Slalom rankings are used for multiple purposes within the paddling community: for assessment of personal progress, for eligibility for certain competitions, as evidence of qualification for training funds/sponsorship, and other reasons. It is in the best interest of the community as a whole, as well as its members, that these rankings be made as accurate as possible -- especially because the more accurate the rankings are, the more fair they are, and there is no higher obligation on any sports organization than promoting fairness.
This paper explains some of the basic concepts of sports ranking systems
and then applies those concepts to the current ranking system, with
the purpose of identifying problems with the current system.
A proposal is made for a ranking system which substantially resembles
the current one, but which attempts to remedy those problems.
Finally, directions for future study (and hopefully thus for further
improvements) are discussed.
All sports ranking systems have the same goal: to produce
an ordered list of competitors based on observed performance.
Roughly speaking, they all have the same inputs (scores/results) and
the same outputs (a list). Most make some sort of attempt
to "level the playing field", whether by factoring out human bias
or by adjusting scores for circumstances.
All sports ranking systems can be checked against reality in the same
way, by looking at the correlation between expected results (as
predicted by the ranking) and observed results (as actually
experienced in competition). This correlation is rarely, if ever,
perfect; but it does provide a first-order estimate of the
accuracy of the ranking system.
There are a number of problems faced by any ranking system.
Here are a few of the ones that relate to slalom rankings.
Sports rankings systems attempt
to reduce the performance of a competitor to a single number.
This is a drastic oversimplification of reality -- necessary as
it may be to produce an ordered list.
Let me try to explain that by example. If you were trying to characterize
the performance of a car, you might start with its elapsed time
accelerating from 0 to 60 MPH, a popular metric. But then you
could add stopping distance from 55 MPH...turning radius...
MPG city and highway...ground clearance...drag coefficient...
wheelbase...until you finally had a collection of thousands of
numbers, each one of which describes some aspect of the vehicle's
performance.
Now which one -- or combination of ones -- should be used to "rank"
the car against its peers?
The answer to that question is tricky, because in part it depends on
what the "rank" is designed to indicate. A rank designed to indicate
safety will be quite different than a rank designed to indicate performance:
in fact, it might use a completely different set of numbers.
The problem is the same in the sport of slalom. We could come try to
come up with metrics that describe the performance of paddlers on big-water
courses and tight technical ones; natural and artificial; ones with
and without major holes, left-handed or right-handed offset moves,
cold or hot weather conditions, and so on. Eventually we'd have
any number of measurements, each one of which describes some aspect
of the paddler's performance. But which one or ones should we use
for comparison?
Some sports have a fixed competition which varies little, if any,
and can be used as a universal metric for assessing performance.
For example, the 100-meter dash (which is essentially the same
wherever it's contested, perhaps with an adjustment for altitude
or wind-assist), or the pole vault. This makes ranking relatively
easy, because it means that competitors anywhere in the world can
be compared against each other solely on the basis of a single
number (elapsed time or height).
Slalom is not like this (with the sole exception of the national
pool slalom). Courses vary dramatically, sometimes even
on the same river on the same day. (E.g. Amoskeag/Junior Olympic
Qualifier 1998, where the water was visibly rising throughout the race.)
At any given site, water level, gate placement, weather conditions,
and other factors all influence how difficult the course is.
This makes comparisons between races problematic -- even if they're
held on the same site on consecutive days. We simply can't assume
that such multi-day events are of the same difficulty.
Nor can we assume that "major" events are difficult, and that "minor"
events aren't, or vice versa. (Yes, in general, this is roughly true;
but there are numerous exceptions, enough so that the standard deviation
is large and therefore the assumption is invalid for rankings purposes.)
Paddlers are human beings and don't always perform the same way
every time they race. So which performance(s) actually reflect
their true ability? Their best? Worst? Median? Average?
Average with the best and worst removed? Or something else?
Polls are notoriously subject to error due to missing information,
disregard for available information, personal bias, geographic bias,
and a myriad of other problems. Such systems just can't be taken
seriously; they're just thinly-disguised popularity contests.
Some sports attempt to do rankings solely by head-to-head competition.
This is less reliable than perhaps it might appear on the surface.
For example, the NCAA basketball tournament provides 63 head-to-head matchups.
It's thus sometimes cited as an example of a system which provides a
thorough assessment based on head-to-head competition. But of the
2016 possible head-to-head matchups available to a field
of 64 teams, it only encompasses 3.1% of them; and half the teams
involved play only one game, yielding a single data point each.
This is hardly a significant data sample, and certainly not one
which would support any but the most general conclusions.
Head-to-head results
are not always the best indicator of how two athletes or teams
actually compare, because they may be heavily influenced by
the particular match-ups which do or don't occur -- and
in the case of a sport as geographically dispersed as slalom,
most of the possible head-to-head matchups will never happen.
Systems which attempt to use an algorithm
to determine which athletes/teams are better than others
are widespread.
Probably the most famous and successful of these systems are the ones
devised by Jeff Sagarin -- an MIT grad who has been refining
his algorithms for many years.
The problem isn't the output of the system -- Sagarin's is
quantifiably superior to its competitors. The problem is that
increasing the accuracy of a ranking systems requires increasing
the complexity of the algorithm, which may make it difficult
for participants in the sport to understand.
For example, Sagarin's methodology encompasses the concept
of a "good loss" -- wherein a team that lost a game may move
up in the rankings because it did so by a small margin
against a vastly superior opponent. This is completely logical
within the rankings system, but confuses many observers who
do not grasp the overall methodology.
If this problem didn't exist, ranking systems would be much simpler.
However, it's quite common, and so most ranking systems make some
sort of attempt to deal with it gracefully. Most do this by going
beyond the binary (A:1, B:0) and using margin-of-victory (A:72, B:65)
in each competition to assist in the ordering. Others factor in
home court/field advantage, or difficulty of that particular
competition, or use other means to attempt to correctly order
A, B, and C in such a circumstance.
This is a substantial problem for ranking systems where only
a relatively small number of competitors are involved. In the
case of slalom, where the number is now over 1000, it's a major problem.
The best way to assess the relative accuracy of a ranking system is to
check it against the original data -- the results. An accurate ranking
system should show a high correlation to results.
That's not the end of the matter, though. One of the many questions
that can be raised is "Which results?". For example, the
Sagarin rankings of NCAA Division 1 basketball teams are designed so
as to weight recent results more heavily than older ones; this is done
in order to provide a ranking which represents the current strength
of each team, not the team's strength throughout the season.
This is neither good or bad: it's just a design choice in the algorithm
which needs to be taken into account when assessing the algorithm's
accuracy.
Another question is "How should that correlation be measured?" To continue
the example above, critics of Sagarin's rankings often point out that
they are not especially effective at predicting the results of individual
games. They miss the point that the rankings are not designed to
be effective at predicting single outcomes, and therefore that critiques
based on their failure to do so are unfounded.
The real answer to assessment lies in the design goal that the
ranking system is designed to address: it should be measured solely
on the basis of how well it meets that goal.
The algorithm used to compute these rankings is the same as that used in
1996, 1997, and 1998. Here's a step-by-step explanation of it; this
explanation is close to what's already on the NWSC web site, but this
one has been edited to make it a bit more complete, a bit more up-to-date,
and hopefully a bit more understandable.
3.1. Results from races come in a variety of formats; in order to prepare
them for subsequent steps, each race's data is rewritten into the same format,
which looks like this:
The fields mean exactly what they mean on a standard slalom scorestrip.
3.2. The classes for each race are translated from the many names that show up
in results to a list of canonical racing classes. In other words, the "race classes"
are turned into "ranking classes". Here's an example of part
(the full table has 526 entries) of the translation table used to do this:
3.3. Results for classes which don't currently get ranked -- for example,
open boats, squirt boats, sit-on-tops, etc. -- are dropped.
3.4. Scores with DNS, DNR, or DNF are interpreted numerically, with 999.99 used
for every one of those. Mostly, this is just used to gather per-race
statistics, because races where someone DNF'd don't count toward their
ranking.
3.5. If a race was scored with 2 seconds for touches/50 seconds for misses,
nothing happens in this pass. But if it was scored with 5 seconds/50
seconds, the penalties are recalculated to the 2/50 system. This is done
via a look-up table and a small algorithm. The impact of mixing results
like this is negligible: in 1998 and 1997 I computed rankings both ways
and the differences in final rankings were insignificant.
3.6. Every paddler's name is converted to canonical form, which I've hopefully
spelled correctly. Here's an example:
The table used to this has around 5000 entries, and covers every paddler
who has competed in a ranked US race for the last several years.
3.7. Every race result is converted into this form:
Class=Name Score Ratio
where "Score" is their combined-run total score for the race, and "Ratio" is
the ratio of their score to the best-score-of-the-day. For example, here are
part of the results from the Riversport Slalom in 1997:
In this table, you can see that Cathy Hearn's Ratio is 1.092; the way that was
derived was by taking the best combined score of the day (Jason Beakes, 247.76)
and dividing Cathy's score (270.49) by it, e.g. 270.49/247.76 = 1.092.
In English, this means "Cathy Hearn's score was about 109% of Jason Beake's.")
Boats which did not complete two runs are dropped at this point. (See previous
comment about how DNFs don't count toward rankings.)
3.8. The ratio from the previous pass is inverted to give a competitor's race
ratio: this number reflects how far off they were from the
best-score-of-the-day. (The boat with the best score of the day has a race
ratio of 1.000.) Two lookups happen: last year's rank class (A, B, C, D or
U for unranked), and membership on the national A team. (The reason for this is
that the strength-of-field assignment, which we'll get to later, is based on this.)
Again, using Cathy Hearn as an example, 1/1.092 (from previous table) = 0.916.
Or, in English, "Cathy raced about 91% as fast as Jason."
In the case of boats which competed in the same race class more than once
(e.g. K-1 Masters and K-1) only the better of those two results is used.
This is done in order to comply as best as possible with our rules concerning
competition in two age classes and to try to level the playing field. (Because,
for example, someone who is 41 can take four runs, while someone who is 39 can
only take two. It seems that the person taking four already has an advantage,
so we shouldn't give them an additional advantage by counting this as two
races instead of one.)
3.9. Each race result is weighted by the race weight; the race weight is given by
where field strength and importance factor both have maximum values of 10;
thus the race weight has a maximum value of 1.000. The table of assigned field
strength and importance factors, along with the criteria used to make these
assignments, is here. Continuing the example above, and using Riversport's
1997 field strength of 9 and importance factor of 5 (thus giving a race weight
of 0.700 by the equation just above):
To continue the example, Cathy's unweighted ratio is 0.916; the race weight is 0.700,
so the weighted ratio is 0.916 * 0.700 = .641.
This number is a competitor's race weight: think of it as "how much credit
you get for doing this well at this race against this competition".
3.10. All results from all races are combined. If a paddler has done more than
three races, their best (highest) three race weights are selected.
These three best results are then averaged to give the competitor's Rank Ratio.
That Rank Ratio is then adjusted if they've done less than three races:
if a paddler has done only two races, they're assessed a 5% penalty; if only one
race, a 10% penalty. Using the same example as before:
3.11. The results are sorted by rank ratio and separated by racing class.
In other words, all K-1's are listed in order from highest rank ratio
to lowest; all K-1W's are listed in the same order, and so on for the
other classes.
3.12. Within each class, a boat's percentile is computed. For example, in 1997,
Cathy Hearn was the top-ranked K-1W; she is thus assigned the 100.00 percentile.
All other K-1W's are then assigned a percentile based on the ratio of their
Rank Ratio to Cathy's. Continuing the example from above:
Taking the last boat as an example, Nancy Beakes' Rank Ratio was 0.654;
Cathy Hearn's was 0.841. Thus Nancy Beakes's percentile is 0.654/0.841 = 77.8.
3.13. A number of lookup tables are consulted. The first is a table which
is used to decide which letter class (A, B, C, D) the boat is in. Here's
that table:
In other words, a boat whose percentile is 77.8 is assigned to the "B" class.
Note that the current cutoff for automatic admission to Team Trials
is at the 75th percentile.
3.14. Each boat is assigned an ordinal number: the highest-ranked boat in
each class is "1", the second-highest is "2", and so on. Ties are handled
by assigning both boats the same number and skipping the subsequent one.
3.15. If the paddler is a citizen of a country other than the US, a notation
to that effect is added to their name. The lookup table that I use for
this is slowly becoming more accurate, but I wouldn't be surprised to find
that I've missed someone.
However, the presence of non-US paddlers also has no impact on rankings,
since the breakpoints for class assignments as well as the cutoff for
automatic admission to Team Trials are assigned on a percentile basis, not
on the number of boats. All it really does is provide our guests with
an inkling of how they rank among people who have competed here in the
last year.
3.16. Where data is available, boats are marked by age group, e.g. "Jr",
"Ms", etc. I've not done this with 1999 rankings because of the
unreliability of the current data, but numerous athletes have remarked
that they find it useful to be able to compare themselves within
their age group.
Here's what the final result looked like in 1997:
To summarize what this table says: Evy Potochny was C-ranked; she was the
#56 K-1W in the rankings. Her best three races were Lehigh, Riversport
and Bellefonte; her Rank Ratio was 0.366 (which compares her to the
fastest boat overall) and her percentile was 43.5 (which assesses her
performance within the K-1W class).
3.17. In the case of rec boats (plastic, cruiser, etc.), all of the above
is repeated *except* that better-of-two instead of combined runs are used.
In order to provide an adequate statistical basis for comparison, the rec
boats are lumped together with the race boats to crunch through the numbers,
then the race boats are dropped out. This ensures that at races where the
overwhelming majority of boats are glass (e.g. Mid-America #2) that there
are enough boats to compare against. (And since the race boats are dropped
out of these calculations *before* the percentiles are calculated, people
who race rec boats aren't penalized for racing primarily against glass boats.)
Also, because rec boats haven't been previously ranked, I found it
necessary to assign guesstimates to a handful (4) of rec boats in order
to provide a starting point for computations. I minimized the number of
such estimates (because I loathe making up numbers, even when I can do
so with a high degree of confidence). It's also worth noting that if my
estimates are wrong, the errors thus introduced will diminish with each
iteration of rankings. To put it another way: each time the rankings
are run, the effect of my initial estimates decreases, so after a few
times through, even gross errors will disappear...and hopefully I didn't
make any of those. Let me demonstrate:
Here are the four estimates I made for rec boats:
These were arrived at by comparing performance in glass vs. performance
in plastic and were done only to make it possible to compute initial rankings
for rec boats. I think my estimates were reasonably close, given that
the final rankings for these boats were:
3.18. That's it. Please note that although all the calculations were done to
several decimal places, that does not mean that rankings are accurate to
that degree. For example, the difference between a rank ratio of .453 and .456
falls WELL within the variability of manual timing systems. And boats
which fall, say, at percentile 91 and 88, are essentially indistinguishable.
SUMMARY: The overall formula that's used in this system is:
where the field strength and importance are assigned from the table below.
(Any race falling into multiple categories in the table above is assigned
the factor from the highest category.)
This is actually a pretty good idea. As explained
above,
one of the problems with assessing slalom performance is
the absence of a standard across races (since races are of
different length, difficulty, etc.). The closest that we can
come is to use the fastest boat, which, as it turns out, is
usually one of the most consistent boats as well (which
means it makes a good "measuring stick" for the rest of the field).
Here's a little bit of data to back that up. This is the result
of analyzing the data from the 1999 season.
For each boat, I calculated the average of their two runs, then
used that to calculate the percent difference (for each run)
against that average. For example, the data for my K-1 runs at
the Codorus SL is 122.48 (1st) and 128.26 (2nd);
the average is 125.37; so the percent difference is 2.3%.
I then calculated the average across boats depending on how
close to the best time-of-the-day they were. The data shown
below came from 65 races -- the ones that I had full scores
for both runs.
What this shows is that faster boats are more consistent boats -- and
thus make a better "measuring stick" than slower boats. (Repeating
this analysis using different percentage cutoffs yields different numbers,
obviously, but the trend remains the same.)
As seen above in
step 10 of the current rankings system,
the numbers from each boat's best three races are averaged.
This tends to mitigate the effects of an exceptional
performance at one race, which might otherwise exert an
undue influence on a boat's rank. There are many open
questions here, though:
Unfortunately, there are no easy answers to any of these, but one
of the things I'm working on are experiments to see what effect,
if any, different answers to those questions have on the rankings
thus generated.
It's probably worth noting at this point that using the best N races,
whatever they might be, helps alleviate the effect of old results
on rapidly developing paddlers. By that what I mean is that when
a paddler's results improve greatly during the course of a year,
the most recent ones will probably be used to determine their ranking.
This is a good property
of the ranking system -- it helps in providing some assurance that
the rankings reflect current ability. The rare exception to this
is a paddler whose ability significantly declines within a year --
and that's usually due to injury.
The more data that's used, the more paddlers that are included.
That in itself is desirable, because it reflects an inclusive
rather than an exclusive viewpoint.
But beyond that, the use of more data means that the number of
paddlers affected by the 5%/10% penalty drops, and it also means
that there are additional races which can be used as part of
each paddler's best three.
Let's look at the current ranking system's effect on boats by gender
by comparing races that are open only to the 4 ICF classes with races
that are open to all 7 US classes.
This means that, roughly speaking, over the past four years the boats
in the classes that the ICF recognizes have had about 250% (39 to 15)
as many opportunities to significantly improve their ranking
(by participating in races races with a high importance factor)
as their counterparts in the non-ICF classes.
Here's a table showing the breakdown for each year, for each race, by where
it was held.
Every entry looks like (number), (state), e.g. "2,TN" means "2 races,
Tennessee".
The last two lines provide a summary by which half of the US (east/west of
the Mississippi) was involved.
Conclusion: Paddlers living west of the Mississippi will have to make many
long trips east to have any chance of improving their ranking.
Those who live west of the Rockies have it even worse; and those who live
in New England are also at a disadvantage. In fact, it could be
argued that paddlers living somewhere in the triangle between western PA,
southern WI, and central KY are ideally situated, since they were
within a day's driving time of 29 of these races.
It's sometimes asserted that the large imbalance to the east
is because most of the "good" slalom
paddlers are in the eastern US. It's true that most of the highly-ranked
slalom paddlers are in the eastern US, but that's in large part
because that's where the races that are held which enable them to
be highly-ranked -- it's not necessarily because they're "good", per se.
It's a self-perpetuating system.
To put it another way: western paddlers are not
highly ranked as a group, NOT necessarily because they're bad paddlers;
but in large part because they have few opportunities to race at
events where they can raise their ranking.
Let's look at the effect of paddler age on their opportunities to improve
their ranking; in particular, let's examine the opportunities to compete
at race with high importance factors depending on age.
What's this mean? It means that a C-1 paddler who is 17 has somewhere
between two and eight times the opportunities of his 19-year-old
competitor to boost his ranking -- depending on how good each is.
In the case of a strong intermediate 19-year-old C-1 who is good enough to
compete at Nationals but not good enough to make Team Trials, this means
that this boat has one chance a year to compete in a heavily-weighted
race... and if he can't make it to that race because of distance,
scheduling or other factors, then he will remain ranked as he is,
will not be prequalified to the following year's Sr Team Trials, and
the cycle will repeat.
The importance factor table still lists the CIWS qualifiers and
finals, even though these races have not existed for several years;
similarly, the USOF and USOF qualifiers, which haven't been held
since 1995, are also listed.
The importance factor table also does not make it clear which races/series
should be considered "major", which "regional", and which "local";
nor does it make it clear how to differentiate these from a "C-D race series".
For example,
is the Esopus June race a "major double-header" -- because it's held on
class III water and is the only doubleheader in the entire northeast;
or is it a local/regional race? The statement that the highest of the
applicable classifications should be used is of some help here, but does
not clear up all the uncertainty.
The importance factor biases mentioned above often
combine to make it difficult even for
talented paddlers to be ranked as they should be.
Consider the combined effect of the geographic and age biases
on a hypothetical C-1 paddler who is 25, lives in the Pacific Northwest,
and is currently ranked around #40-45 -- which would make him
a strong intermediate/advanced C-ranked boat. (1999 C-1 rankings
#40-45 are Renner, Harris, Baldwin, Denz, McEwan, Criscione).
Consider that during the last five years, this hypothetical paddler has had
only three races with importance >= 8 within 1000 miles of his home -- and
because of his age, he could only compete in one. What chance has
this paddler had to improve his ranking? He may really *be* a C-1
which should be ranked in the 40's; or he may be a C-1 that's a top-20
boat but which simply isn't ranked there due to
the rankings system.
There are other combinations of these biases which also work against
the fair assessment of paddler ability: try working through that example again
with a 17-year-old C-1W, for example. And if you do the math, you'll find
that this boat has almost no chance whatsoever to achieve a rank ratio
comparable to B-ranked C-1's regardless of how good it is -- because
this boat can't attend Sr or Jr Team Trials and thus is excluded from 5
of the most heavily-weighted races in the country.
No ranking system, no matter how well-designed, can address all of these
issues. For example, the awarding of bids to major races is beyond
its scope. But it can certainly attempt to minimize the effects caused
by these external decisions.
This is the most obvious problem, and affects the first three entries
(4 team boats=10; 3 team boats = 9; 2 team boats = 8)
in the table. Since the
formula
which is used to compare each competitor's performance normalizes it
to the best score of the day, the only relevant field strength
is the strength of that boat. The presence or absence of
other boats does not affect the performance of either -- at least
not in an objectively, quantifiably measurable way.
To put it another way, if a competitor finishes at 115% of
Scott Shipley's time, we do not need to and should not consider
whether Jason Beakes or Cathy Hearn or Steve Conklin or anyone
else showed up at the race -- because they have no effect on
either performance. (We could speculate that Scott will paddle
faster if he has to race against Jason than, say, against a field
comprised entirely of B/C-ranked paddlers. But that's just speculation,
and there does not exist a way to confirm or deny this based on
measurable data.)
The first four entries in the table of field strength are
determined by how many national team athletes are at a race.
(I actually use "boats", not "athletes", because it avoids
counting C-2's twice.)
Let's look at the fourth entry for a moment -- without loss
of generality.
That entry says that if a single national team athlete competes
at a race, the field strength is 7.
But not all national team athletes (boats) are equally fast:
consider this table of 1999 national team boats,
ordered by rank ratio as determined in 1999 rankings:
The problem is that no matter which of these boats competes
at a race, the field strength for the race will be set to 7. Yet there's
a clear variation in measured performance between them -- in fact,
it's almost 25% easier (if I may use the word "easier" when talking
about racing against national team members) to race against
the C-2 team of Long/Long than it is to race Shipley in K-1.
A paddler who posts a score 125% off Long/Long would be expected
to have a score about 156% of Shipley: yet with the current
field strength assignment, they would find their results weighted
the same way in either case.
This analysis can be repeated for the first three lines in the table
to generalize the problem...and that is, that assigning
a field strength based on national team membership presumes that
all national team boats are equal -- which they're not.
This is a problem similar to the preceding one, except that the
issue isn't boats on/not on the national team, it's boats within
a class. Consider line 6 of the table, which assigns a field
strength of 5 to any race where the fastest boat is B-ranked.
Here's a table which shows the highest and lowest B-ranked boats
from 1999 in each of the four ICF classes, ordered by rank ratio:
There's roughly a 36% difference between fastest and slowest; yet if either
attends a race, the field strength will be 5.
In 1997, Jana Freeburn finished 1st in K-1W on the first day of
Sr Team Trials, and 2nd in K-1W on the second day. I believe that
by the qualification rules in place at the time, this would have
made her the #1 K-1W on the US Team; however, she declined the spot.
This skews the strength-of-field for any 1997 race at which her presence
set the field strength (because it would be set to 6, for an A-ranked
athlete, not 7, for a national team member). And it skews the
strength-of-field for any race at which she was one of several boats
determing field strength, for the same reason.
Granted, this is a relatively rare set of circumstances, but it's
one that should not have an effect on rankings.
The international calendar varies from year to year, but during
significant parts of each season, the national team is out of the US.
Any races held during that time period will have a maximum field
strength of 6, simply because few (if any) national team members
are available to compete in them.
Races which are held near places where national team members live
occasionally have them as competitors; races in other areas don't.
For example, the Penn Cup Riversport race in 1997 had three national
team members racing in the expert class, mostly because of its
proximity to Washington, DC. However, it's quite rare for a race
west of the Rockies to have national team members on hand.
The problems with both of these combined to cause some races to be
overweighted (Nationals, Jr Olympics) and others to be underweighted
(Ocoee DBH, Rattlesnake, Animas). The result is a rankings system
which enables paddlers to move up in rankings by doing relatively
poorly at "important" races that are also attended by national
team members -- and keeps paddlers who, by virtue of geography,
gender, age or other constraints, perform well at "unimportant"
races against strong competition, from doing the same.
This leads to a large number of anomalies in the rankings: paddlers
who are ranked ahead of others that they have never come close to
beating head-to-head, but who didn't happen to attend a sufficiently
highly-weighted race. These anomalies in turn cause the ranking
numbers to correlate less well with actual results than if
they were not present. They also cause paddler confidence in the
accuracy of the rankings to decrease, because in some
cases, they're particularly egregious.
In order to address as many of the problems listed above as possible,
while simultaneously making evolutionary rather than revolutionary
changes, what I propose to do is change the way races are weighted.
Specifically, I propose to replace field strength with "rank ratio
of athlete with best score of race" and importance factor with
"difficulty factor".
These changes would revise the current formula to:
(Note that the change in the denominator from 20 to 2 is just to
keep things normalized; rank ratio and difficulty factor both
go from 0 to 1, so this means that the highest possible weight
for a race is 1.0, just as it is with the current formula.)
Using the rank ratio of the boat with the best score of the race
immediately addresses the problems encountered with field strength:
it does so by providing a much more accurate metric than simply
noting that the fastest boat was "B-ranked". In other words,
rather than just setting this to "5" as in the current system,
this number would vary between .850 (if Richard Dressen is
the fastest boat) and .542 (Heather Warner). IF we presume
that these reflect actual performances differences -- and part of
the goal is to make them do so -- then these provide a fine-grained
way of assessing relative performance against a known standard.
Another way of saying this is that while the fastest boat is not
an ideal measuring stick for the rest of the field, it is
the best one (and perhaps the only one) available.
The difficulty factor is more problematic. A previous approach to
this problem tried to use gradient, length, and other physical
properties of the course to assess difficulty. The problem with
that approach is that it doesn't take into account water
flow, how the course is set, weather conditions or any of the
other myriad of variables that determine how hard a course actually
is to paddle fast and clean.
But if these properties -- which have the advantage of being directly
measurable, and therefore immune to human bias -- aren't sufficient
to determine difficulty, what is? How can we devise a system which
reflects what actually happens on courses, e.g. "The course for
the second day of the Esopus Doubleheader was substantially easier
than the first day."?
The answer, I think, is to use the paddlers themselves to measure the
course. In particular, to use the paddlers who perform well -- and
therefore, as shown above -- consistently well.
To explain: consider a course about which nothing is known. Send
four boats down it -- A, B, C, and D-ranked K-1's. If the results
look like this:
then clearly it's a course of considerable difficulty: enough to make
the B-ranked and C-ranked paddlers have slow runs and/or miss
a number of gates,
and to put a D-ranked paddler in the water.
But if the results look like this:
then clearly it's much easier course: the C-ranked paddler is not
all that far off the A-ranked one, percentage-wise, and so on.
This method can be generalized to use with almost any field of boats
on almost every race course: think of each boat's run as a "probe"
which reveals something about the course. Lots of A-ranked boats
missing gates? Must be hard. Lots of C-ranked boats turning in
clean runs? Must be fairly easy.
The problem is not this concept -- everyone who races is accustomed
to looking at race scores and guesstimating the difficulty based on
who appears to have had trouble and who didn't. The problem is trying
to turn this intuitive concept into a reliable mathematical metric that can
be used to weight races.
The current answer -- and I say current because I'm continuing
to research this and look for better answers -- is to use this average:
This is a lot easier to understand with an example.
So let's take the above two hypothetical four-boat races, and use them
to illustrate how this works.
Granted, this is just a hypothetical example to illustrate how the
calculations are done, but it turns out that this actually works
quite well for most real races. (Where it doesn't work well is when
there are very few boats in the race, or very few boats within 150%
of the top boat. This tends to happen at races that are geographically
remote or at which a single national team member participates as part
of a relatively small and inexperienced field. See below for adjustments.)
The reason for the constraint that boats be within 150% of the fastest
boat in order to be used to calculate this difficulty metric is that
this uses the most consistent boats available at this particular
race to assess the difficulty. It's a compromise number: make it
too small and too few boats are included to make the average useful;
make it too large and too many inconsistently-performing boats are
included, clouding the picture of how hard the course really was.
("Does it appear to be difficult because it really was,
or did several C-ranked boats
just have a really bad day on a III+ river?")
There's one other piece to this: in order to avoid having a race
with few participants end up being heavily weighted in terms
of difficulty, the difficulty factor is
adjusted 5% downward if fewer than 20 boats are used in the average,
and 10% downward if fewer than 10 boats are used.
This doesn't mean that the race is "easier"; it means that the data
sample size is too small to make an accurate estimate, so we
choose to err on the conservative side. Note that the more races
(and thus paddlers) that are included each time rankings are run,
the less of a problem this will be.
Here's what the difficulty factor, rank-ratio-of-fastest-boat, and resulting
race weight looked like in 1999:
For the most part, this aligns rather well with what
one might expect:
the 5 races on the Ocoee are at the top; races
like Trials Qualifiers, Nationals, and
a few other hard ones (e.g. Rattlesnake) follow;
races of moderate difficulty such as Amoskeag
and Nooksack and Punchbrook are in the middle,
and easy races like Lehigh and Farmington are
near the bottom.
Note that all three days of Sr Trials wind up weighted within 3.4% of each
other -- which is quite close to what one might expect for races whose courses
are carefully designed to be difficult and consistent,
and whose competitors include nearly all current team members.
But there are some things that I'd consider anomalies: the Esopus
and Payette races are probably weighted too low, and the races
at Bellefonte (Bellefonte, Dog Days, June Jamboree) are weighted too high.
However, the good news is that these anomalies
aren't as large as the ones generated by the
importance-factor/strength-of-field system.
In other words: this isn't perfect; but it's better than what we have.
Part of the problem is that these 1999 numbers were calculated using
1998 rank ratios as the starting point, and by all the arguments in
section 5
we already know that some of those rank ratios are way off.
(For example, most of the paddlers in the west are probably under-ranked;
thus races like Nooksack, Snoqualmie, and SalmonlaSac aren't weighted
as highly as they ought to be.)
Since this is an iterative process, with each year's rank ratios
used as the starting point for the next year's calculations,
it'll take a cycle or two through to work those problems out.
(In other words, each time an estimate is made, the effects
of a previously inaccurate estimate, if any, will diminish.)
This would be a good point to revisit the critique of the current
system outlined in
Section 5
and see if the proposed system addresses those points.
The issues pertaining to importance factor
all disappear because the importance factor isn't used.
The issues with field strength also go away, for the same reason.
That's not to say that there aren't issues. For example, replacing
the importance factor with a difficulty factor just means that a new
set of issues will have to be addressed. Among those are questions
such as "Should difficulty account for half the weight of a race?",
"What's the most accurate metric for difficulty?", "Is that metric
stable when the race is sparsely attended, or when the field is of
scattered strength?" The same thing goes for the other part of
the race weighting formula, the rank ratio of the fastest boat:
"What about once-in-a-lifetime runs by a single boat that skew
the race weighting?" and "Since the fastest boat is almost always
a K-1, what effect will this have on race weights?" and "Should the
strength of the fastest boat account for half the weight of a race?"
Those are not easy questions to answer -- although I think they're
much easier to deal with than the issues posed by the current system.
But those questions aside for a moment,
does this proposed system result in rankings that are more accurate
than the current one?
For that matter, how can we measure how accurate the rankings are
from either system?
For that, we need to come up with a way of calculating how well
estimated performance (the rankings) matches up with measured
performance (the results).
One way to assess how well a ranking system approximates reality
is to compute the mean-squared error between (a) actual results
and (b) what the ranking system indicates. This isn't necessarily
the best way, but it has the advantage of being computationally
simple -- and it's certainly good enough to indicate the presence
or absence of large errors.
Let's try an example, using the hypothetical results from our
four-boat race of moderate difficulty, and rank ratios assigned by some
hypothetical ranking system "1":
Now let's try it for another hypoethetical ranking system "2" -- same
race, same results, but different rank ratios:
It's a contrived example--but it does show that for this particular
race ranking system 2 was considerably closer to reality
than ranking system 1.
One thing to note about this: ranking system 2 had a poorer estimate
of the rank ratio of our hypothetical C-ranked paddler than ranking system 1;
however, it had much better estimates for the B- and D-ranked paddlers,
enough so that it came out ahead in the overall comparison.
The trick is in making this work for all races, not just one: and
the problem with that is that
a set of rank ratios which matches up beautifully with the results for
one race may do quite poorly when checked against a different one.
What this leads us to is:
Ranking system 1 is better than ranking system 2 if the average (over
all races) mean-squared error between predicted and observed results
is less for 1 than for 2.
It's worth noting that this doesn't say anything about the internal
workings of 1 and 2: and that's because the exact algorithms used
aren't relevant to this. All it says is that whichever ranking system
better matches the original data -- which are the only measurements
we really have -- is a better ranking system.
It's also worth observing that NO ranking system will reduce the
average mean-squared error to zero. Not only will any occurrence
of the
A beats B, B beats C, C beats A problem
prevent this from happening --
but any time A beats B by any percentage other than the ratio of
their rank ratios, the mean-squared error will be nonzero.
In practice, this isn't that much of an issue: where the measurements
provided by the rankings are critical (for funding, team trials
eligibility, etc.) they are almost entirely determined by head-to-head
competition at a handful of races -- and that in turn mitigates the
effects of this problem. (Data point: 1999 US Senior Team members account
for 129 results in this year's overall results. Team Trials, Ocoee DBH,
Nationals and NOC combined accounted for 100 of those.)
To restate that last paragraph in another way: every ranking system
will show some anomalies, because the nature of the results makes it
impossible to eliminate them all. The accuracy of the ranking system
does not depend solely on the number of anomalies, but also on
their severity.
The current NWSC ranking system produces an average mean-square error (MSE)
of .387 (for 1999 races); the proposed system yields .307. One experiment
(see Section 7.5) yielded an average MSE of .280.
So at a first glance, that means the proposed system is about 21-27% more
accurate even though it was seeded with numbers from the current system.
It's my expectation that when seeded with numbers generated from itself
that it would improve even more.
Here's a look at the top 10 K-1W's, C-1's, and K-1's
for 1999 as computed by the current and proposed ranking system.
(No slight meant to the C-2's; just trying to illustrate the
point without too much more data.)
Most of the numbers match up quite closely between the two systems:
one example of a major shift that becomes immediately apparent
is Shaun Smith in K-1, who doesn't even make the top 20 under the
current system, but ends up in 8th under the proposed one.
It's worth looking at this example to understand why such
a discrepancy occurs, and why it can be argued that the latter
ranking is the more correct one.
The reason now becomes apparent: Shaun turned his two of his best three
performances of the year at the Ocoee DBH, where the only people who
beat him were current and former national team members. Under the current
system, the weight on Trials is so high and the weight on the Ocoee DBH
is so low that his second-best performance of the year -- 8% off Scott Shipley
at Ocoee DBH #1 -- doesn't even count as one of his best three races.
However, under the proposed system, both Ocoee races are weighted much
more heavily, and as a result, Shaun's final rank ratio reflects what
he's capable of.
Similar analyses can be repeated for all the differences between the two
ranking systems:
it's important to remember though, that neither one is correct,
in the sense of being demonstrable fact. Both are estimates, and the
assessment of which is more accurate can't be made on the basis of
one or a handful of individual rankings, but must be made across
the entire set of rankings.
The difficulty metric proposed in
Equation 6.2
was derived empirically -- mostly because there
doesn't seem to be a theoretical basis to work from.
But it's by no means definitive: perhaps there are other difficulty
metrics which would yield better results.
For example, I've already tried different percentage cutoffs;
no doubt further investigation can be done there.
There have also been suggestions
that using raw times or times-with-50's-but-not-touches or some other
functions may yield better results. That's possible; the only
way to find out is to do the experiments.
Similarly, the overall race weighting formula
( Equation 6.1)
may not represent the best way to balance the difficulty
of the race with the strength-of-fastest-boat.
One experiment that I've tried was to multiply those factors
rather than add them; the result is highly nonlinear behavior
in the resulting rankings.
Is three the right number of races to use? Or would using
four improve the results? What about using only three races,
but selecting them by throwing out the best/worst races for
each boat?
Should older results be devalued? Or does the selection
of the best three (or equivalent) automatically handle
this because older races will eventually not be used
to compute the current ranking?
This might seem counter-intuitive at first, and even second glance:
but it's not. Because we select the best three races for each boat,
any boat that does a sufficient number of races will eventually
manage to have two good runs on the same day; that race is likely
to be one of their best three, and may even dominate the average.
We already do this: each year's rankings are used to compute
the following year's. The problem is that the presence of
many fast-developing paddlers skews those results. A paddler
who was of high C strength in 1998 may be competing at a high B
level in 1999 -- but because that paddler's rank ratio, as used
in the race weight, is from 1998, that won't be adequately reflected.
A possible solution to this is to compute and publish rankings
more often. Possible times include just after team trials, mid-summer
(before nationals), late October (when the season is mostly over),
and late winter (before the season starts).
A lot of people inside and outside the slalom community have
contributed ideas and questions.
I'll try to list them all, and hopefully not be remiss by omitting anyone:
Jonathan Altman, Karin Baldzer, John Brennan, Bob Campbell,
Chris Carter, Chuck Cooper,
Lee deWolski, Oliver Fix, Renee Gelblat, Tom Gelder, Bert Hinkley,
Peter Kennedy, Dave Kurtz, Keech LeClair, Brian Parsons,
Sylvan Poberaj, Joel Reeves, Mike Sloan, Merril Stock, Max Wellhouse.
Joan Schaech proofread this and sanity-checked the content multiple
times for intelligibility. Any remaining errors are mine.
John Koeppe has made a number of helpful comments about revision 2.0
of this paper, some of which are included in 2.1, and others of which have
led to additional study that will be included in future revisions.
Many, many race organizers have passed along results which have enabled
a sizable quantity of data to be amassed and studied. Thanks to their
efforts, the 1999 rankings used more data and included more paddlers
than ever. Again, I'll try to list them and hope I didn't leave
someone out:
Charlie Albright, Scott Bowman, Chris Carter, Mark Ciborowski,
Linda & Mark Davidson, John Day, Lee deWolski, Wayne Dickert,
Steve Exe, Jennie Goldberg, Kirk Havens, Sonny & Amy Hunt,
Ray Ingram, Don & Paula Jamison, Ralph Johns, Barbara Kingsborough,
Dave Kovar, Dave Kurtz, Ben Kvanli, Keech & Ann LeClair,
David Martin, Sean McCarthy, Peggy & Dave Mitchell, Randolph Pierce,
Mark Poindexter, Bob Putnam, Bob Ruppel, Marilyn & Wayne Russell,
Ted Ryan, Susan Saphire, Walt Sieger, David Sinish, Dave Slover,
Merril Stock, John Trujillo, Boo Turner, Tom Vollstedt, Don Walls,
Henry Wight, Rick Wright, Andreas Zimmer.
2. Sports ranking systems in general - goals, issues, assessment
2.1. Goals
2.2. Issues
2.2.1 Multivariate performance
2.2.2. Absence of standard
2.2.3. Repeatability
2.2.4. Human bias
2.2.5. Head-to-head only
2.2.6. Algorithms
2.2.7. The A beats B, B beats C, C beats A problem
2.3. Assessment
3. Explanation of the current NWSC ranking system
Canonical form for race results Class Name(s) Time-1 Penalty-1 Total-1 Time-2 Penalty-2 Total-2 Better-Score Total-Score
Table 3.1
Class name translation table (excerpt) Race Class Ranking Class C-1W C-1W C-1W Jr C-1W C-1W Junior C-1W C-1W expert C-1W C-1 C-1 C-1 (A/B) C-1 C-1 A/B C-1 C-1 C/D C-1 C-1 Cadet C-1 C-1 Expert C-1 C-1 Jr C-1
Table 3.2
Athlete name translation table (excerpt) Name in results Name in Rankings Carleton Goold Goold, Carleton Carleton Gould Goold, Carleton Carlton Gould Goold, Carleton Goold, C Goold, Carleton Goold, Carlton Goold, Carleton Gould, Carleton Goold, Carleton
Table 3.3
Combined-run scores and ratio to fastest Class Name(s) Combined Score Ratio to Best K-1W Thomas, Natalie 471.65 1.904 K-1W Potochny, Evy 462.75 1.868 K-1W Gelblat, Renee 531.46 2.145 K-1W Hearn, Cathy 270.49 1.092 K-1W Weld, Kara 272.07 1.098 K-1W Beakes, Nancy 317.22 1.280
Table 3.4
Competitor race ratio and rank Class Name(s) Inverse Ratio to Best Current Rank/Team K-1W Thomas, Natalie 0.525 C K-1W Potochny, Evy 0.535 U K-1W Gelblat, Renee 0.466 C K-1W Hearn, Cathy 0.916 ATEAM K-1W Weld, Kara 0.911 ATEAM K-1W Beakes, Nancy 0.781 B
Table 3.5
Weighted race ratios Class Name(s) Weighted Ratio K-1W Thomas, Natalie 0.367 K-1W Potochny, Evy 0.374 K-1W Gelblat, Renee 0.326 K-1W Hearn, Cathy 0.641 K-1W Weld, Kara 0.638 K-1W Beakes, Nancy 0.547
Table 3.6
1997 K-1W rank ratios & best races (excerpt) Class Name(s) Rank Ratio Best Races K-1W Thomas, Natalie 0.350 Bellefonte,Riversport K-1W Potochny, Evy 0.366 Lehigh,Riversport,Bellefonte K-1W Gelblat, Renee 0.369 Lehigh,Codorus,Farmington K-1W Hearn, Cathy 0.841 Trials-3,Trials-2,Nationals K-1W Weld, Kara 0.824 Trials-3,Trials-2,Nationals K-1W Beakes, Nancy 0.654 Trials-2,Trials-1,NOC-DBH-2
Table 3.7
1997 K-1W rank ratios & percentiles (excerpt) Class Name(s) Rank Ratio Percentile K-1W Thomas, Natalie 0.350 41.6 K-1W Potochny, Evy 0.366 43.5 K-1W Gelblat, Renee 0.369 43.9 K-1W Hearn, Cathy 0.841 100.0 K-1W Weld, Kara 0.824 98.0 K-1W Beakes, Nancy 0.654 77.8
Table 3.8
Class assignments by percentile Assigned class Percentile "A" Ranked 85% to 100% "B" Ranked 65% to 84% "C" Ranked 40% to 64% "D" Ranked below 40%
Table 3.9
1997 NWSC K-1W rankings (excerpts) Rank Name(s) Rank Ratio Percentile Best Races A1 Hearn, Cathy (Sr) 0.841 100.0 Trials-3,Trials-2,Nationals A2 Bennett, Rebecca 0.832 98.9 Trials-3,Trials-2,Nationals A3 Weld, Kara 0.824 98.0 Trials-3,Trials-2,Nationals A4 Altman, Renata (Sr) 0.792 94.2 Trials-3,Nationals,Trials-2 A5 Stalheim, Megan (Jr) 0.781 92.9 Trials-2,Trials-3,Nationals A6 Freeburn, Jana 0.774 92.0 Trials-2,Trials-1 A7 Hearn, Jennifer 0.735 87.4 Trials-3,Trials-2,Trials-1 A8 Larsen, Hannah (Jr) 0.732 87.0 Trials-3,Trials-2,Jr-Trials-1 A9 Brown, Amy 0.729 86.7 Trials-3,Trials-2,SnydersMill A10 Jorgensen, Anna (Jr) 0.729 86.7 Trials-3,Trials-2,Jr-Trials-2 A11 Mitchell, Anne 0.718 85.4 Trials-3,Trials-1,Trials-2 B12 Miller, Aleta (Jr) 0.691 82.2 Trials-3,Trials-1,Nationals B13 Beakes, Nancy 0.654 77.8 Trials-2,Trials-1,NOC-DBH-2 ... C54 Gelblat, Renee (Ms) 0.369 43.9 Lehigh,Codorus,Farmington C55 Green, Polly 0.367 43.6 Jefferson,Animas,W-Trials-Qual C56 Potochny, Evy 0.366 43.5 Lehigh,Riversport,Bellefonte C57 Weldon, Amanda (Jr) 0.363 43.2 Lehigh,Bellefonte,FiddlersElbow C58 Wiley, Janet 0.360 42.8 Animas C59 Hoffheimer, Mary 0.356 42.3 Farmington,Riverfest,Blackwater C60 Wiley, Amy 0.355 42.2 Animas C61 Baldwin, Hailey (Jr) 0.354 42.1 NW-Jr-Oly-Qual,Jr-Trials-2,Jr-Trials-1 C62 Thomas, Natalie 0.350 41.6 Bellefonte,Riversport ...
Table 3.10
1999 estimated rank ratios for K-1 Rec Class Name Estimated Rank Ratio K-1 Rec Beakes, Jason .950 K-1 Rec Poindexter, Mark .685 K-1 Rec Maxwell, Tyler .550 K-1 Rec Collins, Dave .550
Table 3.11
1999 calculated rank ratios for K-1 Rec (excerpts) Class Name Actual Rank Ratio K-1 Rec Beakes, Jason 1.000 K-1 Rec Poindexter, Mark .730 K-1 Rec Maxwell, Tyler .606 K-1 Rec Collins, Dave .643
Table 3.12
Race Weight Assignment Table Factor Points Field Strength (fastest times) Importance of Race 10 4 National "A" Team athletes Olympic/National Team Trials
U.S. Nationals9 3 National "A" Team athletes CIWS Finals
Junior Trials
Jr/Sr/Ms Nationals8 2 National "A" Team athletes CIWS Qualifiers
Junior Olympics7 1 National "A" Team athlete Team Trials Qualifier/USOF Qualifiers
Mid-America Series6 "A" ranked athlete Divisional Championships
Major Cup Series, Major Double Headers
Junior Olympic Qualifiers5 "B" ranked athlete Other Local/Regional Races
C-D Race Series4
3"C" ranked athlete Citizens Races 2
1"D" ranked athlete Flatwater/Pool/Jiffy Slaloms
Table 3.13
4. Good things about current system
4.1. Use of fastest boat as metric
Performance consistency vs. percent-of-fastest-boat Race Ratio #runs used Average % difference 1.000 (fastest) 130 2.3% 1.000 to 1.111 (90%-100% of fastest) 798 2.6% 1.111 to 1.250 (80%-90% of fastest) 1496 4.7% 1.250 to 1.428 (70%-80% of fastest) 1616 5.9% 1.428 to 1.666 (60%-70% of fastest) 1004 7.7% 1.666 to 2.000 (50%-60% of fastest) 516 9.2%
Table 4.1
4.2. Averaging of data from multiple races
4.3. Use of as many races as possible
5. Problems with the current system
5.1. Problems with importance factor
Let's start by listing the races whose importance factors (half the
overall weight) are >= 8. A look at the rankings for any of the past
several years will show that these races have a very heavy influence
on the rankings of paddlers in the A & B classes.
Here are the top-weighted races over the last four years under the
current system:
Races with importance factor >=8, 1996-2000 Event #Races/Year Weight Restrictions '96-'00 Locations Team Trials 3 [*] 10 ICF classes only
qualifiersTN, WI, WI, TN, TN Nationals 1 10 none TN, WI, IN, WI, CA Jr/Sr/Ms Nationals 1 9 age < 18 or age > 30 IN, ID, IN, IN, IN Jr Team Trials 2 9 ICF classes only
age <= 18NH, IN, NH, IN, WI Jr Olympics 1 8 age <= 18
qualifiersWI, NC, VA, CO, TX [*] 1996 Team Trials had only 2 races.
Table 5.1
5.1.1. Gender bias in importance factor
Races with importance factor >= 8, 1996-2000
ICF classes only
All classes Weight Event 1996 1997 1998 1999 2000 Total
1996 1997 1998 1999 2000 Total 10 Sr Team Trials 2 3 3 3 3 14 0 0 0 0 0 0 10 Nationals 1 1 1 1 1 5 1 1 1 1 1 5 9 Jr Team Trials 2 2 2 2 2 10 0 0 0 0 0 0 9 Jr/Sr/Ms Natls 1 1 1 1 1 5 1 1 1 1 1 5 8 Jr Olympics 1 1 1 1 1 5 1 1 1 1 1 5 Total Races 7 8 8 8 8 39 3 3 3 3 3 15
Table 5.2
5.1.2. Geographic bias in importance factor
Races with importance factor >= 8, 1996-2000,
distribution by stateWeight Name 1996 1997 1998 1999 2000 Total 10 Sr Team Trials 2,TN 3,WI 3,WI 3,TN 3,TN 14 10 Nationals 1,TN 1,WI 1,IN 1,WI 1,CA 5 9 Jr Team Trials 2,NH 2,IN 2,NH 2,IN 2,WI 10 9 Jr/Sr/Ms Natls 1,IN 1,ID 1,IN 1,IN 1,IN 5 8 Jr Olympics 1,WI 1,NC 1,VA 1,CO 1,TX 5 Total Races 7 8 8 8 8 39 Total Races
east of Mississippi River7 7 8 7 6 35 Total Races
west of Mississippi River0 1 0 1 2 4
Table 5.3
5.1.3. Age bias in important factor
Events open to 17-year-old and 19-year-old C-1's with/without qualifying Weight Event Races 17-year-old
w/o qualifying19-year-old
w/o qualifying17-year-old
with qualifying19-year-old
with qualifying10 Nationals 1 1 1 1 1 10 Sr Team Trials 3 - - 3 3 9 Jr Team Trials 2 2 - 2 - 9 Jr/Sr/Ms Natls 1 1 - 1 - 8 Jr Olympics 1 - - 1 - - Totals 8 4 1 8 4
Table 5.4
5.1.4. Obsolescence/vagueness of importance factor
5.1.5. Biases in importance factor acting in combination
5.2. Problems with Field Strength
5.2.1. Field strength problem with inclusion of boats other than fastest
5.2.2. Field strength problem with variance within team
1999 US National Team Rank Ratios (from 1998 rankings) Class Name(s) Rank Ratio K-1 Shipley, Scott 1.000 K-1 Beakes, Jason 0.976 K-1 Parsons, Scott 0.972 K-1 Giddens, Eric 0.964 C-1 Hearn, David 0.899 C-1 Jacobi, Joe 0.866 C-1 Michelson, Kevin 0.856 C-2 Haller, Lecky/Taylor, Matt 0.852 C-1 Conklin, Steve 0.833 K-1W Hearn, Cathy 0.824 C-2 Hepp, David/McCleskey, Scott 0.816 K-1W Bennett, Rebecca 0.809 K-1W Leith, Sarah 0.788 C-2 Ennis, Chris/Grumbine, John 0.785 K-1W Seaver, Mary Marshall 0.780 C-2 Long, Chad/Long, Kenneth 0.776
Table 5.5
5.2.3. Field strength problem with variance within class
1999 Rank Ratios of highest/lowest B-ranked boats Class Class/Ordinal Name(s) Rank Ratio K-1 B21 Dressen, Richard 0.850 C-1 B11 Haller, Lecky 0.759 K-1W B10 Beakes, Nancy 0.699 C-2 B5 Babcock, Frank/Larimer, Jeff 0.678 K-1 B56 Gagne, Patrick 0.652 C-1 B32 Larimer, Jeff 0.600 C-2 B12 Peterman, Will/Winger, Ethan 0.556 K-1W B28 Warner, Heather 0.542
Table 5.6
5.2.4. Field strength problem with team/non-team boats
5.2.5. Field strength problem with team boats and yearly competitive calendar
5.2.6. Field strength problem with team boats and geography
5.3 Problems with importance factor and field strength acting in combination
6. A recommended approach
6.1. Why this approach?
6.2. How race results indicate difficulty
Example hard race Class Score A 137.1 B 188.4 C 530.2 D DNF
Table 6.1
Example moderate race Class Score A 137.1 B 156.2 C 194.9 D 311.5
Table 6.2
6.3. Quantifying the difficulty factor
Difficulty calculations-hard race Class Score Race Ratio Rank Ratio Ratio Product A 137.1 1.000 .887 .877 B 188.4 1.374 .713 .979 C 530.2 3.867 .422 1.631 D DNF - .274 - Average (race ratio <= 150%, 2 boats) .928
Table 6.3
Difficulty calculations-moderate race Class Score Race Ratio Rank Ratio Ratio Product A 137.1 1.000 .887 .887 B 156.2 1.139 .713 .812 C 194.9 1.421 .422 .599 D 311.5 2.272 .274 .622 Average(race ratio <= 150%, 3 boats) .766
Table 6.4
6.4. The proposed weighting system in practice
Race weights for 1999 Race Difficulty Metric Rank Ratio of Fastest Race Weight Trials-1 1.000 1.000 1.000 Trials-3 0.986 1.000 0.993 Ocoee-DBH-1 0.969 1.000 0.984 Ocoee-DBH-2 0.962 1.000 0.981 Trials-2 0.956 0.957 0.956 Nationals 0.903 1.000 0.951 Rattlesnake 0.931 0.960 0.945 Tariffville 0.892 0.960 0.926 NOC 0.894 0.957 0.925 Dickerson1 0.837 0.960 0.898 Mid-Am-1 0.821 0.960 0.890 Dickerson2 0.821 0.950 0.885 Jr-Trials-1 0.808 0.917 0.862 Aspen-DBH-1 0.787 0.918 0.852 SnydersMill 0.813 0.888 0.850 Mulberry 0.728 0.959 0.843 Jr-Trials-2 0.766 0.917 0.841 FIBArk 0.762 0.918 0.840 Aspen-DBH-2 0.765 0.915 0.840 Southeasterns-1 0.767 0.905 0.836 Jr-Sr-Ms-Natl 0.742 0.917 0.829 Yampa 0.712 0.915 0.813 Jr-Olympics 0.676 0.917 0.796 PEPCO-1 0.646 0.927 0.786 PEPCO-2 0.628 0.927 0.777 Animas 0.809 0.709 0.759 Riverfest 0.556 0.917 0.736 Mid-Am-2 0.740 0.721 0.730 BCE-JrOlyQual 0.561 0.892 0.726 Missouri-1 0.701 0.721 0.711 Amoskeag 0.557 0.863 0.710 Texas-Spring-2 0.663 0.754 0.708 Texas-Fall-2 0.633 0.754 0.693 WACKO-1 0.630 [2] 0.721 0.675 WACKO-2 0.630 [2] 0.721 0.675 Mascoma 0.708 0.619 0.663 BigPiney 0.572 0.754 0.663 Texas-Fall-3 0.544 0.761 0.652 Texas-Spring-1 0.519 0.761 0.640 Southeasterns-2 0.592 0.685 0.638 Missouri-2 0.592 0.685 0.638 Loyalsock 0.573 0.699 0.636 Texas-Spring-3 0.492 0.761 0.626 Texas-Fall-1 0.484 0.761 0.622 Esopus-DBH-1 0.557 0.680 0.618 Esopus-DBH-2 0.555 0.680 0.617 Bellefonte 0.527 0.699 0.613 DogDays 0.516 0.699 0.607 WestQuals 0.569 0.604 [1] 0.586 JuneJamboree 0.449 0.699 0.574 Fiddlehead 0.474 0.636 0.555 Farmington 0.424 0.680 0.552 CoveredBridge 0.475 0.628 0.551 Punchbrook 0.497 0.578 0.537 Salmon 0.445 0.628 0.536 Snoqualmie 0.557 0.495 0.526 Gallatin 0.488 0.550 0.519 Nooksack 0.543 0.487 0.515 TJClassic 0.397 0.604 0.500 Mokelumne 0.397 0.604 0.500 SalmonlaSac 0.497 0.490 0.493 Riversport 0.475 0.450 0.462 Blackwater 0.456 0.463 0.459 Payette 0.506 0.388 0.447 Esopus 0.482 0.407 0.444 Lehigh 0.428 0.450 0.439 FallCreek 0.378 0.463 0.420 FiddlersElbow 0.430 0.407 0.418 Codorus 0.406 0.407 0.406 SE-JrOlyQual 0.435 0.375 0.405
Table 6.5
[1] The West Qualifier was lightly attended, and was
won --narrowly -- by a boat that was unranked
at the time. This is an estimate based on the
rank ratio of the 2nd-place boat.
[2] Estimated based on better-of-two results because
I don't have results for both runs.
6.5. Assessing the proposed weighting system's effectiveness
6.6. How right is right?
Mean-squared error computations for moderate race, ranking system "1" Class Score Race Ratio Rank Ratio Rank Ratio Fastest/Rank Ratio Error A 137.1 1.000 .887 1.000 (1.000-1.000)^2 = 0.000 B 156.2 1.139 .713 1.244 (1.139-1.244)^2 = 0.011 C 194.9 1.421 .422 2.109 (1.421-2.109)^2 = 0.473 D 311.5 2.272 .274 3.237 (2.272-3.237)^2 = 0.931 Mean-squared error = 0.472
Table 6.6
Mean-squared error computations for moderate race, ranking system "2" Class Score Race Ratio Rank Ratio Rank Ratio Fastest/Rank Ratio Error A 137.1 1.000 .900 1.000 (1.000-1.000)^2 = 0.000 B 156.2 1.139 .750 1.200 (1.139-1.200)^2 = 0.004 C 194.9 1.421 .405 2.222 (1.421-2.222)^2 = 0.642 D 311.5 2.272 .375 2.400 (2.272-2.400)^2 = 0.164 Mean-squared error = 0.270
Table 6.7
6.7. How wrong is wrong?
Comparison of current and proposed ranking systems Class Rank Name(s) Rank Ratio Percentile Rank Name(s) Rank Ratio Percentile K-1W A1 Hearn, Cathy 0.824 100.0 A1 Bennett, Rebecca 0.815 100.0 K-1W A2 Bennett, Rebecca 0.809 98.2 A2 Hearn, Cathy 0.807 99.0 K-1W A3 Altman, Renata 0.799 97.0 A3 Leith, Sarah 0.800 98.2 K-1W A4 Leith, Sarah 0.788 95.6 A4 Altman, Renata 0.788 96.7 K-1W A5 Seaver, Mary Marshall 0.780 94.7 A5 Seaver, Mary Marshall 0.773 94.8 K-1W A6 Larsen, Hannah 0.743 90.2 A6 Stalheim, Megan 0.754 92.5 K-1W A7 Stalheim, Megan 0.742 90.0 A7 Miller, Aleta 0.728 89.3 K-1W A8 Miller, Aleta 0.726 88.1 A7 Larsen, Hannah 0.728 89.3 K-1W A9 Jorgensen, Anna 0.714 86.7 A7 Beakes, Nancy 0.728 89.3 K-1W B10 Beakes, Nancy 0.699 84.8 A10 Jorgensen, Anna 0.719 88.2 C-1 A1 Hearn, David 0.899 100.0 A1 Hearn, David 0.887 100.0 C-1 A2 Jacobi, Joe 0.866 96.3 A2 Jacobi, Joe 0.845 95.3 C-1 A3 Michelson, Kevin 0.856 95.2 A2 Conklin, Steve 0.845 95.3 C-1 A4 Ennis, Chris 0.849 94.4 A4 Michelson, Kevin 0.839 94.6 C-1 A5 Conklin, Steve 0.833 92.7 A5 Ennis, Chris 0.833 93.9 C-1 A5 Bahn, Ryan 0.833 92.7 A6 Boyd, Adam 0.829 93.5 C-1 A7 Boyd, Adam 0.817 90.9 A7 Bahn, Ryan 0.826 93.1 C-1 A8 Davis, Samuel 0.804 89.4 A8 Sanders, Lee 0.819 92.3 C-1 A9 Sanders, Lee 0.797 88.7 A9 Davis, Samuel 0.813 91.7 C-1 A10 Crane, Austin 0.782 87.0 A10 Crane, Austin 0.789 89.0 K-1 A1 Shipley, Scott 1.000 100.0 A1 Shipley, Scott 0.992 100.0 K-1 A2 Beakes, Jason 0.976 97.6 A2 Beakes, Jason 0.958 96.6 K-1 A3 Parsons, Scott 0.972 97.2 A3 Parsons, Scott 0.949 95.7 K-1 A4 Giddens, Eric 0.964 96.4 A4 Giddens, Eric 0.944 95.2 K-1 A5 Geltman, Louis 0.934 93.4 A5 Jackson, Eric 0.936 94.4 K-1 A6 Heyl, Brett 0.923 92.3 A6 Geltman, Louis 0.928 93.5 K-1 A6 Jackson, Eric 0.923 92.3 A7 Smith, Shaun 0.916 92.3 K-1 A8 Braunlich, Kurt 0.920 92.0 A8 Kimmet, Nick 0.915 92.2 K-1 A9 Nielsen, Corey 0.912 91.2 A9 Heyl, Brett 0.913 92.0 K-1 A10 Harris, Cody 0.905 90.5 A10 Braunlich, Kurt 0.909 91.6
Table 6.8
Comparison of current and proposed ranking systems
1999 Results for K-1 Shaun Smith
Weighted Race Ratio as shown in Section 3.9
Three best races highlightedRace Race Ratio Weighted Race Ratio
Current SystemWeighted Race Ratio
Proposed SystemOcoee-DBH-1 1.080 0.741 0.912 Ocoee-DBH-2 1.046 0.765 0.938 NOC 1.111 0.765 0.833 Trials-1 1.112 0.899 0.899 Trials-2 1.288 0.776 0.743 Trials-3 2.268 0.441 0.438
Table 6.9
7. Directions for future research
7.1. Difficulty Metric
7.2. Race Weighting
7.3. Race Inclusion
7.4. Time-weighting
7.5. Result selection - better-of-two vs. combined
7.6. Recomputation interval
8. References
9. Acknowledgements
All contents © copyright Rich Kulawiec 1999, 2000.
All rights reserved.
Please send comments to
Rich Kulawiec.
Contact: webmaster.