Category: Math

  • Interesting Statistical Anomaly – Triathlon Race Timing

    I was analyzing the data (well, really I was just processing the data from the ugly text I can rip off the web into something that I can mangle with Excel) from the 2010 Peachtree City Sprint Triathlon and I found some interesting things.

    First off, all props to Event Tech for getting the results posted so quickly1 although it would be nice if I could pull the data down and actually use it with a bit less manual processing. Basically I have to import this…

    Event Tech Website

    …into Excel and do a bunch of manual manipulation to end up with this…

    Excel Times Replacement

    …which for one allows me to sort by time and do various other things.

    However, today I was doing something I hadn’t done before: I summed up the total of the swim/bike/run/T1/T2 times and compared it with the total time that Event Tech had calculated. Interestingly it was generally off. Off by one, two or three seconds.

    Hmmm… fascinating.

    Even more interesting is that it was rarely off by zero seconds.

    I quickly realized that the total time was always equal to or less than the sum of the individual parts which implied to me that the total time was your chip time from the begin timing mat to the end timing mat and that the summation I had conducted was introducing rounding errors of some sort into the equation. After all, if you finish the swim in 10:00.4, your time on the sheet will say “10:00” but that 0.4 seconds still hangs on there and will contribute to your final time.

    However, that doesn’t work. If you assume, as I did, that there are three splits that are introducing rounding errors (three because the maximum error was three seconds) it should be equally likely that the split will round down as up. That would mean there should have been instances where the split times added together were less than the total chip time. This, however, did not exist. Something was going on.

    The next guess was that all splits were rounding up (three splits total). That would account for the sum of the splits being all greater than the total chip time. However, I ran a Monte Carlo simulation and got these results:

    Bin – Frequency
    0 – 21
    1 – 438
    2 – 464
    3 – 21

    Which is an expected distribution if everything is random however if you analyze the results from the race you get:

    Bin – Frequency
    0 – 11
    1 – 266
    2 – 496
    3 – 159

    This is markedly different from a random result. Something is biasing those numbers. The mean of the difference between the rounded times and the summed times for the Monte Carlo simulation is (as expected) approximately 1.5 but the mean of the race results is 1.8.

    At this point I stopped. I could go on theorizing about why the numbers are off in the specific manner they are, but really it’s not that important.

    What does this all mean? Absolutely nothing! As I mentioned above, your race results are your chip time from start to finish, it’s only when I started summing up their broken-out numbers that I noticed anything wonky and decided to geek out on this. I’m confident that my race time is accurate and even if it was not, it was only off by 1, 2 or 3 seconds. If those seconds make or break me, I should have trained harder!

    I will probably send this link to Event Tech and ask if they have any insight; they probably do. It’s their software after all.


    1: 2010 Tri PTC Results, although the white on black background thing should really go, guys. It’s tough on the eyeballs.

  • Virus Comparison

    One of the blogs I read has an interesting quote:

    To translate that into simpler terms: they didn’t just compare the genome of the new Ebola variant to other Ebola viruses; that wouldn’t be particularly difficult, since the Ebola genome is only about 19K of data. But they compared it to the sequenced genomes of every previously sequenced virus, and found nearly identical sequences in Ebola and feline leukemia virus! [ed: emphasis mine]

  • 365! Is a Big Number

    For reasons I won’t get into, I was looking at the Birthday Problem today. The probability of two people out of a group of n people having the same birthday (month/day) is equal to

    Birthday Problem Probability Equation

    Notice that 365! item? And the subsequent (365-n)! denominator? That pesky “!” means “factorial” and you derive a factorial number by multiplying all of the integers from 1 to n together like this: 1 x 2 x 3 x … x n = n!

    So:
    1! = 1
    2! = 1 x 2 = 2
    3! = 1 x 2 x 3 = 6
    4! = 1 x 2 x 3 x 4 = 24
    10! = 1 x 2 x … x 10 = 3,628,800
    20! = 2.43e18

    Obviously these numbers get big fast. I’m familiar with the factorial so when I read “365!” I actually started laughing out loud. I don’t have access to any particular piece of equipment or software that is capable of calculating that number. Thankfully, Wikipedia has a nice table with a short breakdown of the results, depending on n. Now I know that if ten people are in a room, and assuming a normal distribution of birthdates (which is a faulty assumption), the probability of at least two people sharing a birthday is 11.7%. My day is complete.

  • People are Dumb Everywhere

    MarkCC has a posting linked to another posting about how the British aren’t teaching the itegersintegers very well. Some subset of british citizens are unaware that -10 is lower than -2. Go, check it out, especially the quote.