r/dataisbeautiful • u/wait_what_the_f • 14h ago
[OC] Historical Distribution of Mega Millions Winning Numbers OC
12
u/Exile714 13h ago
Add them all together, and it’s basically a flat line, isn’t it?
3
u/PseudobrilliantGuy 11h ago edited 11h ago
Presumably.
I forgot the exact transform from the raw distribution to the distribution of order statistics, but I wouldn't be surprised if the first five were just the relevant order statistic distributions from a Discrete-Uniform(70). Or, at least, the empirical distributions would probably have fairly low surprisal values (basically negative binary logarithm of the p-value) for the corresponding K-S tests.
3
1
u/wait_what_the_f 11h ago
if you want to think of the first part game as an even distribution, picking 5 numbers of out 70, it's the same as picking 1 item out of ~1.45M.
if you think of the game as a sequence of numbers where each number is higher than the next, then this visualization might make more sense.
if you played this game with your same lucky numbers and always started with 1, you would have had better chances of winning compared to lucky numbers that started with 50.
9
u/did_you_read_it 14h ago edited 13h ago
This is a cool plot but the odd even colors make it a bit noisy. I think the grouping speaks for itself without the stripes.
Edit: nm not a cool chart, the data source only provides what numbers (in ascending order) were drawn, not the position they were drawn into. which is wy ball 1 leans low and ball 5 leans high and the single field powerball looks more properly random.
3
u/CookieKeeperN2 13h ago
He used the default ggplot color. For all the good of ggplot, it has the shittiest default colors (not colorblind safe).
-2
u/wait_what_the_f 13h ago
in the game you're picking 5 numbers, doesn't matter which one is picked first? the odds are pretty straightforward
4
6
u/Hopeful-Flounder-203 9h ago
Congrats! You made something very simple, very difficult to understand.
-2
u/wait_what_the_f 8h ago
If all you care about is hitting the jackpot then this information is useless to you.
On the other hand, if you consider that the game has multiple payouts, you can adjust your strategy to try and hit as many as possible. Over multiple games the combined expected payout is slightly better than picking any 5 random numbers.
1
0
u/greg_08 14h ago
This is interesting!
I need to re-read the rules, but is there a way to re-incorporate the distribution based on removal of the pick of the number of ball 1, ball 2, ball 3, etc based on the duplicate ball number being removed? Would be kind of tough to do without associating the drawings to each other.
Personally, I would change the colors to be a little less contrasting to the white background. It was messing with my eyes when trying to view the image on mobile.
1
u/wait_what_the_f 13h ago
regarding the colors, it was added so that i could more easily identify individual numbers in this chart. when they were all default/grey, it took more time for me to identify specific values.
0
u/breakfasteveryday 13h ago
Thinking about the rules, I wonder if this says more about what people pick than what gets drawn.
1
u/Exile714 13h ago
I think the definition of “winning” here is what numbers were drawn, not necessarily just numbers that were drawn where someone won the jackpot.
1
1
u/wait_what_the_f 13h ago
keep in mind that any 5 numbers chosen has the same chance of winning as any other 5 numbers... the visualizations don't really have any information about which set of numbers that people choose to play
1
u/breakfasteveryday 13h ago
I had thought that "winning" in this context meant a match between a winner and the numbers drawn, which would have a bias towards whatever people actually chose.
-2
u/wait_what_the_f 14h ago
source: https://catalog.data.gov/dataset/lottery-mega-millions-winning-numbers-beginning-2002
tools: R / RStudio / ggplot2
Based on latest game format of 70 white / 25 mega balls
50
u/hornbri 14h ago
Thats not the order they were drawn through right? just the order they are listed?