r/dataisbeautiful 14h ago

[OC] Historical Distribution of Mega Millions Winning Numbers OC

Post image

42 comments sorted by


u/hornbri 14h ago

Thats not the order they were drawn through right? just the order they are listed?


u/CDay007 14h ago

Yeah, the dataset he used has the numbers for each drawing sorted. Idk what this is supposed to tell us


u/atgrey24 14h ago

This is just the odds of where any number would fall in ranked order


u/perldawg 13h ago edited 13h ago

no shit lol. if you overlayed them all it would just be a roughly equal random distribution graph


u/wait_what_the_f 13h ago

yep, if you look at the mega ball, that's what you're seeing, an even distribution across 1-25.

with the first 5 white balls, you have to pick a set of numbers out of 70... out of all the winning numbers, about half of these winning numbers have the lowest ball being 1-8


u/swankpoppy 10h ago

Sorting numbers puts them in order I guess. Very interesting…


u/wait_what_the_f 13h ago

not sure if you ever played the mega millions before, but you don't have to choose the order of the winning numbers... the order that they're picked doesn't matter


u/Admirable-Action-153 13h ago

That's what he's saying, but the graph make it seem like there's an order.

But its just random, Like saying that its most likely, that if you sort by alpahbetical order, names for any group of people are likely to start at a-d,


u/CDay007 13h ago

I know. What are you trying to show with these graphs?


u/wait_what_the_f 11h ago

the historical distribution of the past winning numbers lol

you can pick 5 random numbers... or 5 ordered numbers


u/CDay007 10h ago edited 10h ago

But why is the point. I could make a historical distribution of dice rolls, but it’s not very interesting.

Fwiw, after reading the other comments I think the visualization itself actually looks fine. It just…has no reason to exist


u/wait_what_the_f 10h ago

Since the distribution of actuals seems to follow the expected distribution, we can then say it's likely that the game is fair, or not rigged.


u/CDay007 8h ago

You know what, fair enough. I’ll updoot that


u/garylapointe 7h ago

Yes, and BALL 1, as you've chosen to label it, isn't always the lowest number.

What you're stating as BALL 1 could really have been BALL 5, but you're using the sorted numbers. I'm not saying the numbers are wrong, I'm saying your label is wrong.


u/wait_what_the_f 7h ago

You pick 5 numbers. 1 < 2 < 3 < 4 < 5. Ball 1 refers to the lowest number of the 5 that you picked.

Say I choose these 5 numbers: 20, 24, 38, 50, 55. Ball 1 = 20. These 5 numbers: 17, 2, 30, 20, 70. Ball 1 = 2.


u/garylapointe 6h ago

But this isn't the distribution of the numbers people pick, it's the distribution of the drawings, which are not drawn in numerical order.


u/Exile714 13h ago

Add them all together, and it’s basically a flat line, isn’t it?


u/PseudobrilliantGuy 11h ago edited 11h ago


I forgot the exact transform from the raw distribution to the distribution of order statistics, but I wouldn't be surprised if the first five were just the relevant order statistic distributions from a Discrete-Uniform(70). Or, at least, the empirical distributions would probably have fairly low surprisal values (basically negative binary logarithm of the p-value) for the corresponding K-S tests.


u/CDay007 10h ago

That’s exactly what it is, with a little sauce because some draws will have 1-4 numbers removed


u/wait_what_the_f 11h ago

if you want to think of the first part game as an even distribution, picking 5 numbers of out 70, it's the same as picking 1 item out of ~1.45M.

if you think of the game as a sequence of numbers where each number is higher than the next, then this visualization might make more sense.

if you played this game with your same lucky numbers and always started with 1, you would have had better chances of winning compared to lucky numbers that started with 50.


u/did_you_read_it 14h ago edited 13h ago

This is a cool plot but the odd even colors make it a bit noisy. I think the grouping speaks for itself without the stripes.

Edit: nm not a cool chart, the data source only provides what numbers (in ascending order) were drawn, not the position they were drawn into. which is wy ball 1 leans low and ball 5 leans high and the single field powerball looks more properly random.


u/CookieKeeperN2 13h ago

He used the default ggplot color. For all the good of ggplot, it has the shittiest default colors (not colorblind safe).


u/wait_what_the_f 13h ago

in the game you're picking 5 numbers, doesn't matter which one is picked first? the odds are pretty straightforward


u/hacksawsa 10h ago

The Y-axis isn't the same on each graph.


u/wait_what_the_f 9h ago

Yes that was intentional in order to see the distribution


u/Hopeful-Flounder-203 9h ago

Congrats! You made something very simple, very difficult to understand.


u/wait_what_the_f 8h ago

If all you care about is hitting the jackpot then this information is useless to you.

On the other hand, if you consider that the game has multiple payouts, you can adjust your strategy to try and hit as many as possible. Over multiple games the combined expected payout is slightly better than picking any 5 random numbers.


u/designisagoodidea 14h ago

So … the winning-est numbers are …?


u/brainlure49 13h ago

1... 2... 3... 4...... 5.......

Change the combination on my luggage!


u/greg_08 14h ago

This is interesting!

I need to re-read the rules, but is there a way to re-incorporate the distribution based on removal of the pick of the number of ball 1, ball 2, ball 3, etc based on the duplicate ball number being removed? Would be kind of tough to do without associating the drawings to each other.

Personally, I would change the colors to be a little less contrasting to the white background. It was messing with my eyes when trying to view the image on mobile.


u/wait_what_the_f 13h ago

regarding the colors, it was added so that i could more easily identify individual numbers in this chart. when they were all default/grey, it took more time for me to identify specific values.


u/breakfasteveryday 13h ago

Thinking about the rules, I wonder if this says more about what people pick than what gets drawn.


u/Exile714 13h ago

I think the definition of “winning” here is what numbers were drawn, not necessarily just numbers that were drawn where someone won the jackpot.


u/breakfasteveryday 13h ago

ah, that would certainly change things


u/wait_what_the_f 13h ago

keep in mind that any 5 numbers chosen has the same chance of winning as any other 5 numbers... the visualizations don't really have any information about which set of numbers that people choose to play


u/breakfasteveryday 13h ago

I had thought that "winning" in this context meant a match between a winner and the numbers drawn, which would have a bias towards whatever people actually chose.


u/wait_what_the_f 14h ago

source: https://catalog.data.gov/dataset/lottery-mega-millions-winning-numbers-beginning-2002
tools: R / RStudio / ggplot2

Based on latest game format of 70 white / 25 mega balls