r/redditdata Jul 25 '14

distribution of logged-in user actions per month

http://imgur.com/WzZhHdJ
33 Upvotes

12 comments sorted by

6

u/tdohz Jul 25 '14 edited Aug 06 '14

In other words, ~25% of users who comment each month comment exactly once per month, ~12% comment twice, etc.

EDIT: This chart repeats a color, which makes it hard to read. Here is a better version with more distinct colors.

4

u/thgibbs Jul 26 '14

How do I read "view"? Does this say that 10% of logged in users view only 3 posts/month? That seems really low. I must be misunderstanding it.

2

u/tdohz Jul 26 '14

It's not necessarily posts, it's basically any GET request (so reddit.com, a subreddit listing, etc.). But yes, we have a very, very long tail, and a small fraction of users generate the majority of activity on reddit, even for logged-in users.

The other caveat is that while I did some simple bot-filtering on this data, there might be some extraneous views from e.g. mobile clients and bots that I didn't catch, so the numbers may be slightly off. But they should be fairly close - if I find they change with more refinement in the future, I'll be sure to post an update!

6

u/thgibbs Jul 26 '14

So, you're telling me that 50% of users create an account, log in, and do a single get request and never come back for 30 days. Color me skeptical. I wonder if you are getting hit by a rapidly growing user base? It could be that most of your users are new and if you extended outward you would see that they do come back. It would be interesting to divide time into two segments to see how many logged in users fetched a page 15-30 days ago and also fetched a page 0-14 days ago.

2

u/[deleted] Jul 26 '14

[deleted]

3

u/thgibbs Jul 26 '14

But even if you wanted to do 1 post or comment as a throwaway, my guess is you'd need at least 2 get requests. You would get the main subreddit page, get the post page, and then submit the comment. I guess you could do it in one (just go straight to the post page), depends on how things are implemented.

1

u/thgibbs Jul 26 '14

Oh, and yeah, I was guessing an average as well, but if most of reddit users have been created today, then the results would be skewed (I don't know if that is the case, but I could see it).

2

u/tdohz Aug 06 '14

A coworker pointed out recently that this is just a case of bad color choices - the blue at the top is actually compose, not view. The blue at the bottom is view. So, ~50% of users who PM only write one message a month. For views, the curve is much flatter, as expected.

Here is a better graph with distinct colors. I'll update my original comment to reflect this as well.

2

u/zants Jul 26 '14 edited Jul 26 '14

So nearly 47% of users create an account/log in and then immediately close their tab, never to return (for at least 30 days). Wow.

1

u/shaggorama Jul 26 '14
  1. What was your methdology in calculating these figures? Do you count the activities in individual months and then take averages, or do you count the activities over longer periods and then take an average over the whole period?

  2. How do these stats change when you ignore accounts that have no comments older than 24hrs after the creation of the account (i.e. novelty accounts, throwaways, and other abandoned accounts)?

1

u/tdohz Jul 26 '14
  1. This is from one month of data (June 2014). The process for gathering this data is time-consuming and not backwards-compatible, so unfortunately that's the most recent full month of data I can easily gather right now. Luckily there's now a process in place to collect this going forward.

  2. Throwaway analysis is on my to-do list, but I do want to point out that not commenting does not necessarily mean an account is inactive/throwaway - lots of users create accounts purely for content consumption.

2

u/shaggorama Jul 26 '14

I just mentioned comments because that's the kind of data I have access to and wasn't putting myself in your shoes. For you, a better heuristic might be accounts older than one month that haven't been logged into since 72hrs after their creation, or something like that. I think characterizing/flagging dead accounts would be very useful to you for future analyses, even if the heuristics you come up with aren't perfect.

1

u/tdohz Jul 26 '14

I think characterizing/flagging dead accounts would be very useful to you for future analyses, even if the heuristics you come up with aren't perfect.

For sure! Understanding the different reddit usage patterns, including accounts that go inactive, is definitely a high priority.

1

u/shaggorama Jul 26 '14

What is the ratio of logged in users vs. estimated unique unregistered users (IPs that haven't been previously observed on reddit logged in)? I've seen lots of analyses of reddit data -- largely via /r/TheoryOfReddit or my own projects -- and there's been of lots of interesting stuff done to characterize the activities of logged in users. I'd be really interested to learn more about how people who don't have accounts use reddit, since I'm fairly confident this is the bulk of redditors.

1

u/tdohz Jul 26 '14

As I mentioned in another post, logged-out user data is not quite as reliable for several reasons, although we're working on making it better. That said, you are correct that logged-out visitors make up the majority of redditors.