Webjay playlist popularity metric

Someone asked me recently about the Webjay popularity metric. It was a good metric — simple and reliable — so I thought I’d pass it along here. I do this with confidence that Yahoo doesn’t mind because its metrics are much more sophisticated.

The metric was based on playlist plays, so if somebody played a playlist this was used as input. A “play” was defined as fetching a playlist in a playable format like XSPF, M3U, SMIL, or ASX.

Who it was that fetched the playlist was recorded, so that we could filter out plays according to various reputation metrics. The reputation metric that ended up the winner was how old an account was. I tried others that were more exclusive, but they ended up filtering out so much data that there wasn’t enough left for the statistics to be reliable. By sticking to plays by people who had been around a while, we got rid of plays by people who were just looking around for the first time. New people invariable play the most popular items, so filtering out their activity fixed a feedback loop. (Note to old Webjay hands: feedback loops like this were the reason why the same few playlists would get stuck to the top of the listings).

At this point we had a set of play data that covered the entire lifespan of the project. If we counted each play as a single point, the sum would give the relative popularity of a playlist within all playlists. It would be a hall of fame with playlists from different time periods competing. (Though the point scores would have had to be normalized against the changing popularity of the site by dividing against the total points within a given time period). Given the amount of data and the number of competing playlists for such a large time period, the results would probably have been an accurate indicator of playlist quality.

However, we needed a sense of freshness, because regular visitors want to know what’s happening on an ongoing basis. To make this work the timestamps of the plays were recorded, and plays were given more value if they were more recent. Timestamps were used because they happen to ascend perfectly, which makes them monotonic. The ranking of a playlist was the sum of the timestamps.

However, there was again a feedback loop. The most popular playlists of all time still had an advantage in the popularity listing on the home page, and thus still got stuck to the top of the listing. There was a need to allow playlists to compete within different time windows, so that they could be on even footing. New candidates should be competing with other new candidates.

To set the time window of the ranking, the plays were filtered according to different time periods. I think the time periods were a day, a week, two weeks and a month. This gave us popularity contests among peers. The best playlist today, the best playlist this month, etc. Note that the filtering didn’t rely on when a playlist was created, so sometimes an old one would be rediscovered and rise to the top.

So which time window to use? There could have been pages dedicated to each one, but traffic off the home page was always going to dominate. Also, it is inconvenient to make users click around. The solution was for the different popularity contests to share the home page. This was done by choosing a random time window within the four possible time windows each time the popularity rankings were calculated. On a user level what this meant was that the home page would be showing one of four different rankings depending on when you viewed it.

This constantly shifting ranking set worked to sift playlists up through the ranks. A promising new playlist would get exposure by appearing on the home page in the “today’s most popular” set. It would compete with the other brand new playlists for enough popularity to advance to the “this week’s most popular” set. If it made the cut, it would then be on a footing to advance to the two-week set, and from there to the 1-month set. At each step a bit of popularity gave the playlist opportunity to compete for more.

A bit of good luck was that this metric captured the attention span of the community. A good playlist would be discovered, rise to the top, be tried out by all the regulars, and sink down as the regulars got bored with it.

A deliberate strength of this metric was that it was based on actual behavior rather than on vote counts, so was not as gameable as systems using the Digg approach. This also provided more input data, which improves the quality of statistics.

A weakness of this method was that it relied on a single home page, and a single ranking can never be representative of all the different interest groups. A project that I never got to do was to filter according to similarity with an input set of playlists or playlisters, so that you’d have the world according to Jim (who likes punk and modern classical) or according to Chromegat (who likes hip hop).

So that’s the metric. It developed over many sessions of trying to manage feedback loops and turn user behaviors into meaningful data, and took a lot of tweaking to get right. I hope this is useful to others.