hit, git and split

In Rolling Stone’s current piece on the best music blogs of 2008 (update: link corrected), three out of the four winners which do MP3s are using Yahoo! Media Player. That’s a pretty good score, and it shows that the media player has found a place in the world. I feel great. It’s like seeing your kid graduate from college and get a decent job. To see that the software has grown up, or at least reached its decadent 20s, means the completion of a long project.

Early work on the software now known as Yahoo! Media Player began in fall of 2004 as version 2.0 of Webjay. My vision was to move outwards from Webjay’s centralized form, and rather than have a site for authoring playlists have any old web authoring tool on any old site be able to create playlists using HTML. The difference would be ease of use — rather than go to Webjay, it would come to you. When Webjay got acquired by Yahoo! this unreleased software was picked up and began a new life as a project codenamed “goose.” During my first year at Yahoo!, while Webjay proper was going down, the new version was coming up, and right around the same time that we officially shuttered webjay.org we also bootstrapped a good development team for goose.

The first goose release went live on July 31, 2007 in the most modest way I could arrange — as a player for 30 second samples in an easter egg page within Yahoo!’s massively trafficked music site. The power of an AJAX-based player was evident in a subtle way, though, in that it supported Yahoo’s subscription service in off-the-shelf Internet Explorer on Windows. The subscription service wasn’t an impressive product, but the underlying code was truly hair-raising and couldn’t have been done with a traditional Flash MP3 player.

The next major iteration was in January of 2008, when we released a version of the player which could handle third-party content and run in third-party pages. It was a dramatically more open piece of work and we got great buzz right out of the gate, with articles all over blogdom and a warm reception from users. From there we picked up the pace on the release schedule quite a lot, turning the crank on a new rev a month later. Along with user interface changes based on feedback, we introduced the ability to open XSPF playlists that weren’t available to straight AJAX or Flash apps, as well as an integrated screen scraper that could turn almost any page on the web into a playlist just by linking to it. A month later we did the last rev of the first version of the player. This had many fit and finish improvements, auto-attribution for MP3 hosts being deep-linked, a buy button with an affiliate program for web publishers, and a “Find in page” button to help you associate a track with the place in the page that it came from. The first major version was complete. We went into quiet mode to work on version 2.0, which will be out in alpha form very soon and will have significant improvements.

And with that, my part in this is done. There is an excellent team to run the show, the product has good support on the business side of Yahoo!, there is a healthy user and developer community, and the software has good market share. It’s time for me to let go and move on, and so today is my last official day at Yahoo!

I don’t know exactly what I’ll do next, though I do have general ideas about areas to explore. What I do know is that tomorrow morning I’ll sit down to start work on whatever comes next.

Thanks for everything, y’all. See you on the flip side.

Checking in

I suppose I ought to blog once in a while, so this post is to check in. Since last time I posted here the big news is that the software I have been working on in stealth mode for the last three years finally went public under the name of Yahoo! Media Player. It has gotten great reactions, been picked up on a bunch of notable pages, and been covered by well known sites like Tech Crunch. This software was originally going to be Webjay 2.0, but wasn’t released before the Yahoo! acquisition and ended up becoming the nucleus of a new Yahoo! project.

It’s not much like Webjay the site, which was a combination playlist editor, portal, generator, and social networking site. But philosophically it is still about media with URLs, openness, sharing, and interoperability.

It is also still about playlists. But it is a major twist on the concept. The player accepts all sorts of traditional playlists, like XSPF and M3U, as well as feed formats like RSS and Atom; it even has an integrated screen scraper which can use a remote web page as a playlist. But primarily the web page in which the player is embedded is the playlist.

Web pages are a very good playlist format. They are visually customizable, semantically rich, standardized, documented, open, flexible, decentralized and implemented world-wide. To the extent that they didn’t have syntax for everything playlist-oriented, we were able to use semantic HTML with a light sprinkling of extensions.

However I can’t use the player on my blog here, which is the reason why I haven’t been writing on this blog. This blog is hosted by wordpress.com, which blocks out Javascript. I need to move my blog to another host.

The code name for the player project, by the way, was “goose.”

Webjay playlist popularity metric

Someone asked me recently about the Webjay popularity metric. It was a good metric — simple and reliable — so I thought I’d pass it along here. I do this with confidence that Yahoo doesn’t mind because its metrics are much more sophisticated.

The metric was based on playlist plays, so if somebody played a playlist this was used as input. A “play” was defined as fetching a playlist in a playable format like XSPF, M3U, SMIL, or ASX.

Who it was that fetched the playlist was recorded, so that we could filter out plays according to various reputation metrics. The reputation metric that ended up the winner was how old an account was. I tried others that were more exclusive, but they ended up filtering out so much data that there wasn’t enough left for the statistics to be reliable. By sticking to plays by people who had been around a while, we got rid of plays by people who were just looking around for the first time. New people invariable play the most popular items, so filtering out their activity fixed a feedback loop. (Note to old Webjay hands: feedback loops like this were the reason why the same few playlists would get stuck to the top of the listings).

At this point we had a set of play data that covered the entire lifespan of the project. If we counted each play as a single point, the sum would give the relative popularity of a playlist within all playlists. It would be a hall of fame with playlists from different time periods competing. (Though the point scores would have had to be normalized against the changing popularity of the site by dividing against the total points within a given time period). Given the amount of data and the number of competing playlists for such a large time period, the results would probably have been an accurate indicator of playlist quality.

However, we needed a sense of freshness, because regular visitors want to know what’s happening on an ongoing basis. To make this work the timestamps of the plays were recorded, and plays were given more value if they were more recent. Timestamps were used because they happen to ascend perfectly, which makes them monotonic. The ranking of a playlist was the sum of the timestamps.

However, there was again a feedback loop. The most popular playlists of all time still had an advantage in the popularity listing on the home page, and thus still got stuck to the top of the listing. There was a need to allow playlists to compete within different time windows, so that they could be on even footing. New candidates should be competing with other new candidates.

To set the time window of the ranking, the plays were filtered according to different time periods. I think the time periods were a day, a week, two weeks and a month. This gave us popularity contests among peers. The best playlist today, the best playlist this month, etc. Note that the filtering didn’t rely on when a playlist was created, so sometimes an old one would be rediscovered and rise to the top.

So which time window to use? There could have been pages dedicated to each one, but traffic off the home page was always going to dominate. Also, it is inconvenient to make users click around. The solution was for the different popularity contests to share the home page. This was done by choosing a random time window within the four possible time windows each time the popularity rankings were calculated. On a user level what this meant was that the home page would be showing one of four different rankings depending on when you viewed it.

This constantly shifting ranking set worked to sift playlists up through the ranks. A promising new playlist would get exposure by appearing on the home page in the “today’s most popular” set. It would compete with the other brand new playlists for enough popularity to advance to the “this week’s most popular” set. If it made the cut, it would then be on a footing to advance to the two-week set, and from there to the 1-month set. At each step a bit of popularity gave the playlist opportunity to compete for more.

A bit of good luck was that this metric captured the attention span of the community. A good playlist would be discovered, rise to the top, be tried out by all the regulars, and sink down as the regulars got bored with it.

A deliberate strength of this metric was that it was based on actual behavior rather than on vote counts, so was not as gameable as systems using the Digg approach. This also provided more input data, which improves the quality of statistics.

A weakness of this method was that it relied on a single home page, and a single ranking can never be representative of all the different interest groups. A project that I never got to do was to filter according to similarity with an input set of playlists or playlisters, so that you’d have the world according to Jim (who likes punk and modern classical) or according to Chromegat (who likes hip hop).

So that’s the metric. It developed over many sessions of trying to manage feedback loops and turn user behaviors into meaningful data, and took a lot of tweaking to get right. I hope this is useful to others.