making a case for portable identifiers

One followup to my post on portable identifiers for songs using XSPF’s content resolution abilities happened on J. Herskowitz’ blog. I asked whether the problem in developing interoperability between music services is technical or economic. J’s answer was:

I think it is both. Since there appears to be a need for ongoing resolver work to map to lots of catalogs, the opportunity cost of one company to do so becomes too high. Just look at Paul Lamere’s work on Spiffy (http://research.sun.com:8080/SpiffyContentResolver/)- it was a great start, but he couldn’t rationalize the opportunity costs to keep it going.

As a consumer, I want it though…. I want to be able to find a playlist somewhere and then click “play” – by which enables me to determine what vendor fulfills it. Napster, Rhapsody, Yahoo, YouTube, free-range MP3s, etc.

Paraphrasing him, the value to users seems clear enough, but the work to enable it need to be shared across vendors, since no one vendor benefits more than the others. It’s social value which has to be funded by everybody and nobody.

Back here at home I asked the question slightly differently: does this technology provide enough business benefit to be worth implementing? If not, what would have to be different?

Jay Fienberg came back with an answer a lot like J’s:

I think there’s a bit of a mismatch here: catalog resolution of the type described is especially beneficial and necessary in “open” multiple-catalog systems–where the goal is linking / sharing info between as many systems as possible. And, the question is being asked of people involved in furthering the goals of “closed,” single-catalog systems.

These single-catalog systems have the goal of, more or less, focusing only on incoming links, e.g., focusing on making their single catalog a more unique authority.

I think another way to look at this would be: how hard would it be for these services expose to their own unique, permanent, identifiers to the public? (Not very, one would imagine.) Then, rather than these services building their own catalog resolution systems, they could make it possible for others to do so.

Similarly, Scott Kveton of MyStrands said: From the MyStrands perspective we’re simply not in the catalog resolution business. I would wager that Pandora isn’t either.

Jay’s trick of flipping the question around is insightful. Almost all online music businesses right now are in the distribution business, even if they see other functions like discovery or social connection as their main value, because they have no way to connect their discovery or social connection features with a reliable provisioning service from a third party. But provisioning is a commodity service which doesn’t give anybody an edge. They don’t want to import playlists from third parties because *that’s* where they are adding value.

Exporting playlists for others to provision, though, is a different story, and it makes much more sense from a business perspective. Let somebody else deal with provisioning. This is what it would mean for somebody like Launchcast or Pandora to publish XSPF with portable song identifiers that could be resolved by companies that specialize in provisioning.

Chris Anderson said:

The portability problem is a bit of a prisoner’s dilemma for music providers. If everyone addresses it, the benefit is great, but if only a few do, and in different ways, then the costs can outweigh the gains.

In the absence of a bottom-up revolution resulting in audio resources that can be resolved to, there has to be cooperation among audio brokers. Perhaps Imeem et. al. could provide an API that takes XSPF <track/> fragments and provides a flash widget with the appropriate content.

And Scott Kveton again:

What I would love to talk about is using something akin to Musicbrainz to be the public commons that companies like MyStrands, Last.fm, Pandora and others can use as a basis for playlist portability.

And that’s where internet music vendors are right now: stuck waiting for ways to cooperate without disarming unilaterally. The closest thing to cooperation is that companies are willing to export Flash widgets that can be embedded in any third party site, and the reason we’re using Flash is that it allows us to define and limit points of interoperability.

Ok, so let’s just say that the business and technical problems can be factored into separate projects. Yves has been working on the technical problem of mapping identifiers from different vendors into a unified framework:

I played a bit with such lookup algorithms (using metadata+acoustic fingerprints) when I experimented linking a Creative Commons label collection (Jamendo) and Musicbrainz – this is described here, and uses a technique close to the “similarity flooding” one in the record linkage community:
http://blog.dbtune.org/post/2007/06/11/Linking-open-data%3A-interlinking-the-Jamendo-and-the-Musicbrainz-datasets

Yves’ work deals with interlinking experiences based on the Jamendo dataset, in particular equivalence mining – that is, stating that a resource in the Jamendo dataset is the same as a resource in the Musicbrainz dataset.

For example, we want to derive automatically that http://dbtune.org/jamendo/artist/5 is the same as http://musicbrainz.org/artist/0781a….

It’s a fascinating and productive investigation. I am aware of at least one private proprietary effort to do this kind of thing, but no open project, and this is exactly where work has to be (as Scott says above) for multiple vendors to become interoperable without unilateral disarmament. One immediately useful result of this work is to make a direct connection between the XSPF concept of content resolution and the semantic web concept of Equivalence Mining and Matching Frameworks. This allows music developers familiar with the application domain of catalog management to benefit from high-academia research into techniques that can be used to auto-generate links between data items within different datasources.

Phew. I’m done. This was a hard post to write because I had to digest all the different strands in this conversation. It took a long time to figure out what people were talking about. Still, now that I’ve done the legwork I feel like I understand the problem better than before, even if parts are still a complete mystery.