attribution and reuse

Play the Web is a blog with the premise of exploring technical hurdles for making chains of derivative works:

On this blog we want to talk about media reuse on the Internet and enabling reuse in a responsible way. Media companies’ reactionary response of restricting all use is throwing the baby out with the bathwater but conversely doing away with copyright on the Internet altogether is no better. There’s a middle way and we need to build tools to facilitate that path. Tools to recognise media and enable reuse.

They’re assuming that the end result of their work will be part of The microformats.org Initiative:

Our immediate challenge is discovering what licensing and ownership attributes are associated with a given piece of media. There are millions of discrete pieces of media on the Internet, how can software tell which are reusable, which are licensed, which are public domain, etc.? A simple solution to this problem is offered by microformats. By embedding meta-data with media in a standardised, machine-readable way we open the door to all kinds of applications that rely on this knowledge.

And they already have an excellent post on how to do attribution for a reused photograph:

I’m now kind of concerned with what to call “Attribution”. In the Creative Commons attribution is a legal term, but what I really want to relate is:

  1. From where did I find the content: Miss 604’s blog. (The Copied Source)
  2. From where did the original content come from: Squeaky Marmot (The Original Source or at least the source Miss 604 found)

Do you reuse content? Do others reuse your content? If so, what do you think? How would you like to see the “attribution”?

I have a couple data points to offer.

One, non-commercial users don’t care about copyright. They know zero about it, they don’t know of any reason to care, and they aren’t going to change. (Software developers, who deal with free and open source software, are an exception to this rule). Commercial users may care, but can’t use content under a non-commercial license. So in practice the issue of attribution only has a real-world impact for derived works created by commercial entities. Source works which are licensed to allow both derivative works and commercial use are the ones we’re talking about.

Two, in XSPF there is an element for giving attribution to the sources of derived works. The idea is that one person would incorporate another person’s playlist into their own, and would use this element to give credit. It is defined as a chronologically-ordered stack:

An ordered list of URIs. The purpose is to satisfy licenses allowing modification but requiring attribution. If you modify such a playlist, move its //playlist/location or //playlist/identifier element to the top of the items in the //playlist/attribution element. xspf:playlist elements MAY contain exactly one xspf:attribution element.

Such a list can grow without limit, so as a practical matter we suggest deleting ancestors more than ten generations back.

<attribution>
  <location>http://bar.com/modified_version_of_original_playlist.xspf</location>
  <identifier>somescheme:original_playlist.xspf</identifier>
</attribution>

The stack framework is a pretty elegant tool for handling this requirement, and I’m happy about how we did it. However this element is rarely if ever used because no current playlist sharing sites that I know of both expect playlists to cross site boundaries and expect users to make new playlists out of old ones.

relative paths in playlists

There is a new version of libSpiff, the XSPF library, with support for relative paths using the xml:base attribute. Up until now relative paths have never worked in playlists as far as I know, so whereas in an HTML document you could do…

<a href="my.mp3">my song</a>

In a playlist you always had to spell it out, like:

<a href="http://example.com/my.mp3">my song</a>

The long-term story here is about the maturation of playlists as an internet media type. They have rarely gotten enough respect to be implemented well, according to the same high standards as other media types, and as a result they could rarely be shared across different systems. Any application that did bother to support relative paths in M3U had to guess at how that should work, so no two apps would support it the same way and a different M3U file would have to be created for each app. That’s like having one kind of HTML for Internet Explorer and another for Firefox, which is how we did do things in the bad old days before internet developers aggressively moved to web standards.

A plug for libSpiff: it has this kind of sophistication in many other ways as well. Unlike most playlist implementations it does a stellar job with the little details (like character sets outside of US-ASCII) that tell a user whether or not to trust your software. If you’re making an app that uses playlists, you can have that quality level just by using libSpiff.

I know that many developers consider playlisting too trivial to even have this kind of detail. But then again, most app developers don’t do a competitive job on playlist support. Playlists are one of the three atomic multimedia types, along with audio and video. Considering playlists less important is like giving red and green more respect than blue: Light blue and dark blue are finicky colors. Sky blue is pretty much all you need.

Relative paths are a baseline part of the web. If playlists are web documents, they need to support relative paths. This release of libSpiff makes it so.


foo_xspf is a plugin for the foobar2000 audio software that adds XSPF support. XSPF is an open, XML based playlist format developed by the Xiph.Org Foundation. It uses the open library libspiff to parse the files.

Checking in

I suppose I ought to blog once in a while, so this post is to check in. Since last time I posted here the big news is that the software I have been working on in stealth mode for the last three years finally went public under the name of Yahoo! Media Player. It has gotten great reactions, been picked up on a bunch of notable pages, and been covered by well known sites like Tech Crunch. This software was originally going to be Webjay 2.0, but wasn’t released before the Yahoo! acquisition and ended up becoming the nucleus of a new Yahoo! project.

It’s not much like Webjay the site, which was a combination playlist editor, portal, generator, and social networking site. But philosophically it is still about media with URLs, openness, sharing, and interoperability.

It is also still about playlists. But it is a major twist on the concept. The player accepts all sorts of traditional playlists, like XSPF and M3U, as well as feed formats like RSS and Atom; it even has an integrated screen scraper which can use a remote web page as a playlist. But primarily the web page in which the player is embedded is the playlist.

Web pages are a very good playlist format. They are visually customizable, semantically rich, standardized, documented, open, flexible, decentralized and implemented world-wide. To the extent that they didn’t have syntax for everything playlist-oriented, we were able to use semantic HTML with a light sprinkling of extensions.

However I can’t use the player on my blog here, which is the reason why I haven’t been writing on this blog. This blog is hosted by wordpress.com, which blocks out Javascript. I need to move my blog to another host.

The code name for the player project, by the way, was “goose.”

making a case for portable identifiers

One followup to my post on portable identifiers for songs using XSPF’s content resolution abilities happened on J. Herskowitz’ blog. I asked whether the problem in developing interoperability between music services is technical or economic. J’s answer was:

I think it is both. Since there appears to be a need for ongoing resolver work to map to lots of catalogs, the opportunity cost of one company to do so becomes too high. Just look at Paul Lamere’s work on Spiffy (http://research.sun.com:8080/SpiffyContentResolver/)- it was a great start, but he couldn’t rationalize the opportunity costs to keep it going.

As a consumer, I want it though…. I want to be able to find a playlist somewhere and then click “play” – by which enables me to determine what vendor fulfills it. Napster, Rhapsody, Yahoo, YouTube, free-range MP3s, etc.

Paraphrasing him, the value to users seems clear enough, but the work to enable it need to be shared across vendors, since no one vendor benefits more than the others. It’s social value which has to be funded by everybody and nobody.

Back here at home I asked the question slightly differently: does this technology provide enough business benefit to be worth implementing? If not, what would have to be different?

Jay Fienberg came back with an answer a lot like J’s:

I think there’s a bit of a mismatch here: catalog resolution of the type described is especially beneficial and necessary in “open” multiple-catalog systems–where the goal is linking / sharing info between as many systems as possible. And, the question is being asked of people involved in furthering the goals of “closed,” single-catalog systems.

These single-catalog systems have the goal of, more or less, focusing only on incoming links, e.g., focusing on making their single catalog a more unique authority.

I think another way to look at this would be: how hard would it be for these services expose to their own unique, permanent, identifiers to the public? (Not very, one would imagine.) Then, rather than these services building their own catalog resolution systems, they could make it possible for others to do so.

Similarly, Scott Kveton of MyStrands said: From the MyStrands perspective we’re simply not in the catalog resolution business. I would wager that Pandora isn’t either.

Jay’s trick of flipping the question around is insightful. Almost all online music businesses right now are in the distribution business, even if they see other functions like discovery or social connection as their main value, because they have no way to connect their discovery or social connection features with a reliable provisioning service from a third party. But provisioning is a commodity service which doesn’t give anybody an edge. They don’t want to import playlists from third parties because *that’s* where they are adding value.

Exporting playlists for others to provision, though, is a different story, and it makes much more sense from a business perspective. Let somebody else deal with provisioning. This is what it would mean for somebody like Launchcast or Pandora to publish XSPF with portable song identifiers that could be resolved by companies that specialize in provisioning.

Chris Anderson said:

The portability problem is a bit of a prisoner’s dilemma for music providers. If everyone addresses it, the benefit is great, but if only a few do, and in different ways, then the costs can outweigh the gains.

In the absence of a bottom-up revolution resulting in audio resources that can be resolved to, there has to be cooperation among audio brokers. Perhaps Imeem et. al. could provide an API that takes XSPF <track/> fragments and provides a flash widget with the appropriate content.

And Scott Kveton again:

What I would love to talk about is using something akin to Musicbrainz to be the public commons that companies like MyStrands, Last.fm, Pandora and others can use as a basis for playlist portability.

And that’s where internet music vendors are right now: stuck waiting for ways to cooperate without disarming unilaterally. The closest thing to cooperation is that companies are willing to export Flash widgets that can be embedded in any third party site, and the reason we’re using Flash is that it allows us to define and limit points of interoperability.

Ok, so let’s just say that the business and technical problems can be factored into separate projects. Yves has been working on the technical problem of mapping identifiers from different vendors into a unified framework:

I played a bit with such lookup algorithms (using metadata+acoustic fingerprints) when I experimented linking a Creative Commons label collection (Jamendo) and Musicbrainz – this is described here, and uses a technique close to the “similarity flooding” one in the record linkage community:
http://blog.dbtune.org/post/2007/06/11/Linking-open-data%3A-interlinking-the-Jamendo-and-the-Musicbrainz-datasets

Yves’ work deals with interlinking experiences based on the Jamendo dataset, in particular equivalence mining – that is, stating that a resource in the Jamendo dataset is the same as a resource in the Musicbrainz dataset.

For example, we want to derive automatically that http://dbtune.org/jamendo/artist/5 is the same as http://musicbrainz.org/artist/0781a….

It’s a fascinating and productive investigation. I am aware of at least one private proprietary effort to do this kind of thing, but no open project, and this is exactly where work has to be (as Scott says above) for multiple vendors to become interoperable without unilateral disarmament. One immediately useful result of this work is to make a direct connection between the XSPF concept of content resolution and the semantic web concept of Equivalence Mining and Matching Frameworks. This allows music developers familiar with the application domain of catalog management to benefit from high-academia research into techniques that can be used to auto-generate links between data items within different datasources.

Phew. I’m done. This was a hard post to write because I had to digest all the different strands in this conversation. It took a long time to figure out what people were talking about. Still, now that I’ve done the legwork I feel like I understand the problem better than before, even if parts are still a complete mystery.

Cruxy « Kafka’s SL World

Cruxy « Kafka’s SL World

In case you haven’t heard of it, there is now a Cruxy Player for Second Life, a portable music player for use at listening parties, as a promotional giveaway, or for just some relaxing downtime in your personal parcel of land. It can load and play music from mp3 playlists using the XSPF standard.

I love the Cruxy guys’ idea of doing a playlist module for SL. It would be cool to see this for all the online roleplaying environments, like World of Warcraft.

But aren’t these worlds supposed to be sealed off from the outside?  Isn’t that the point?