license claims in HTML

When you put a Creative Commons license in a web page, it usually applies to that page. For example, if you generated HTML for the Attribution-ShareAlike license using the license chooser at CreativeCommons.org and put that claim into a web page at http://example.com, it would mean that the page at http://example.com could be freely shared as long as there was attribution and the sharer applies the same license to their copy.

By using the “about” attribute specified in RDFa, you can modify that claim HTML so that it applies to a different URL and not the page in which the HTML is embedded.

Let’s say you have a media file “my.mp3” (which may or may not have embedded license info), it is online at http://example.com/my.mp3, and you have a web page at http://example.com. Let’s also say you have a chunk of HTML for saying that the current web page is under an Attribution-Sharealike license.

Your web page containing that chunk would normally have HTML along these lines:

      <html><head><title></title></head><body>
      [the HTML for the license claim]
      </body></html>
    

The modified HTML would look like this:

      <html><head><title></title></head><body>
      <div about="http://example.com/my.mp3">
      [the HTML for the license claim]
      </div>
      </body></html>
    

This is a new way to publish a license claim for a media file. The existing way is to embed the claims into the file using a tool like liblicense. The reason you would use the new method is that the benefits and drawbacks are a better match for your needs.

Pros of embedding within media files:

  1. A license claim inside a file travels with the file, so that the license claims on the copy are still identifiable. If you use the external HTML method, the only way to tell that a copy at a different URL is under the same license is to do a byte-for-byte comparison of the files.
  2. A license claim inside a media file is instantly accessible to any program which is already accessing the file and only slightly less accessible to a program which already has a copy of the file. A license claim in external HTML requires the HTML page to be found, fetched, and parsed.

Pros of using an external HTML file:

  1. A license claim embedded in a media file can only be recognized by fetching the file and parsing it. AJAX techniques usually can’t be used to parse a binary file. Bandwidth and latency limits may also prevent this. In contrast, an HTML file can be parsed by JavaScript, and is often small enough that bandwidth and latency are not a problem.
  2. A license claim inside a media file is hard for web spiders to see, and most search engines won’t index it. In contrast, a license claim in HTML is easy for a spider to see and all search engines will index it.
  3. A license claim inside a media file requires a dedicated program like liblicense on the client side to edit. A license claim in HTML can be generated using a simple web application like the license chooser at CreativeCommons.org, and any decent content management system (like Drupal or WordPress) could easily do it.

You don’t have to choose between these methods. There is no reason why these two methods can’t be used together, which would give you the good parts of both.

As with all implementation proposals, this method may not work. It may be that the RDFa “about” element isn’t widely available enough, given that it is specific to XHTML 2 as far as I know. It may be that the rel-license microformat can’t be extended like this.

There’s one improvement to this method that I don’t know how to do — making it work in existing search engines with no changes on their part. If it’s possible to tweak the HTML syntax so that existing search APIs or query arguments could be used to find Creative Commons works, the entire open media ecosystem would benefit.

8 thoughts on “license claims in HTML

  1. Hello Lucas!

    Another advantage of using RDFa is that you can mix-and-mash other information alongside licensing information – for example, you could state a list of contributors for that particular media file, or any other things that terms in DC, Music Ontology, FOAF etc. allows you to express.

    I guess (I don’t really know as I didn’t have time to play with it yet) that searchmonkey can help to build search applications on top of such RDFa – such as finding works by a specific person who have a particular license.

    Cheers!
    u

  2. Yves, I really like those mix and mash abilities, though I’m also reluctant to get pulled into the religious issues between the microformats.org and semweb folks. I’ll do whatever works after all these years of back and forth.

    Searchmonkey isn’t exactly what you’re thinking. It’s a way for sites to customize their listings in a search result.

    jon, that’s a great example of how easy it is to work with embedded HTML. A wordpress plugin to use that widget would take a couple days at the most.

  3. IIRC, the WC3 says it’s OK to use RDFa in XHTML 1 or 1.1, although it doesn’t validate (yet).

    RDFa’s “about” attribute is very handy. It also allows one to have a single license page that points to mp3s displayed on many other pages / sites.

    Commonly, I’d put in my mp3s a URL that points to a singular license page (e.g., example.com/license.html), and then, with RDFa, I can have that page point to all of my mp3 URLs that are spread across many download pages.

  4. I think it’s not invalid in HTML 4 because of the weird loophole where you can always add an attribute to an HTML element.

    I have mixed feelings about having this important info in an attribute rather than as element content, but users can’t see it and it seem like it would be valuable to them. But I guess you can just put it in the document for users as a separate thing that isn’t explicitly machine readable.

  5. The idea that the page *is* the database has some conveniences, but is often awkward. There ultimately is no perfect way to create a single statement that does it all–you often do have to repeat yourself, e.g., having a version that is really clearly human readable, and another that is really clearly machine readable.

    This used to be the recommendation with CC–in your HTML, you’d put a human readable communication in visible text, and then also a separate machine readable chunk of RDF/XML hidden in the comments. But, without an easy tool to generate AND maintain the appropriate code, it’s not really viable to maintain over time.

    If you look at BMI as an example, they have a web form for entering songs in their rights database. One way to think about it is: the real data standard for the web is web forms. IMHO, the form is more important than the format.

Leave a Reply

Your email address will not be published. Required fields are marked *