MAFF day

The conversation about content packaging technologies turned towards the Firefox MAFF format:

MAFF files are standard ZIP files containing one or more web pages, images, or other downloadable content. Additional information, like the orirginal page address, is saved along with the content; this metadata allows the browser to open these files intelligently.

…so Paolo, the maintainer of the related code in Firefox, stopped in to give an overview of the work:

The only strict design goals for MAFF are (1) to be very easy to use and implement, and (2) to be based on existing and widely-used technologies. Backwards compatibility with the existing implementation is also very important.

At present, MAFF archives can be multi-page, but the basic atom is the “page”. Each “page” may be updated or used independently from the others.

Optional metadata (currently stored as RDF/XML) may link each “page” in the archive to an original location (URL) that can be queried for updated versions of the entire “page”, if desired. The root document may link to other resources in the archived “page”, through relative URLs, or to remote URLs, indifferently. All the local resources are “owned” exclusively by the “page”.

If no original location is specified, it currently just means that the file was authored as a standalone package to begin with.

The MAF extension does not support remote MAFF files, at present, but Firefox can access any remote ZIP file using the “jar:” protocol. In this case, I think the caching headers of the MAFF/ZIP file itself are used to check if updated versions of the remote archive exists. That’s different from using the original location in the archive’s metadata to check for updated versions of a “page”.

Something about the ability to index into the zip file using the jar scheme is inspiring.

Putting on my standards weenie hat, there’s a spec design problem in that this should really be a #fragment, with treatment of the fragment defined in the media type definition for ZIP. But that’s just a nit for now.

What inspires me is that you can make the issue of your starting file within the larger archive a moot point. Without that approach, there is a need to write a protocol which defines a canonical starting file when you get the archive — e.g. “/index.html.” Using the jar scheme, you can just specify the starting file in the URL, like this: jar:http://www.foo.com/bar/baz.jar!/index.html". So there is no need for a new standard.

Another interesting issue here is that OS X comes with Java, which comes with a utility for managing jar files, and that utility can be used as a wedge for new code.

This parenthetical bit from Paolo's comment also caught my eye:

The latest MAF 0.15.1 provides an interesting feature that’s peculiar to ZIP compression (compared, for instance, to .tar.gz of the KDE war format): OGG files are stored without re-compressing them.

Thanks to this, audio and video in a large MAFF+OGG file are seekable in real-time exactly like a standalone OGG file.

The back story is that putting media into a zip file usually prevents streaming access. You have to download and unpack the whole thing before you can hear the first byte of an MP3 within it. But apparently that limitation goes away if the media file is *OGG*. That's a big advantage to Ogg in comparison to MP3. And given that Firefox has native support for both MAFF and Ogg, it would be easy to start using this feature.

Kudos to Sull for developing this larger vision of MAFF as key prior art and then pinging Paolo.