MP3 files can contain text, of course, and I’ve occasionally found lyrics stored inside TEXT and USLT frames. But there’s no consistency at all, probably never will be – more likely to find spam inside a TEXT frame.
Your idea for linking to time points is a cool notion, Lucas. Related to this, Real’s servers provide for a “start” parameter on a/v URIs, allowing one to jump to a time point, e.g.
Some of the various SMIL specs provide begin and end params for the same purpose (http://is.gd/5I3jL). Aside from that and Real’s faded format, my hunch is that most a/v is not very content-addressable, partly due to the fact that a given song can be found in the wild with many encoding variations. If I make in/out time points for lyrics on my rip of a CD track, your rip might not sync with it. Also, radio vs. album versions of a song may vary in duration and content.
Event-based synchronization, i.e. the beat-counting idea Piers brings up, might be worth looking into-
<a href=”example.mp3#t=1017b,1683b” class=”chorus”>chorus</a>
This would need a filter to recognize beats and count them. Possible, just not as simple as time. Might be more consistent than seconds-based.
Perhaps there’s another type of common event found in audio streams that could provide consistency, but I like drum beats because they’re less likely to get corrupted or folded than high frequencies, and less common than human voice-range freqs.
The karaoke industry seems to have cracked this nut, but I’m gonna hazard a guess that it’s all proprietary.
These guys sell player sw that syncs lyrics for 1 million songs, they claim: http://is.gd/5I48w . They appear to target music teachers in their marketing.
When you think about it, a technological component in a media player can auto-magically beat-sync two tracks by comparing basic structure and determining BPM. Word documents used to be the bane of the structured data movement, because they trapped content in a non-structured format, but ODF and OOXML have changed that game completely, creating a new class of semi-structured data; so why not music or video?
It’s fascinating to consider that if more artists released works under CC-NC by attribution, remix artists could provide additional value by micro-tagging individual samples within the deeper structure of their compositions – particularly if this functionality were baked into the software used to assemble the composition.
In addition, isn’t the original theory behind Pandora based on linking chord progressions and such, or is it more general? I never really got a bead on what Pandora was actually doing.
It would be utterly amazing to link into music files based on high level concepts like “the 23rd through 27th beats”, “the Doobie Brothers sample”, “the I-VI-II-V section”.
I suppose you could do it in two parts. One, you’d have a semantic map of a song that was something like sheet music but much richer. It would be able to express things like “this part is a Doobie Brothers sample.” Two, you have a piece of software that applied the map to a particular rip or encoding of the song, so that the map would be applicable to all different rips/encodings
Back in the days of Real-hacking that Kev alludes to, there were experiments with mixing multiple web-accessible MP3s on the fly. For example, I found a spoken word MP3 of a sermon and put it in parallel with an instrumental DJ track. Our jargon was “client side remix.” Anyhow I did do a few experiments with indexing into MP3 files using time ranges, so that you’d be plucking out just the chorus or guitar solo or whatever. The software I tried (Real and Quicktime) was too imprecise to make this work very well. But the technique was a lot of fun.
Sorry, I realize that this post is absurdly full of jargon and shorthand. Back story:
Kev and myself and some pals once did a bunch of hacks using SMIL and RAM playlists.
There is a new standard for linking into multimedia files, called “Media Fragments URI 1.0” and still in progress.
jwheare recently posted a vision about making music on the web more webby.