MP3 files can contain text, of course, and I’ve occasionally found lyrics stored inside TEXT and USLT frames. But there’s no consistency at all, probably never will be – more likely to find spam inside a TEXT frame.
Your idea for linking to time points is a cool notion, Lucas. Related to this, Real’s servers provide for a “start” parameter on a/v URIs, allowing one to jump to a time point, e.g.
Some of the various SMIL specs provide begin and end params for the same purpose (http://is.gd/5I3jL). Aside from that and Real’s faded format, my hunch is that most a/v is not very content-addressable, partly due to the fact that a given song can be found in the wild with many encoding variations. If I make in/out time points for lyrics on my rip of a CD track, your rip might not sync with it. Also, radio vs. album versions of a song may vary in duration and content.
Event-based synchronization, i.e. the beat-counting idea Piers brings up, might be worth looking into-
<a href=”example.mp3#t=1017b,1683b” class=”chorus”>chorus</a>
This would need a filter to recognize beats and count them. Possible, just not as simple as time. Might be more consistent than seconds-based.
Perhaps there’s another type of common event found in audio streams that could provide consistency, but I like drum beats because they’re less likely to get corrupted or folded than high frequencies, and less common than human voice-range freqs.
The karaoke industry seems to have cracked this nut, but I’m gonna hazard a guess that it’s all proprietary.
These guys sell player sw that syncs lyrics for 1 million songs, they claim:
http://is.gd/5I48w. They appear to target music teachers in their marketing.
When you think about it, a technological component in a media player can auto-magically beat-sync two tracks by comparing basic structure and determining BPM. Word documents used to be the bane of the structured data movement, because they trapped content in a non-structured format, but ODF and OOXML have changed that game completely, creating a new class of semi-structured data; so why not music or video?
It’s fascinating to consider that if more artists released works under CC-NC by attribution, remix artists could provide additional value by micro-tagging individual samples within the deeper structure of their compositions – particularly if this functionality were baked into the software used to assemble the composition.
It seems to me that the magic is to find a way to refer to semantic elements of a song, both of which are abstract things that only loosely map to specific bytes in a specific file.