I think I agree with Crosbie – why not make the web page the home of the track, that happens to have many different representations attached to it (audio, video, score, etc).

You’d be playing a playlist of HTML pages with linked audio, rather than a playlist of audio files with linked HTML.