For Music Hack Day in San Francisco this past weekend I did a hack related to my blog post on “hyperaudio notation”. My idea was to caption a recorded song using music notation, as an instantiation of ideas like hyper video, hyper audio, popcorn.js, and WebVTT.
There is a recording and a score. The recording is an MP3, the score is a PNG. The purpose of the system is to move a highlight through the score in sync with the MP3, so that the listener can see which part of the notation in the image is currently being played. It’s like text captions for a person talking.
I could have designed it to show just a portion of the overall score, but showing the entire image with a moving highlight was easier.
To move the highlight in sync with the music, you train it. Pushing a button marked “start recording” initiates a training run. The music starts, in time with a recorder for clicks within the image. When you click in the image the time and location are recorded. The trainer clicks in the image in sync with the music. When the first bar is played, click on the first bar in the image. Continue until you have provided music captions for as much of the song as you want. Then press “stop recording.”
At this point, press the “play recording” button to rerun the training session.
The vision is that the training would be done by the person publishing the page, and visitors would just use the “play recording” button.
To see it in action, go to the live demo code or view a screencast I made. (The live code was super quick and dirty and assumes that you have exactly the same everything I do, including browser, bandwidth, etc. Chances that it will actually work are slim).
“.
I’m presuming this is a very quick pre-proof-of-concept idea: The synchronization offset between when beat one comes up aurally and when the green bar finally gets displayed is very disconcerting.
However, at the risk of completely missing the point, this feels like an extremely small niche in this presented format, and the problem has really been already solved: A more useful and general-purpose manifestation would be a time-based-event concept that is already pretty ubiquitous. That could be easily adapted for a concept such as this by knowing the tempo and transforming measures to time-code, etc (like any midi/recording software does already).
I’m thinking Soundcloud annotations as an audio-file representation of this concept, which means you just replace the waveform display with sheet-music display and voila. This proof of concept.
Ok, so what am I missing?
What you’re missing is (1) that the approaches you’re thinking of are dramatically more labor for every song and (2) the approach I’m taking can be incrementally improved to do quite a lot.
On (1), knowing the tempo and transforming the measures requires a complete and correct score, including any tempo variability like fermata, as well as all the details of the arrangement. Questions like “how many bars do they vamp for that voiceover segment?” and “how do I notate the dubstep breakdown?” take a lot of care and time.
The simplified chart that I did took about three sittings. The extended version would be 5-6. That’s for a relatively simple tune. For a more complex arrangement like typical prog metal it would take forever, while my simplified version would continue to take about the same 2-3 sittings.
On (2), I agree that the I agree that the sync sloppiness is super disconcerting. It wouldn’t be hard to fix. I could also easily add all kinds of other objects that you click on in time with the original recording. Animated gifs, guitar tab, video of a conductor’s hands.
So this hack is about finding a really feasible method to annotate music.