I’d think of it as a caching thing.
You have an audio resource at some URL. It’s just PCM bytes, no metadata. A wav file would match this description. But I’m thinking about next-generation audio formats that have real compression, are accessible to CORS, and can be fed into web audio APIs.
For the most part the file would be kept with its URL. The bytes returned by the URL and the URL itself wouldn’t be decoupled, so that one could travel without the other. Given that the client has the URL and the URL is enough to get a metadata file, the metadata and audio do travel together.
So let’s say there’s some client choosing the save the audio for offline use. At that point it’s going to do a lot of reformatting. Probably it will convert the bytes to a plain old MP3. At that point it can use the WOAF field for the URL of the metadata, if it wants to, or maybe just copy the metadata itself into the ID3 fields and throw out the metadata file. It’s like when your browser saves a page to disk and modifies it to include copies of the CSS, Javascript and images.
I don’t see offline use as important. That whole way of approaching music is going away fast. The web audio API is the new standard. Developers writing Javascript to feed into that API will pick up the audio file in whatever format pleases them. No need to be reading from a standard format like MP3.
The concept for metadata in this post is a sibling to the web audio API. It’s a web metadata API.