Music File Management

brian · January 27, 2018, 7:47pm

Our users have been pointing at Discogs as a silver bullet since day one. We’ve done a fair bit of investigation. While we will probably do something with it in the future, if we thought it was going to cure significant problems, it would have been integrated a long time ago.

Remember–the purpose of identifying content is so that we can enhance it with additional metadata or link it to things, not just identification for the sake of identification. So when we look at a data set, we are looking for coverage of a lot of content, and rich data–Discogs doesn’t really succeed at either.

Their data model is sparse. No reviews/biographies. No track-level credits. No composer/composition/performance structure. No regard for classical-specific concerns. The database is a fraction the size of Rovi’s or TIDAL’s or Gracenote’s, so coverage is pretty poor too.

If we were to build a hypothetical Roon experience on top of Discogs data alone, it would not feel like Roon–too much would be missing.

We have a project underway now that is doing some work with Discogs data, and this is the primary benefit that we’ve found: Discogs is a good source for information about the various releases/tracklistings of albums. These could help us link more esoteric releases with data from richer data sources. That’s a nice-to-have, but not something that will change the fundamental equation.

Musicbrainz is crowdsourced like Discogs, but they did a far better job on their data schema/design. Unfortunately, the community is not as vibrant. The data sets are similar in size.

Yes. There have been business like that (AllMusicGuide…not thousands, but they’ve built an impressive amount of data by hand). Pandora also has some of that in their approach…there are some others.

Volunteers doesn’t work. Discogs/Musicbrainz are small and not very rich. That’s what volunteers create.

No-one on earth is willing to pay for thousands of people in rooms scrubbing music data.

I agree. Scrubbing metadata is not the interesting problem for crowdsourcing to solve. It’s too hard to get a bunch of people to agree on how to make cleaner metadata for 100 million tracks in the global library. Crowdsourcing has to be more finely targeted.

If anything, Discogs/Musicbrainz do more to demonstrate for me that crowdsourcing music metadata does not scale. The largest and richest data sources are commercial and not crowdsourced.

That doesn’t mean that crowdsourcing doesn’t have a place. There are lots of possible use cases for it–both explicit (users taking action) and implicit (mining data that users create passively).

We already do crowdsource our translations, and we are planning to crowdsource an internet radio directory soon too. We are using implicit crowdsourcing to build the second phase of the new radio algorithm (that goes outside of your library into TIDAL). I would really like us to use crowdsourcing to fix coverage gaps in artist/composer artwork, because the browsers for those look so crummy with all of the gaps today. There are half a dozen other places where we could point crowdsourcing.

Zero of them are “fixing errors in track listings”. And that’s ok, I think.

One last thought: Spotify crowdsources an immense amount of data, but none of it has to do with the music metadata itself. Their most valuable crowdsourced data are playlists generated by their users and histories of user actions taken within the service.