Music File Management

I find the quality of data on discogs for classical music other than Vinyl abysmal to be honest… Does anybody even maintain new classical recordings there?

IME, Discogs breaks down completely when it comes to downloads, classical or otherwise.

Because in usual situations, Roon isn’t ruining the file names + the user-facing metadata simultaneously when it makes a mistake.

Imagine you import Kind of Blue, and Roon totally misidentifies it as Swedish House Grooves 2004.

One day, months later, you go to listen to Kind of Blue and it’s missing. So you go searching for it.

You have no idea that it is now Swedish House Grooves 2004.

Imagine two worlds:

(1) The world where Roon files everything in directories by Artist/Album for you

The album is now located in \\\\NAS\\Artists\Various Artists\Swedish House Grooves 2004\...

The words Kind of Blue no longer appear in the filename OR in the metadata in Roon, so search for Kind of Blue will come up empty everywhere, meaning:

  • If you search the filesystem using Windows/Mac tools, you will fail.
  • If you search Roon’s “Skipped Files” list, you will fail
  • If you search Roon’s Metadata, you will fail
  • If you go into the Tracks browser, show file paths, and try to filter down to “Kind of Blue” you will fail.

You’d have to be a magician to know to search for Swedish House Grooves 2004. You will probably never find your files.

Roon is creating an utterly hopeless situation. Yes, a small % of the time, but it happened enough that it convinced us that this feature was fundamentally too dangerous. Getting things wrong a small % of the time is only OK if the consequences are reversible and the experience of reversing them isn’t so bad.

(2) The world we live in today

Ok, so Roon has mis-identified the metadata. Very annoying, but locating the files is not a problem:

  • Search the “Skipped Files” list for Kind of Blue. It might be there.
  • Go into the Tracks browser, and filter the Path column by Kind of Blue. It might be there.
  • If you imported via drag-drop, remember roughly when you did it, and go look through your drag-drop imports from that time

By only making the mistake in our database and not propagating it to the filesystem, the problem becomes fixable. Once the files are found, there are some straightforward remedies, including telling Roon to just stop mucking with this album.

3 Likes

I’m afraid you do not need just a bunch of skilled volunteers, you would need literaly thousands of them and all have to stay dedicated for many years to come. That sounds like work to me.

In pop/zazz composer fields are often not populated. I eventually found 27 performances of “Stardust” (Hoagy Catmicheal) for example in my library. But that is a lot of manual effort. Roon had only found 5. And I have only done the detective work for a handful of favorite compositions. I really like being able to explore my library that way which I simply could not do before. It really pays dividends also in radio I have noticed as roon will then turn up obscure performances of favorite works buried away somewhere I have forgotten I had.

With classical there are two things.

Dead composers in general shouldn’t be tagged as artists (Bach/Beethoven/Mozart etc.). There are some dead composer/performers that survived into the recording age (Elgar/Shostokovich etc.) but I could live with a few miss attributions. There would be less manual work than now. Maybe it could be a toggle/setting.

The grouping of compositions is too flat. It often leads to a situation that just because the catalogue numbers are different (e.g. an orchestrated vs piano version or an entire ballet vs. a suite, or a violin vs a clarinet version) then roon will group otherwise identical compositions differently. It’s the same for example with entire compositions vs. excerpts on box sets. In most cases I would like to group all of these variations together but in some kind of hierarchical structured way (a bit like multi-part compositions). If I go to the trouble of grouping compositions roon considers different I seem to get what I consider an improvement in radio as the mood stays similar but less familiar performances of familiar works are chosen.

As an aside. I haven’t personally seen a reduction in manual intervention. It’s just the manual intervention has shifted elsewhere to get the best out of roon’s very different features. On the one hand, for example, I spend much less time, depending on genre, with basic track/artist labeling stuff. But on the other hand I now spend much more time on composer/composition stuff I simply did not bother with before as none of the other library managers made much use of it.

Our users have been pointing at Discogs as a silver bullet since day one. We’ve done a fair bit of investigation. While we will probably do something with it in the future, if we thought it was going to cure significant problems, it would have been integrated a long time ago.

Remember–the purpose of identifying content is so that we can enhance it with additional metadata or link it to things, not just identification for the sake of identification. So when we look at a data set, we are looking for coverage of a lot of content, and rich data–Discogs doesn’t really succeed at either.

Their data model is sparse. No reviews/biographies. No track-level credits. No composer/composition/performance structure. No regard for classical-specific concerns. The database is a fraction the size of Rovi’s or TIDAL’s or Gracenote’s, so coverage is pretty poor too.

If we were to build a hypothetical Roon experience on top of Discogs data alone, it would not feel like Roon–too much would be missing.

We have a project underway now that is doing some work with Discogs data, and this is the primary benefit that we’ve found: Discogs is a good source for information about the various releases/tracklistings of albums. These could help us link more esoteric releases with data from richer data sources. That’s a nice-to-have, but not something that will change the fundamental equation.

Musicbrainz is crowdsourced like Discogs, but they did a far better job on their data schema/design. Unfortunately, the community is not as vibrant. The data sets are similar in size.

Yes. There have been business like that (AllMusicGuide…not thousands, but they’ve built an impressive amount of data by hand). Pandora also has some of that in their approach…there are some others.

Volunteers doesn’t work. Discogs/Musicbrainz are small and not very rich. That’s what volunteers create.

No-one on earth is willing to pay for thousands of people in rooms scrubbing music data.

I agree. Scrubbing metadata is not the interesting problem for crowdsourcing to solve. It’s too hard to get a bunch of people to agree on how to make cleaner metadata for 100 million tracks in the global library. Crowdsourcing has to be more finely targeted.

If anything, Discogs/Musicbrainz do more to demonstrate for me that crowdsourcing music metadata does not scale. The largest and richest data sources are commercial and not crowdsourced.

That doesn’t mean that crowdsourcing doesn’t have a place. There are lots of possible use cases for it–both explicit (users taking action) and implicit (mining data that users create passively).

We already do crowdsource our translations, and we are planning to crowdsource an internet radio directory soon too. We are using implicit crowdsourcing to build the second phase of the new radio algorithm (that goes outside of your library into TIDAL). I would really like us to use crowdsourcing to fix coverage gaps in artist/composer artwork, because the browsers for those look so crummy with all of the gaps today. There are half a dozen other places where we could point crowdsourcing.

Zero of them are “fixing errors in track listings”. And that’s ok, I think.

One last thought: Spotify crowdsources an immense amount of data, but none of it has to do with the music metadata itself. Their most valuable crowdsourced data are playlists generated by their users and histories of user actions taken within the service.

2 Likes

Been my tagging reference for a very long time.

I would certainly be with you on those. But, as I said, there is currently no consistency in Roon for these kind of works for identified albums.
Another example I came across yesterday are the Brahms Ballades op.10

I have tagged my tracks with individual WORK tags for each of the Ballads, so in an ideal world I’d expect one of the following scenarios:

  1. Roon (and I am deliberately calling it Roon here, even if it’s in fact 3rd party metadata with added magic) thinks like me.

  2. Roon doesn’t think like me and groups these works together as op.10, consisting of 4 pieces.

In case 1 I would expect to have identified and unidentified albums all lead to individual works for each Ballad.
In case 2 I would either need to change my Ballad tagging to a combined op.10 or I would set the composition to “prefer file” on my identified albums, giving me a consistent picture across identified and unidentified albums.
I also could merge compositions, but as far as I understand merging it only would make sense to merge my single track works into the Ballads (4), op.10 composition and not vice versa. I can’t split the Ballads (4) op.10, can I?

But in fact what I see is a mix of Variant 1 and 2 for identified albums, combinded with my single Ballad work tags ffrom my unidentified albums…

If I follow Brians and Joels advice to just “sit and wait” for Roons metadata to improve (and I am exagerating here on purpose, no offense intended), I’m moving from being consistent (based on my file tags) to being inconsistent for an unknown amount of time. I can’t be sure that the work attribution for identified albums will ever change, since it may not be seen as an inconsistency by a third party.

So what to do? That’s a dilemma I’m thinking about a lot at the moment.

Hi Brian. I Was not pointing out Discogs as a silver bullet. Just that crowdsource databases allready excists. Discogs and Musicbrainz are a perfect ecample of how the work in practice.

I was referring to volunteers,.not professionals. Most volunteers don’t want to spend 40 hours a week,.year in year out on collecting and curating metadata. That’s why you need many of them to make it work, otherwise it’s a dead end.

From my seat, there are significant problems with these databases, not least the fact that, in spite of “guidelines”, there is a huge amount of rubbish and far too many inaccuracies. There are simply too many contributors who do not know what they are doing, who enter bad metadata, who don’t follow the guidelines when they exist; and the guidelines are often lacking in some important areas.

1 Like

Brian, I think you made a pretty good case for folder browsing there. Just sayin’.

Klaus, I’m afraid I have nothing to add. (So then why am I posting? :wink: ) My original point was that it doesn’t matter all that much which choice is made; there just needs to be a choice, consistently employed. I’ll modify that now to say that the choice that makes life easier (and more flexible) for the user is preferred.

But whatever. The problem, as you note, is inadequate metadata for less-common genres (and, surely, less common music generally). Roon is working on improving metadata–they say that often, an I trust them–but they also insist that there is no adequate source, and that they are not in a position to create one, either through crowdsourcing or directly. Lots of people saying it can’t be done.

The thought that came immediately to my mind was of Google a few years back, with Google Maps. Google is huge with (comparatively) practically infinite resources–but their task was almost infinitely harder: To build an accurate and detailed map–so accurate and detailed that it could be used for real-time driving directions–of all the roads in (and beyond) the U.S.

So, I had the idea of taking this on–building up a first-rate metadata database focused on basic information–not reviews (although that, probably, is the easy part) but compositions, works, movements, variations, performers, dates and venues, alongside stuff like who produced, engineered and mastered the recording–the stuff a scholar or serious enthusiast (not to mention metadata-rich music-server software) would require. Sounds like a lot of fun. The problem is that I have little cash and only a little bit more expertise. I know a little about classical music–more than most people but much less than others and far from enough–and less about data structures, architecture, and crowdsourcing QC. Someone ought to do this, but I’m not the right person. Anyway, until it is done, what progress can be made?

Best,
Jim

1 Like

@joel
But surely that’s one of the positive aspects of databases like Discogs?

As long as the editing rights are not abused, you - anyone - can go in & make an update and suggest other changes.
There’s potentially an infinite source of like-minded enthusiasts…resources which no Record Label could call on.

I’ve noticed errors & made / suggested amendments, and I’ve also added a couple of rarities myself - which others have later picked-up on & improved / added-to.

However it is clear that there are (other) weaknesses - particularly when it comes to Classical…

1 Like

As an isolated database perhaps, but not if you have to code around it.

1 Like

I meant a perfect example of “how” they work in practice not that they work perfectly. Thought I was clear on that. It is a perfect example of how it works with all the pro’s and cons. I know the pros and cons, that’s inherent to a system like this. The only thing I was referring is that I don’t see a Roon user based crowsed sourced tagging to become any different

1 Like

Got a hammer in your hand and everything looks like a nail :joy:

2 Likes

That’s a great Gretchen Peters Lyric

1 Like

Not seeing it, and…never gonna happen. It would ruin the product.

We will of course keep making incremental improvements…the “can’t be done” isn’t the same as admitting that it’s impossible to move forward. Just that some of the obvious paths aren’t the clear-cut winners that they are made out to be.

Think about where we were with regards to metadata/editing/classical/file management 2.5 years ago:

  • No Library/Import settings
  • No file paths in the tracks browser
  • No “prefer local metadata”
  • Composer catalog system that was so dumb that it did more harm than good
  • No awareness of “classical” or filtering of “classical” content in the compositions/composer browsers
  • No “browse this composition on TIDAL” links
  • No editing in Roon, at all. If Roon got it wrong, you were out of luck.
  • Metadata pulled from file tags was extremely limited. No LIVE LABEL RECORDINGDATE LOCATION WORK PART COMPOSITION CREDITS PART MOVEMENT CONDUCTOR or dozens of others.
  • No support for using user-provided genres
  • No support for merging albums
  • No support for disabling/controlling work/part grouping
  • No support for custom delimiters in file tags

I’m sure I’m missing a lot…this is just what came to mind quickly. The point is–while there are still pain points, we are consistently shrinking them. The feedback we were hearing on these topics a couple of years ago was far noisier and more concerning than the issues we are dealing with today (and back then, our user base was a small fraction of the size that it is today).

The fundamental problems with classical are:

  • The music is published poorly
  • The people who listen to it have different opinions about how to organize it
  • There is a big gap between how the music is published/sold and how it is consumed when compared to other genres.

For example, in this thread, someone was complaining about composer names in the album artist field. The problem is–record labels often put composer names into the artist field. I understand why people groom those away, but it’s really not a clear-cut call whether we should do that in cases where it is actually the most accurate artist name according to the publisher.

The most acute improvements we will likely make for Classical are probably less about data and more about navigation/presentation. There are obvious missing navigational links from Composers -> Albums/Performances. There are also some faults in how we order/present data on the Composer/Composition screens–they do not do a good job of pushing the most important stuff towards the top of the list. This is being worked on as well.

1 Like

Is there a definitive list of these somewhere?

Thanks for the suggestions. Now, if only the programs were for Windows.:grin: