The Metadata Blame Game

As a spinoff to a discussion being had on this thread: Roonie I’m starting this more focused thread about the issue of non-existent metadata for many newer releases.

I call it The Metadata Blame Game because that’s what it is - one big round of finger pointing that goes something like this:

New release lacks meaningful metadata → Roon is not at fault because Roon gets metadata from MusicBrainz and/or Tivo and they have not provided any metadata → MusicBrainz and/or Tivo are not at fault because they receive the metadata from the record labels → the record labels are not at fault because they receive the metadata from the musicians → the musicians are not at fault because it’s up the producer to forward the metadata to the label → etc. → etc.

This blame game needs to stop and a solution needs to found.

Can AI be used to parse the metadata from the internet? For example, in the above linked thread I gave the example of the new release “Jamie Baum Septet - What Times Are These” for which Roon provides no meaningful metadata. However a quick Google search finds that the recording is available on Bandcamp and the Bandcamp page for the recording has much more meaningful metadata

On What Times Are These (Sunnyside), Jamie Baum’s fifth recording with The Jamie Baum Septet+, her flagship ensemble, the acclaimed New York-based flutist-composer presents her first exploration of spoken word and art song. To be specific, seven of Baum’s ten compositions respond to works by a cohort of eminent 20th and 21st century female poets (Adrienne Rich, Marge Piercy, Tracy K. Smith, Lucille Clifton, and Naomi Shihab Nye), interpreted by renowned guest vocalists Theo Bleckmann, Sara Serpa, Aubrey Johnson and KOKAYI.

The album follows the critically acclaimed Sunnyside releases In This Life (2013) and Bridges (2018), on which Baum explored the connections between South Asian qawwali, Near Eastern maqam, and Jewish sacred musical traditions, incorporating the scales and rhythms into her vivid harmonic and orchestrative argot. Here again, Baum blends multiple influences, tailoring the improvisational sections to the individualistic tonal personalities of her uniquely configured ensemble of virtuosos: herself on flute and alto flute; Jonathan Finlayson on trumpet; Sam Sadigursky on alto saxophone, clarinet and bass clarinet; Chris Komer, French horn; Brad Shepik, guitar; Luis Perdomo, piano and Fender Rhodes; Ricky Rodriguez on bass and… more
credits
released April 5, 2024

Jamie Baum - flutes, spoken word
Jonathan Finlayson - trumpet, spoken word
Sam Sadigursky - alto saxophone, clarinet, bass clarinet
Chris Komer - French horn
Brad Shepik - guitar, singing bowls
Luis Perdomo - piano, Fender Rhodes
Ricky Rodriguez - bass, electric bass guitar
Jeff Hirshfield - drums
Theo Bleckmann, KOKAYI, Sara Serpa, Aubrey Johnson - vocals
Keita Ogawa - percussion

As one can see, the information (aka metadata) is out there and readily available, now what does Team Roon need to do to get that information into Roon and to the Roon user?

11 Likes

As well, point the finger back at yourself. Despite the AI suggestion, most music metadata is human curated by volunteers. So, ask yourself: what are you doing to help ameliorate the inadequate metadata problem?

AJ

2 Likes

I’m paying for a Roon subscription, that’s what I’m doing. I don’t do volunteer work for For Profit companies.

9 Likes

Not paying enough. Not for metadata creation, that is for certain.

If you care that much to put your money where your mouth is, try hiring some college students proficient in data entry to do your bidding. You should be able to make a dent in deficient metadata at $20/hr.

AJ

1 Like

MusicBrainz is operated by the MetaBrainz Foundation, a California based 501(c)(3) tax-exempt non-profit corporation dedicated to keeping MusicBrainz free and open source.

https://musicbrainz.org/doc/About

AJ

1 Like

Not the creation of metadata but rather the retrieval of metadata from sources other than MusicBrainz and Tivo.

So what you’re suggesting is that I pay for a Roon subscription and then do all the work required so that Roon can sell it back to me. Sounds a great deal for Roon but a not so great deal for Roon’s paying users. Capitalism at it’s finest!

Same as the Valence stuff for artwork and more for support. Something I have now decided to not do.

8 Likes

You oversimplify and make this out to be a solvable problem. It is not. It is an insurmountable issue that can be only mitigated with a lot of volunteer labor (to which you are unwilling to contribute.)

As for the provided example, you can scrape all of the Bandcamp metadata that you want for personal use. Roon cannot do so for commercial use.

The Service (including, without limitation, any Content) is provided only for your own personal, non-commercial use (except with respect to individual recording artists, collections of recording artists, Artist Entities or Represented Artists (each, an “Artist”) selling Music, Merchandise or other Content (each as defined below) as authorized through the Service).

The term “Content” includes, without limitation, any User Submissions, videos, audio clips, written forum comments, information, data, text, photographs, software, scripts, graphics, and interactive features generated, provided, or otherwise made accessible by Company or its partners on or through the Service.

https://bandcamp.com/terms_of_use

AJ

7 Likes

Is AI the way forward here?

It’s not readily available for use by Roon. Most of it is copyrighted and/or subject to limitations.

I’m afraid that the whole subject of metadata accuracy and usage is far more complicated than you seem to acknowledge. It isn’t a case of blame games, it’s a case of reality. You are quite correct that you pay for a service. As a customer you can request that this service improves. If it doesn’t then you can accept the service as it is (whilst reserving the right to keep lobbying for change) or stop paying for the service.

5 Likes

While I sometimes find this less than convenient, I’ve tried to remind myself that this sort of metadata is something that I can wait for. I’m reminded of this:

One of my favorites, to boot.

That said, some stuff may still not make it because MB and TiVo won’t carry it. Roon has said they are looking at additional metadata sources for the near future, and I am also willing to wait for that.

1 Like

Time to clear the air a bit.

I don’t mind doing volunteer work when my labor is not then used to make money for someone else. Adding metadata to MusicBrainz, which Musicbrainz then sells to Roon, just makes one a sucker. One may disagree with me but this is how I feel, especially when I am paying Roon for it’s (lack of) ability to retrieve metadata.

As far as Bandcamp is concerned, in America everything is negotiable. Roon can find a way to make Bandcamp’s metadata available for Roon’s use. By “find a way” I mean PAY Bandcamp. Bandcamp is not the only source of metadata available on the big bad Internet. There are lots of databases and discographies available. Discocogs, like Bandcamp, can also be approached and a deal can be made to make their extensive metadata available.

Solutions are available but it takes the willingness and the resources (aka money) to make them happen.

At this point it’s pretty clear that volunteers, Tivo and MusicBrainz are not the solution.

1 Like

I’ve been in or adjacent to the technical, legal, and financial issues surrounding metadata for a long time, and I’ve been in AI for data munging since much before it was cool. I agree with and can add to the points that @WiWavelength and @SukieInTheGraveyard made. In no particular order:

  • Automated data integration from semi-structured sources is not a solved problem, even if it may seem so to those susceptible to AI hype. Human curation is critical for decent-quality metadata, even because some questions (are these two releases the same or different?) may require judgment based on external factors.
  • Metadata copyright is a big can of worms (actually, copyright is a whole Pacific Ocean full of worms), in particular because it’s not even clear which rights each participant (producer/label/distributor/…) has to what.
  • Bandcamp is just a distributor. I asked them about the status of the data on their album pages a while ago, and they responded that it was purely under uploader control, not theirs. It’s quite plausible that they don’t feel they have the rights to license it. And negotiating with individual uploaders (artists/producers/labels) would be impractical.
  • Adding to/editing Musicbrainz is laborious but feasible. I used to do it for lots of new independent jazz releases (like the one that @Jazzfan_NJ mentioned) but I’ve been way too busy with my day job. Then everyone can benefit, not just Roon users.
  • The metadata schema for Discogs is idiosyncratic and does not map seamlessly to anything else, not Musicbrainz, not Roon.
13 Likes

I sincerely hope it will be one day. From my experience with tagging, discogs seems to be a pretty fast source of metadata covering a bigger chunk of what is out there compared to MusicBrainz and TiVo. But that comes at a cost: discog´s metadata is vastly inconsistent with albums and artists existing manyfold in different variations, not to speak of composers and musicians having solely a surname and no hyperlinks.

Something tells me it is pretty difficult to filter metadata fitting roon´s pattern from that. Maybe it is possible to offer discogs data as an alternative in cases MusicBrainz is failing.

I can tell you the situation in the world of classical music and recordings from the early LP era is even worse. Some metadata is existing for the majority of releases, but it is in many cases inconsistent, with primary and album artists being mixed up, composers and artists alike, primary artists varying from track to track of one and the same album, wrong links to compositions with lonely movements flying around w/o Opus no., wrong or missing track durations, one and the same recordings existing in tens and hundreds of reissues and variants and pressings - I can tell you this is a mess of biblical dimensions.

Every time I import a new classical album and roon is showing new artists named ´Rattle Berliner Philharmoniker´, ´César Franck 1822-1890´ and ´Richard Tucker Tenor´ as well as a new composition by Mozart called ´Adagio´, I want to cry. If this is happening with less than 25% of imported classical albums, I call it a good day.

My desperation with that matter was so immense while I was thrilled by the composition list options of roon 2.0 that I have sorted out manually which albums are really meaningful and well-integrated. Others have been provided with new, consistent metadata or have just been thrown out of my core library.

Maybe that could be a solution at least with albums from streaming sources to automatically identify inconsistent or incomplete metadata and hide the album. If not automatically then maybe provided by all roon users as some kind of crowd intelligence - simply flag an album with metagarbage and hide it for others too.

I agree, but so far I cannot imagine an easy solution to extract meaningful metadata from sources other than roon is anyways using. I hope roon team will be able to at least find a method of identifying consistent metadata from questionable ones.

1 Like

Thank you for your detailed response. The way forward for Roon will be difficult and filled with problems but the one thing that is perfectly clear is that Roon’s current way of collecting metadata is no longer working as well as it did in the past. A new way forward needs to be found.

1 Like

Beyonce’s album Cowboy Carter was released without many credits. They started updating them over the last couple of weeks, and it appears they are now fully populated in roon.

Not sure if that was by design from the label/Beyonce, and wondered if roon would eventually include them, but all is well now.

1 Like

I hear you, but what I know is that technical, labor, legal, and licensing costs of better metadata are likely much larger than any revenue derivable from distributing that metadata. If that were not the case, there would be a metadata aggregator much better than the current ones already. The fact that even incumbents like AllMusic/Tivo don’t do the job should tell you something. It’s not incompetence, it’s the lack of a viable business model.

For contrast: extremely detailed, accurate, well structured data feeds can be licensed (for a lot of money) for legal, medical, financial matters, and are the basis for very successful companies like Bloomberg and ThomsonReuters. It pays to provide high-quality AI+human curation when the customer depends on it for their high-margin business. But you and I would not be willing to pay the equivalent of a Bloomberg terminal fee for our jazz metadata :moneybag:

3 Likes

I just love your responses. You understand that I am not looking for a fight but rather for ideas on how to solve this worsening issue of metadata retrieval. Your knowledge of these of complex issues reflects what I’ve been reading about the evolving issue of AI - the conflicting issues of privately held databases/archives available for public use but not available to for profit enterprises.

Again I thank you for your helpful insights into this complex issue.

2 Likes

This. I think many many many users would be shocked at that what that Bloomberg fee actually is, something like 20 to 25K per user per year.

1 Like