Cleaning up and standardising file based metadata

I recently embarked on a bit of a SQL project to analyse and standardise file based metadata throughout my music library. I’ve tackled the task by firstly importing all tags into a sqlite table to which I can make changes and then later on write those changes back to the underlying files. The code used to update the files is the same as that used in puddletag (in fact puddletag must be installed to use the tag to database import to tag export capabilities) so there’s no risk of file corruption.

In any event, seeing as Roon suffers the same metadata issues as my library (and addressing it doesn’t seem to enjoy any sort of priority) the first thing I’ve tackled is inconsistent composer tags, whereby I encounter the following:

  1. song by an artist with no composer metadata
  2. same song by same artist with composer surname only
  3. same song by same artist with full name of composer
  4. same song by same artist with multiple composers
  5. same song by same artist with any combination of 2, 3 & 4

It isn’t perfect and I’ve no doubt an unintended anomaly will creep in here and there, but suffice to say if you have various instances of a track: Code of Silence performed by Bruce Springsteen with composers Joe Grushecky / Bruce Springsteen and

  • some have no composer entry
  • others have Springsteen or Bruce Springsteen or only or Grushecky only etc.

the end result in the database is that all instances of Code of Silence performed by Bruce Springsteen will have composer tags set to:

  • composer=Joe Grushecky
  • composer=Bruce Springsteen

Roon can then be configured to utilise its own and file based metadata for composers and you should end up with a very significantly improved result (assuming of course there is file based metadata to work with), the most obvious benefit of which is that all instances of the song across your library should be understood by Roon to be the same track and viewing all instances in Roon should actually yield the lot rather than the subset ordinarily yielded. Presumably then lyrics etc. should also pull through assuming Roon has the lyrics for the track.

I’m still in the process of finalising and testing, but if there’s interest I’m happy to pop it up on github when done. You’ll need to be running Linux, have puddletag and SQLite3 installed in order to use it.

What happens if you have codes of silence by Eric Idle with John Cleese as composer?

Serious query by the way.

Looks like it could be very useful (although not to a barely literate Windows user like me). It is annoying when you know that 2 songs are the same and Roon doesn’t link them and we have no way to do it manually.

.sjb

1 Like

It’ll be treated as a different song with same title, i.e. left as is.

The analysis is predicated on artist, song title and composer metadata. What it won’t do is pretend to know that Joni Mitchell and Jackson Browne may have covered the same song and thus use one’s metadata to fill in the other. Too much potential for a complete and utter mess as an end result. Those things I’ll handle manually by exception.

Here’s a sample of what the code does thus far, more work to do…

Evan, noobe here. I have two Linux devices: a Nuc ROCK, and a DigiOne. I’ve been told the ROCK is a no-go despite my boatload of extra memory and storage.

That leaves DigiOne, currently running Ropiee. I’m not a Linux person, and I’m not even sure this would be the right vehicle. But I’m very interested in your project.

Any advice, including “leave it be”, welcome.

further refinement… 109,095 tracks with improved composer metadata and I’ve not had to lift a finger to make it so.

John

You’ll need an instance of Linux you can interact with, albeit you don’t necessarily need a gui if you’re prepared to accept the results blind (which I wouldn’t be comfortable with). That said I don’t believe Rock will provide the flexibility to install additional packages.

In essence, after installing puddletag, python 2.7 and sqlite you call a Python script to read your library’s file based tags and write it out to a sqlite table. You then run a series of sqlite scripts against that table to improve the composer metadata and export all changed records to a new table. Finally you rerun the same Python script with an export parameter which then proceeds to write back the changed records to the underlying files in pretty much the same manner as puddletag would write any tag changes you make via its gui.

The steps to follow are simple enough I could do paint by numbers instructions.

A thought: I leave my DigiOne/ pi as is, buy another Pi with a screen, and an OS system that works (DietPi? recommendations?), do the SQL thing, then sometime later turn it into a needed endpoint.

Sound feasible?

PS: Tried it out on classical cuts?

PSPS: Its late and I’m at my worst, but I noticed a portion of the changes being a reordering of the composers. Why that order?

Another curiosity question: are you trying for fuzzy matches, i.e. “What a Wonderful World” = “What a Wonderful Wrold”?

dietpi would likely work. Not sure how well the system would perform writing to a usb connected hard drive. You could just plug the pi into your regular monitor rather than buy a screen for a one off like this.

It makes no distinction between genres, so it’ll do the same for classical composers

just a function of the code, and makes it easier to spot anomalies. Roon and other software that can leverage the metadata doesn’t care what order it is in, in file tags.

No, don’t want to go there at this juncture - I don’t want to introduce the possibility of changes degrading rather than enhancing the metadata.

Hah, running an update against my files it’s definitely better to put Roon to sleep whilst the update runs otherwise you’re getting Roon database updates everytime a change is written.

Just in the process of adding work and part tags to my files based on the presence of ‘:’ in track titles where genre includes Classical. 12,059 files to be written to.

My Pi arrives tomorrow. Make some more mistakes/discoveries and tell me about them :slight_smile: . My universe is only 30-35k tracks.

Just reported/ranted about an album imported from TIDAL in Metadata issues.