What can I do to improve my metadata matches?

I’m trying to help myself and other power users here. Here’s an example:

I’m prepared to update my tags and directory structure to get things “as I want them” with minimal post-Roon-import grooming.

If I want to increase the chances of a boxset actually being recognized as a boxset, what do I need to do? How do I arrange the directory structure and tags to achieve this vs. having Roon think that boxset constituents are individual albums?

1 Like

this is a GREAT question. I’m going to have @jeremiah answer this, since he can answer it authoritatively.

The matching rules are broken into two phases. The first clumps the files into albums, the second actually identifies.

For clumping, the best you can do is put 1 album in 1 subdirectory, and have media number and track number fields completely filled out in the metadata, with no missing tracks. Contiguously numbered tracks in a subdir with no other files will always get clumped together. There are many other cases for clumping, but this one will result in the best result.

WIthout a good clump, you will have a hard time getting a match.

@jeremiah will talk more about this, but I think putting the discs themselves in different directories would probably lead to less errors as well (both by the clumping algorithm and by you). Be sure the tags have proper media number tags!

By media number are you referring to discnumber?

On the same note, how are compilations identified during scanning?

Yes, media number is just a more generic term than disc number :slight_smile:

I’ll continue where Danny left off. There are two phases involved in getting metadata for your files. The files are first grouped into clumps that have a high likelihood of being a complete album. Then we use the filenames, tags, and track lengths to identify each of the clumps.

Your question pertains mostly to the clumping phase, as identification results will vary as a consequence of how the files are clumped. For example, if you have the Pink Floyd Discovery box set, which is just composed of re-releases of studio albums, how the files are organized will dictate whether the resulting identification pulls info from the box set or from each of the original albums. In particular, if you don’t have media numbers we’ll assume each disc is a separate entity while if you have correct media numbers we’ll be able to identify the entire box set.

As Danny alluded to, the single most important factor for getting a good identification is to make sure the files for a given album (or album set) are together in a directory, they have proper media numbers and track numbers, no files are missing, and no files have duplicates.

When something doesn’t identify, the first thing I look for is Are there any files missing? Are there any extra files that don’t belong with this album? Then I look at the media and track numbers. These might occur in the file names (e.g. “05-03 Summer '68.flac”) or in the file’s tags. We check both but in general the tags are more authoritative. If things aren’t identifying, you might look for a mistake in the tags or a disagreement between the tags and filenames.

Whether the files of a multi-disc set are together in one directory or separated into subdirectories is a matter of preference. If you separate the discs, make sure the subdirectories include a parseable indication of the disc number (“Disc 5”, or “Atom Heart Mother [disc 5]”, “Atom Heart Mother CD5”, etc) and that it agrees with the media number in the tags and/or filenames.

So this scheme will work:

Music/
    Miles Davis - The Complete Columbia Album Collection/
        01-01 Track.flac
        01-02 Track.flac
        ...        
        02-01 Track.flac
        02-02 Track.flac
        ...        
    Pink Floyd: Discovery/
        01-01 Track.flac
        01-02 Track.flac
        ...        
        02-01 Track.flac
        02-02 Track.flac
        ... 

And this will work just as well:

Music/
    Miles Davis - The Complete Columbia Album Collection/
        CD1/
            01 Track.flac
            02 Track.flac
            ...        
        CD2/
            01 Track.flac
            02 Track.flac
            ...        
        ...
    Pink Floyd: Discovery/
        CD1/
            01 Track.flac
            02 Track.flac
            ...        
        CD2/
            01 Track.flac
            02 Track.flac
            ...        
        ...  

One thing that likely won’t work is having all the discs of a box set separated into album-level directories, like this:

Music/
    Miles Davis - The Complete Columbia Album Collection CD1/
        01-01 Track.flac
        01-02 Track.flac
        ...        
    Miles Davis - The Complete Columbia Album Collection CD2/   
        02-01 Track.flac
        02-02 Track.flac
        ...        
    ...
    Pink Floyd: Discovery CD1/
        01-01 Track.flac
        01-02 Track.flac
        ...        
    Pink Floyd: Discovery CD2/
        02-01 Track.flac
        02-02 Track.flac
        ...        
    ...

Don’t do that!

Whether you have box sets or not, you want to get the directory organization and track/media numbers of your files correct. The other tags usually don’t matter as much except in the case of albums with very few (less than four) tracks. For these, it’s crucial to get the album and artist tags correct to facilitate identification.

6 Likes

@jeremiah, thanks for that explanation. I have many box sets and adopted a slightly different scheme (two in fact…and initial one and a more recent one). Wondering whether these would work:

Music/
Miles Davis - The Complete Columbia Album Collection/
01 Track.flac
02 Track.flac

nn Track.flac

i.e. sequential track numbering from 1 to nn where nn represents the number of tracks making up the box set. In each case media number is in the metadata.

More recently I adopted a slightly different approach:
Music/
Miles Davis - The Complete Columbia Album Collection/
CD1/
01 Track.flac
02 Track.flac

nn
CD2/
03 Track.flac
04 Track.flac

i.e. sequential track numbering from 1 to nn where nn represents the number of tracks making up the box set. In each case media number is in the metadata, but the tracks representing a disc are in their own subfolder.

In both cases (I’m sure I’ve not always been consistent here) the tracks will all carry the same album name: “The Complete Columbia Album Collection”. Where I’ve been inconsistent I may have named each album in the box set discretely.

Is there any special sauce pertaining to compilation albums?

Would it be possible to publish your metadata name mappings along the lines of that published here: SlimServer Supported Tags for users to check their metadata against?

Also, what does Roon do in the absence of metadata and/or a descriptive parseable directory name - does it then leverage the likes of AcoustID to identify the album?

This is going to be a real issue for Sooloos owners wanting to use their actual Sooloos collections with Roon. As far as I’m aware discs in the same set are not nested in the same directory.

Good question, @audiomuze. At this point, we don’t support continuous numbering of box set tracks so you are most likely going to run into problems; however, you’re not the first person to mention this so I’ll definitely take a look at how we might be able to translate that numbering scheme into one which will match with metadata, without generating any false-positives. The risk is that we could mis-detect a box set and renumber the tracks in such a way that identification becomes impossible when it would have been fine if we didn’t fiddle with it. Anyway, I’ve taken note of it and will investigate soon. It’ll have to be post-release, though, as I’ve got my hands full with some super exciting upgrade-to-Tidal metadata features right now. I can’t wait to show you guys this stuff!

In the meantime, numbering each disc from one and using one of the schemes above will give us the best opportunity to identify the files and improve metadata.

There’s nothing special required for compilations, aside from having them properly numbered and together in a directory. Tags do become important when files are missing, in which case it will help to have a consistent albumartist tag amongst the entire album.

A big table like that isn’t really necessary for us as we only use a scant handful of tags (album, artist, albumartist, track name, track number, and media number) in addition to the file and folder names and track lengths to identify.

As for which specific tags we look for to get a value for say “artist”, our goal is to use every commonly found tag out there. When I get reports of a missed identification, I can examine the files’ tags via the contents of the support package. Whenever I find something we aren’t looking at, I add it immediately. So I highly recommend including a support package with reports of wrong or missing identification.

I hesitate, further, to provide a list as I don’t want it to appear that we recommend widespread grooming of tags. In most cases we’re going to identify the files without even looking at the tags and get full metadata from the cloud, so the grooming would’ve been wasted effort. I would only recommend adjusting tags if identification fails and the album in question is either composed of very few tracks or is missing some tracks. Even then, it’ll eventually be much easier to do the editing within Roon.

As for your second question, we don’t do any acoustic fingerprinting at this time but I’m aware of the possibility and will certainly consider adding it once I get a sense of how well we’re doing as is and assess the need. The good news is, even without good artist or album title strings from tags or the directory name, if your files are properly numbered and none are missing or extra we have a very high success rate identifying by approximate CD TOC. This is why I’m harping over and over on file/folder identification: if you get that right, everything just works.

Good point! :frowning: I’ll have to look into this.

Thanks @jeremiah, that’s clarified a lot for me. It’s trivial for me to correct/rename and renumber box set tracks where I’ve kept discs in discrete sub-folders. Those that are in a single folder and sequentially numbered will have to be dealt with when I encounter them (makes me glad I retained cue sheets when ripping so I should be able to sort numbering relatively easily.

On the topic of cue sheets, does Roon support single FLAC files containing an embedded cue sheet for track identification? I’m guessing not and can’t imagine it’d be a high priority.

One area I’m going to have a challenge is in my home-grown folder “VA - Singles Gathered” where I’ve collected individual tracks I like from albums I’ve otherwise discarded as something I don’t listen to. But it’s a corner case and would hardly diminish the Roon experience.

You’re the first to bring up single FLAC rips with me in a long time! No, we’re not supporting that right now.

Your home-made compilations should clump together just fine and display their tags in Roon so long as they (as ever…) are properly numbered.

If it were me I would not bother supporting embedded cue sheets.

One aspect of Sooloos I loved was the ability to play a work within an album. This is particularly useful for classical music, but had applications for modern music as well. Kate Bush springs to mind with “The Ninth Wave”.

Will Roon allow this? Can Roon iron out the bug in Sooloos that meant that all tracks after the selected work were also played?

1 Like

I’m hoping teaser #2 comes out tomorrow… it teases our classical offering :slight_smile:

The Sooloos “work” stuff was a complete hack job… something I threw in there with a horrible text munging algorithm. It got the job done in many cases, but fundamentally the engine never understood the data.

Roon completely changes that. I’ll speak more about this in the forums post-teaser #2.

1 Like

The CD structure examples above are based on naming CDs with multiple discs “CD1,2,3”. Can you please confirm that also “Disc 1,2,3” can be used. I prefer to use this naming as it aligns with the naming of multiple disks in the Roon GUI.

BR//Joakim

Yes indeed. You can use CD, Disc, or Disk; with or without a space and case doesn’t matter.

I looked at the albums that were not identified. Some were caused by directory structure as discussed here – I found it easiest to fix the structure on disk, and the problems went away. Some were legitimately oddball stuff that wouldn’t be in the metadata database.

But I noticed one thing that Roon could perhaps mitigate: some albums showed up under “Igor Stravinsky” while others were under “Stravinsky, Igor”. I imagine this kind of thing would be common with a variety of sources. Might make sense to implement logic to handle this case.

1 Like