Best Practice: Duplicate Finding & Deleting (FLAC library >10tb)

My son and I would like to combine our libraries. We’ve both ripped similar and different albums. We began ripping over 20-yrs ago – & even within our own libraries we have duplicates, so they will be profuse!

NOTE: The FLAC library is 10+TB’s. So, the effort is going to be significant. Especially considering that: 60-70% (approx.) of our library is 16-bit. The other 30-40% is >16-bit – e.g. 24bit, DSD and a few Double DSD. So, just among the different bit-rates, there are going to be inherent duplicates.

What is the most accurate and expeditious method to ID duplicates and then delete them? Use Roon to do this? If so, what instructions might you suggest?

Or, before creating a Core, would you recommend doing the “cleaning” off-line from Roon, using other software or media players? If so, what Windows based programs have you to be best?

Conduction a search here I found the following from May/2017:

Editing and deleting of duplicate albums – “When I am editing albums and delete a duplicate within the Roon I am returned to the first album in my database.”
“With 12,000 albums this means I have to scroll through my library to find the place where I did my last edit. Very frustrating.”
“Can this be fixed in the next release or is there an existing process to do this.?”

First, if this hasn’t been fixed, using Roon for detection and then deletion doesn’t appear to be as functional as the size of my library would dictate.

Any suggestions/observations would be much appreciated…

-Mike
strong text

1 Like

what computer system are running on PC/MAC/Linux?

duplicate removals is messy at best, but roon can only give you some idea of what’s there in the focus settings you select duplicates as an inspector option.

I’m using a Windows 10 DIY spec’ed & built HTPC as a Roon Core. But I have another 6 HD Bay DIY build Desktop that can be used, if duplication identification/deletion is done outside of Roon.

Maybe try to google for some dup detection/remval application that has some music functionality and folder level detection. I’m a MAC user and just did this exercise a few days ago…managed to kill 1.4TB’s of dupes I think in the end but there are still lots more in different formats. I have 14TB of music so I feel your pain :crazy_face:

I can’t imagine what going through that quantity of files must be like. But I did my much smaller library manually through Windows. I’d create a new location and go through a list of your titles and move your unique files and the best version of duplicates manually into that location for Roon to then detect and catalog. That is assuming you have the time. One problem I did find is a lot of my older rips were not great. But being something of a hoarder I still have the vast bulk of original disks.

Have a look at MP3TAG. Has a lot of information that you can view and sort by. And it does work on other file types. Automating something like this could be dangerous. Yes it will take some time but at least you will know what it is doing. And you can update tags and artwork at the same time if they are missing.

Roon really isn’t the tool for this, or any other library management for that matter.
In the past I have used this tool for finding file duplicates (for jpgs, not audio) but it should work just as well. Free to try too. Other tools are available but I have had no experience. Any tools are going to take quite a while on such a large number of files.

https://www.mindgems.com/products/Fast-Duplicate-File-Finder/Fast-Duplicate-File-Finder-About.htm

One thing to bear in mind, if you or your son need lower quality duplicates for good reasons (eg an mp3 version to play in the car/phone etc) then it is good to know that ahead of deleting them all!

There is also another tool which I have used which can do this but I have always found it to be incredibly slow, even on a small library (it fingerprints the audio), but it may have improved by now as I haven’t used a recent version:

http://www.jthink.net/songkong/

Note that a discount may be available for Roon users at times (see thread…). Again, you can try before you buy and the author has a Roon account here @paultaylor.

1 Like

I have used this one from Illustrate…

https://www.dbpoweramp.com/perfecttunes.htm

It is a bit persnickety [in lack of more precise words] but will give you a very good starting point to cull from.

Great responses, thanks!

I feared that Roon wouldn’t be a good duplicate finding tool for my sizable project. So, I appreciate your inputs accordingly.

“SongKong” looks to be an interesting short list add.

I do use dbPoweramp for ripping. Hence, “PerfectTunes” would be a product to consider. So, a comparison between “SongKong” and PerfectTunes would be interesting. I do like the fact that in addition to identifying duplicates, SongKong will correct metadata; thus allowing Roon to more easily “find” and catalog all of my music.

Doing a browser search, I also found this product: MindGems/Audio-Dedupe AudioDedupe. It appears to be somewhat similar to SongKong and PerfectTunes.

I will delve into the the details of these 3 programs to determine which appears to be best. If trials are available, they would be helpful in making a decision. Fast, but accurate would be 2-determining points. And then the ease of wading through the results and correcting and deleting, would be a 3rd. determining point. Has anyone had any experience with using any, or all 3 of these programs?
-Mike

1 Like

Thanks for mentioning,. SongKong can quickly identify duplicates just using existing metadata but that relies on metadata being correct so is less accurate. SongKongs main task is Fix Songs and having used this to identify albums and added fingerprints (and MusicBrainz Ids) the Delete Duplicates function should run pretty quickly. Fix Songs can take a while to run because fingerprinting is cpu intensive, matching relies on using internet to match to database and if many changes found there is I/O required to write all the changes to your files. But Fix Songs can be run completely unattended on a music collection of any size, so works well if you just let it run overnight. On a reasonable modern computer with broadband connection expect about 5000 songs to be processed an hour, for an older low powered compute or nas this may go down to 1000 songs an hour.

1 Like

Hi, just to note with SongKong that you can Fix Songs on your whole collection in preview mode, this does not require a license. This will create a report where it will accurately show you exactly what has been matched and what metadata has been added.

It is not currently possible to do the same comprehensive test with Delete Duplicates because unless just doing the Metadata option it currently relies on fingerprints/metadata being added by Fix Songs and these will not be added if only running in preview mode. I’m working on a solution to this.

Would certainly be interesting to see your analysis of the three applications.

Hi,

In my opinion, what is important to understand is that no software can properly process duplicate albums because… there’s no such thing as an “album” in your music collection. Your music collection is just tracks and an “album” is just an abstract view of some of these tracks.

So… from my experience:

  • You need to identify the duplicate tracks/albums before you do any cleaning or reorganizing. Put the list on a piece of paper per artist.
  • You need to identify which part of these duplicates are part of different editions (initial, remasters, special editions, hi-rez, lossy, etc), which are part of compilations and which ones are really duplicates of the same album.
  • Once you have the list, delete the tracks(albums) prior to scanning them by Roon.
  • The others (other editions) you can import them into Roon and then merge them (keeping the appropriate edition as principal). This will allow you to access them later on, if needed. This work should be done for each artist, in the artist view, not the album view.

I understand where you are coming from but I respectfully disagree. In the simple case SongKong identifies folder of songs to tracks within the same album, then each song has a MB song Id, and MB album id added. Then with Delete Duplicates you can elect to find duplicate songs based on having the same song Id and same album Id (i.e you have the same album twice), or just song id (have same song but maybe on two completely different albums, e.g original and a compilation).

You can also elect to only search within an audio format so that for example if have a a Flac lib, and an mp3 lib it would only find duplicates if song in same Flac lib twice.

Also, MB also can group editions of albums under one album (known as release group), this means we can choose in SongKong to only consider duplicate if exact same album, or if just edition of album.

Hi Paul,

Well, we’re saying essentially the same thing : there’s no perfect way of identifying an album.
That’s the reason SogKong calculates an albumId and binds it later on to the track metadata.

Some software (like SongKong) do a better job than others, but you can’t simply rely on their assumptions to automatically delete tracks. I mean, this is a great way of getting the duplicate track list, but you’re still obliged to manually check that everything is ok before deleting anything.

Don’t you agree ?

Understood.

Although our 100% FLAC database is >10tb, it is in fairly good order in terms of metadata. When we began ripping over 20-years ago, in the intervening years, we learned a lot, especially the importance of creating accurate metadata

Paul, I downloaded and ran a status report on SongKong yesterday. I then tried to see what & how duplicates would be discovered. But I found as you said:

I would have liked to see the output of Delete Duplicates – a format and layout example via my database, of the found duplicates. From past experience with this subject, I know that human intervention must come into play by analyzing the results and performing a pick & choose process before ANY tracks are permanently deleted.

Hence, after SongKong finds duplicates, what would the output look like? That is, how easy would it be to see/figure out whether the tracks are exact duplicates or differ because of bit rates or they’re from different albums etc.? And more importantly, using the duplicate output, can one easily make delete decisions track by track? Or is it a delete ALL, or nothing situation?

Lastly Paul, assuming I purchase SongKong and after we point to my collection and my son’s, in a sequence of actions, what would you recommend be done first, second etc.? Given your above advise, it sounds like Fix Songs should be the 1st. step (before running Delete Duplicates)…

If your metadata are in good shape (for both of your libraries) and if you have been precocious and split each album in a separate folder, then imho your task will be much easier. And a software like SongKong should be able to create a nearly-faultless list of your duplicates.

Note: the “nearly-faultless” does not mean that the software is not up to the task, it only reflects my own experience of the fact that metadata/folder structures are often not perfect, although we think they are… :slightly_smiling_face: I’ve seen that too often, so, I’m cautious.

Acknowledged, and so true about more than just metadata/folder structures. :slightly_smiling_face:
Thanks for your advice in this thread…

1 Like

Hi, you can run Delete Duplicates and generate a report. If you already have reasonable metadata you could just use the default of Same song and same album (metadata only) to run, this should find some duplicates you get an idea of the report created. Basically the report groups songs by album and show the duplicates songs and the one that is kept and the ones deleted.

In the options you can configure to either delete the duplicates or move them to a folder, and in the case of duplicates you can specify preferred deletion criteria to determine the song that should be kept and songs that should be deleted/moved.

You can run in Preview mode first to see the results, check the results and then run for real. But SongKong does not offer the ability to individually select each duplicate, it is designed to be be configured and then run, if you have many duplicates then checking each one manually is too time intensive. Although you could use my other software Jaikoz if you want to take that approach.

Yes, Fix Songs need to be run first.

Attn: paultaylor:
I’ve been running SongKong samples and really like its abilities.

Because of your presence here and also by the Roon questions I’ve perused on your forum, might you may be able to point to metadata settings – screen by screen – that you’ve found to derive the best results from Roon’s acceptance and usage?

It’s good that you’ve provided so many variables from which one may choose. But it’s a little daunting trying variations for Roon usage, when I presume you’ve established a routine(s) that simply work “play” well with Roon.

As previously mentioned, I’m dealing with over 10tb of FLAC (only) files. Metadata is fairly complete & accurate to a point, but consistency between fields & tracks/albums is questionable.

As you can imagine, genre’s are quite varied – predominated by Rock, Classical and Jazz et al. Album covers are 99.9% complete however. And bit rates are important to identify, because we like to perform listening comparisons 16 - 24bit, vs DSD & some double DSD etc…

Roon recommended screen shot settings if available, would be more than helpful and appreciated…-Mike

Hi, the short answer is the Default profile should work well enough, except possibly remove performer from the list of fields added due to Roon possibly not recognising performer and performer(instrument) as the same person - https://community.jthink.net/t/artist-artist-details/9240/8

Genres are difficult, SongKong can add genres from Discogs, but by default it will only add if your files currently don’t have a genre. This is because Genres are inherantly subjective, there is not an objectively correct genre but you can modify this.

SongKong identifies 16bit versus 24bit & Dsf from the audio encoding itself, and sets the Is HD flag, you can add this to the album title if you wish with the Add [HD} to album title option.

I need to do an extensive review of how Roon currently processes the metadata provided to it, this is not so easy since Roon doesnt document this in that much detail, and the processing is tagging format specific so it may handle a field in a Wav file okay, but not the equivalent field in a Flac file. Until I have done this I cannot offer you a definitive Roon mapping just now.