Best Practice: Duplicate Finding & Deleting (FLAC library >10tb)

@Mrmb Just to add my experience on this point:
It is important that each listener indicates the main genre for each artist (at least the important, beloved ones) on all tracks of the artist. The best moment of doing so is while ripping.
I keep all this clean for genres I prefer, like Progressive Rock, Blues Rock, Greek music, etc.
Then, before importing the library to Roon, you go to Settings>>Library>>Import settings and choose to whether you need Roon genres, your genres or both. I chose both.
Then, later on, after when the library has been imported to Roon, you go to the Settings>>Library>>Genres mapping and map these (sometimes customized) genres to the AllMusic ones (I think these are the genres Roon uses by default).
Like this, you can still get to your favorite albums by using your genres while allowing at the same time the “Discover” option in Roon guide you through the music specialists’ genres Daedalus. :slightly_smiling_face:

1 Like

I am also interested in your future findings on this point.

Very good suggestions about Genres and how to handle them in Roon…Thanks @Themis!

Genre is an important, but usually a personal, subjective identifier as previously mentioned. At the very least, consistency in designating what is meaningful to each of us and our libraries, would seem to be paramount in Genre naming.

1 Like

After trialing “SongKong” and “dbPoweramp’s” “PerfectTunes”/“De-dup”, I prefer the latter for duplicate handling.

Although it took days for De-Dup to “listen” to my sizeable library, it was a simple hands-off process that is then saved for future use. In my case, that was important, because a brief power outage deleted De-Dup’s active output. From there, De-Dup’s graphical output recreation was in hours, verses days the original “listening” took.

I prefer De-dup’s graphical output that aligns each duplicate together. One can then listen to each and easily see each ones metadata with a Click. However, I prefer “SongKong’s” metadata findings and output compared to "PerfectTunes/“ID Tag” element. By comparison the “ID Tag” portion of “PerfectTunes” is rudimentary. My take is that one could obtain similar, to better abilities by using the free (with contributions requested) “MP3Tag” program, or using several media/music players.

1 Like

I’m using songkong as I type and hope to have some positive results…it’s a 13tb library with 250k tracks so I too expect it to take some to complete.

1 Like

I sent the following email to Paul Taylor (https://community.roonlabs.com/u/paultaylor) today to ask him a licensing question, to add a couple minor suggestions, BUT mainly to tell him how much I liked what he has accomplished with SongKong:

Hello Paul,

Following my SongKong Pro purchase today (7/19/20), I received the attached two (2) emails. After checking them and clicking on any links, I can not locate a number that will change my Lite version to a paid version. No other emails have been received (Junk/Spam) or otherwise.

I did check your forum, but found nothing that would suggest how, when, or in what manner a License Number would be forwarded.

BTW, I’m the “Mrmb” (https://community.roonlabs.com/u/Mrmb) that initiated the following Roon post: Best Practice: Duplicate Finding & Deleting (FLAC library >10tb)

You advice on the Roon forum helped lead to my purchase. But the results of testing the Lite version with my folders absolutely cinched the deal!:-)

Although my Roon thread involved duplicate removal, after seeing the metadata results from SongKong , I purchased it primarily for that purpose.

I spent several hours testing various SongKong configurations. You’ve managed to develop a program that removes much of the time consuming and painstaking labor involved in dealing with the meta-data in any music library, especially one that is in the multiple terabyte range. That isn’t to say that a couple of minutes and a few clicks, is all that is necessary to obtain tagging results that are completely satisfactory for ones personal needs and the configuration and condition of their music folder(s). However, given the complexity of the subject and the mindbogglingly number of variables involved, spending time testing and adjusting repeatedly, is completely warranted and expected – I.e. one must spend time to create a personal learning curve. SongKong’s outstanding results far EXCEED the learning time spent however!

Your tutorials, comments on various Roon threads and in the Jthink forum helped.

I do have a couple very minor recommendations:

  1. After I discovered it, the Help.pdf is superb. However, I spent quite some searching, before finding it. That is, I looked for something labeled a “Users Manual” or “Instructions” on your website (& under a Support tab) , plus I searched the Web and your forum to no avail. You may consider calling the Help.pdf a Users/Instruction Manual as well as considering it a Help add-on to the program itself. Placing a Users &/or Instruction Manual more prominently on your main Jthink/SongKong website and on your Forum as for example a Sticky/Read This First/User/Instruction Manual would have eliminated my time spent searching and head scratching.

  2. If Chrome is already Open, a SongKong Report doesn’t pop-up. Plus, clicking on individual Reports in the Report tab for the same reason, does nothing to output a report if Chrome is already open. I obviously learned this the hard way. Some sort of note/error message etc. about closing ones Browser to see Reports would have helped. Until I figured out the Chrome-is-Open connection, I did locate where the reports were saved on my PC and opened them there.

  3. Lastly, there is this email’s subject of where is the License found question? Maybe I overlooked it or missed it. But a registration instruction on your download page, or a Sticky/Read-This-First post on your Forum would perhaps preclude the issue I am having…

Best Wishes,

Hi @Mrmb

Sorry for the delay, I have been away on a long weekend so havent been monitoring my email last few days. Your email domain rejected my email as spam, I have now resent from two other email addresses, if that doesnt get through I can pm you the details. I will follow up your other points tomorrow.

To finalize my above questions (Best Practice: Duplicate Finding & Deleting (FLAC library >10tb)), with Paul’s answers via his SongKong Forum, I’ve copied our SongKong Forum posts below. Let me add that Paul has been very responsive and that I’m a fan of his SongKong efforts.

Paul,

  1. If even a couple potential, or new users find your new “Where is the User Manual?” topic, that’s a positive.
    In any case, the Help.pdf is cogently well done and is a very worthwhile read for a potential/new user who is trying to get a handle on SongKong’s options and capabilities.
    My problem was that when seeking an answer to a question, the SongKong program wasn’t being used. I was on the Internet/Forum & didn’t think of downloading/loading SongKong to find Help (a User’s Manual) . In fact, after no Forum mention of a Manual, I did a Jthink Website search and a Google search for a SongKong User’s Manual (obviously to no avail) . I’ve found that before purchasing something like SongKong, perusing a Forum or reading Instructions &/or a Manual aids making an informed purchase decision.
  2. The “Reports” being HTML should have given me more of a clue that it was browser
    activated (and the problem was that Chrome was already running in the background). But when I was in the middle of attempting to activate a Report & nothing happened, I was perplexed. Hence, my note inclusion suggestion – I.e. something more specific about the Report opening in ones browser or perhaps programming that would energize Windows’ Open With Menu. Nevertheless, no harm or foul.

I should add that in the overall scheme of things, I realize that the above 2 issues are quite inconsequential! But regardless of that being so, perhaps worthy of a mention.

  1. Acknowledged and resolved…Thanks
    [/quote]

FYI: The SongKong Forum post is here: https://community.jthink.net/t/no-license-received-yet-purchase-order-introduction-emails-received/9306/4

3 Likes

Can I ask a few more questions about this since what you say about De-Dup seems to apply to SongKong as well. For the best duplication deletion you need to run Fix Songs first and then set suitable identify deletion method for Delete Duplicates. But if you do this then SongKong will also create a report that groups the duplicates by artist and album, showing the duplicate that was kept and the duplicates that were deleted/moved. This is a html report and can be viewed independently of SongKong.

Below I show a simple example whereby I just made a copy of an album and then ran SongKong against them both.

Paul, Using only the trial version, I didn’t delve deep into SongKong’s duplicate processing to see what you depicted. Nevertheless, on the surface, I still personally favor dBPoweramp’s De-dups’ graphical depiction of supposed duplicates aligned together with each ones cover art (see attached pic).


By clicking the folder Icon, one sees the metadata associated with each track. And clicking the Play Icon, each track can be listened to. Plus, other than aligning the duplicates together, there are no automatic Keeps or Deletes being done. That process is purely a manual one on a track-by-track basis, or globally. SongKong’s may be likewise, I don’t know.

Although not depicted by the picture, tracks related to an album are all depicted sequentially, and in each picture (GUI), are kept in the same order album wise. So, if the track one has determined to delete is located for example, at the Top or Middle or Bottom, then, when scrolling-down, each Top/Middle/Bottom track can be deleted. The pictures definitely aid this process. Plus, it’s easy peasy to visually connect all the tracks to one album as one scrolls down deleting the duplicates.

At any rate, I haven’t a clue about the accuracy of either program. One could indeed be better than the other, or they could be identical in their output accuracy. When scrolling through the 1,000’s of my outputs though, everything made sense.

In any case, personal preference is at play with both programs – e.g. ( Tomayto / Tomahto).

With that said, as I own both programs, SongKong’s metadata handling is a hands-down/no contest Winner!

I enjoyed learning SongKong’s intricacies and am appreciative of how it has effortlessly improved the data contained within my hundreds-of-thousands of tracks. Kudos on a well done effort and more importantly, being extremely helpful and supportive both personally and globally with an eye towards tweaking and improving an already mature program!! Which, as an avid Roon fan, I hope your tweaks and setting suggestions help aid Roon to sort and accurately identify the millions of tracks we users throw at it.

Thanks that is very useful, De-Dups screenshot certainly looks prettier than SongKong one, I like the idea of being able to play songs and get to the metadata (I may add these), although with SongKong you can easily see filename which seems to be missing with De-Dups.

Importantly Songkong Delete Duplicates is not a manual process since I think it can automatically delete the right songs if configured correctly (as discussed at start of this thread). However if not comfortable with that you could run in preview mode and use report to manually delete songs, but this could quickly become a chore if you have many duplicates.

I also note in your De-Dups screenshot the duplicates found are same song on totally different albums (I.e original album versus compilation). Songkong can also do this but personally I think better to restrict to only delete duplicate songs on copies of same album otherwise you end up with incomplete albums. Thus is configurable in SongKong I don’t know if it is De-Dups.

2 Likes

The screen shot I provided was culled from the internet. What follows is a pic from my PC:

In retrospect, my previous comment about clicking on the Folder Icon was inaccurate. Clicking on the Folder Icon brings up a Windows Explorer screen, focused on that file. Only when hovering the cursor over the file, its metadata is then shown. I’m not sure if that is built in to Windows or it’s because of other Software I have loaded. Nevertheless, one needs to click to obtain file information; whereas as you pointed out SongKong provides it at the outset ( Tomayto / Tomahto again). On the other hand, clicking the Play Icon, does indeed play each track.

De-dup is both automatic and manual, with the Mass Delete Button seen at the top of the screen and the Delete button seen by each track. The other thing seen at the top of the screen is a “Possible Matches” header. From what I’ve seen it sorts those at the beginning and Tentative Matches (I believe that was the nomenclature) below those.

Also one can readily see, what I attempted to explain (above) about the graphics. That is, once you’ve established which album you want to delete (if doing so individually), the order of the albums as depicted by the graphic stays the same. The ordering is identical as one scrolls down through the tracks of the album. This makes for a quicker more unerring deletion decision – I.e. once the order is visually established – reading the text as a double-check – needn’t be done.

In a large collection like mine, a mass delete is out of the question. I have 100’s if not 1,000’s of the same albums and tracks. Some are from different issues, reissues and remasters. And others are of varying bit rates – I.e. 1) Standard Def.; 2) High Def. (of various bit rates) and 3) DSD (dsf); and a few 4) Double DSD. I like listening to the same song in its different iterations, to discern what the differences are and what is favorable and not – yeah, I know somewhat daffy, but an audiophile is oftentimes “audiophoolish”.

FWI: De-dup’s other Options can be seen below:

As I mentioned, I pulled the previous pic from the internet, so I’m unsure of the sequence before or after what’s depicted. But when the same song was on two different albums (E.g. Original vs a Compilation vs Different bit rates), I found that those albums were sorted together (sequentially), which helped prevent deleting one song from one album and another from the second album – ending up with incomplete albums. However, I even have some purposeful incomplete albums. Inconsistently consistent is the state of my library :slightly_smiling_face:

Thanks again, I have now added the ability to play files in to my SongKong enhanement list. I have also added something about displaying metadata although doing it the same way as De-Dup will not work because SongKong can be run in remote mode as a client-server with the files on the server not being accessible form the client device, so I probably would have to create metadata in the report itself.

You clearly have an extensive library, but there are a couple of options in SongKong that may help you. If you run Delete Duplicates in preview mode then it will not delete anything but generate a report you can use.

Assuming that you would not want to delete the same song in different formats (e.g Dsf, Wav, Flac) you can prevent this easily by enabling Find duplicates within same audio format only.

If you set Song is a duplicate if has same to the most restrictive option of Same MusicBrainz song and same album (specific version e.g same country/date) and sounds the same then it will only find songs that have been found on the same version of album multiple times and have same the acoustic fingerprint (so sound the same).

This will not find all your duplicates but it should generate a list of ones that really are duplicates without removing any alternative versions/remasterings ectera.

i use a file explorer called directory opus since years now

https://www.gpsoft.com.au/

in addition to many useful functions, it also has a very good duplicate search

Good suggestions Paul…

Presently, after recognizing SongKong’s very robust but easily executed capabilities, I’m running it to “Fix” (& add) to my metadata. Cost wise, it was well worth that function alone. After years of collating and curating my library and its metadata, I thought it was in fairly good shape, I found otherwise when trialing SoundKong.

I’m currently using a 6-hard drive DIY PC with my collection spread across several drives. Once I get the collection in as good as possible metadata state, I will condense the data on a couple hard drives and load them into a DIY HTPC running the Roon Core. From there, I will energize Roon and see the end results. I may attack the duplicate issue sometime thereafter, with the plan being to run SongKong’s delete duplicates following your suggestions…Thanks!

Can anyone tell me if these programs work on music files stored on a Nucleus+?

There is no reason why they should not. So long as the OS can see the files (in Windows File Explorer) then so can SongKong or any other program. SongKong works in trial mode until you license it, so it is trivial to find out.

I have found in the past that some programs cannot access network stored files unless the drive containing them is mapped to a Windows drive letter. This is easy to do though if any programs still need that.

Songkong is good but will take a while with the files not on the same machine you run it on

Indeed, on my sizeable collection, SongKong running on an Intel i7-8700, 3.2 GHz, 6-Core CPU, took: 3-days; 3-hours; 18-minutes; 46-seconds – but whose counting :wink: However, considering SongKong is analyzing (essentially “listening”) to each and every track minus any “babysitting”, the length of time question is really quite moot…

I just did a side by side between perfecttunes and songkong on a library of about 180GB, 33K mostly mp3 files.

The baseline library has been a growing mess for many years and have gone through a few passes of various tools including songkong in Fix mode over the past couple of years. So I made two copies of the whole library and ran both apps. Sorry, I didn’t really track timing. Both took about 2 days on a mid level Lenovo i7 with an SSD.

PerfectTunes mass delete worked well for the exact matches. The scanning mode was lengthy but the delete itself was quick. It deleted 455 exact dupes.

As for its 'suggested" matches, most were way off so after deleting a few, i decided it wasn’t worth the time and left them alone. It was rarely even close.

Songkong deleted less files (449). And it actually made the library slightly larger - I think the reason for this is it created a lot of extra duplicate directories with (#'s), each with a few of the songs in them. That is a huge problem as reconciling them would take forever. I think thats why I’ve had so many problems with songkong before. Maybe it’s an easy setting I could have changed but that’s never been clear.

All in all, I feel trust songkong for matching/fixing but used perfecttunes for deduping. I wouldn’t have bothered with perfecttunes but I had a license as part of dbpoweramp which I needed for a Nimbie batch ripper.

That’s a whole prequel to the saga - I got so sick of trying to fix my library, I just decided to start over and re rip all my cd’s to lossless using a Nimbie, setup roon server, built a hifiberry, and am now trying this Roons stuff. So far so good!

Not that DBPoweramp is great at ripping - it ■■■■ the bed for unmatched disks and puts them all in the same Unkown folder by default. A change to the naming string fixes this and creates Unknown Artist + GUID. But I had to spotcheck and guess which of 1200 CD’s didn’t match a release and rerip them. Maybe about 100 in all.

Anyway, at this point I’m just trying to salvage the old MP3’s I’ve downloaded or gotten from friends and add them to roon in a separate folder.

Below is an example of the extra directories on the right that songkong created for a 3CD boxed set of star trek music! (a must have btw)