We’ve fallen into the trap of trying to discern simple and consistent rules for parsing composer catalogs before–our current system is significantly more manual and labor intensive. Machines still involved, but they need to know about specific composers, their catalogs, and the (many) exceptions. There is absolutely no way this sort of thing could be done automatically. The tricky cases are the rule, not the exception.
The music history “industry” is awful at making and sticking to conventions across composers and eras, even through eras where the idea of international standards and reasonably modern practices for handling data were well established (for instance–one of the catalogs most rife with invented inconsistency–the Hoboken catalog–was conceived in the mid-20th century).
I’d love it if the music history world would “grow up” and ratify a proper international standard, and pay the editorial burden of re-numbering everything under a system that could be understood by machines…but I won’t hold out my hopes.
In this case, there is literally no good source on earth for consistent data. Just a series of bad editors–starting with the people who assigned the catalogs haphazardly in the first place, then the people who printed potentially incomplete or erroneous catalog#s onto CD packaging, then the people at Rovi/Tivo who entered the data.
I place most of the blame on the music historians–if they had taken the care to create and document (originally or retroactively!) a crisp, mechanically verifiable, self-consistent, and comprehensible system that worked the same for every composer and reflected these hierarchies clearly, then there would be no way to get it wrong, and downstream users of the data could be reasonably expected to handle it properly.
By making things so complex/inconsistent, they have created a situation wherein only experts know how to deal with the catalogs + non-experts are forced to view the numbers as opaque, and also a situation where the amount of effort required to handle everything properly is too large to repeat in each system that might want to do so.
Because catalog#s are only comprehensible to experts, the only real solution is one where experts lay hands on the data and organize it properly. Whether that’s a centralized process–via a standardization process + renumbering, or ad-hoc–in a crowd-sourcing system like the one we may eventually create–experts must do the work.
Sorry, been holding that rant in for a long time. It is incredibly frustrating how “close but far” the world is from having a firm handle on this stuff, and also frustrating how simple people perceive this problem to be from the outside.