Maximalism? What is the Genre for Hiromi?

brian · March 30, 2018, 11:54pm

The field gets a fair amount of research attention. There is a lot of prior art for extracting mood/genre information and determining audio-based similarity. At this point, the most widely used mood and genre databases out there were generated by CNN’s, not humans.

The whole tradition around classifying music via genres is a mostly irredeemable mess. Genres vary widely in granularity and meaning. Different people have different strongly held convictions about how it should work that are often self-contradictory. Even in official, maintained genre systems, genres are overloaded to represent other concepts. Bebop is a pretty well-understood/defined thing, but String Quartet should be a form. Then there is International…Fusion could mean many things, but is usually understood in a particular way.

The goal of a machine learning system doesn’t have to be “better human visible labels”. That may not even be a problem worth solving–humans may have already proven that getting everyone to agree on labels is futile, and practicality dictates that most people aren’t willing to laboriously design and implement their own labeling system for themselves.

But–clustering or relating items without labeling the relationships is likely a lot more tractable.

Anyways, the main point I want to make is a little bit of a tangent, but interesting with respect to your “humanistic” comment.

There’s an interesting shift going on in how machine learning systems are operating. We started with big collective systems like pagerank or collaborative filtering, where the whole world shares one model. When things appear personalized, it is just personal queries or simple filtering of the shared model.

This is changing. Machine learning systems are starting to adopt a split model where the models themselves are trained on a user by user basis. This is happening for the usual boring reason: computing power continues to get cheaper.

In this scheme there’s usually an expensive centralized training step that is shared, and then the model is duplicated for each user and refined based on data that came from that user (often these models then do their inference work on the user’s hardware). So this is still a bit of a hybrid approach. At some point, it will be practical to centralize it again.

The benefit of this sort of approach is that it brings it closer to the user who is experiencing the results. We don’t have to come up with a model that everyone agrees on–the model can learn the objective parts generically and the subjective parts in context of one user’s experience.