Embarrassing Search Capability

brian · December 29, 2018, 1:03am

I agree. Our current search is terrible. It’s a Lucene index over the textual information in the music metadata done in the simplest way possible.

At the time when we built it (prior to our launch in 2015) we had no usage data on which to base a smarter search feature. And of course like @AndersVinberg said, people do not give that kind of data away–even “not for free” and definitely not to partners in the context of a relationship like the one we have with TIDAL. These sorts of data assets are considered extremely valuable and in need of protection.

So we had to build our own–this means acquiring enough users, and waiting enough time for them to create data. Somewhere in 2017, I started to judge our pile of data as “likely to be big enough to do interesting things” and we kicked off projects to begin warehousing it properly + building data/learning pipelines to make use of it.

Along the way, we had to totally re-invent our approach to running/hosting cloud services, carve out real, modern places to run regular batch processing, come up with new processes for development, deployment, release management, etc for all of the data processing bits + cloud services that we will have to build.

We also had to build basic expertise in the team about machine learning–both casual expertise in the hands of our product design team, and also deeper domain/implementation expertise to get the work done. I spent months, personally, studying this field + getting practical experience under my belt just to be able to properly manage this sort of transition. We also employ outside experts for advice (in a consulting capacity) to make sure that we are not missing anything–since this is both a new domain of expertise for us, and also a very fast-moving field.

Recently, we moved 95% of our cloud infrastructure to a new cloud provider, and transitioned most of our services into a container-based architecture–which has opened up significant efficiencies in how we develop/roll cloud services–efficiencies that we will need to deliver on our goals in this area. We “flipped the switch” on that just a few weeks ago after months of running old/new systems in parallel and iterating on the processes, operations management, etc aspects of the “new world”. Like most operational transitions, if it happens silently, it was a success and few people noticed this one…getting that done moved a large amount of backend work done over the past year into the “shippable” column.

All of this is groundwork required to start deploying product improvements based our new data-related capabilities. Whether that be a better search feature that uses usage data to form a model of content relevance + use that to improve results, recommendation systems, smarter navigation paths in-app, improvements to “radio”…there is a lot of potential to be extracted from all of this.

Contrary to the snide remarks about “junior coders”, doing this stuff well is actually a lot of work. The good news is…we’ve been putting effort into this for a long time, and we’ll be able to start releasing the fruits of that labor soon.