You are jumping around there a bit.
Are you looking for answers to old roon radio or the new version?
You are jumping around there a bit.
The old one, the 1.5 version. I don’t have streaming.
Shortly after writing that about DS Audio, it occurred to me that I could probably do the same thing by putting Roon on shuffle, and get more or less that same effect, but with additional Roon goodness. Doing that now, and it works fine (though I’m still uncertain about what it means to press the thumbs-down button to skip a suggested track – will doing that have consequences in the future?).
Are you shuffling your library or using radio, they are different things.
Radio is designed to play tracks related to what you initiate it with. This could be related or associated srtists, genre, bands that are similar in style . All streaming services offer this kind of thing. The new Roon radio allows this to expand out of your local collection to tracks that are available in Tidal or Qobuz but not added to your library so it’s a great way to discover new music and artists you might not know or just not got to yet. The new radio also gets data from other users libraries as to what they like dislike to try and improve what it chooses. If you don’t have streaming services then it’s purely limited to what you have in your local library. I highly recommend you trial Tidal or Qobuz to try it out. It’s initiated by a single track or the last track you have in your queue finishes.
I think this question is based on classical thinking about database queries, which is not really how we do things today. A generation (or two) of software engineers were taught at the altar of correctness and integrity: you dont want the same $100,000 to be withdrawn twice. But today, if I ask Google, which is the most popular R&B artist in Seattle? and it tells me Symone Kamaria, how do I know that is correct? What does that question even mean? It’s no longer about correctness, it’s about value.
I have no inside knowledge of how Roon’s search algorithm works. But let me tell you a somewhat similar story: a few years ago I talked with people who ran a cloud-scale search engine about the database architecture. They told me, when you enter a search, they start a timer. Then they send the query, plus everything they know about you and your computer and location and time history and reputation and the color of your dog, to various information providers: not just the search engine but the advertising system, geography-based advertising, advertiser accounts, fraud detection… When the timer hits 10 milliseconds, the results from any of the providers that have replied in time goes into an algorithm, the search results and the ads are constructed, and sent out. If the advertising account status doesn’t respond in time, maybe an ad is displayed even though the advertiser’s budget is already exhausted. Timeliness trumps correctness.
How did R&B and Seattle and my search history and my age go into that result? AND or OR?
This is why I was sitting in a Frankfurt hotel, reading the British Guardian, and an ad for Canon zoom lenses from B&H Photo in New York appeared on the front page. Was it relevant at the time? Yes. Was it accurate? Huh?
In all builds
If you elect to “shuffle play” something, the thumbs purely mean “add to queue” or “reject” and have no other effect. Shuffle play is simple randomized play of the content that you chose, no added flavors or ingredients.
For Library Radio (or 1.5), not the new cloud-based Roon Radio
The thumbs up on the queue screen purely means “add to queue” and has no longer effect.
The thumbs down on the queue screen means “reject” and slightly adjusts the current radio session only. There is no long lasting effect.
For the new, cloud-based Roon Radio released in 1.6–
Give honest feedback to the algorithm and trust us to do the best thing with it. It is designed to learn from everything you do while it is active. It is not going to over-react and ban John Coltrane from your life just because you thumbsed down something once. There is no way for you to ruin your own experience with those thumbs,
The thumbs are primarily used to improve large-scale models that are shared amongst everyone. They help the algorithm learn how to make better picks for everyone. They do not immediately or drastically update our model of you as a listener in any way that is worthy of concern.
The number of aggregate feedback items we’ve collected so far is already very large™ compared to any one user’s feedback, and especially compared to any single action that you take, and the feature has been live for less than five days. You do not have enough statistical power over the algorithm as one person to skew it in any meaningful way.
I love Anders’ notion of value above. It’s a great way to think about it…so let me frame radio in those terms.
There are three goals for Roon Radio: to have lots of people enjoy it, to minimize negative feedback, and to encourage people to grow their libraries. Some of the first things I looked at after the algorithm was live for a few days was the amount of usage it was getting (both in absolute terms + per-person), the proportions of positive/negative/no feedback, and the number of library adds that result from radio play.
The algorithm is good if people use it a lot without complaining very much and their libraries are growing as a result. This is the Value we are trying to maximize.
A more concrete example
Let me give a simple example of a way in which the algorithm might learn. There are more complex ways too, but they require more mathematical foundations than I haven’t thought through how to explain here, so lets start with a simple kind of learning that we can explain in mostly familiar terms (there are more indirect/mathematically intense models inside of Roon Radio…perhaps I will explain some of them one day, but for now, lets start with something more accessible).
For this example, lets start by considering the subset of tracks that Roon Radio has picked for anyone at any time in the past which have also received the “bad for radio” feedback.
(Just to make my point, I actually clicked “Bad For Radio” after taking this screenshot, even though I love this song…because it. doesn’t. matter. I am just one person, not big enough to screw up the data.).
Let’s further focus on the “This Track is bad for radio” feedback. Given enough time, every track is going to get some. Maybe someone particularly hates Stairway To Heaven or has a particularly narrow view that radio should only play 3 minute songs, or whatever. So they say “Bad for Radio” to us. That’s fine.
We might start by disregarding all tracks that have been played very few times or only received a small number of feedback events because we expect data for those to be too noisy. So maybe we only look at tracks that have at least 10 feedback events and have been picked more than 100 times (more likely we would not use absolute numbers to make this more robust to the data set changing size).
First we must establish a baseline by looking at the mean or median number of “Bad For Radio” feedback events per track in that data set. This tells us how much “bad for radio” we should expect for a typical track.
I would expect a “u-shaped” distribution–a concentration of tracks that have little/no “bad for radio” feedback, and another concentration that have a high proportion of “bad for radio” feedback, and a relatively weak amount of data between those extremes, but would definitely make the plot to test this hypothesis if we were actually doing this .
Once we understand the distribution, we might be able to come up with a rule. For example, “any track that is two standard deviations from the mean in its proportion of “bad for radio” feedback is probably “bad for radio” for everyone”. Essentially, we have come up with a very simple model for predicting future “bad for radio” feedback. We could use that model to make it much less likely that we pick those tracks.
We could also eliminate them entirely, but then there would be no way for a track to slowly redeem itself over time. I think if we were to build this system in practice, the elimination would be probabilistic, like “reduce the likelyhood of picking Despacito based on how far from the mean it is”. This would allow the algorithm to smoothly reduce the prevalence of a track as feedback comes in, but without making a rash change all at once. One feedback event moves the needle an insignificant amount, so there isn’t ever a place where the straw breaks the camel’s back and the algorithm has a “step change” in its behavior as a result of one increment of data showing up.
Here’s a refinement to that idea–what if a bunch of people vote “Bad For Radio” on Despacito because they are temporarily exhausted of it. So many people hate it that it is being picked one millionth as often as before (e.g. never, for practical purposes). Is that good?
I say no–we shouldn’t ban it forever. Its un-welcome-ness is a temporary situation, so maybe we should do something to weight the feedback based on when it arrived. More recent feedback has a stronger weight attached to it, but feedback from a year ago is only half as informative–something like that. Techniques like that give the system extra robustness and a self-healing quality. If people continue to hate Despacito, it will continue to meet the model criteria for a “bad for radio” track. As people stop hating it, it will slowly return to normal pick probability.
To be absolutely clear–I am not describing the actual guts of Roon Radio. This is more like a brainstorming session about what to do with “bad for radio” feedback that is meant to help you understand the long torturous path between a single feedback event and tangible impact on the algorithm’s behavior.
Over time, as we collect more data, we can add more models to the system to make it smarter, or enhance existing models to make them perform better. Most of those models are data-driven, and will slowly change their exact behaviors over time based on what the data teaches them.
It’s actually too early in our feedback gathering to have a model like the above. 5 days into Roon Radio, we don’t have enough data collected to make that work. As of today, “bad for radio” does nothing at all other than take note. In a month or three, we will have enough data to actually implement an idea like the above.
I hope that helped. This is a complex topic, and I could write at least as many words about how to develop a good mindset for understanding all of the machine-learning systems that we interact with daily and having good intuition of how they work and what they are/aren’t good at doing.
Think about self-driving cars for a second–humans often cause accidents out of inattentiveness or failing to see something or by being drunk. Self-driving cars are much better than us at paying attention and at keeping their eyes on the road and they can’t drink. But they are worse than a human with tracking a lane when the road markings are ambiguous or in poor condition or when there is a lot of snow on the road. They don’t make some of our most common mistakes, but they also make some “easy” mistakes that we don’t make.
This kind of gap between what the humans can optimize for and what the machines can optimize for is key to understanding all of these systems–the goal isn’t to make the same radio stream that a DJ does. That’s a distinct idea, with distinct value, best judged differently. The value is in a radio feature that a lot of people enjoy engaging with a lot, annoys them as little as possible, and that helps them find new music delivered for a tiny fraction of the cost of a 24/7 personal DJ. That’s it.
Wow! Thank you, thank you, thank you Brian. I will link you post to the similar thread I started.
To be sure about it, does your explanation apply to both current track and upcoming track thumbs? And, if I’m reading accurately, neither shapes my individual algorithm in my profile directly?
If so, I (and evidently some others) can relax about this and thumb away.
Brian, wow, thanks so much for this incredible post. So enlightening. The mere fact that you took the time to write such a comprehensive explanation, and even more so on a Saturday afternoon, is a testament to how much you care about your user community. Respect and appreciation!
Lets look at all of the feedback items–
“This Doesn’t Fit” isn’t a real item–it exposes three more–
There is one kind of feedback that I think makes a lot of sense with respect to your personal profile–“I don’t like this” – if you pick that option, it seems logical that we would keep track of that with your profile and let it influence future behavior for you specifically, due to the way it was worded.
“Not Related”, “Bad for Radio”, and “Holiday Music” are impartial judgements. It would make no sense for us to interpret those as evidence of your personal taste because they are worded to express an opinion of the algorithm’s performance, not something about you.
“I don’t want to hear this right now” is worded deliberately vaguely to give you an escape hatch that lets you skip something without really influencing future behavior (personally or globally). It is basically equivalent to giving no/neutral feedback.
The upcoming track thumbs-down shows the same feedback options as above, and behaves the same as the current track thumbs-down. If you know enough to thumbs-down something, your input is just as valid as if you’d heard it first.
The now-playing thumbs-up and upcoming track thumbs-up are currently recorded, but we aren’t doing anything it them beyond that. I am honestly not sure how we will take them into account, aside from using them as a way to measure algorithm performance. If we do so, we will be sure respect the ambiguity in the user interface that can’t distinguish between “I like this track” and “I think this is a good pick”.
It’s not in our best interest to interpret your feedback in a way that isn’t true to the words on the screen at the time when you made the click…we even keep track of the software versions that go along with the feedback so that we can understand it in full context even if the choices change over time.
I think there is nothing to be anxious about here, but let me know if you feel otherwise…
Thanks…helpful insight–especially the John Coltrane example.
I like the idea that songs from the cloud from artists and albums I don’t know are songs that have been chosen based on some intelligence over a genre focus & shuffle approach (i.e. they aren’t songs from the album that don’t match and get played the least from the monster data set and songs that aren’t good for radio like 30 second crowd clapping and spoken word intros prior to next track or 30 second instrumental intro’s to popular songs not attached or highly experimental songs that have data to support they are disliked for radio). If I am going to get introduced to a band like The Beatles for the first time I don’t want the track to be “Revolution 9” when my seed is some other 3 minute Beatlesque pop song. When I try a new artist on a streamer I tend to start with the most played tracks so it seems some of that type of intelligence combined with other is used to dish up a playlist.
NOTE: When I hear a new artist or album in Radio the first thing I do is exit radio now playing screen to see when the album was originally released, the AMG star rating and if the track was an AMG Pick track. If it was my local library I might try to see the back cover and any other art. I add new stuff I buy from reviews to my library and sometimes hear it for the first time in radio
Radio exploration is of special interest for the car so that alone is worth at least a $10 streamer (vs FM or SAT radio).especially if Roon goes mobile one day.
Thanks again. I’m getting a good handle on it. The anxiety was because of the unknown. I’ve been asking questions to reduce the unknown and your answers are as good as diazepam (almost ).
The anxiety came also from past experience (not PTSD level anxiety though!). Not Roon experience, but Pandora. After having thumbed up some tracks, I found them to repeat obnoxiously. Fortunately, they have a control panel where you can edit your thumb action and I got that straightened out. Don’t use Pandora much anymore now though!
I intend to, once it’s available in California.
Yes, I’ve built these things myself, I understand what you mean. But what I want, for value, is “will it play a pleasing and surprisingly apropos series of tracks given the seed?” And I think it would have to be magical to infer what I want from a single seed. I think the thing to do with local radio is to treat the tracks as suggestions and then thumb-up and thumb-down until you have enough in your queue.
Yes, thank you, Brian. I appreciate the thought you put into your answer. Once Qubuz is available here, I’ll try the new radio. I’m sure I’ll appreciate it.
This makes sense as an explanation for where the anxiety is coming from. Then again, Pandora operates in a very different environment–they start by knowing nothing about you and have to learn everything they can learn from radio play. And anyone who uses it on lots of devices (car, tv, roku, computer, …) is likely not to even have a single login, so they are really prone to amplifying “noise” from a small number of interactions.
On the other hand…we start with your library, play history and favorites, and have a far better picture from the get-go. We will probably add some kind of initial taste profiling step to the onboarding flow in the app in the future to help “bridge the gap” for new users. This is less important for radio, because even if we don’t know about you at all, we can still organize picks around the seed. When we build more recommendations-oriented stuff, even something relatively straightforward like a “New Releases For You” list, it will be crucial to know something about you to get the out of box experience right for new users.
Anyways–we have much higher-quality sources of information about you than the radio thumbs. Pandora has no other choice but to amplify a few feedback interactions into a noisy picture. I view radio feedback as being primarily about improving the radio experience.
So, I wonder how different is selecting some focus, then doing shuffle on it, as opposed to selecting some focus, then doing radio on it? Presumably, even with a single song as a seed, it is essentially doing a shuffle among some set of songs it thinks are similar, which probably means “matches the seed in some ways”, like genre.
@Brian, you may have noticed that one of the distinguishing features of audio enthusiasts is their delight in the details of improving the SQ. Here, I’m wondering how to improve the sequence.
In 1.6, we don’t let you start radio on a focus because the intention behind that interaction is almost totally indiscernible. 1.5 did allow radio kickoffs on more combinations, but the results were super indistinct. You can shuffle anything of course–and that is really the best thing to do for random play on a focus.
Valid radio seeds today include artists, albums, tracks, and genres. These give us a clear “point in space” to organize picks around.
Step 1: use our newest/best stuff . I know you’re waiting for Qobuz in the US…but there is more improvement in taking that step than in anything else you can do in the mean time.
thank you for the explanation, really appreciated.
I would like to check what you mean with
Does this mean that the learning algorithm learns from what I add to the playlist while the radio is going?
I asked this in a different thread
Hope I manage to explain the idea.
The stuff that you put in the queue does eventually make its way back into the data pipeline that feeds radio, but not quite in the way you are thinking. The fact that it is interspersed with radio tracks does have influence on what the algorithm learns. It is not analogous to a thumbs-up, but it is also more than zero-influence and more than the influence of playing the track on a different day.
The idea of recording a track added to the queue “at the same level as a thumbs up” would be bad data handling practice. To the extent that that things are recorded, we try not to discard information along the way–that has a way of poisoning the well. A thumbs up is a thumbs up. A track played in the middle of a radio stream deliberately is a different thing.
I feel a KB item , will there be one
This explanation is fascinating
I chose the wrong time to do my 4 week road trip away from my system
Thanks for your great explanations regarding Roon Radio.
With reference to the above quote, I have a work core and a home core which I authorise/deauthorize as needed. Will Roon Radio learn my tastes from both core’s based on my logon details so that they affect each other or as separate entities?
They’ll affect each other. All of the taste stuff is based on your Roon profile–so if you have multiple people in the household using Roon, they can all be separate.