Voice control trumps Roon

I purchased a lifetime subscription to Roon under a year ago, and have a sizeable local library thus have a significant motivation to keep using it (if you believe in worrying about sunk costs). And yet I have stopped using it entirely in the last few weeks for the lack of one feature, voice control.

Without Roon I have cd quality music, streaming to my two proper hifi’s and assorted smart speakers around the house and everyone’s phones and all the speakers can control (play, pick, change volume) of all hifi’s/speakers via voice control or from the phones lock screens / dedicated apps.

I tried for almost a year to convince other members of my household to use Roon and it just never stuck. Even setting up the home assistant plugin which allowed play/pause control via voice commands. The UX just can’t compete with saying “Play song X on speaker Y” or “Play genre X on speaker Y”

I know Roon has plenty of special features but for one reason or another I don’t require them, its main selling point was being able to easily access my local library

Basically, Roon needs this feature to stay relevant, even if it only started with a basic Alexa/Siri integration.


Funny. I tried to get my family to start using the voice control I meticulously set up, but they all thought it was silly. After a year of talking to my home I find I agree with them.


I can scarcely think of a less desirable feature for Roon than voice control. The whole point of Roon for me is that I can quickly and easily see all sorts of music that I might want to listen to and get it playing immediately.


I agree that voice isn’t sufficient for all use cases. A decent desktop/mobile UI still has a much deserved place. Particularly for reading up on details, or obvious things like modifying custom play lists. And Roon does this very well.

But its the basic day to day listening, device control which is so much easier by voice. To start a playlist, skip a track, change volume.


I always thought I would use voice to play all my music, but it’s ok to say play money by Pink Floyd, but 80% (made up number) of the fun of Roon to me is picking what I listen to next and dive into the rabbit hole of who else the band members played with etc.
I might be an outlier bit I do not see myself going back to voice much, but very happy it works for you.


Voice control is a waste of time and space. My friends attempt to use it for their Echo show and it fails so often and just wont understand his wife’s voice. Going to dinner and hearing constant Alexa play … at loud volume is tedious and not very relaxing. We all end up using connect or Bluetooth in the end. Voice is fine for a few simple commands like turn it on off. I just don’t use my Alexa for anything than turning on or off lights.


Agree, a dedicated listening session is much better via a proper UI.

Perhaps I should be more clear. I am in no way suggesting that voice alone is sufficient, or that its the best possible interface but rather that when comparing Roon against alternatives, for my use case of playing CD quality music around my house having the option of voice tips the balance away from Roon.

I love to sit with the Roon app for a dedicated listening session, but no one else can be bothered to use it, to pick the right playback zone or use any of its other features.

As is I can walk into a Room and ask my assistant to play a playlist, genre or a custom station and music begins playing. I can then pull my phone out and dig deeper on the app if required.

And anyone can get music with only a very basic introduction to the voice commands, rather than an in-depth tutorial in how to use Roon / install the app on their own device. Most people are not power listeners.


I have some experience with Mozilla DeepSpeech (an open source speech recognition system) and might tinker with it and see what I can do to control Roon. Previously I got some very rudimentary voice controls written for Kodi Media Center, like play/pause/up/down/select, but I didn’t go as far as being able to do full library search features like “Play [ARTIST]” or “Play [GENRE]”. Doing more complex stuff like that requires a lot more sophistication, and probably some customization of the microphone to filter out what’s being played on the speakers, and a custom speech model to interpret all the artists, and songs in your Roon player.

In general I find if you are precise in what you ask for and it can hear you properly, then it is fairly good with the limits of skills that various companies produce for their equipment. The biggest problem I find if that often the skills (3rd party) themselves leave a lot to be desired and are often poorly implemented and/or very limited.

I personally use it a lot in my home for control numerous devices including basic control of Roon via Harmony. Also ask Alexa alot of general stuff which is where I am find it most useful. For eg, I cant remember a track, but I know the artist and which decade or rough range of years of release - I can ask alexa a simple qualified question include such contraints and usually it get me the answer I want. I just tried ‘tell me a song by rhianna and emimem released 10 years ago’ - it gave me the correct result - love the way you lie. Another - Cant remember an old track by p!nk, so I ask tell me a song by pink released about 20 years ago - got correct answer - get the party started from 2001.

These are just a couple of examples, but I use these kind of more complex queries alot with Alexa before typing a more specific query into Roon to hopefully get a meaningful search result out of it instead of nothing or lots of rubbish.

When you have this kind of query capability in the background, then having the direct play skills in Roon starts, a well implement voice skill could be very useful indeed. Of course, if you have a voice it can understand in an environment in which it can hear you clear - which tends to be a problem when you combine a non-alexa based playback with alexa trying to understand you because it cant dim/mute the music after the wake word.

I also know people who seem to really struggle to do anything with Alexa at all - I guess I am lucky with fairly accentless (southern england) english being my native language, so it understands me quite well and more so since voice imprinting it so it specifically recognises me.

I should think there would be a huge ongoing burden for implementing fully integrated Alexa search into Roon - all the metabase from Tidal and Qobuz would probably need to be uploaded to Amazon to be searchable and additionally what to do about additional content on user’s local storage especially (as an extreme example) a unique dubplate that was handed to me by the original artist back in my DJing days and subsequently ripped to digital, or my own produced music I have in Roon etc. These uploads will also mostly likely demand updated agreements with Tidal an Qobuz to cover the additional metadata usage.

Then what level of integration? Transport and now playing display on show devices only (which is what I initially wanted to do)? What levels of search? Auto dim/mute on wake word? (not even sure this is possible except when playing via an alexa device). Do you support alexa device playback and thus have to provide and internet accessible stream (which really is the domain of a mobile access project). Some of these could not be separated in the Alexa SDK, but hopefully that has changed by now.

It seem that there a huge amount of work involved, a lot of ongoing support and new infrastructure required and probablly new business agreements etc depending on the level of integration carried out. This has to be balanced against that some users already have at least some basic transport voice control available (via harmony and deep harmony in my case). While I do have this, it is somewhat let down by the Alexa harmony skills that I just do not find reliable along with the problem of Alexa hearing me while music is playing that it cant dim/mute which in the end is the real killer. If it cant hear me - then everything else is pointless.

I really want to see this, but I am still not sure that a good user experience on Roon’s terms is fully possible so I can fully understand the lack of it. Of course I havn’t looked at the Alexa SDK for a few years now, so maybe it has improved some more.

My biggest issue however with Alexa is some skills I want are US only - Tidal for eg.

1 Like

Voice control in Roon would be of no value to me. Sorry the o/p has left Roon.

The complexity of integrating Alexa and making useful intents is probably why nobody has figured out a good way to do it. I think anyone that tries it just ends up going down a rabbit hole and never ends up with anything that is any way useful.

I’d want the speech recognition system running on the same computer that Roon Core is running, no cloud stuff, cause it’s too slow and too much data would have to pass back and forth from Roon to the cloud.

There’s ways to do it with Deepspeech. To give you an idea where’s it’s at this chess game is running entirely locally, no cloud.

And this was 2 years ago, Deepspeech has improved a lot since then. It’s been spun out into a startup and rebranded as https://coqui.ai/

I have started using voice controlled devices using my Google Home and it’s very useful.

No need to do anything to be able to turn tv on, control volume, mute, turn on other my coffee machine, boiler, shutters and of course different lights.

It could have been very useful for me also to be able to play something in Roon, control volume, skip songs especially in radio mode where I just want to experiment new songs I like and skip others, tag playing song into playlist quickly, etc.

I think that in 2022 this is becoming a differentiator that should be taken into account seriously.

You can do a few of these things with Chromecast speakers: adjust volume, next track, pause and resume. I’m running a CCA into my house audio, and any of the Google Home devices (or my phone) can control it by name, e.g. “Hey Google, pause CCA House”.

Chromecast speakers may work to some extent but as probably like most people, I use Roon (with hqplayer) using a DAC connected to analogue setup. I mostly interested to be able to skip and tag songs playing in radio mode or search for songs/playlists.

Voice control and any association with one of the large data grab or privacy invasive technologies out there, particularly Amazon, would be repellent for me so if it were included I would need to be sure I can disable it and there is no way it can surreptitiously help itself to information about me, my devices etc though Roon.

I will say, and I do mean this in all sincerity, that Alexa is a very very good kitchen timer.

I find that unless my hands are full or dirty, I forget to use voice control. If someone would invent a multi-timer voice-controlled timer with no internet connectivity that could understand all the voices in my household, I’d be very happy. I hate the privacy concerns just for a kitchen timer I can shout at, but I continue to make that trade-off.


Sure. But my top DAC still has S/PDIF inputs, and the CCA has S/PDIF output, so you can just use it as a voice-controllable streamer.

Last summer I could hear my neighbour in his yard shouting “Hey Google. Play 'Arctic Monkeys”.

The speaker plays something. Then shouting again. No! ‘Arctic Monkeys’!! Over and over again.

Cycle, rinse and repeat.


Yes my experience exactly and that sounds like me on many an occasion.

I ended up shouting at mine in the hope that it would eventually understand and respond with the actual music I wanted. Pretty much given up now with it as it’s pretty much useless to pick more than an artist or maybe an album (if lucky) and I realize that’s not how I listen to music.

Now relegated to controlling lights and a few other things (great for timers alarms)