Voice Control Extension

My post about modernizing the roon SDK strayed off topic so creating a separate topic for anyone that would be interested in an extension that lets them control roon endpoints using a voice assistant like Alexa.


I’m new to roon and already wishing I could control playback using my voice. I have a lot of expertise in building conversational experiences so thinking I may take a crack at it. I know the roon developers attempted this and abandoned the project which I can understand. Developing conversational experiences is hard (or at least harder then it should be) and building a “good” experience for a domain as broad as music is doubly difficult. You often need the help of people that are experts in things like Language Understanding and Conversation Modeling. Fortunately I’m friends with some of the brightest minds in the world in these fields so I have plenty of smart people I can ask questions of should I get stuck. :slight_smile:

I have a day job that keeps me busy so this is a weekend project at best. My current project ships in 2 weeks so I have some free time coming up that I can work on this. If anyone is interested in collaborating or just kicking ideas around please feel free to chime in.

1 Like

I already have some Roon control with Alexa. Not perfect - Alexa Stop usually advances to the next track. I have Sonos in several rooms and it behaves better than Roon with Alexa. That said, I would welcome proper and full control of Roon using Alexa. Not interested in any other voice control application than Alexa since it is already well established in my home.

1 Like

Did you build the Alexa Skill yourself? If so I’m assuming you did it as a Custom Voice Skill. It sounds like you need to add more training utterances for your Stop intent unless you’re using the pre-built intents. Unfortunately this is the nature of the beast… You typically need at least 100 utterances per intent to get things reliable and then you have to be careful about how you balance your training utterances. You don’t want 200 for one intent and 50 for another. There’s about another dozen thing you have to watch out for.

I am using the built in ones. Good idea to create custom, thanks; think I will try that although most of the time I want to use voice control I am on the Sonos app.

1 Like

Hi @Steven_Ickman,
good to hear that you like to do something for Alexa. @Klaus_Engel and me we published a Roon Extension for the Apple Watch (https://rooExtend.com). This extension enables Siri to control Roon.
Maybe you are interested to contact @Klaus_Engel to talk from developer to developer.
Best DrCWO

@DrCWO I saw the rooExtend stuff… Nice work. I’d love to sync up with you, @Klaus_Engel, and others to get a better understanding of the roon related projects are going. I’m just getting started with my project so no real timelines yet for when I’ll have anything worth showing. I have a simple little extension working that can dump out now playing information for all the zones in my house. I have to admit that I was impressed with how simple the node SDK is to use. The documentation for the SDK is incomplete and has some errors but once I tweaked my code to match the actual JSON coming in over the wire, everything pretty much just worked.

I’m thinking about creating a new NPM library called roon-kit that packages up the RoonExtension class I’ve built with some other utility classes to make building roon extensions even easier. My day job is designing SDK’s so it’s difficult for me to start any new project without first building a framework :slight_smile:

Is there a separate discord server or gitter channel where developers chat? I don’t want to spam the community here if there’s a better place to get technical. I’d love to learn more about the architecture for rooExtend.


I’ve started working on this project if you want to follow along:

I have a tendency to build frameworks so I’m designing this extension as a more general purpose application called Zone Control. The primary focus will be to let you control your roon endpoints using voice but I’m designing it to be more flexible then that. A lot more flexible…

The architecture of Zone Control will be broken into a collection of controllers, endpoints, drivers, and plugins. If you’re familiar with the Media Player Remote Interfacing Specification (MPRIS) you can think of Zone Control as just a more modern version of MPRIS that’s multi-zone aware.

The main Zone Controller app will expose a series of configurable endpoints that can be used to control your systems zones via Voice, HTTP (other apps), and even MPRIS. These endpoints generate commands which get routed through a configurable set of plugins before being sent to drivers. Drivers understand how to control your systems zones (Roon, Volumio, etc,) Plugins can modify commands before their routed to a driver so you could create a plugin that does a phonetic search over your library and corrects the command “play Billy Eyelish” to “play Billie Eilish” before sending the command to the Roon Driver.

The project is just a bunch of empty stubs right now but I’m hoping to have a simple command line version of the controller working in the next week or two.