Google Action for Roon control

Playing with Roon on my Google Home mini makes it clear how handy it would be to have voice control over my collection. Google Play Music already has something like this. You can tell your Google speaker “play <something> on <speaker>”, and most of the time, in my experience, it fails for unknown reasons. (There’s an endless thread about that on!msg/googlehome/E-Blpmr2UjI/Zez1f0o5CAAJ.) But when it succeeds (for some reason I can always play “Crocodile Rock” via voice command), it’s nice.

However, there’s this thing called Google Actions, which extend the capabilities of these devices, much like Alexa Skills. And I think all the pieces are available to do one for Roon. So you’d say something like, “Hey Google, tell <servicename> to play <something> on <zone>”. That command would go up to Google and be decoded. It would then find the publicly available server (let’s call it S1) for this <servicename>, and send the command on to it, along with, I believe, the authenticated Google ID of the user making the request. (Might be a <servicename>-specific ID, I forget.)

Meanwhile, back in your LAN, a Roon extension (call it S2) would be running and would have on startup opened an encrypted messaging connection to S1, telling it in addition the Google user ID of your S2. Thus when S1 receives the request from Google’s service, it sends it down to S2. S2 will receive both the textual transcription of the original audio command, and the original audio itself. It would then have to figure out what the user meant, which seems to be where the Google Play Music integration fails. And then use the Roon API to make that happen.

S1 is pretty minimal, as all it does is forward requests from Google, and would fit in the free tier of many cloud application services. S2 is harder; it would take some sophistication to get the text->intent mapping right. I would imagine an Alexa skill would work in a similar way; never looked at that architecture. Perhaps S2 could handle both of them.

Google Play Music is not relevant, it uses machinery not available to third parties for a bunch of tricky reasons. However, Actions on Google (the official name) for Google Assistant can use Dialogflow to parse the user request into a structured intent.

Yes, I was just citing it as an example of the UX.

Since my wife has an affinity for these NSA listening posts err “personal assistants”, it’s crossed my mind as well - some input or comment from Roon would be interesting.

Clearly the Chromecast SDK has some facilities here. The fact that I can say “pause”, “resume”, “stop”, “volume up”, “volume down”, and “next song” to a Google speaker, and have that reflected in the Roon remote control, means that there’s some level of voice control integration already going on. However, I suspect that’s a limited and local set of commands.

[Later] Yes, looking at the Chromecast SDK reveals that there are “Chome Senders” and “Chrome Players”, and the players have a limited set of events they can send back to the sender: volume changes, play/pause, track seek, etc.