Playing with Roon on my Google Home mini makes it clear how handy it would be to have voice control over my collection. Google Play Music already has something like this. You can tell your Google speaker “play <something> on <speaker>”, and most of the time, in my experience, it fails for unknown reasons. (There’s an endless thread about that on https://productforums.google.com/forum/?utm_medium=email&utm_source=footer#!msg/googlehome/E-Blpmr2UjI/Zez1f0o5CAAJ.) But when it succeeds (for some reason I can always play “Crocodile Rock” via voice command), it’s nice.
However, there’s this thing called Google Actions, which extend the capabilities of these devices, much like Alexa Skills. And I think all the pieces are available to do one for Roon. So you’d say something like, “Hey Google, tell <servicename> to play <something> on <zone>”. That command would go up to Google and be decoded. It would then find the publicly available server (let’s call it S1) for this <servicename>, and send the command on to it, along with, I believe, the authenticated Google ID of the user making the request. (Might be a <servicename>-specific ID, I forget.)
Meanwhile, back in your LAN, a Roon extension (call it S2) would be running and would have on startup opened an encrypted messaging connection to S1, telling it in addition the Google user ID of your S2. Thus when S1 receives the request from Google’s service, it sends it down to S2. S2 will receive both the textual transcription of the original audio command, and the original audio itself. It would then have to figure out what the user meant, which seems to be where the Google Play Music integration fails. And then use the Roon API to make that happen.
S1 is pretty minimal, as all it does is forward requests from Google, and would fit in the free tier of many cloud application services. S2 is harder; it would take some sophistication to get the text->intent mapping right. I would imagine an Alexa skill would work in a similar way; never looked at that architecture. Perhaps S2 could handle both of them.