Looks interesting but you really shouldn’t have to do phoneme analysis on the bot/skill side of things. If you give the speech rec system a full list of keywords for your domain (your library) then it should match things regardless of accent. Language plays into things as your music titles might have French terms but your going through an English speech recognizer but understand thick accents is more dependent on the recognizers acoustic model then anything.
BTW… I am not an expert on all things speech rec. I’ve just been in lots of meetings with people who are so I’m parroting back the things they’ve told me over the years.
Right. But individual differences in phoneme pronunciation can be extreme. It makes it difficult to do well (few false positives) on anything other than a small set of carefully chosen keywords. Amazon’s Alexa technology is probably at the leading edge there in recognizing unrestricted speech. Google often seems like it does better, but it’s really matching against statistical probabilities of various Google search queries, I think. Not really unrestricted.
So if we were dealing with “free text” searching on the bot/skill side of things where you’re searching over an unknown corpus then I would agree that doing some phenome processing could help. But in this case we have a known corpus in the form of the users music library. I’m just saying that if we give the recognizer the list of all the phrases in that corpus, the doing additional post analysis probably isn’t going to yield much improvement.
The things worth focusing on are trying to deal with ambiguity like “alexa ask roon to play some country” and responding with “did you want to play your country hits playlist or the genre country?”
Yep, you may be right. Surely you’ll need some way to disambiguate the Artie Shaw version of “Krazy Kat” from the Frankie Trumbauer version.
It would be nice if the browse API returned play counts as you could feed that into the ranking algorithm to help with disambiguation. The language model can help with identifying the users intent “are they wanting to play a track or songs by an artist?” which can help with disambiguating type but at some point you need to ask the user to disambiguate things. I’ll probably start with re-ranking the output of the browse call using edit-distance but if push comes to shove I’ll just build my own Lucene index where I can get a reliable relevance score over the entire music corpus. My gut says that’s where I’m going to end up at given I can then tune everything. Play counts would REALLY help though.
Does the Alexa service give you the phoneme analysis of the user command? That’s what was missing from the Google version (though they might have added it since).
So looking at the alexa docs… The priming for music skills consists of you uploading your services music catalog to them and they have a 500,000 entry limit for each type of catalog (artist, playlist, genre, track, etc.)
That could potentially be a bit limiting given the long tail of music that roon users are likely to have in their libraries. There are some tricks I could potentially try to reduce the number of priming terms I need to give them but likely I’ll end up needing to just look at the distribution of artist/albums/tracks across all libraries and take the top 500,000 entries from each category. I was hoping to not have to store data like this in the cloud as that gets into GDPR and data residency gunk for users in Europe so I’ll just have to figure that out.
That’s interesting. Having a domain-specific catalog should help with accuracy.
No… They don’t even give you the raw text of the users utterance for privacy reasons. You only get the intent, the slots that were matched, and the individual slot values. At least with Google you get the original utterance along with intent & entities that were matched. Back when I built the prototype Alexa Adapter for the Bot Framework I did a trick where I would run the Context Free Grammar (CFG) file Alexa has you build in reverse and I could sort of reconstruct the original utterance. That was so I could pass the utterance over to LUIS for recognition since their slot based system sucks when compared to LUIS…
One thought I immediately had (like 10 minutes ago) is you could cluster libraries into similar catalogs so think all of the people that like classical are together and everyone that likes rock or together but I a) don’t think that will work because you’d need multiple skills, and b) a lot of people like both.
Amazon does say that you can contact them and request a higher entity limit but honestly I figure that if I run into that it means a lot of people are using the skill so its a good problem to have.
This kind of strayed off topic so I created a separate thread for the Voice Control Extension I’m thinking about.
I thought I’d share the new RoonExtension class I built to wrap the current node-roon-api.
You create a new instance of RoonExtension and pass in options that tell it the services you require and whether you want to subscribe to zones & outputs. You can then call extension.start_discovery() and the extension class will take care of initializing the api with the requested services and setting up any subscriptions you’ve requested. The extension also provides the RoonApiStatus service for you and I’m planning to add in RoonApiSettings support soon.
Everything has been modernized to include full TypeScript definitions so you’ll get intelisense everywhere in VSCode. It uses EventEmitter based events so you can have multiple components subscribe to events. And it lets you wait for a core to pair asynchronously so for simple things you don’t need to subscribe to events at all.
I haven’t checked this code into GitHub yet but I can if others are interested. I can also publish it as a freestanding package to NPM if there’s enough interest.
This sounds really useful - I’ve been playing with the Roon API for a while and trying to prosimify the callbacks and respond to events.
I’d love to see your code and help test if needed.
I’m out of town this week but next weekend (or the week after) I’ll try to get something published to GitHub. I’m thinking about creating a roon-kit library that has a bunch of utility classes like the RoonExtension class.
Microsoft MVP for 20 years and an avid Roon user. I just found this thread and will follow what you are doing and help test and code. I have both Google Home and Alexa.
I also am a Home Assistant enthusiast and have the Roon extension running there. That works reasonably well. HA has a cloud service that exposes your home automation to remote internet connections. I did not try yet but believe that allows me to operate Roon while on the go (not sure that has value). That also enables voice integration.
Thanks @Bill_Wolff I will post back once I have something on GitHub.