Published on February 3rd, 2021 | by Emergent Enterprise0
How Watson, IBM’s AI, Is Powering Audio-Interactive VR/AR Environments
In real life, usually the best way for people to communicate is vocally. Yet often in virtual and augmented reality experiences that option isn’t even available. But that is changing as can be seen in this post from Anne McKinnon at VR Scout. Developers of AR/VR applications and devices are increasingly integrating voice recognition and artificial intelligence resulting in more engaging and true-to-life experiences. This mashup of technologies is making its way into gaming and training so that users can direct the experience in powerful ways. As people get accustomed to voice recognition devices and interactions with AI and bots, talking to tech is becoming more common and even expected.
Image courtesy: Ben Hider / Getty Images
As VR and AR continue to become more mainstream, the expectations of users are also on the rise. The last big breakthrough we saw in ease-of-use interactivity was the launch of hand tracking and gesture recognition on enterprise and consumer VAR devices.
IBM predicts that AI will unlock the next generation of interactivity for XR experiences, describing in the 2021 Unity Technology Trends Report that the maturity of AI will play a key role beyond hand tracking, and into the world of voice.
This will include query based voice interactions for a new level of digital agency, and even the ability to interact with and control digital environments through conversation.
Curious to learn more, I reached out to Joe Pavitt, Master Inventor and Emerging Technology Specialist at IBM Research Europe.
NATURAL LANGUAGE PROCESSING
Natural language processing is a type of machine learning that powers realistic conversation between humans and machines. IBM’s key technology in this area is Watson, their AI assistant.
Pavitt describes how Watson uses classifiers to recognize different components in speech. This makes it easier to interpret varying inputs or asks, and also easier for a developer to build the speech interfaces into an experience.
“When you program [speech] into a game, you may have 10 intents that you will need to handle, but the freedom of being able to use your own voice as the user makes it feel like you’ve got infinite things to ask. Even if you ask something completely obscure, you could still classify it in such a way that it’s integrated with the story and the flow of what you’re expecting,” says Pavitt.
He gives the example of Star Trek: Bridge Crew, a VR game that was made in collaboration with Ubisoft. You can play with ‘crewbots’ who, with the help of Watson-powered voice recognition – will listen to and carry out commands.
“You could be the captain of the Enterprise, and bark orders at anyone with your voice. You didn’t have to keep hitting menu buttons, you were just talking to the characters and telling them what to do,” says Pavitt.
In terms of voice recognition, he explains how natural language processing works in this context.
“You have ‘increase the engine power to 70%’. That’s classified as ‘engine power’. We know the intent of what they want to do. You could also say ’increase the engine thrust to 70%’. You could say ‘increase thrust’ without saying the word ‘engine’ and it would still be classified ‘engine power’. So functionally, that’s how it works,” says Pavitt.