Voice Capture Technology: Why the Status Quo is No Longer “Good Enough”
In a consumer study on voice recognition technology conducted by PricewaterhouseCoopers, 62% of respondents surveyed expressed frustration with the lack of understanding, reliability, and accuracy that their devices deliver. Our own research — talking to both clients and consumers — revealed that a significant number of consumers are frustrated when a device doesn’t always hear or understand what is said. This can happen because the user may be too far away, there may be too much background noise (like streaming music or movies on the TV), or there may be objects between the talker and the device that may cause interference. Another point of frustration is that the device hears words that sound similar to the trigger word, and the device becomes confused. The bottom line? Clearly, there is room for improved performance. People are tired of screaming at their devices that don’t work as they expect they should. The status quo just isn’t good enough anymore.
So Why Aren’t OEMS Stepping Up to Deliver?
In many cases, the smart devices finding their way into millions of homes, schools, and businesses are voice-enabled products that are heavily subsidized by the platforms to which they connect. The end game is a “market share grab,” capturing the largest number of customer connections at the lowest possible cost while providing a “good enough” solution that people will buy.
Few companies are optimizing in the direction of providing the most capable device to enable the “best” user experience. As a result, today’s consumer generally believes that repeating commands, moving closer to the device, clearing obstructions from the audio path, lowering the volume of competing audio sources, and other user adjustments remain a necessary evil of the technology to make voice capture work reliably. And decidedly that is not the case, and some companies are beginning to differentiate their product offerings with better performance.
A Better Solution:
EveryWord™ Ultra Far-field Voice Capture
At ArkX Laboratories, we’ve developed an advanced production-ready voice solution that delivers on the promise of enhanced noise reduction, 3X the usable range of typical competitors’ offerings, and more accurate real-world trigger word performance.
How does that work? Our EveryWord™ ultra far-field technology is based on 3-D reverberation science and not traditional “beam forming” technology used by many competitors. 3-D reverberation doesn’t rely on geometric constraints to define microphone configuration, placement, or orientation. The old beamforming technologies often resulted in false positives and false negatives or required users to repeatedly shout to have devices hear them accurately. 3-D reverberation overcomes those problems.
In addition, our technology tolerates fixed and moving obstructions in the audio path, making it perfectly suitable for complex living spaces, workspaces, or places with competing talkers and noise.
Another game-changer is the use of 12 independent Acoustic Echo Cancellers (versus the competition’s standard one or two) that provides superior barge-in performance. This translates into a talker being able to speak a command without hesitation and without lowering background audio sources like TV’s or other Audio products.
Finally, ArkX offers platform-neutral solutions. The ArkX solution is simultaneously compatible with multiple voice services and trigger-word providers, including Alexa, Google, Siri, Cortana, AliGenie, Baidu/Kitt.ai, and Tencent. Users can pick and choose from the best skills available from each platform and craft a solution that best suits their particular needs combining the best skills available from each platform and crafting a solution that best suits their specific needs. Also, utilizing our new Sensory collaboration, EveryWord Voice Control allows OEM’s to create their own branded voice experience with custom wake-words and command sets. Gone are the days when users had to start every command with “Hey Alexa,” or a similar provider wake word. Users can now utilize their own brand name as the wake word to begin every command and, in so doing, reinforce their own brand experience with users.
Why OEMs Are Now Taking Notice
From an OEM perspective, there are real limitations with the current built-in options that dominate the voice space. A growing number of companies across a wide variety of verticals want a much higher standard of performance. They seek something that can be customized to work seamlessly within their ecosystems and is uniquely “ownable” by their brand. From a business point of view, a better customer experience can increase brand value and translate into higher margins.
The bottom line is that this technology can easily be built into most electronic devices — from consumer and industrial products to medical devices and even robots — allowing nearly any electrical device to be converted into a superior-performing voice-operated device. Our solutions deliver exceptionally enhanced human-to-human and human-to-machine speech recognition compared to anything in the marketplace today.