4 Reasons Why OEMs Are Choosing EveryWordTM Voice Capture Technology
When talking to our clients, we always ask, “Why did you choose our ArkX Labs Voice Capture Solution?” The answer varies by vertical, but the one common response is, “The performance of existing OEM solutions for voice-enabled devices just doesn’t cut it anymore.” They lack range, do not operate optimally in loud or noisy environments or spaces with other competing noise sources, and their accuracy in capturing voice commands is subpar.
There are several reasons why companies are selecting EveryWordTM Ultra Far-Field for their voice-enabled medical, industrial, smart speakers, and smart devices. It is usually a combination of performance factors that these companies are seeking when evaluating their options. EveryWord far-field has proved its exceptional human-to-human or human-to-machine speech recognition performance when we go head-to-head with the “status quo, including superior accuracy and recognition rate, noise reduction and echo cancellation, low power consumption, and a natural voice experience. Our customizable mic array design also allows industrial designers greater flexibility. Our technology’s ability to operate in loud and reverberative environments is unmatched,
Here are just a few examples:
Accurate Voice Recognition Rate
Recently, a public safety technology company asked us to test the transcription capabilities of EveryWord against their current voice capture solution. For this client, accuracy in capturing voice commands in high noise conditions was key.
In a side-by-side WRR (word recognition rate) using an ASR evaluation tool, the results were clear: EveryWord outperformed the client’s current solution with a 98.27% accuracy rate vs. an 80.53% rate.
That is a remarkable difference.
Noise Reduction & Echo Cancellation from 3X the Distance
EveryWord captures voice commands with superior clarity and accuracy at up to 9 meters of far-field talker-to-microphone distances, around corners and obstacles, and in noisy and reverberative environments. For example, in hospital, industrial, and public environments, EveryWord’s powerful noise reduction, echo cancellation, de-reverberation, and full-duplex controls have the unique ability to identify and suppress speech-competing noise sources to deliver highly intelligible, natural-sounding audio for a great conferencing experience.
EveryWord solution separates itself from the competition in a way that many experts consider the most important function of a far-field audio front-end, the robustness of acoustic echo cancellation (AEC). In simple terms, this is what enables, for example, a smart speaker to hear you over itself when playing loudly. It also is what enables you to have a natural full-duplex conversation with someone without echo. To be able to reliably barge-in during loud playback or have a conversation without echo, the AEC must create and maintain and unique model for each acoustic path between each microphone and each speaker. That means for a 4-microphone stereo solution 4×2= 8 AECs are needed. Most other solutions have only 1 or 2 available while EveryWord has 12 AECs.
That’s a big deal.
Flexible Mic Array
EveryWord Voice Technology uses only up to 4 integrated microphones to achieve clear 360° voice capture from up to 9 meters. Until now, algorithmic requirements often limited the positioning, configuration, and orientation of mic arrays. This, in turn, constrained the industrial engineer’s design vision. This restriction was due to conventional algorithms using a planar array to view the world in the 2-D plane of the array. However, ArkX Labs use algorithms that hear the world in 3 dimensions by exploiting reverberation. This means microphones can be placed in any array geometry and the product can be mounted on walls, ceilings, and odd angles without killing performance.
For example, it can easily accommodate a 12-seat conference room, allowing a conference system to be fully integrated and compact, which streamlines deployment, integration, ongoing management, and operation. Since our technology does not rely on geometric constraints to define microphone configuration, placement, or orientation, this results in better performance and additionally allows industrial designers the mounting flexibility to achieve their product visions with fewer constraints.
What’s the Secret?
Exploiting reverberative energy in all 3 dimensions. Conventional beamforming originated from free field acoustics that focuses on the direct (aka shortest) path between the person speaking and the mic array. If you happen to live in an anechoic chamber, this approach is ideal, but in real home and office environments, the reverberative energy starts exceeding the direct path energy a short distance away from the speaker.
The bottom line is that by focusing on the acoustic energy in all 3 dimensions, our advanced far-field algorithms can tolerate fixed and moving obstructions in the audio path to better characterize and suppress noise and capture speech that enables voice-enabled devices like smart speakers or other voice agents to perform at longer ranges, around acoustic obstructions and in reverberative living spaces, workspace, or places with competing talkers and noise.
EveryWord™ also provides audio output processing that enhances fidelity and volume of playback. This results in “naturalness” and intelligibility of the person’s voice originating on the other end of the call or from the audio source. Beamforming technologies employed by most OEM devices just cannot compete.