Skip to main content

Multimodal context and voice recognition for seamless voice control technology interfaces with low upfront cost

Periodic Reporting for period 1 - XMOS (Multimodal context and voice recognition for seamless voice control technology interfaces with low upfront cost)

Reporting period: 2019-01-01 to 2019-12-31

Electronic devices are proliferating and they’re getting smarter. We’re moving towards a new era – one where intelligence is embedded in the fabric of the world around us – our homes, vehicles, workspaces and cities, even the things we wear.

This is the “Artificial Intelligence of Things” (AIoT). By 2025, it is forecast to have grown to 65 billion devices, consuming 180 zetabytes of data and a costing 3 trillion dollars globally (Gartner: 2019). This proliferation of devices is going to create challenges around latency (delay), privacy and accuracy. We’re going to need new ways to interact with these technologies: a more intelligent interface that keeps costs low and design potential high.

Today, these interactions still rely on touch (keypad) and are command based rather than conversational. The remote control still dominates the living room and the keypad search experience is limited and frustrating. This is particularly true when compared to the freedom of simply telling your TV what you want to watch, from anywhere in the room. There is also a shift in the way we’re going to expect our devices to behave. We’ll want them to know when we want them to interact with us and - equally important – when we don’t. Our devices will be aware of our presence, able to identify us from others and interpret information to ‘understand’ the environment they’re operating in.

The application of machine learning or artificial intelligence (AI) to the various data inputs from connected devices (IoT), gives context to the information collected. This will enable us to deliver intelligent decisions that elevate the end-user experience and create a more intuitive interaction with technology.

This project playing a fundamental role in the development of these new human-machine interfaces, developing technologies that combine voice with other sensors (eg radar and imaging). Artificial Intelligence is applied to the captured data enabling local decisions to be implemented in nanoseconds and context rich metadata to be sent to the cloud to support broader services and experiences.
During the first reporting period the XMOS project focused on developing core technologies to enable low cost, multi-modal interfaces:

A low-cost voice interface processor that delivers an optimized far-field voice interface
Novel radar-based sensor techniques which can add complementary functionality to the voice interface
Enhanced AI models targeting speaker identification
New processor technologies for high performance edge AI /AIoT applications

The key achievement of the project so far has been the development and market launch of a very low-cost far-field voice interface processor device (XVF3510) which implements algorithms developed under this project to extract voices from high levels of background noise. This product was launched into the market in August 2019 and has successfully passed initial customer trials. It is currently being designed into commercial products that will launch in 2020.
The XFV3510 2 microphone voice processor solution developed under this project is the lowest cost/premium performance device available today and is already enabling wider application of voice technologies and will a key component of the XMOS business in the next three years. It also forms a core component of the next generation of low cost interfaces that this project is targeting in the next period.

In addition, the project has demonstrated the use of radar imaging for detecting and identifying individuals and the use of low power microcontrollers to detect keywords and classify speakers.
The next phase of this project will integrate a selection of these key technologies into a complete demonstrator system including both client and cloud-based functionality, utilising advanced capabilities in a new edge processor device.

Overall the XMOS project is enabling the realisation of the next generation in voice control technologies with disruptively low upfront cost and new capabilities, including:

- Novel healthcare & fitness applications
- Low power/Energy efficiency (near zero power wake up with presence & identification)
- Safety systems (real time, multi-modal)

The unique architecture under development in this project is versatile, scalable, cost-effective and powerful. The fast processing and neutral network capabilities will enable manufacturers to build smarter, sensing technology products that make life simpler, safer and more satisfying for all.
Potential customers testing the XVF3510 live at IFA.
XMOS stand at Voice Summit in Newark.
Radar device for detecting human presence.
Low cost voice client (inside the white line in centre)
Amazon qualified 3510 demonstration and evaluation system.