XMOS has designed a low-cost, cloud-connected, "Thin-client" AIoT design, comprising a far-field voice interface, Artificial Intelligence inferencing and user customisation. This design can be produced as a stand-alone module or integrated into a complex system. The architecture is flexible, enabling developers to add new capabilities and AI models to support a wide range of use cases.
The Thin-client's central component is xcore.ai a cross-over processor that integrates most of the digital functionality required at a low cost. The XMOS AIoT module demonstrates this in a very compact voice interface design (55mm x 30mm) including microphones, AI inferencing, and Wi-Fi.
Incorporating voice control into an existing product design was one of the primary use cases for this project, so removing the need for an engineer to learn new programming environments was a critical factor in gaining wider customer acceptance. XMOS developed a framework that enables machine learning engineers to develop AI models using their preferred tools.
XMOS investigated three areas of embedded AI voice processing: speaker localisation, identification and keyword detection. In all three areas, the core technology has been prototyped, demonstrated to be viable, and the algorithms and models made available for exploitation. Customer trials indicated most interest in local, customised keyword detection as this offers brand differentiation without a dependency on the tech giants’ voice service platforms.
XMOS also investigated short-range radar to add complementary functionality to the voice interface as this allows a system to detect human presence, especially people's movement, without the privacy implications of capturing visual images.
Using a 60GHz radar chip, XMOS demonstrated detection and classification of people in a room and the AI tools could identify them as one of a group of pre-registered individuals. This enables new applications where an appliance can adapt automatically for each user, e.g. a cooker could disable functions depending on whether a child or adult is present.
While radar offers novel capabilities, other use cases require the higher resolution that cameras provide. Internet camera systems typically send images via the internet to cloud-based AI inferencing systems. Processing these images and running inferencing at the edge, i.e. without image data leaving the sensor, significantly improves performance and addresses privacy and data protection regulations.
An AIoT module with a camera that uses the XMOS AI tools to validate a live image against a pre-registered photograph was created. No captured image data is stored or transmitted, protecting privacy, and the module can take action locally (e.g. turning on an appliance only when the owner is present).
The XMOS project executed a diverse set of trials to validate the product/market fit of these designs. The feedback from users on the technical and market readiness for this research was very valuable. A clear market priority that emerged is for low-cost voice interfaces with embedded keyword detection, which can be designed into products today. Multi-sensor applications and edge AI capabilities are still in the early stages of market acceptance; there is much interest, but it will be a few years before mass adoption.