Periodic Reporting for period 2 - NR1 (AI-centric Server on Chip for increasing complexity and scale of AI inference applications, enabling the scale of real-life AI applications.)
Periodo di rendicontazione: 2024-07-01 al 2025-06-30
NeuReality addresses these challenges with the NR1, the world’s first Network Addressable Processing Unit (NAPU) designed as a Server On Chip (SoC). Together with the NR1-M module, the NR1-S server, and a cloud-enabled SDK, the company provides a truly AI-centric infrastructure. The architecture removes the CPU from the critical data path and introduces hardware-based control engines, enabling linear scalability, higher utilization of AI accelerators, reduced TCO, and an order-of-magnitude improvement in cost-performance and energy efficiency.
Since mid-2024, NeuReality has advanced the positioning of NR1 into a full AI Inference Appliance. This turnkey solution integrates compute, software, and orchestration into one platform that is Private GPT and Agentic AI ready, delivering value “out of the box” across on prem, edge, and cloud deployments. The company roadmap includes appliance configurations for standalone servers, clustered data centers, edge deployments, and Appliance-as-a-Service models, expanding the accessibility of enterprise-ready AI inference.
This strategic approach directly addresses the growing demand from enterprises and cloud service providers for efficient deployment of Large Language Models (LLMs) and other AI workloads. By maximizing the utilization of AI accelerators (increasing from ~30% in CPU-centric systems to 100% with NR1), NeuReality unlocks economically viable AI applications at scale, bridging the gap between cutting-edge technology and real-world adoption.
On the hardware side, the NR1 SoC was fully integrated and validated. The memory subsystem, ARM N1 CPUs and interconnects were tested at full speed, and complete RTL and emulation flows confirmed the design. The system successfully booted Linux and executed inference workloads, showing robust functionality at target performance levels. Advanced security mechanisms such as secure boot and TrustZone were also implemented to ensure that the device meets the reliability and security requirements of modern data centers.
On the software side, driver optimization and dedicated DSP components for audio and vision workloads were completed. Integration with Qualcomm AI100 Pro accelerators was also finalized, delivering linear scalability in which every additional card continues to operate at peak capacity.
Within WP3 the platform software stack was delivered. The SDK was extended to support large language models, with an OpenAI-compatible client API, PyTorch and ONNX support, and integration with orchestration frameworks such as Kubernetes. Monitoring and logging capabilities were also connected, creating a software platform that can be easily deployed and managed in enterprise or cloud environments.
In quantitative terms, performance reached 22,000 frames per second for ResNet50 on a single NR1-M module, scaling linearly to 220,000 frames per second across ten modules. Networking throughput was improved by factors of two to almost three thanks to optimized offloads and protocol enhancements.
From the customer perspective, live demonstrations at SC24 attracted strong attention. Proof-of-concept deployments were initiated at Cirrascale, Fidelity and Rebellions, while new collaborations with Qualcomm, AMD and Nvidia were established to broaden the ecosystem.
These achievements mark the transition of NR1 into a validated and enterprise-ready inference appliance, showing both technological maturity and initial market traction with early adopters.
Benchmarks of image classification workloads (ResNet50) showed 22,000 frames per second per NR1-M module. Scaling to ten modules maintained linear efficiency and reached 220,000 frames per second in combination with Qualcomm AI100 accelerators. This demonstrated that NR1 removes the CPU bottleneck which typically limits accelerator utilization to around thirty percent.
For large language models, NeuReality integrated vLLM and model sharding. Tests at Cirrascale with Llama 3.1 (70B parameters) indicated more than sixty percent higher performance compared to a CPU-centric configuration. These results were confirmed in joint work with Cirrascale and Qualcomm.
Energy and cost benefits derive from this higher utilization. Based on system measurements, NR1 achieves a reduction in inference cost of up to ninety percent compared to GPU servers with CPU-centric architecture, while also lowering power consumption. These figures are consistent across both vision and LLM workloads.
The results have been validated in live demonstrations, such as Conversational AI and system performance showcases at SC24, and in early customer engagements with Cirrascale, Fidelity and Rebellions. These activities provided external confirmation of the platform’s capabilities in realistic deployment environments.
NeuReality also advanced its intellectual property portfolio to support these results, reaching a total of twenty-four patent families, with six already granted and eighteen pending. This portfolio strengthens the technical foundation for further exploitation.
Further steps are required to achieve full impact. These include broader demonstrations with enterprise and service provider customers, continued IPR support, development of commercialization and financing pathways, and alignment with regulatory and standardisation frameworks that can facilitate adoption across markets..