AI-centric Server on Chip for increasing complexity and scale of AI inference applications, enabling the scale of real-life AI applications.

Project Information

NR1

Grant agreement ID: 190180284

DOI

10.3030/190180284

Project closed

EC signature date 20 November 2023

Start date 1 July 2023

End date 30 June 2025

Funded under

The European Innovation Council (EIC)

Total cost

€ 7 252 975,00

EU contribution

€ 2 500 000,00

2 500 000,00

4 752 975,00

Coordinated by

NEUREALITY LTD
Israel

Periodic Reporting for period 1 - NR1 (AI-centric Server on Chip for increasing complexity and scale of AI inference applications, enabling the scale of real-life AI applications.)

Reporting period: 2023-07-01 to 2024-06-30

Despite the potential for AI to add €13trillion to the global economy by 2030, acc. to McKinsey, the growth of AI both in the EU & worldwide is currently facing numerous challenges, incl. the complexity & high cost of AI deployment due to limitations in the current processor-centric AI computing architecture. This architecture creates bottlenecks, costs & power overheads that limit the use of existing deep learning accelerators & scalability, hindering the deployment of many AI applications whose business models cannot support the high cost of existing infrastructure (ranging from €6000 to €300,000). It also blocks the scalability of hyperscaler solutions, consumes too much power & leads to under-utilization of already installed resources.

NeuReality's NR1 redefines future AI computing systems for ultra-scalability in disaggregation infrastructures to enable the growth of real-world AI applications. With its innovative AI SoC, AI-centric server, and cloud-enabled virtualized software development kit, NR1 delivers key values such as linear scalability, flexibility, cost optimization and energy efficiency (10x improvement in power and cost performance efficiency compared to CPU-centric architecture). The system architecture puts AI computing at the center with hardware-based data path control engines for optimized resource allocation and utilization, and a dedicated AI accelerator for AI pre-processing, post-processing and deep neural network calculations. Overall, NR1 has a huge cost advantage through reduced total cost of ownership, elimination of system bottlenecks, and implementation of appropriate system accelerations for disaggregated deep learning use cases compared to current CPU-centric AI accelerator solutions.

Main activities and achievements:

- Successful Tapeout (TO) of the NR1: The design of NR1 was completed and ready for manufacturing without any issues.

- Bring up of the NR1 including the buildup of NR1-M module and NR1-S server: The bring-up process involved testing and verifying the NR1 chip and system after manufacturing. This included validating the NR1-M module, which housed the NR1 chip designed for connectivity via PCIe, and the NR1-S server, a complete system for running AI tasks.

- NR1-M with External DLA on NR1-S AI Inference Appliance: The integration of an external Deep Learning Accelerator (DLA) with our NR1-S AI inference appliance, using Qcom AI-100 DLAs, was successfully achieved. This setup enabled a benchmark demonstration system to showcase the enhanced AI processing capabilities of the NR1-S platform.

- Release of Software SDK and Toolchain: We have released our Software Development Kit (SDK) and comprehensive toolchain tailored for our early adopters' customers. This SDK includes essential tools and software necessary for developers to efficiently build and deploy applications on our platform, enhancing accessibility and usability.

- Establishing a Software Developers' Portal: To foster collaboration and engagement within our developer community, we have established a dedicated software developers' portal. This portal serves as a central hub for developers to access resources, share insights, and collaborate on projects related to our system and software ecosystem

Performance Results:
• In all testing, we ran various AI pipelines across model, base and full AI workloads as a real-world scenario for the true sense of diverse modalities including text, audio, and images, namely Conversational AI and Computer Vision, in inference using Cost per AI Query (tokens, sequences, frames, audio seconds) relative to Nvidia DGX-H100 – a highly-priced GPU (graphic processing units) commonly and mid-market L40S used in both AI training and inferencing. In every use case, NR1-S combined with a comparable AI accelerator significantly decreased the cost versus CPU-centric Nvidia GPUs – with an average 90% savings.
• The performance limitations of CPU-centric architecture extend beyond the specific AI pipelines we tested. As pipelines grow in complexity, CPU bottlenecks increasingly hinder system performance.

Cost Reduction and Efficiency:
• Projected cost reduction by up to 90% through the elimination of system overheads and bottlenecks within two years.
• Potential Impact: Expected to facilitate cost-effective adoption of AI technologies, making them more accessible to businesses of all sizes and sectors

Energy Efficiency and Sustainability:

• Forecasted prioritization of energy efficiency, resulting in reduced power consumption and carbon footprint in AI-centric data centers within two years.
• Potential Impact: Anticipated contribution to environmental sustainability goals by mitigating the environmental impact of data center operations and promoting green computing practices.

Beyond State of the Art:

Achieved groundbreaking resource utilization facilitated by our R&D and operational proficiency and efficiency, pushing the boundaries of what is currently achievable in AI inference capabilities.

NR1 – the world’s first NAPU.

NR1S™ - AI Inference Appliance.

NR1-M™ AI Inference module™

Periodic Reporting for period 1 - NR1 (AI-centric Server on Chip for increasing complexity and scale of AI inference applications, enabling the scale of real-life AI applications.)

Share this page Share this page on social networks

Download Download the content of the page