WEKA's Triple Play: New Hardware, Record Performance, and RAG Architecture Advance Enterprise AI

Nov 20, 20242 min read

WEKA unveils comprehensive AI infrastructure solutions: NVIDIA Grace CPU storage, record-breaking benchmarks, and new RAG reference architecture for enterprise inferencing.

WEKA has announced three major developments that strengthen its position in enterprise AI infrastructure: an industry-first NVIDIA Grace CPU storage solution, record-breaking performance benchmarks, and a new reference architecture for AI inferencing.

Grace CPU Integration: Power-Efficient AI Infrastructure

WEKA's latest innovation, previewed at Supercomputing 2024, combines their AI-native data platform with NVIDIA's Grace CPU Superchip and Supermicro's storage server technology. This solution addresses a critical challenge in modern data centers: delivering high-performance AI capabilities while managing power and space constraints.

The new system leverages 144 Arm Neoverse V2 cores to deliver twice the energy efficiency of traditional x86 servers. When paired with NVIDIA's networking technology - including ConnectX-7 NICs and BlueField-3 SuperNICs - the solution can achieve network speeds up to 400Gb/s.

Key benefits include:

Up to 10x acceleration in time to first token
10-50x increased GPU stack efficiency
4-7x reduction in data infrastructure footprint
Potential reduction of up to 260 tons of CO2e per petabyte stored annually
10x lower energy costs

Benchmark Dominance

WEKA has announced unprecedented performance in cloud environments across all SPECstorage Solution 2020 workloads. Notable achievements include:

AI Workloads: Outperformed competitors by 175% in raw performance at 64% of the infrastructure cost on Microsoft Azure
EDA Workloads: Delivered 60% faster response times compared to NetApp's fastest 8-node system
Video Data Analysis: Achieved 12,000 streams, surpassing their own previous record of 8,000 streams
Software Builds: Effective processing of 7,472 builds, outperforming competitors through superior latency

WARRP: Simplifying AI Inferencing

WEKA's newest initiative, the WEKA AI RAG Reference Platform (WARRP), provides a blueprint for building production-ready AI inferencing environments. This infrastructure-agnostic architecture helps organizations implement Retrieval-Augmented Generation (RAG) at scale, a critical technique for improving AI model accuracy and reducing hallucinations.

WARRP integrates:

NVIDIA NIM microservices and NeMo Retriever
Run:ai's GPU orchestration capabilities
Kubernetes for data orchestration
Milvus Vector DB for data ingestion

The architecture offers:

Hardware, software, and cloud-agnostic deployment options
Streamlined GenAI application development
Workload portability across cloud and on-premises environments
Optimized model loading and unloading for complex inference workflows

Infrastructure for the AI Era

These developments reflect WEKA's comprehensive approach to addressing technical challenges in AI adoption. Recent studies indicate that data management (32%) and security (26%) remain top technical inhibitors to AI/ML success. WEKA's enhanced platform aims to solve these challenges through unified access, hybrid cloud support, enhanced data liquidity, and streamlined data pipelines.

Insights From Analytics