Key Considerations for Selecting the Right Memory Technology for AI and Edge Computing Systems

Mar 26
4 min read

Read below:

Memory as a Core AI Performance Driver: Bandwidth, power, latency and thermal behavior now shape throughput, energy efficiency and reliability more than raw compute alone.
Different Workloads Demand Different Memories: HBM suits bandwidth-intensive training, LPDDR is ideal for power-constrained edge inference and non-volatile memories enable instant-on, resilient systems.
Effective Trade-offs Require Expertise: Partners like McKinsey Electronics help align memory choices with performance targets, power limits and long-term scalability.

Memory selection has become one of the most critical architectural decisions in modern AI and edge computing systems. As model sizes grow and inference moves closer to the edge, memory subsystems increasingly determine system throughput, energy efficiency, latency and even functional feasibility. Engineers must balance competing constraints across bandwidth, power, capacity, cost and integration complexity, often under tight thermal and form-factor limits.

Let’s explore the practical considerations engineers face when selecting memory technologies for AI training and edge inference, focusing on DRAM variants and emerging non-volatile memories.

Why Memory Is a Bottleneck in AI and Edge Systems

AI workloads are dominated by data movement rather than arithmetic. Matrix multiplications, attention mechanisms and convolution layers continuously stream weights and activations between compute units and memory. This creates the well-known memory wall, where performance scaling is limited by memory bandwidth and latency rather than compute capability.

In edge systems, the challenge is compounded by strict power budgets (often <10 W), limited cooling and the need for fast wake-up and deterministic latency.

As a result, memory choice directly affects not only performance, but energy per inference, thermal stability and system reliability.

Memory Technology Options and Trade-Offs

Bandwidth vs Power: A Central Trade-Off

HBM for AI Training

HBM achieves its bandwidth through 3D stacking and very wide interfaces (1024-bit or wider). Modern AI accelerators can exceed 1 TB/s aggregate memory bandwidth, enabling high utilization of thousands of parallel compute units.

However, this comes at the cost of complex silicon interposers or advanced 3D packaging, high power density near the compute die and significant BOM cost.

For large-scale training and high-end inference, HBM is typically required for the highest-performance AI training accelerators, but it is overkill for most edge workloads.

LPDDR for Edge Inference

LPDDR5/5X offers substantially lower energy per transferred bit compared to DDR5 and HBM. While peak bandwidth is lower, it is often sufficient for quantized or sparsity-optimized inference models.

For battery-powered or thermally constrained systems, LPDDR enables longer operational lifetime, simplified power delivery and smaller form factors

Latency, Persistence and System Behavior

Volatile vs Non-Volatile Memory

Traditional DRAM loses state when power is removed, requiring model reloads and reinitialization. In contrast, MRAM and FeRAM retain data without power, enabling instant-on behavior and reduced boot latency.

This is particularly valuable in industrial edge nodes, automotive systems and intermittently powered devices.

Recent MRAM developments have reduced access latency significantly, making it suitable for on-chip buffers, metadata storage, and checkpointing. However, it remains slower than SRAM and lacks the density and cost efficiency required to replace DRAM at scale.

Architectural Implications for Engineers

AI Training Accelerators

Bandwidth dominates over latency.
HBM is typically mandatory to avoid compute underutilization.
Thermal co-design between memory and compute is essential.
Supply constraints and cost volatility must be factored early.

Edge AI Inference

Power efficiency and deterministic latency dominate.
LPDDR is usually the primary working memory.
Non-volatile memories can offload model storage or retain system state.
Memory capacity often constrains model size more than compute.

Always-On and Safety-Critical Systems

Non-volatility and endurance become primary requirements.
MRAM and FeRAM enable resilience to power loss.
Hybrid memory hierarchies are increasingly common.

Emerging Trends Engineers Should Track

Hybrid Memory Hierarchies

Rather than relying on a single memory type, many designs now combine LPDDR or DDR for active computation and non-volatile memory for persistence and standby reduction.

This approach improves energy efficiency while maintaining performance.

HBM Cost and Supply Pressure

AI demand has tightened HBM supply chains, pushing engineers to evaluate whether bandwidth requirements can be relaxed through model compression, sparsity or architectural changes.

Practical Design Checklist

Quantify sustained bandwidth requirements, not peak values.
Evaluate energy per inference, not just throughput.
Consider thermal coupling between memory and compute.
Assess startup latency and data retention needs.
Plan for supply availability and long-term scalability.

Selecting the right memory technology for AI and edge computing is a multi-dimensional optimization problem. HBM enables unmatched performance for training but brings cost and thermal challenges. LPDDR remains the most practical choice for edge inference, while emerging non-volatile memories offer compelling advantages for persistence and power efficiency.

As AI systems continue to evolve, memory architecture is no longer a secondary concern, it is a primary determinant of system performance, efficiency and reliability.

To navigate these trade-offs in real deployments, system designers increasingly rely on partners who understand both the semiconductor roadmap and the realities of edge and AI system integration. McKinsey Electronics supports customers across AI training and edge computing applications by combining deep component expertise with an engineering-led distribution model. From sourcing advanced DRAM and low-power LPDDR solutions to advising on emerging non-volatile memories and long-term availability risks, McKinsey Electronics helps architects translate memory choices into resilient, efficient and scalable system designs, where performance targets, power budgets and reliability requirements must all be met simultaneously.

Sources

JEDEC Solid State Technology Association – Memory standards documentation
SemiEngineering – Memory Challenges for AI Workloads
IEEE Solid-State Circuits Magazine – Emerging Non-Volatile Memory Technologies
TechInsights – AI Accelerator Memory Architectures

Key Considerations for Selecting the Right Memory Technology for AI and Edge Computing Systems

Recent Posts

QUICK LINKS

CONTACT