AMD Chief Technology Officer Mark Papermaster and Liquid AI CEO Ramin Hasani explore how efficient AI architecture can unlock scalable, enterprise ready intelligence across PCs, devices and the edge.
As generative AI races from novelty to necessity, the industry is confronting a hard reality: building bigger models alone is not a sustainable path forward. Compute intensity, energy consumption and latency are critical. The next phase of AI will be defined by scale, but also by efficiency – how intelligently systems are designed from silicon through software to deliver real world value.
Efficiency was among the central themes of a recent episode of AMD’s Advanced Insights series with Mark Papermaster, executive vice president and chief technology officer at AMD, and Ramin Hasani, co founder and CEO of Liquid AI. Their conversation offered a view into how first principles thinking is reshaping AI inference and enabling intelligence to run where it matters most: on devices, close to users and data.
Moving Beyond Cloud Only AI
For much of the past decade, AI innovation has been synonymous with hyperscale data centers and ever larger foundation models trained on vast GPU clusters. While that approach has driven remarkable breakthroughs, both leaders agreed it represents only part of the AI opportunity.
Hasani explained that Liquid AI was founded around a different question: Where should intelligence live? “It’s not just about quality,” he said. “We think about where AI is served, the latency you get, how much battery it consumes and what that means for real hardware outside the data center.”
As AI expands into PCs, edge devices and embedded systems, the economics of inference – power efficiency, responsiveness and cost – become paramount. Papermaster said AI efficiency is already visible in the cloud, with models shrinking while delivering higher accuracy and larger context windows. Today’s challenge is to extend that efficiency farther from data centers to systems constrained by power, thermals and footprint.
Liquid AI’s approach centers on building compact, specialized foundation models designed with hardware in the loop from the start. Instead of trillion parameter architectures, the company focuses on models orders of magnitude smaller that are optimized to run efficiently on processors such as neural processing units (NPUs).
“NPUs are where you can sustain AI workloads at very low power,” Hasani said. “That’s critical for battery health and for enabling intelligence that runs continuously in the background.”
Papermaster said this philosophy aligns with AMD’s long standing focus on holistic design. From CPUs and GPUs to NPUs and system architectures, AMD engineers optimize performance per watt across the computing stack so AI can move out of the data center and into everyday systems.
