The Macro: GPUs Are the Wrong Tool for Inference and Everyone Knows It
The AI infrastructure industry has a hardware problem that gets worse every quarter. GPUs are excellent for training. They are mediocre for inference. The reason is physics: inference workloads are dominated by memory bandwidth, not compute. You spend most of the time moving data between memory and processing units, not doing the actual math. This is the “memory wall,” and it means that the majority of your expensive GPU silicon sits idle during inference, waiting for data to arrive.
Cerebras solved this by building a wafer-scale chip where memory and compute sit next to each other, eliminating the data movement bottleneck entirely. The result is inference speeds that make GPU-based systems look glacial. But Cerebras hardware is exotic, expensive, and requires a fundamentally different deployment model. You cannot just rent Cerebras capacity the way you rent GPU instances.
The rest of the inference market has tried to optimize around the GPU’s limitations. NVIDIA’s TensorRT optimizes model execution on their hardware. Groq built an LPU (Language Processing Unit) specifically designed for sequential inference workloads. Together AI and Fireworks AI offer optimized inference APIs. AWS Inferentia provides custom inference chips in the cloud. But all of these approaches either accept the memory wall and optimize around it or require entirely new hardware ecosystems.
Piris Labs, backed by Y Combinator (W25), is taking a different path: photonic hardware. Using light instead of electrons to move data, which is fundamentally faster and more energy-efficient. The claim is Cerebras-level inference speed, but scalable and at a fraction of the cost of traditional GPU clusters.
The Micro: Light Instead of Electrons
Ali Khalatpour (CEO) and Keyvan Moghadam (President) founded Piris Labs in San Francisco. The company describes itself as “The Unified Fabric for AI,” which speaks to the interconnect layer, the part of the system responsible for moving data between compute units.
The technical thesis is that the data movement bottleneck is best solved by changing the physical medium rather than optimizing around it. Photonic interconnects have been used in data centers for years for long-distance links. What Piris Labs appears to be doing is bringing photonics closer to the compute, using light to handle the short-distance data movement that currently relies on copper traces and electrical signaling.
The physics argument is sound. Light travels faster than electricity, does not generate heat from resistance, and can carry multiple signals simultaneously through wavelength division multiplexing. In theory, photonic interconnects should dramatically reduce the latency and energy cost of moving data within an inference system.
In practice, photonic computing has been “just around the corner” for decades. Companies like Lightmatter, Luminous Computing (now part of Alphabet), and Celestial AI have been working on optical interconnects and photonic computing for years. The challenge has always been manufacturing: making photonic components cheaply enough, reliably enough, and at the density required to compete with mature semiconductor processes.
What makes Piris Labs’ approach interesting is that they are offering a full-stack inference service, not selling hardware. You do not buy their photonic chips. You send inference requests to their API and get results back. This means the hardware complexity is their problem, not the customer’s. If the photonic system delivers the promised performance, the customer does not care how it works underneath.
The “improved effective FLOP utilization” claim is the technical metric that matters. Current GPU clusters waste enormous compute capacity because the processors are starved for data. If Piris Labs can keep processors fed with data continuously through faster photonic interconnects, the same compute hardware delivers dramatically more useful work per second.
The Verdict
Photonic AI hardware is a high-risk, high-reward bet. If the technology works at scale, it could fundamentally change the economics of AI inference. If the manufacturing challenges prove too difficult, it joins the long list of photonic computing ventures that showed promise but never delivered.
At 30 days: what is the actual inference speed on standard benchmarks compared to NVIDIA H100 clusters and Cerebras? Published benchmarks on common models (Llama, Mistral, etc.) would be the most credible evidence.
At 60 days: how is the manufacturing scaling? The gap between a working lab prototype and a production system running 24/7 is where most hardware companies fail. Reliability, yield, and cost at scale are the questions.
At 90 days: are real customers running production inference through the service? If paying customers are generating meaningful inference volume, the technology has crossed the credibility threshold. If it is still in private alpha with a handful of test users, the commercial viability remains unproven.
I respect the ambition. Attacking the memory wall with photonics is the right long-term approach to AI inference, and building a service rather than selling hardware is the right business model. The question is whether Piris Labs can solve the manufacturing and engineering challenges that have defeated larger, better-funded photonic computing companies before them.