<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Compute on ben&#39;s blog</title>
    <link>https://benjamin.mendes.im/tags/compute/</link>
    <description>Recent content in Compute on ben&#39;s blog</description>
    <generator>Hugo -- 0.152.0</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 23 May 2026 13:34:26 +0100</lastBuildDate>
    <atom:link href="https://benjamin.mendes.im/tags/compute/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>The bet hiding inside the AI hardware boom</title>
      <link>https://benjamin.mendes.im/posts/2026/compute-thesis-architecture-dependency/</link>
      <pubDate>Sat, 23 May 2026 13:34:26 +0100</pubDate>
      <guid>https://benjamin.mendes.im/posts/2026/compute-thesis-architecture-dependency/</guid>
      <description>&lt;p&gt;&lt;img loading=&#34;lazy&#34; src=&#34;https://benjamin.mendes.im/i1/Screenshot%202026-05-23%20at%2013.32.58.png&#34;&gt;&lt;/p&gt;
&lt;p&gt;There is a quiet but very expensive bet being made across the AI compute layer right now, and I think it deserves more scrutiny than it is getting.&lt;/p&gt;
&lt;p&gt;The bet is that the best way to handle the growing demand for AI compute is to build silicon shaped around the architecture we have today. In practice, that means chips increasingly tuned for transformers. &lt;a href=&#34;https://www.etched.com/&#34;&gt;Etched&lt;/a&gt; is the clearest example, with hardware designed explicitly around transformer workloads. But the broader pattern shows up across the industry too: more memory bandwidth tuned for attention, more matrix throughput tuned for the operations LLMs actually use, and more interconnect tuned for the shapes of current models.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p><img loading="lazy" src="/i1/Screenshot%202026-05-23%20at%2013.32.58.png"></p>
<p>There is a quiet but very expensive bet being made across the AI compute layer right now, and I think it deserves more scrutiny than it is getting.</p>
<p>The bet is that the best way to handle the growing demand for AI compute is to build silicon shaped around the architecture we have today. In practice, that means chips increasingly tuned for transformers. <a href="https://www.etched.com/">Etched</a> is the clearest example, with hardware designed explicitly around transformer workloads. But the broader pattern shows up across the industry too: more memory bandwidth tuned for attention, more matrix throughput tuned for the operations LLMs actually use, and more interconnect tuned for the shapes of current models.</p>
<p>The compute layer is not just scaling. It is co-evolving with the architecture sitting on top of it.</p>
<p>I understand why this is happening. The demand is real, it is growing quickly, and infrastructure teams need to ship hardware that works for today’s workloads. Transformers dominate for a reason. They are useful, scalable, and already deeply embedded in the current AI stack. If you are trying to reduce cost, increase throughput, and serve billions of model calls, it makes sense to optimize around the thing everyone is actually running.</p>
<p>But the question that keeps nagging at me is this: <strong>compute demand is not a constant. It is a function of the architecture you choose to run.</strong></p>
<p>Or more precisely, it is a function of both the architecture and the capabilities you are trying to unlock. Those two things move together. AI is not the first workload to become compute-heavy at scale. Search, video transcoding, and crypto mining all created massive infrastructure demands. What feels different about LLMs is the compute cost per use. Training is the obvious tax, but inference is the larger and stickier one. Every conversation, every code completion, every agent step, and every generated token draws from the same well.</p>
<p>That makes the hardware question especially important. If the dominant architecture is expensive per use, then the industry has two ways to respond. One is to build more efficient hardware for that architecture. The other is to find architectures that change the cost structure entirely.</p>
<p>Right now, billions are flowing toward the first path. Chips are being built for the transformer era. The risk is that this only works if the transformer remains dominant long enough for those chips to pay back their design and deployment costs.</p>
<p>That is not a theoretical concern. Researchers across multiple labs are actively trying to displace the transformer, or at least reduce its centrality. State-space models, mixture-of-experts variants, diffusion-based language models, hybrid retrieval systems, and other approaches are all attempts to change the performance-cost frontier. Some may fail. Some may become components inside transformer-like systems rather than replacements. But the direction of travel is clear: there is serious effort going into making today’s architecture less inevitable.</p>
<p>There is also a strong counterargument. Many of the candidates that might replace or weaken the transformer still rely on similar low-level operations. Matrix multiplications, high memory bandwidth, large activations, and attention-like primitives do not disappear overnight. A chip built for transformer workloads may turn out to be more general than the phrase “transformer-native” implies. If the next architecture inherits enough of the same computational primitives, then the silicon ages more slowly than my critique suggests.</p>
<p>That is why I do not think the compute bet is obviously wrong. The tension is what makes it worth watching.</p>
<p>Transformer-specific silicon is a bet that the transformer remains dominant long enough to amortise the hardware. If new architectures arrive quickly and meaningfully change the workload, that bet looks worse than a more general-purpose GPU. If the transformer remains the core architecture for several more years, then transformer-native chips could become a generational advantage.</p>
<p>The key point is that these are two sides of the same coin. You cannot argue that architectural change is plausible without also admitting that specialized compute is riskier. But you also cannot argue that specialization is reckless without admitting that architectural durability would make it enormously valuable.</p>
<p>So where does that leave the current wave of AI silicon?</p>
<p>I do not think the work happening at the compute layer is wrong. The current architecture is dominant for a reason, and chips tuned for it will likely deliver real performance gains over the next few years. What I am less convinced by is the quiet assumption underneath the investment: that the architecture will stay dominant for as long as the silicon needs to remain relevant.</p>
<p>To be fair, these chips do not necessarily need a decade. At hyperscaler volume, they may only need a few strong years to amortise. That makes the bet shorter and more defensible than the harshest version of the critique implies. If the transformer era lasts long enough, the payoff could be enormous. If it does not, a lot of very expensive silicon may age quickly.</p>
<p>Building chips around a single architectural shape, at a moment when serious researchers are trying to replace it, is a bold bet.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
