Table of Contents
KEY INVESTMENT THEMES AT A GLANCE

Introduction
At GTC 2026, Nvidia made one architectural decision that defines the next chapter of AI infrastructure — and it has significant investment implications beyond Nvidia itself.
The company has shelved its previously planned Rubin CPX chip, which was designed to handle the prefill stage of inference. Prefill is compute-intensive but memory-light — it processes an input prompt in parallel using dense matrix operations well-suited to GPUs, and does not demand high-bandwidth memory. CPX had been designed around cheaper DRAM precisely because of this, making HBM unnecessary. That cost logic collapsed when DRAM and HBM prices converged during the current memory shortage, removing the economic rationale for a separate prefill chip entirely.
In CPX’s place, Nvidia has deployed technology from its $20 billion acquisition of Groq — integrating the Groq 3 LPU as a dedicated accelerator for the decode stage of inference, where tokens are generated one by one under tight latency constraints and high memory-bandwidth demand. This is where Groq’s SRAM-heavy architecture excels. The result is a disaggregated inference pipeline — Rubin GPUs handle prefill, Groq LPUs handle decode — orchestrated by Dynamo 1.0, Nvidia’s new open-source inference operating system.
A common concern raised against this architecture is worth addressing directly: does the rise of SRAM-heavy inference chips — Groq, Cerebras, and others — reduce the overall need for HBM and DRAM? The answer is no — and the reasoning matters. Training workloads remain overwhelmingly dependent on HBM and DRAM, and that is not changing. More importantly, even within inference itself, the prefill stage requires running input tokens through the full set of model weights and parameters, as well as maintaining context memory across long sequences. The sheer scale of these weight matrices means they cannot be fully loaded onto on-chip SRAM — they must reside in high-bandwidth external memory. As model sizes and context windows grow with each generation of agentic AI, this memory load increases, not decreases. SRAM on inference chips addresses the decode bottleneck; it does not displace the need for HBM and DRAM elsewhere in the pipeline.
The investment punchline: this architectural shift is a direct tailwind for Samsung (manufacturing the LPX, supplying both SRAM and HBM), the memory complex broadly (as agentic AI makes memory the binding constraint on inference economics), and the optical networking supply chain (as Nvidia formally commits to co-packaged optics across scale-out today and scale-up from the Feynman generation onward).
Section 1 — The Architecture: Groq, Dynamo, and Disaggregated Inference
The strategic logic of the Groq acquisition mirrors the Mellanox playbook. Just as Mellanox gave Nvidia ownership of the networking layer in 2020, Groq gives it ownership of the latency-sensitive decode layer in inference — the competitive front where Google TPUs and other in-house chips had been making inroads.
The Groq 3 LPX is already in volume production at Samsung, with availability expected in Q3. Nvidia is offering the chip in a dedicated rack housing 256 LPUs, designed to sit alongside the Vera Rubin GPU rack rather than replace it. Huang’s guidance on deployment mix was explicit: for predominantly high-throughput workloads, Vera Rubin GPUs alone remain the right choice; for coding and latency-sensitive workloads, adding LPX capacity to roughly 25% of a data centre’s footprint is optimal.
Dynamo 1.0 is the software layer that makes this split work in practice. It disaggregates the inference pipeline — routing prefill and attention work to Rubin GPUs, and decode and token generation to Groq LPUs — delivering up to 7x inference performance improvement on Blackwell GPUs, with the combined GPU-LPU system claiming up to 35x higher inference throughput per megawatt.
Looking further out, Nvidia has already previewed the Feynman generation — which includes a next-generation LPU alongside new GPU and CPU architectures — cementing the LPU as a permanent fixture in Nvidia’s roadmap rather than a one-generation experiment. The trillion-dollar demand figure Huang cited — orders for Blackwell and Vera Rubin systems through 2027, double last year’s $500 billion guidance — is the commercial validation of this full-stack bet.
Section 2 — Investment Implications
i. Samsung — Winning Across Every Vector
Samsung is the standout beneficiary of GTC 2026, with exposure across three distinct revenue streams from the new Nvidia architecture.
No other single name in the supply chain has this breadth of exposure to the new Nvidia architecture. The Samsung SRAM position deserves particular attention: SRAM is manufactured using a logic-style process — closer in character to advanced foundry work than to commodity memory fabrication — and Samsung’s independent foundry business gives it both the process know-how and the capacity to scale this alongside its conventional memory operations. Memory and foundry, DRAM and SRAM — Samsung wins across every dimension simultaneously.
ii. Memory Broadly — A Regime Change, Not a Cycle
The memory sector has historically been valued as a cyclical. GTC 2026 reinforces why the AI era demands a different framework.
It is worth addressing a bear case that has gained traction in recent months: the idea that SRAM-heavy inference chips from Groq, Cerebras, and others will reduce the overall need for HBM and DRAM. This misreads the architecture. Training remains entirely dependent on high-bandwidth external memory and is not affected by inference chip proliferation. Within inference itself, the prefill stage — which runs input tokens through the full set of model weights and parameters, and maintains context memory across increasingly long sequences — cannot be served by on-chip SRAM alone. The weight matrices of frontier models are simply too large. As model sizes and context windows continue to grow with each generation of agentic AI, the external memory load on the prefill side increases rather than decreases. SRAM on inference chips solves the decode latency problem; it does not displace the need for HBM and DRAM in training or prefill.
Every architectural decision Nvidia made at GTC reinforces this. Agentic AI workflows — where models reason across long multi-step chains — require large, persistent KV-cache storage, translating directly into sustained DRAM and HBM demand. The convergence of DRAM and HBM pricing that triggered the CPX cancellation is itself a signal of tightness across the memory supply chain, not weakness.
The NAND complex is equally relevant. As AI inference scales — particularly for retrieval-augmented generation and long-context agentic workloads — storage throughput becomes a meaningful bottleneck. Both SSDs and HDDs benefit from this demand, but we have a clear preference for SSDs within this theme: the latency and throughput characteristics of NAND-based solid-state storage are far better suited to the demands of real-time inference pipelines than rotational media. Within SSDs, our preferred names are Sandisk and Kioxia as the more focused pure-plays; Western Digital retains exposure through its flash business though with a more blended revenue profile.
The more important shift across the memory complex is structural: as hyperscalers commit to multi-year infrastructure buildouts of this scale, memory manufacturers gain the pricing power and contract visibility — through long-term agreements — that the sector has historically lacked. This is a genuine regime change for how memory should be valued, affecting DRAM, HBM, and NAND players alike.
iii. Optical Networking — Gradual But Inevitable
The copper versus optics debate has run for years. Nvidia’s announcements at GTC 2026 provide the clearest resolution yet — not a sudden switch, but a structured, generational migration that is already underway and accelerating.
The physics of the copper wall is not in dispute. At the bandwidth densities required by large-scale AI clusters, copper’s signal integrity limitations are a hard constraint. Nvidia’s Spectrum-X and Quantum-X silicon photonics switches deliver 5x better power efficiency and 10x higher network resiliency versus pluggable transceivers.
The migration is happening in tranches. Today, CPO is advancing in scale-out — the networking layer that connects racks and clusters. Copper continues to dominate scale-up (within-rack, NVLink connections) in the current Vera Rubin generation. However, Huang’s Feynman generation preview is a critical signal: the Feynman platform explicitly incorporates both copper and co-packaged optics for scale-up, meaning optical is already being designed into the intra-rack layer for the generation after Rubin. Full optical dominance across both layers will come as bandwidth requirements push beyond the 1.6T / 200G per lane threshold — at which point copper’s physics simply cannot keep pace.
For investors, the CPO transition — particularly scale-up CPO — unlocks a TAM expansion not yet reflected in consensus estimates for optical supply chain names. The beneficiaries span the full stack:
Section 3 — Beyond the Chip: Space, Software, and the Agent Economy
Vera Rubin Goes to Space
Nvidia announced Space-1 Vera Rubin — a version of its Vera Rubin architecture being designed for orbital deployment, with the goal of bringing AI data centres into space. Commercial timelines remain long-dated, but the announcement reflects how broadly Nvidia is conceiving the demand for inference compute — and how deeply it intends to own the infrastructure layer wherever that compute lives.
NemoClaw, OpenClaw, and the Software Moat
Hardware drives Nvidia’s revenue — software is building its moat. NemoClaw is Nvidia’s full-stack platform for deploying autonomous AI agents, enabling one-command installation of Nvidia’s Nemotron models and OpenShell runtime with enterprise-grade governance and privacy controls on-premises.
Nvidia’s embrace of OpenClaw — the open-source agentic AI platform Huang called “the fastest-growing open-source project in history” — deepens this further. By tying OpenClaw’s explosive adoption to Nvidia’s hardware and software ecosystem via NemoClaw, Nvidia is extending its lock-in from the chip level up to the application layer. Combined with Dynamo as the inference operating system, the software switching costs around Nvidia infrastructure are compounding — and that matters as much as the hardware roadmap for long-term competitive positioning.
Conclusion
GTC 2026 confirmed that the inference era is here — and that Nvidia’s response is a fully integrated system, not just a faster chip. The trillion-dollar demand figure is the headline, but the architectural choices — disaggregated inference, the Groq LPU, CPO networking, Dynamo as the inference OS — are the substance.
For investors, the opportunity is not confined to Nvidia’s own multiple. The more compelling risk-reward lies in the supply chain: Samsung across memory and fabrication; the memory complex broadly as AI transforms the demand structure of the sector; and the optical networking supply chain as CPO migrates from scale-out today to scale-up in the Feynman generation and beyond. These names win regardless of which hyperscaler wins the model race — because they are the substrate the inference era is built on.
IMPORTANT DISCLOSURES & DISCLAIMER
This document has been prepared byLighthouse Canton Pte. Ltd. ("LC"), a company regulated by theMonetary Authority of Singapore ("MAS"), and/or Lighthouse Canton(DIFC) Ltd, regulated by the Dubai Financial Services Authority ("DFSA").This document is for informational purposes only and does not constituteinvestment advice, a recommendation, or an offer or solicitation to buy or sellany financial instrument.
This document is directed at and intendedfor institutional investors and accredited investors (as defined under theSecurities and Futures Act of Singapore and/or applicable DIFC regulations). Itis not intended for retail investors or members of the public. Recipientsshould not rely on this document as the basis for any investment decisionwithout obtaining independent professional advice.
The information contained herein has beenobtained from sources believed to be reliable, but LC makes no representationor warranty, express or implied, as to its accuracy, completeness, ortimeliness. Opinions, estimates, and projections expressed herein are subjectto change without notice and do not necessarily reflect the views of LC or itsaffiliates.
Investments in financial instrumentscarry risk, including the possible loss of the principal amount invested. Pastperformance is not indicative of future results. The value of investments andany income derived from them may go down as well as up. Recipients should beaware of and comply with all applicable laws and regulations in theirrespective jurisdictions before taking any action based on this document.
This document is confidential and may notbe reproduced, distributed, or transmitted in whole or in part without theprior written consent of Lighthouse Canton Pte. Ltd. © 2026 Lighthouse CantonPte. Ltd. All rights reserved.


%20(11).png)
%20(9).png)
.png)
%20(7).png)
.jpg)