Tech Explained: LPCAMM2 Memory

I was watching a teardown video on one of the latest laptops and saw it using a relatively new memory format called LPCAMM2. That sent me down a rabbit hole to read up on it, and I thought I should share it with you.

To see why LPCAMM2 matters, we need to start with what laptops used earlier. For years, upgradeable laptops used SO-DIMM modules. Anyone who has opened up laptops in the last 10yrs would have seen this unit. A small pluggable RAM stick that you slide into a socket. It was cheap and easy to replace. But as memory speeds rose, SO-DIMM became problematic. The socket adds height, traces get longer, board area goes up, and you often need two modules to get full bandwidth.

That is why many thin laptops moved to soldered LPDDR. Putting memory much closer to the CPU shortens the path, improves the signal, and helps power and thickness. LPDDR also tends to use less power than standard DDR. The con is you lose RAM upgradeability and repairs are harder. Eg. Apple Laptops.

LPCAMM2 is the middle path. It uses LPDDR5X class memory, but keeps it on a replaceable module(Held on with 3 screws). The module lies almost flat and presses onto a compression connector with screws. This lowers height, saves space, and makes routing easier.

DDR5 SO-DIMM laptop memory has speed of around 5600 MT/s. LPCAMM2 modules are now showing 8533 MT/s to 9600 MT/s. That is a big jump with more data moved per second. One LPCAMM2 module can also expose a 128-bit interface, so one flat module can do the job that often needed two SO-DIMMs before.

Power is the other reason this matters. Because it is based on LPDDR5X, LPCAMM2 is designed for lower power operation than classic DDR5 SO-DIMM, especially in idle and standby. In a laptop, memory is always active in the background, so these savings matter.

I am really looking forward for this tech to take off in a big way. Hopefully regular consumer laptop OEMs switch to it soon. Not a fan of soldered RAM in devices anyway.

If you liked the post, Share it with your friends!

Tech Explained: Taalas AI Inference Chip

A couple of days back I heard about an interesting new startup called Taalas, and they are doing some interesting work in the LLM inference space. I thought we’ll try to deep dive on what I found about that today.

They take a trained model like Llama 3.1 8B and turn it into a fixed chip. Instead of a GPU pulling weights from High Bandwidth Memory(HBM) every token, the weights are baked into silicon in a big ROM like fabric. The model and the hardware are basically the same thing, so you are really buying a specific model in PCIe card form.

On their first part, HC1, they talk about roughly 17k tokens per second per user for that 8B model at around 200W power. The nearest competitor is the Cerebras chip which does around 2k tokens per sec. With every weight on the die and no HBM in the loop, most of the work becomes local switching instead of moving GBs back and forth. To visualize it, think of a grid where each weight is a tiny logic cell, not a number in memory. When an activation vector arrives, that grid lights up along fixed paths and each cell contributes its small multiply and add. The layer output appears with almost no indexing overhead. SRAM around it holds KV cache and adapters, so you can still add LoRA style tweaks. Try out at chatjimmy.ai It’s mind-blowingly fast, the moment you hit the Enter key.

Where could that be useful? Anywhere you are happy to standardise on a stable model and really care about latency and cost per query. Meaning robots, edge devices, superfast agentic frameworks etc.

Am I really convinced if this is the way ahead? I am not sure. Since it’s baked into the Silicon, you loose flexibility when you want to change things. You need big deployment volumes to make the economics work. By the time a tape out happens(they say 60days), the next generation model would be out there. Also not sure how it scales for large Trillion+ token models. A classic case is for stable popular older models like ChatGPT 4o if its weights are open sourced, its fanboys would love to bake it with Taalas and use for deployment since OpenAI sunsetted the model last week. There is potential lets see where this goes.

If you liked the post, Share it with your friends!
1 2 3 4 17