The Nostalgic Joy of Running Large Language Models on Modest Hardware

The tech community has been buzzing about DeepSeek’s latest language model releases, and reading through various discussions brought back memories of my early computing days. Someone mentioned running a 671B parameter model at 12 seconds per token using an NVMe SSD for paging, and while many scoffed at the impracticality, it struck a chord with me.

Remember when waiting was just part of the computing experience? Back in the 80s, loading a simple game from a cassette tape could take 10-15 minutes, and we’d sit there watching those hypnotic loading stripes, filled with anticipation. The thought of having a machine that could answer complex questions in just a few hours would have seemed like science fiction back then.