Manifesto

The Interactive Era

Keegan McCallum3 min read

There's a photograph from 1984 of Steve Jobs introducing the Macintosh. He holds it up. The crowd loses their minds, not because it was smarter, but because it understood you. You pointed at things. It responded. That was the GUI moment: technology that worked the way people do, not the other way around.

Steve Jobs holding the original Macintosh at Apple's 1984 shareholder meeting
Steve Jobs holding the original Macintosh at Apple's 1984 shareholder meeting

The terminal didn't disappear. Developers still live there. What the GUI did was expand who could participate without taking anything away from the people who already knew how. That's what's happening with AI now.

Most AI today still meets you where it is. Query, response, reset, repeat. Powerful, but it's a form letter dressed up as a conversation.

Then the model starts staying with you. Audio you can talk back to. Agents that hold context for hours. Sessions that remember where you've been. Output stops being output. It starts to become a presence. Video does the same thing, except it isn't producing a sentence, it's producing a world.

Video generation isn't flawless yet, but it's capable enough that the bottleneck has moved.

The constraint isn't intelligence/token anymore. It's tokens/second.

Today's best video models still feel like an AI slot machine: pull the lever, wait thirty seconds to two minutes, pay about twenty-five cents, get five seconds of footage. Streaming generation, KV reuse, and shorter denoising paths cut cost per second of footage by 100 to 500x, and the shape of the output changes with it. Frame-by-frame streaming. Long generations that don't drift. Interactive steering while the model is still running. Not one flawless shot. Continuous generation that keeps responding.

Building for that requires what building GUIs required: the entire compute layer had to change. GUIs needed graphics cards, event loops, windowing systems, infrastructure that didn't exist for batch computing. Interactive generative AI needs inference that holds session state, runs continuously at real-time speeds, and keeps the model alive between turns.

Today, we're announcing exactly that: uRun. The inference cloud for the interactive era.

Holding session state at GPU speed through a million-user spike isn't a new problem for us. Keegan McCallum scaled generative video inference across tens of thousands of GPUs at Luma, a million users in four days. Sean Kane has shaped how engineers run production systems at scale: co-authoring Docker: Up & Running, building New Relic's original container platform, and lead inventor on a container monitoring patent. Matt Krzus spent years at AWS compiling models down to edge hardware for every computer vision problem imaginable (AWS Neuron, Just Walk Out, Panorama) before anyone had a clean name for what that was.

Generative models are already running on uRun today. Video pipelines, avatar pipelines, world models, each one holding session state and staying alive between turns. LingBot-World Fast is one of them, a real-time interactive world model we brought up from freshly released weights in eleven days.

We're opening a waitlist and looking for early design partners. Research teams with models ready to run interactively today. Game studios folding real-time generation into production pipelines. Product teams replacing scripted interactions with AI-generated video avatars.

In 1984, the people in that audience didn't yet know what they'd build with a mouse. They just felt the distance between intention and machine collapse. That's what we want to put in your hands.

If you want to build this with us, join the waitlist.