For years, non-determinism has been an accepted limitation of large language models (LLMs): ask the same model the same prompt—under identical settings and receive different answers. This perceived “quirk” of AI created headaches for researchers and enterprise users alike, undermining reproducibility and complicating debugging or regulatory compliance.
What Was the Real Problem?
Most blamed floating-point arithmetic and GPU concurrency. But the team at Thinking Machines Lab, led by Mira Murati, discovered the real culprit: batch-dependent computation. When GPUs process multiple prompts simultaneously, the mathematical pathways—like matrix multiplication, normalization, and attention mechanisms—can vary depending on how requests are grouped, causing subtle but critical output drift
The Batch-Invariant Kernel Solution
Thinking Machines solved this by engineering “batch-invariant kernels.” These custom GPU operations force calculations (matrix multiplications, attention, normalization) to always follow the same mathematical pathway, regardless of batch size or grouping. Now, layer normalization and matrix multiplication steps always process data in a repeatable way, eliminating all variance due to internal batching or GPU workloads
Real-World Results
Their results speak for themselves: with conventional kernels, 1,000 identical prompts to Qwen-3 produced 62 different completions. With batch-invariant kernels, all 1,000 outputs were identical—true determinism at last. This result marks a turning point for research reproducibility, regulatory trust, and real-world AI deployment
Why This Is a Game Changer
- True reproducibility transforms LLMs from unpredictable black boxes into reliable research and business tools.
- It unlocks new regulatory, enterprise, finance, and scientific applications that require full audit trails and compliance.
- The open-sourcing of this technology could set a new industry standard and catalyze further innovation across all AI domains.
the library on github
if you want to dig more
- Official Blog: Defeating Nondeterminism in LLM Inference (Thinking Machines Lab)
- Tech Industry Coverage: TechCrunch
- In-depth News: Times of India
- Community Discussions: Reddit/LocalLLaMA
- LinkedIn Announcements: Analytics India Magazine
- Company Press: Thinking Machines Lab News

Leave a comment