AI News·3 min read

Thinking Machines Unveils Full-Duplex AI That Listens While It Talks

Mira Murati's Thinking Machines Lab introduces interaction models — AI that processes input and generates responses simultaneously, responding in 0.40 seconds like natural human conversation.


What Are Interaction Models?

Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, announced a new concept called "interaction models" on May 11, 2026. Unlike every AI model before it, this system processes your input and generates a response at the same time — making conversations feel like a phone call rather than a text chain.

The technical term is "full duplex." Their model, TML-Interaction-Small, responds in just 0.40 seconds, roughly matching the speed of natural human conversation and significantly faster than comparable models from OpenAI and Google.

Why This Matters for AI Conversation

Current AI models work in a strict turn-taking pattern: you speak, it listens, then it responds, and you listen. This creates awkward pauses and makes real-time collaboration feel stilted. Full-duplex AI breaks this pattern entirely.

Imagine an AI assistant that can notice you're about to interrupt and pause mid-sentence, or one that can react to your facial expressions during a video call. That's the world interaction models are building toward.

Current Status and Availability

This is a research preview, not a consumer product. Thinking Machines is releasing a limited research preview in the coming months, with a wider release planned for later in 2026. The benchmarks are impressive, but real-world performance remains to be tested.

The underlying idea — that interactivity should be native to a model architecture rather than bolted on afterward — represents a genuine paradigm shift in how we think about AI communication.

How This Could Change Business Communication

For businesses, full-duplex AI could transform customer service, sales calls, and internal meetings. AI agents could handle real-time negotiations, detect customer frustration mid-conversation, and adjust their approach instantly. The latency reduction alone makes AI-powered phone support dramatically more natural.

Common Questions (FAQ)

Q: When can I try Thinking Machines' interaction model? A: A limited research preview is coming in the next few months, with wider access planned for late 2026.

Q: How is this different from real-time voice features in ChatGPT? A: ChatGPT's voice mode still processes in turns. Full duplex means simultaneous listening and speaking, like a real phone call.

Q: Will this work for enterprise applications? A: The company hasn't announced enterprise pricing or availability yet, but the architecture is clearly designed for production use cases.


Stay ahead of the AI curve. Follow @AiForSuccess for daily insights.

📬 Want more AI solopreneur insights?

Subscribe to our weekly newsletter →
☕ Enjoy this article? Support the author

Related Articles