Thinking Machines Lab, the artificial‑intelligence start‑up founded last year by former OpenAI chief technology officer Mira Murati, announced on Monday the development of a new class of “interaction models.” In essence, the technology enables an AI system to generate a response while it is still receiving user input, mimicking the natural flow of a spoken conversation rather than the turn‑taking pattern of most text‑based chatbots.
Current generative models operate in a half‑duplex mode: a user speaks, the system listens, then it replies and the cycle repeats. Thinking Machines intends to replace that paradigm with a full‑duplex approach, where the model processes the incoming prompt and produces output simultaneously. The company refers to its first offering as TML‑Interaction‑Small, which it says can return a reply in 0.40 seconds—approximately the latency of human speech and notably faster than comparable offerings from major providers such as OpenAI and Google.
The announcement comes as a research preview rather than a commercial product. The firm is not yet opening the model to the public; instead, it plans a “limited research preview” in the coming months, followed by a broader rollout later in the year.
Industry observers note that the technical claim of sub‑second, full‑duplex interaction is impressive, particularly given the challenges of maintaining coherence while processing overlapping input and output streams. The approach also aligns with a growing view that interactivity should be built into the architecture of AI models rather than layered on after training.
While the benchmarks released by Thinking Machines are promising, the real‑world performance of the system remains to be seen. Potential applications range from more fluid voice assistants to collaborative tools that can interject or suggest edits in real time. If the technology scales as described, it could narrow the gap between human conversation speed and AI response times, offering a more seamless user experience.
For now, the AI community will be watching the upcoming research preview to evaluate whether the full‑duplex model lives up to its technical promises and how it might influence the next generation of conversational agents.