Thinking Machines Lab, an artificial intelligence start-up founded last year by former OpenAI chief technology officer Mira Murati, announced on Monday the development of a new class of “interaction models.” This technology allows an AI system to generate responses while still receiving user input, thereby mimicking the natural flow of spoken conversation instead of adhering to the turn-taking pattern typical of most text-based chatbots. Currently, generative models operate in a half-duplex mode: a user speaks, the system listens, and then it replies, repeating the cycle. Thinking Machines aims to replace this paradigm with a full-duplex approach, enabling the model to process incoming prompts and produce outputs simultaneously.
The company has introduced its first offering, TML-Interaction-Small, which reportedly can return a reply in just 0.40 seconds—approximately the latency of human speech and significantly faster than comparable offerings from major providers like OpenAI and Google. However, this announcement serves as a research preview rather than a commercial product. The firm is not yet making the model available to the public; instead, it plans a “limited research preview” in the coming months, followed by a broader rollout later in the year.
Industry observers have noted that the technical claim of sub-second, full-duplex interaction is impressive, especially considering the challenges of maintaining coherence while processing overlapping input and output streams. This approach aligns with a growing belief that interactivity should be integrated into the architecture of AI models rather than added after training. While the benchmarks released by Thinking Machines are promising, the real-world performance of the system remains to be evaluated. Potential applications for this technology include more fluid voice assistants and collaborative tools that can interject or suggest edits in real time. If the technology scales as described, it could significantly narrow the gap between human conversation speed and AI response times, resulting in a more seamless user experience.
For now, the AI community will be closely monitoring the upcoming research preview to determine whether the full-duplex model fulfills its technical promises and how it may influence the next generation of conversational agents.
Comments are closed for this story.