San Francisco-based robotics startup Physical Intelligence has unveiled new research demonstrating that its latest AI model, π0.7, can direct robots to perform unfamiliar tasks without explicit training—a capability the company’s own researchers say has surprised them.
The model represents what the company describes as an early but meaningful step toward a general-purpose robot brain—one that can be guided through unfamiliar tasks using plain language and successfully execute them. If validated, the findings suggest robotic AI may be approaching an inflection point similar to what occurred with large language models, where capabilities begin compounding in ways that outpace what the underlying data would predict.
The core innovation lies in compositional generalization—the ability to combine skills learned in different contexts to solve problems the model has never encountered. Traditional robot training has relied on rote memorization: collecting data on specific tasks, training specialist models, and repeating for each new task. π0.7 breaks this pattern.
“Once it crosses that threshold where it goes from only doing exactly the stuff that you collect the data for to actually remixing things in new ways,” says Sergey Levine, co-founder of Physical Intelligence and UC Berkeley professor, “the capabilities are going up more than linearly with the amount of data.”
The paper’s most striking demonstration involved an air fryer the model had essentially never seen in training. Investigators found only two relevant episodes in the entire training dataset: one where a different robot merely pushed the air fryer closed, and one from an open-source dataset where another robot placed a plastic bottle inside one on someone’s instructions. The model had somehow synthesized those fragments, plus broader web-based pretraining data, into a functional understanding of how the appliance works.
“With zero coaching, the model made a passable attempt at using the appliance to cook a sweet potato,” says Ashwin Balakrishna, a research scientist at Physical Intelligence. “With step-by-step verbal instructions—essentially, a human walking the robot through the task the way you might explain something to a new employee—it performed successfully.”
That coaching capability matters because it suggests robots could be deployed in new environments and improved in real time without additional data collection or model retraining.
The researchers acknowledge significant limitations. In at least one case, they attribute failure to their own team’s inability to properly prompt the model. “Sometimes the failure mode is not on the robot or on the model,” Balakrishna says. “It’s on us. Not being good at prompt engineering.” He describes an early air fryer experiment that produced a 5% success rate. After spending about half an hour refining how the task was explained to the model, it jumped to 95%, he says.
The model also isn’t yet capable of executing complex multi-step tasks autonomously from a single high-level command. “You can’t tell it, ‘Hey, go make me some toast’,” Levine says. “But if you walk it through—’for the toaster, open this part, push that button, do this’—then it actually tends to work pretty well.”
The team also acknowledged that standardized benchmarks for robotics don’t really exist, which makes external validation of their claims difficult. Instead, the company measured π0.7 against its own previous specialist models—purpose-built systems trained on individual tasks—and found that the generalist model matched their performance across a range of complex work including making coffee, folding laundry, and assembling boxes.
What may be most notable about the research is not any single demonstration but the degree to which the results surprised the researchers themselves—people whose job it is to know exactly what is in the training data and therefore what the model should and shouldn’t be able to do.
“My experience has always been that when I deeply know what’s in the data, I can kind of just guess what the model will be able to do,” Balakrishna says. “I’m rarely surprised. But the last few months have been the first time where I’m genuinely surprised.”
The paper itself uses careful hedging language throughout, describing π0.7 as showing “early signs” of generalization and “initial demonstrations” of new capabilities. These are research results, not a deployed product, and Physical Intelligence has been restrained from the start about commercial timelines.
Physical Intelligence has raised over $1 billion to date and was most recently valued at $5.6 billion. The company is now said to be in discussions for a new round that would nearly double that figure to $11 billion.
