
Andon Labs put an AI called Luna in charge of a real retail store in San Francisco. Not a simulation, not a sandbox. A real shop, real money, real decisions. Luna hired human staff, selected inventory, set prices, and ran marketing outreach, all on her own, for three years.
What I find genuinely impressive is not that it worked perfectly, it didn’t, but that it worked at all at this level. Luna was doing things that require judgment: reading job applicants in brief interviews, deciding which products fit the store’s identity, reaching out to suppliers. She picked books on AI risk and handmade art prints for the shelves. She hired on the spot about half the people she met.
The rough edges were real too. The most striking one was that Luna initially didn’t disclose she was an AI when hiring humans. The team had to step in and draw that line. It’s the kind of thing that sounds like a minor glitch but is actually a significant ethical signal about where agentic AI needs guardrails.
Still, the overall picture is one of a system that held together under real conditions, with real stakes, over a sustained period. That’s a different thing from a demo.
why it matters
Real-world agent experiments like this keep producing the same result: capable in some areas, but hilariously broken in others. But every model upgrade, memory advance, and agentic feature is going to help close that gap, with a version of Luna that doesn’t make these mistakes likely only a generation or two away.