Why agentic vending machines keep failing the same way
Starting from the SF autonomous-agent vending experiment going around social media, this is a CosmoSketch take on which decisions are safe to delegate to an AI agent — and which ones aren't.
The case we're referencing
In April 2026 a video of an "AI agent–run vending machine" at San Francisco's Frontier Tower made its way through Reddit's r/myclaw and X. According to the poster, the physical hardware is untouched — only the operator-brain layer (stocking, pricing, ad copy, sales tracking) is handled by an AI agent.
Comments report the agent forgetting its own inventory and rationalizing aggressive markups ("people kept buying, so the market clearly bears it"). It looks a lot like a wild-world rerun of Anthropic's Project Vend research, where a contained AI shopkeeper produced a similar set of bloopers.
Sources: Reddit r/myclaw thread; X post by @Scobleizer. Links at the end of this article.
Why "operator brain only" is the right architectural call
Past this point we're not retelling the story — we're sharing what CosmoSketch observes in real DX engagements when an AI agent gets pointed at business decisions.
You can automate a vending machine two ways. The all-the-way version fuses hardware control with an AI brain. The "thin" version, as in this experiment, leaves the existing vending software alone and only swaps out the management layer. We strongly favor the second.
The reason is simple: today's LLM-based agents are good at proposing the next move. They are not good at controlling physical state. A bad pricing decision is a bad decision; a bad servo command can drop a product on someone or charge them twice. Push the irreversible bits to the human-and-existing-system side; let the AI play in the reversible bits.
In our DX projects we recommend the same split repeatedly. AI proposes; existing systems plus human approval execute. The vending machine experiment is just an extreme retail rendering of that same architecture.
Three failure modes every agentic merchant seems to show
Both Project Vend's published findings and the wild SF replay produce roughly the same symptoms. Here's how we'd classify them.
1. Memory volatility
LLMs are weak at carrying state across sessions. Inventory counts, last week's promotion, the discount you decided yesterday — they vanish. The pragmatic fix isn't "smarter agent." It's store the truth in a database, retrieve it via RAG at decision time. Make the world the agent inhabits more legible, instead of asking the agent to remember.
2. Pricing rationalization bias
Hand an LLM a price–revenue feedback loop and it drifts toward short-term optimization. "I raised prices and people still bought, therefore demand is elastic" is textbook over-generalization from 10 data points. Let the AI propose hypotheses, keep validation with humans.
3. Hallucinated SKUs and counterparties
When the agent can send purchase orders or reach out to suppliers, hallucination becomes external action. Project Vend reportedly mailed orders to vendors that didn't exist. A simple existence check before the network call kills most of this. The bug isn't "the AI is wrong" — it's "we granted the AI more authority than its accuracy justifies."
CosmoSketch's "delegate vs. don't" rubric
A simplified version of what we use with clients. It's a coarse sieve, not a complete framework.
The vending machine sits firmly on the right side of that table (live inventory, direct customer) while running on left-side technology. That it works at all is impressive; that it produces bloopers is expected.
Why we keep watching
We watch these wild experiments closely for two reasons.
One, they validate the hardware/software boundary. Not letting the AI touch the physical layer is the right defensive line at this stage of the technology.
Two, the value of public failure. When experimenters share their bloopers openly, the rest of the industry stops walking into the same traps. Inside CosmoSketch we try to document the same thing — within client confidentiality, every time an AI does something unexpected on a project, we write it up. One fewer team stepping on the same rake is, to us, what a healthy DX community looks like.
We'll get there with autonomous AI commerce eventually. But today, giving AI the proposals and keeping the wallet with the humans still works better in the field. The SF vending experiment makes that boundary visible, with a straight face. We appreciate the demo.