Case Studies

Proof, not promises.

Two goals. Two domains. One Umma. Each from a single prompt — and every claim is grounded and traceable.

Umma vs. the frontier models

Agent mode is not an operating system.

What Big Tech promises
“Give GPT-5.5 a messy, multi-part task and trust it to plan, use tools, check its work … and keep going.” GPT-5.5, openai.com ↗
“Give it a goal and Claude works on your computer … to return a finished deliverable — but consequential decisions remain with the user.” Claude Opus 4.8, anthropic.com ↗
“Your 24/7 personal AI agent … takes action on your behalf and is under your direction.” Gemini 3.5 Flash, blog.google ↗
Capability Umma operating system GPT-5.5 OpenAI Claude Opus 4.8 Anthropic Gemini 3.5 Flash Google
Designed to pursue goals ~ ~ ~
Continuous self across sessions
Honest, truthful
Builds and keeps its own capabilities
Verifiable trace of every claim
Umma vs. the agents

Agents sell goals. Only Umma achieves them.

Their own words — against the receipts.

Manus

general AI agent
  • They say “It bridges minds and actions — it delivers results, getting everything done while you rest.” manus.im ↗
  • That needs Clarify the goal, get your sign-off, and verify its own work against reality.
  • It doesn’t Asked to verify, it fabricates curl responses for a state that never existed. Rio Times ↗
  • Umma does Umma validates every claim instead of asserting them. See the Lemur audit →

Hermes Agent

Nous Research
  • They say “The agent that grows with you — it remembers what it learns and gets more capable the longer it runs.” hermes-agent.nousresearch.com ↗
  • That needs An identity it can audit, and a gate before new skills are allowed to land.
  • It doesn’t Its self-improvement quietly opens security holes — the bug Nous Research itself calls “most dangerous,” because it “looks like success.” Nous Research ↗GitHub #7826 ↗
  • Umma does Umma's growth is governed, versioned, and refusable. How she's built →

OpenClaw

open-source agent
  • They say “The AI that actually does things.” openclaw.ai ↗
  • That needs Real boundaries on what it runs, an audit trail, and a supply chain that can't be poisoned.
  • It doesn’t A one-click bug let attackers take over the app (rated 8.8/10 critical), its add-on store was poisoned, and a safety lead's emails were deleted despite stop commands. NVD ↗The Hacker News ↗Fast Company ↗
  • Umma does Umma quarantines every capability and logs every call — nothing ships unproven. What she can do →
Your turn

Bring your hardest problem. Get in touch →