Case study
Hermes: a multi-agent coding system running in production
An instruction enters the system. A planning agent scopes the change, then a human decision gate reviews and approves it. A writing agent implements the approved plan, a testing agent checks the result, and the change passes a second human decision gate before it deploys.
Context
Most AI coding tools stop at the suggestion. They draft a snippet and hand the real work, planning, testing, integrating, shipping, back to a person. We wanted to know what it takes to run agents across the full delivery loop in a way an engineering team would actually trust.
Hermes is our answer, and it is live. It is a multi-agent coding system that plans, writes, tests, and deploys software, with human oversight built in at the decision gates that matter.
What we built
A multi-agent architecture where specialised agents handle distinct stages of delivery, planning the change, writing the code, testing it, and moving it toward deployment, coordinated rather than run as one monolithic prompt.
Human oversight at decision gates. Hermes does not run unchecked. The gates are where a person reviews and approves before the system proceeds, so the work stays accountable and reviewable.
LLM orchestration that routes each stage to the right model and keeps outputs structured, so what each agent produces can be checked rather than taken on trust.
A production posture from the start. Hermes is not a demo. It runs live and builds real systems, which means it has to be reliable, observable, and safe to leave running.
Outcome
Hermes is running live today, building production systems with human oversight at the decision gates. It is the clearest proof of how we think about AI agents: not chatbots, but systems that do real work inside a delivery process, with people kept in control where control matters.
It is also how we pressure-test what we recommend to clients. The patterns we put into Hermes, multi-agent coordination, structured outputs, human-in-the-loop gates, are the same ones we bring to agent work for regulated firms.
“Hermes runs live, with people at the decision gates. That is the difference between an agent that demos and an agent you can trust in production.”
Built with: multi-agent architecture, LLM orchestration, human-in-the-loop decision gates.