Level up your AI agents.
You have built an AI agent. The demos are impressive. But it is costing you too much, and you never quite know when it will let you down—or do something risky in production.
If you are a CTO, engineering manager, technical founder, or senior engineer, you know that building a demo is easy. Building a reliable, scalable, and cost-efficient agent that your team can trust is another matter entirely. I specialise in helping technical teams turn promising prototypes into robust, production-ready systems.
I have built and shipped AI agents to production and created multiple home-grown agent evaluation systems, each one better than the last. I have distilled that learning into a platform for agent reliability, and I work hands-on with teams to:
- Audit and improve agent reliability, safety, and cost
- Design and implement custom evaluation systems for agents
- Prototype and productionise agent-powered features
- Upskill engineering teams in agent best practices
To get started, take a couple of minutes to answer a few questions:
For Technical Leaders and Agent Builders
If you are responsible for delivering AI agents that work in the real world, I can help. My services are designed for:
- CTOs and engineering managers who need production-grade agents
- Technical founders and senior engineers scaling agent systems
- Teams struggling with agent reliability, cost, or unpredictable behaviour
You get direct, technical support—no generic advice, no hand-waving. I work with your codebase, your stack, and your real-world constraints.
Agent-Focused Services
1. Agent Reliability Audits
A deep technical review of your agent’s architecture, evaluation, and deployment. I identify failure points, cost drivers, and reliability risks, then deliver a prioritised action plan. Includes:
- Code and infra review
- Evaluation of agent behaviour and edge cases
- Recommendations for monitoring, evals, and safety
2. Custom Agent Evaluation Systems
Off-the-shelf evals rarely fit real-world agents. I design and implement custom evaluation frameworks so you can measure, monitor, and improve agent performance over time. Deliverables:
- Automated eval pipelines
- Metrics and dashboards tailored to your use case
- Training for your team on running and interpreting evals
3. Agent Prototyping & Productionisation
Move from demo to production. I work alongside your engineers to:
- Rapidly prototype new agent features
- Refactor and harden existing agents for scale
- Integrate agents into your product with robust monitoring and cost controls
4. Team Workshops & Upskilling
Hands-on, technical workshops for teams building or maintaining agents. Example topics:
- Architecting reliable LLM agents
- Cost optimisation and monitoring
- Debugging and failure analysis in production
How to Get Started
Take a couple of minutes to answer a few questions to get started:
If you want to see how I work, request an agent audit or join a workshop. I am happy to share examples of how I have helped teams ship agents that work—at scale, in production, and under real-world constraints.
The Huge List of AI Tools: What's Actually Worth Using in May 2025?
There are way too many AI tools out there now. Every week brings another dozen “revolutionary” AI products promising to transform how you work. It’s overwhelming trying to figure out what’s actually useful versus what’s just hype.
So I’ve put together this major comparison of all the major AI tools as of May 2025. No fluff, no marketing speak - just a straightforward look at what each tool actually does and who it’s best for. Whether you’re looking for coding help, content creation, or just want to chat with an AI, this should help you cut through the noise and find what you need.
Read moreBuilding AI Cheatsheet Generator Live: Lessons from a Four-Hour Stream
I built an entire AI-powered app live, in front of an audience, in just four hours. Did I finish it? Not quite. Did I learn a huge amount? Absolutely. Here is what happened, what I learned, and why I will do it again.
The challenge was simple: could I build and launch a working AI cheatsheet generator, live on stream, using AI first coding and Kaijo1 as my main tool?
Answer: almost! By the end of the session, the app could create editable AI cheatsheets, but it was not yet deployed. A few minutes of post-stream fixes later, it was live for everyone to try. (Next time, I will check deployment on every commit!)
Try the app here: aicheatsheetgenerator.com
AI: The New Dawn of Software Craft
AI is not the death knell for the software crafting movement. With the right architectural constraints, it might just be the catalyst for its rebirth.
The idea that AI could enable a new era of software quality and pride in craft is not as far-fetched as it sounds. I have seen the debate shift from fear of replacement to excitement about new possibilities. The industry is at a crossroads, and the choices we make now will define the next generation of software.
But there is a real danger: most AI coding assistants today do not embody the best practices of our craft. They generate code at speed, but almost never write tests unless explicitly told to. This is not a minor oversight. It is a fundamental flaw that risks undermining the very quality and maintainability we seek. If we do not demand better, we risk letting AI amplify our worst habits rather than our best.
This is the moment to ask whether AI will force us to rediscover what software crafting1 truly means in the AI age.
-
I use the term “software craft” to refer to the software craftsmanship movement that emerged from the Agile Manifesto and was formalised in the Software Craftsmanship Manifesto of 2009. The movement emphasises well-crafted software, steady value delivery, professional community, and productive partnerships. I prefer the terms “crafting” and “craft” to avoid gender assumptions. ↩
Why Graph RAG is the Future
Standard RAG is like reading a book one sentence at a time, out of order. We need something new.
When you read a book, you do not jump randomly between paragraphs, hoping to piece together the story. Yet that is exactly what traditional Retrieval-Augmented Generation (RAG) systems do with your data. This approach is fundamentally broken if you care about real understanding.
Most RAG systems take your documents and chop them into tiny, isolated chunks. Each chunk lives in its own bubble. When you ask a question, the system retrieves a handful of these fragments and expects the AI to make sense of them. The result is a disconnected, context-poor answer that often misses the bigger picture.
This is like trying to understand a novel by reading a few random sentences from different chapters. You might get a sense of the topic, but you will never grasp the full story or the relationships between ideas.
Real understanding requires more than just finding relevant information. It demands context and the ability to see how pieces of knowledge relate to each other. This is where standard RAG falls short. It treats knowledge as a stack of random pages, not as a coherent whole.
Time for a totally new approach.
Read moreIntroducing Kaijo: AI functions that just work
For months, I have wrestled with a problem that has consumed my thoughts and challenged everything I know about software development.
This week I wrote about building the future with AI agents. One of the key areas for me is moving beyond prompt engineering to something more reliable.
I have spent decades learning how to craft reliable software. Now I want to bring that reliability to AI development.
Today I am ready to share what I have been building in the background.
It started with a game. It ended with something that could change how we build AI applications forever.
Read moreSee the Archive for more articles.