Case Study: GPT Meal Generator

Updated: June 2024
Case Study: GPT Meal Generator

The Challenge

Cherrypick had operated a basic meal generator since early 2023. Customers could select how many meals they wanted for the week, and the system would generate a plan. They could reject meals and ask for alternatives, but there was no way to specify preferences or understand the reasoning behind recipe selections.

Customer feedback was clear: they wanted more personalisation and explanations for why certain recipes were chosen. The existing system felt rigid and opaque, leading to frequent plan changes and lower engagement with the generated meal plans.

This presented a perfect opportunity to explore how LLMs could add genuine value beyond the typical chatbot implementations flooding the market.

The Solution

We built a production LLM system that creates truly personalised meal plans with natural language explanations. The breakthrough was recognising that this was not a chat problem but a completion problem.

Instead of forcing customers into lengthy conversations, we designed an interface where meal plans generate with one click. Customers can then refine their plans using LLM-generated rejection options that feel customised and natural. This approach eliminated conversation fatigue while maintaining the personalisation benefits of LLMs.

The system excels at tasks that uniquely benefit from LLM capabilities: understanding dietary preferences, combining compatible recipes, and generating natural explanations for choices. These were problems that would have been difficult to solve effectively with traditional programming approaches.

Meal Generator Interface

Technical Implementation

The implementation required careful consideration of both technical and business constraints. We calculated costs upfront, understanding that consumer applications with large user bases and small margins cannot sustain expensive LLM interactions.

Our meal generator requires only a few LLM calls per plan generation, compared to the dozens of messages typical in chat sessions. This made the system financially viable where other approaches would have eroded profit margins entirely.

Quality control became paramount given the unpredictable nature of LLMs. We built a multi-layered evaluation system starting with automated validation of JSON structure and recipe ID verification against provided context. This prevented hallucinated recipes from reaching customers.

Expert review formed the second evaluation layer. Cherrypick’s Head of Food Sophie assesses generated plans for nutritional balance and flavour combinations, ensuring meals work well together throughout the week. These evaluations became training data for continuous improvement.

Context curation proved crucial for reliable results. Rather than sending the model our full recipe database and risking dangerous selections, we only provide recipes the customer can actually eat. This approach uses more tokens but delivers consistent, safe results.

Results

The system delivered measurable business impact that validated our approach. Customers began changing their meal plans 30% less frequently, indicating they were more satisfied with the initial recommendations. More importantly, basket usage of meal plans increased by 14%, demonstrating that customers found the personalised plans more actionable and appealing.

Meal Generator Rejection Interface

These improvements came from a production-ready system serving real customers daily. The multi-layered evaluation framework proved its worth during launch, catching quality issues before they reached users. Despite initial concerns about LLM costs, the system operates comfortably within budget constraints thanks to careful interface design.

Beyond the meal generator itself, we shipped major additional features including a health scores system while maintaining high delivery standards. The evaluation infrastructure we built became reusable for future AI features, creating lasting value beyond this single project.

"We built a production LLM meal generator that made our customers happier and increased revenue. Our AI system reduced customer plan changes by 30% (meaning happier customers) and increased basket usage by 14% (meaning more revenue per customer). We shipped major new features including health scores while maintaining high quality delivery."

Read full case study →

Key Learnings

The success came from treating this as a product challenge first, then finding the right technical implementation. We learned that LLMs excel when applied to problems that genuinely benefit from their unique capabilities. Natural language explanations and dietary preference understanding were perfect fits, but we would never use LLMs for simple categorisation tasks.

Interface design proved more important than the underlying technology. Our guided approach with LLM-generated options delivered superior results compared to chatbot interfaces. Users received personalisation benefits without conversation fatigue, creating a sustainable interaction model.

Building evaluation frameworks from day one prevented quality disasters and enabled confident model comparisons. This upfront investment paid dividends during scaling, allowing us to iterate quickly while maintaining reliability.

Perhaps most importantly, we learned to work with LLM limitations rather than fight them. Careful context curation eliminated dangerous outputs while preserving the flexibility that makes these systems valuable. The key was designing constraints that enhanced reliability without sacrificing the core benefits.

The Impact

This case study demonstrates how to build LLM applications that actually work in production. Too many AI projects become investor demos rather than shipped products.

The key was treating this as a product challenge first, then finding the right technical implementation. The result was a system that improved customer experience while operating within business constraints.

Want similar results for your team? Chris can help you identify where LLMs add real value and build systems that work in production, not just in demos.

Ready to build production AI systems?

Get the same systematic approach that delivered these results. Fill in the form to get started:

Share this article

More articles

Vision for an AI-First Team Topology

July 2025

Most discussions about AI focus on tools and techniques. But the real revolution is organisational. AI does not just change how we build software - it fundamentally transforms how we organise teams.

I am speaking at Fast Flow Conf UK 2025 about a vision for AI-first team topologies. Where stream-aligned teams become smaller yet more capable. Where platform teams evolve from infrastructure providers to creators of intelligent, malleable tools. Where the boundaries between disciplines blur as AI amplifies every team’s superpowers.

Read more →

Beyond the Demo: Building LLM Applications That Actually Ship

July 2025

Every team building with LLMs faces the same crushing moment. The demo works perfectly. Stakeholders are impressed. Then you deploy to production and watch it fall apart. What worked flawlessly with test data breaks in ways you never imagined. The carefully crafted system that seemed so promising becomes a source of constant firefighting.

I recently spoke at Google Cloud Summit with Gunisha Vig about how to build LLM applications that actually make it to production. Not proof of concepts. Not investor demos. Real systems serving real customers every day.

Read more →

Kill Your Prompts: Build Agents That Actually Work

July 2025

Every technical team I talk to faces the same painful truth about AI agents. They build something that works brilliantly in their demo, showing it off to stakeholders who nod approvingly. Then they deploy it to production and watch it break in ways they never imagined. The carefully crafted prompts they spent weeks perfecting become a maintenance nightmare.

Recently I showed a (virtual) room full of technical leaders how to kill their prompts entirely. I do not mean improve them or optimise them. I mean kill them completely and build something better.

Read more →

Master Prompt Stacking: The Secret to Making AI Code Like You Do

June 2025

After months of fighting with inconsistent AI coding results, I discovered something that changed how I work with tools like Cursor. The problem was not my prompts. The problem was that I had no idea what else was being fed into the AI alongside my requests.

During a recent webinar, I walked through this discovery with a group of engineers who were facing the same frustrations. What we uncovered was both obvious and surprising: AI coding tools are far more complex than they appear on the surface.

Read more →

The Huge List of AI Tools: What's Actually Worth Using in June 2025?

Updated: June 2025

There are way too many AI tools out there now. Every week brings another dozen “revolutionary” AI products promising to transform how you work. It’s overwhelming trying to figure out what’s actually useful versus what’s just hype.

So I’ve put together this major comparison of all the major AI tools as of June 2025. No fluff, no marketing speak - just a straightforward look at what each tool actually does and who it’s best for. Whether you’re looking for coding help, content creation, or just want to chat with an AI, this should help you cut through the noise and find what you need.

Read more →