How Vet-AI deployed AI in pet healthcare in 10 weeks without the hallucinations

Vet-AI, a UK Artificial Intelligence (AI) scale-up, has built a new AI-powered veterinary triage system using Google Cloud Platform (GCP) data and AI solutions to quickly guide pet owners to helpful advice.

The system combines Large Language Model (LLM) capabilities from Gemini, retrieval-augmented generation (RAG), and a human ‘vet-in-the-loop’ safety model. According to Rob Gray, the organization’s Head of Engineering, this approach means the system is far from being ‘just another chatbot.’

The triage system went live after just 10 weeks of development and is currently operating at 81% clinical accuracy. It offers a generative AI-based conversational interface that draws on real clinical data to give owners timely, accurate advice.

The company claims this use of generative AI has cut online vet consultation times by 8%, helping to make pet care – notoriously expensive in the UK – more accessible and affordable for worried owners.

Given the sensitivity of the use case, Gray says the development team used extensive “ghost mode” testing before go-live to head off any chance of generative AI hallucinations.

Following the project’s success, Vet-AI is planning to expand into multimodal image and video AI analysis, plus develop a wider set of specialist pet care AI agents trained on clinical datasets.

Table of Contents

From decision tree to LLM with RAG

Based in the University of Leeds’s £40m innovation hub, Vet-AI’s mission is to address veterinary burnout, help the 60% of the UK population who own pets manage rising pet care costs, and make access to good pet care guidance available at scale.

The veterinary industry faces significant challenges, Gray explains. Burnout is on the rise among veterinary professionals, with almost 70% reporting losing a colleague to suicide.

Vet-AI has been working to address these issues since 2019, when it launched its proprietary ‘Joii’ Pet Care telemedicine app. Originally built on a decision tree, the platform has been cautiously transitioning to AI over the past 18 months.

The concept is to connect owners with licensed vets through online video and chat consultations for a fraction of the price of a practice visit. Since launch, the platform has delivered over 400,000 consultations.

To reach the next stage, the company needed to build a fast, scalable pipeline to process and structure the resulting dataset, which includes 100,000 labeled images.

The aim was to enable instant retrieval of relevant clinical insights and past interactions. Joii users could then describe their pets’ symptoms in-app and immediately know what level of care their sick animals need.

Built-in classification models would also trigger a warning if the animal’s condition warranted immediate in-clinic or pet hospital care.

Gray states that the company was already familiar with GCP and decided to build on that foundation.

To build the triage system, Gray and his team used a range of Google Cloud services, including the Gemini LLM, the BigQuery data warehouse, Vertex AI’s model training, and Cloud Run for AI model hosting.

Other GCP tools included Genkit with Firebase to build a proof of concept, drawing on extensive previous Vet-AI internal AI experimentation with LangChain and LangGraph.

The development process moved quickly. Gray and his three-person IT team went into full production after six weeks of prototyping and four weeks of development and testing.

Careful with advice

Gray stresses that accuracy is paramount—and is one of the reasons this is as much a RAG story as it is a generative AI or GCP one.

Gray explains:

Not only are we basing all this on nearly half a million real conversations between pets and clinicians and have humans in the loop, but we also use something called Assembly AI that transcribes all our conversations between clinician and pet owner.

We then summarize that, creating the backbone of our RAG dataset – so when you’re in conversation with the Triage tool, we’re not only leaning into the power of the models, but we’re also checking that we have a clinical history of similar cases – and if we have similar cases, how did that clinician navigate that scenario?

All RAG data is thoroughly checked for relevance, Gray adds, while groundedness is a key metric for ensuring that when prior cases are retrieved, the LLM actually uses them.

Gray continues:

And that’s really, really important – because it’s all well and good bringing back similar cases, but if the LLM isn’t making use of that and just talking about something random, hallucinations might happen, which neither us nor the vets nor the owners want.

User experience is equally important, he notes. The system tracks not just factuality, integrity, safety, and readability of all triage interactions, but also the amount of empathy the agent displays as it interacts with an owner.

Gray says:

It’s all about clinical accuracy; we avoid that temptation by an LLM just to give the user what they need now. Plus, we always have 60 vets online and whenever advice cannot be given or it is not grounded in truth, we bring them straight in.

We think that provides a level of assurance when you’re interacting with the AI, as you know there’s a team of trained professionals there to handle cases that the AI falls short on, or our RAG data set doesn’t have any prior cases of.

I can also say we haven’t launched a single model or provided a clinical outcome without running it in what we call ‘ghost mode’ to monitor it in a real-world environment with all the medical sign-off.

Next steps

Building on this success, Gray wants to develop more agents to complement the triage application.

This work is starting with use of the Google Agent Development Kit to build clinical specialist systems for conditions like gastroenteritis. These would provide not just guidance on whether to take a pet to the vet, but could offer actual, accurate medical advice.

More video analysis capabilities are planned so owners can share images of conditions like a cat’s skin problem and identify its type and severity.

Work has also started on allowing Joii to remember a pet’s entire care journey so it can offer continuous, personalized support—again, using GCP technology (in this case, the Vertex AI Agent Engine Memory Bank).

link