Aiinfox logo
Case study · Healthcare

Healthcare Information Platform · Healthcare · Compliance

A fine-tuned Llama 3.1 model for healthcare inquiries.

Domain fine-tuning that beat general-purpose models on accuracy, latency, and cost — while keeping data inside the customer's VPC.

+22%

accuracy lift vs. base Llama 3.1 70B

−61%

inference cost vs. GPT-4o on the same task

VPC

fully self-hosted, no data leaves customer cloud

Healthcare Information Platform — A fine-tuned Llama 3.1 model for healthcare inquiries.

Healthcare · Compliance

Healthcare Information Platform

Client

Healthcare Information Platform

Healthcare · Compliance

Headline metric

+22%

accuracy lift vs. base Llama 3.1 70B

Deliverables

4

shipped to production

Stack

5+ tools

across the build

01

Challenge

A healthcare information platform had data-residency requirements that ruled out hosted frontier models. Base Llama 3.1 was close but not accurate enough on the clinical inquiry task. They needed a self-hostable model that hit the eval bar.

02

Approach

Curated a 14,000-pair instruction dataset of approved clinical inquiry / response examples. LoRA fine-tuned Llama 3.1 70B with three evaluation rounds, distilled to 8B for inference cost. Served on vLLM inside the customer's AWS VPC with audit logs on every call.

03

Outcome

Fine-tuned model beat base Llama 3.1 70B by 22% on the eval set and matched GPT-4o accuracy at 61% lower inference cost. Fully self-hosted — no patient data ever leaves the customer's VPC.

Fine-tuned model beat base Llama 3. The team owned this end-to-end.

Healthcare Information Platform

Healthcare · Compliance

+22%

accuracy lift vs. base Llama 3.1 70B

−61%

inference cost vs. GPT-4o on the same task

VPC

fully self-hosted, no data leaves customer cloud

Deliverables

What we shipped.

  • LoRA fine-tune pipeline
  • Eval harness
  • vLLM deployment
  • Drift monitoring
Stack

The tools we used.

Llama 3.1LoRAvLLMWeights & BiasesAWS VPC
Have a similar problem?

Let's talk about your version of this.

30-minute discovery call. No NDA gatekeeping. We'll tell you straight whether we're a fit.

Book a discovery call

info@aiinfox.com