How to evaluate an AI development company in the UK.
An objective framework for UK CTOs, Heads of Engineering, and procurement leads weighing London, Manchester, and Edinburgh AI consultancies against offshore senior-only options. Nine criteria, the red flags to watch for, and the contract terms that protect you.

AI systems shipped to production
industries served end-to-end
average voice-agent p95 latency
production uptime across deployments
A practical evaluation framework for UK organisations hiring an AI development company — without the pitch deck.
If you are a UK CTO or Head of Engineering evaluating AI development companies in 2026, the field has organised itself into three rough categories — City of London boutiques charging £300-£500 per senior hour, mid-market UK consultancies in Manchester, Edinburgh, and Bristol running discovery-then-build at £150-£280 per hour, and offshore senior-only firms (Aiinfox among them) shipping fixed-price scopes at the equivalent of £70-£120 per senior hour. The cheapest vendor will burn your budget twice. The most expensive will sell you a slide deck and a Magic Circle-style discovery phase. The difference between a good evaluation and an expensive mistake is asking the right structured questions whilst the field is still on the first call — before the demo, before the SOW, before the contract review. This page is the framework we wish every UK buyer brought to that first conversation.
We have written this honestly. Aiinfox is one of the vendors you may be evaluating, and we will not pretend otherwise — but the criteria below were not reverse-engineered from our strengths. They are the same criteria your data protection officer, your finance team, and your engineering team will surface during the diligence cycle anyway, organised into one document so you can run them in week one rather than week six. We name no competitor companies. We frame red flags in the structural language that applies to the whole market, not as ammunition for a sales narrative. Read it, copy what is useful into your own RFP or ITT, and apply it to every vendor on your shortlist — including us.
The nine criteria below are sequenced the way they actually break a UK engagement. Seniority and delivery model first, because if the engineers writing your code are not the engineers on the kickoff call, nothing else matters. UK GDPR and ICO posture second, because data protection scope determines what is even buildable. Eval-first delivery third, because an AI system without evals is a demonstration, not a product. Time-zone honesty, cost transparency in GBP, takeover clauses, IP assignment, and post-launch support round out the contractual surface. Treat this as a checklist, not an essay — and walk away from any vendor who cannot answer all nine in plain English on the first call.
Why teams pick Aiinfox
- Senior engineers only — 8+ years average, no junior pool
- Eval harness in week one, not retrofitted in phase two
- UK GDPR · DPA 2018 · ICO-aligned, DPA-ready, UK-region deployment
- Fixed-price 6-week target; overrun cost on us if we miss
- IP assignment, takeover clause, and 30-day warranty in every SOW
- 5-hour native UK-hours overlap from IST — no late-night calls
Production work, not prototypes.
1. Seniority verification
What good looks like: named engineers with public GitHub, prior production credits, and direct calendar access — same people through launch. Red flag: 'tech lead and the team' with no named engineers, or LinkedIn profiles you cannot locate.
Explore2. Eval-first delivery
What good looks like: a written eval set with ground-truth answers before any prompt is committed, plus latency and cost telemetry from day one. Red flag: 'we'll add evals in phase two' or no answer when you ask about failure modes.
Explore3. UK GDPR & ICO posture
What good looks like: DPA template in hand, DPIA scoped for personal-data-at-scale engagements, UK-region inference pinned, audit logs exportable for ICO inspection. Red flag: 'we work with personal data all the time' without a written DPA template or region pinning.
Explore4. FCA awareness for fintechs
What good looks like: SM&CR-aware audit trails, no autonomous decisions on regulated outcomes, human approval in the loop. Red flag: 'our AI agent decides on credit / claims / KYC' with no operator-identity logging.
Explore5. Fixed-price scope in GBP
What good looks like: a one-page SOW with scope, acceptance criteria, six-week timeline, and a fixed GBP number written in 72 hours. Red flag: T&M with an open-ended discovery phase that runs for two months before any code is committed.
Explore6. Takeover, IP & exit clauses
What good looks like: source in your GitHub from day one, deployment in your UK cloud account, IP assigned to you, runbooks at handover. Red flag: code in a vendor repo, a 'managed AI platform' wrapper, or licensed-not-assigned IP.
ExploreWhere this work has shipped.
1. Seniority verification
Ask: 'who exactly writes my code, and will they be on every demo?' Verify against public GitHub commits, named LinkedIn profiles, and direct calendar access. Walk away from agency-style 'tech lead plus team' answers with no named engineers.
2. Eval-first delivery
Ask to see an eval set from a comparable prior engagement before the SOW is signed. Eval coverage on 200+ reference cases is table stakes for production work. 'Evals come in phase two' is the single biggest red flag in the UK market.
3. UK GDPR & ICO posture
DPA template in hand, DPIA scoped, UK-region inference pinned (AWS London, Azure UK South, GCP europe-west2), audit logs exportable. International transfers covered by UK IDTA or EU SCCs with the UK addendum, spelled out in the DPA.
4. FCA awareness (fintech only)
If you are FCA-supervised, the vendor must understand SM&CR accountability — every model and tool call audit-logged with operator identity, no autonomous regulatory decisions, humans approve regulated outcomes. Vendors who handwave this away should not be on the shortlist.
5. Honest time-zone story
Indian-time offshore vendors who claim 'full UK business hours' should be honest about the four-to-five-hour native overlap from IST. That overlap is workable — but vagueness about it is not. Onshore consultancies should be honest about senior availability during peak project months.
6. Fixed-price scope discipline
A vendor who cannot scope a six-week build in 72 hours will not deliver one in six months. T&M with open discovery is a budget extension mechanism — useful for some research-grade engagements but not for production AI builds where the scope is reasonably definable.
7. Takeover, IP & exit
Source in your GitHub from day one, deployment in your cloud, IP assigned (not licensed) on payment, runbooks at handover. Vendor-locked code or 'managed service' wrappers convert you from buyer to tenant — read the exit clauses before signing the MSA.
8. References that match your shape
Ask for references from a comparable industry, scale, and compliance footprint. A vendor with healthcare references and no FCA-supervised fintech work will struggle with a fintech engagement. Generic 'enterprise references' that map to nothing in your sector are marketing, not proof.
How we ship.
Define the bar
Write a one-page brief: problem, success metric, UK GDPR scope, FCA scope (if relevant), hard budget ceiling in GBP. Send the same document to every shortlisted vendor — different briefs make responses uncomparable.
Run a structured 30-minute call
Walk every vendor through the nine criteria in the same order. Take notes in a shared spreadsheet. Anyone who cannot answer UK GDPR posture and seniority verification on the first call should not get a second.
Ask for the SOW in 72 hours
The 72-hour test separates vendors who scope cleanly from vendors who run open-ended discovery. Pass: a one-page SOW with acceptance criteria. Fail: a £15-25k discovery proposal that itself precedes any committed scope.
Run a 2-3 week paid pilot
Never sign a £150k engagement on the strength of a deck. Pay £8-20k for a scoped 2-3 week pilot with acceptance criteria. The vendor who ships clean pilot code is the vendor who will ship the production system.
The cost benchmark for an honest UK AI build. Written down.
City of London boutique senior rates land at £300-£500 per hour. Manchester / Edinburgh / Bristol mid-market consultancies at £150-£280. Offshore senior-only firms (Aiinfox included) at the equivalent of £70-£120 per hour or, more usefully, fixed-price v1 scopes between £20,000 and £100,000 for a six-week build. A reasonable UK v1 with UK GDPR and FCA compliance scope lands at £50-£100k fixed-price. Anything quoted under £20k is either a pilot or a corner-cutter; anything quoted over £200k without fine-tuning and a multi-quarter scope is paying for a sales narrative.
Questions teams actually ask.
How do I verify a senior engineer's seniority before signing a UK SOW?
Three concrete checks. First, ask for named engineers on the proposal and confirm their LinkedIn profiles match the seniority claimed — eight-plus years of production AI or ML work, not eight years of general software with six months of LangChain experience. Second, ask for direct access to those engineers on the discovery call and again at kickoff — the same names, not a swap. Third, ask for public artefacts: GitHub commits, conference talks, or shipped products you can locate. Vendors who refuse all three are running a senior-figurehead-plus-junior-pool model, which remains the most common failure pattern in the UK consultancy market.
What contract terms should I insist on in a UK MSA and SOW?
Six non-negotiables. (1) IP assignment — your code, your prompts, your evals, your data, assigned to you on payment under English law. (2) Source in your GitHub organisation from day one, not a vendor repository. (3) Acceptance criteria written into the SOW with a defined test plan. (4) A 30-day production warranty — bugs introduced by the vendor are fixed at no charge for 30 days post-launch. (5) A clean exit clause — runbooks, on-call docs, and credentials handed over on termination for any reason. (6) Data protection terms (UK GDPR-aligned DPA, processor obligations, sub-processor list, breach notification timeline) signed before any personal data is processed. Anyone resisting any of the six is protecting a lock-in.
What does an honest takeover clause look like for a UK engagement?
It looks like the vendor losing zero leverage if you decide to leave. Concretely: code lives in your GitHub organisation from commit one. Deployment runs in your AWS London, Azure UK South, or GCP europe-west2 account under your IAM. Secrets live in your secret manager. Runbooks, on-call docs, and architectural decision records are checked into the repository, not parked in a vendor wiki. IP assigns to you on payment, not on contract end. If your vendor cannot describe a clean handover to a different team on a single page, the architecture itself is the lock-in. We see this most often with 'managed AI platforms' that wrap an open-source stack and charge enterprise pricing for the wrapper — the moment you try to leave, the wrapper goes with them.
How should I handle UK GDPR posture in the vendor evaluation?
Four asks on the first call. (1) Show me your DPA template — vendors who 'work with personal data all the time' but cannot produce a UK GDPR-aligned DPA template have not done the work. (2) Walk me through how you would scope a DPIA for an engagement processing personal data at scale, in plain English. (3) Tell me which UK or EU region your inference endpoints run in and how that is enforced — vendors who say 'OpenAI handles that' have not pinned the region. (4) Walk me through a prior incident response — what happened, what was the timeline, what evidence was produced for the ICO. Anyone who cannot do all four on the first call is selling compliance, not practising it.
How does the time-zone overlap actually work with an India-based offshore team?
Indian Standard Time is GMT+5:30, which gives roughly four to five hours of native daily overlap with UK business hours — 1:30pm IST is 8am GMT, 6:30pm IST is 1pm GMT. Daily standups, twice-weekly demos, and most ad-hoc problem-solving fit inside that window without late-night calls on either side. The honest framing from an offshore vendor is that overlap, not a claim of 'full UK business hours all day.' If a vendor cannot articulate the overlap precisely on the first call, they are either bending the truth or have not thought about it — neither is a good signal. Onshore vendors should be honest about senior availability during peak project months, when senior staff often get rotated onto larger accounts.
Should IP assign to me, or be licensed?
Assigned, with one narrow exception. Custom code, prompts, evals, fine-tuned weights trained on your data, and integration glue should all assign to you on payment — that is the default in UK AI MSAs and what your legal team will expect. The narrow exception is pre-existing vendor frameworks or internal libraries that pre-date the engagement; those are usually licensed to you under a perpetual, royalty-free, transferable, sub-licensable licence rather than assigned. That is fair, provided the licence is genuinely perpetual, transferable, and survives change of control. Watch for 'licensed for your internal use' language without 'perpetual and transferable' — that is a renewal trap.
What is a fair fixed-price range for a UK v1 AI build in 2026?
A focused v1 — single AI agent, single RAG system, or single voice pipeline — with UK compliance scope (UK GDPR, FCA-awareness, ICO posture) lands at £20,000-£100,000 fixed-price for a six-week build with offshore senior-only delivery, £60,000-£200,000 with UK mid-market consultancies, and £150,000-£400,000+ with City boutiques. Anything quoted under £20k is either a pilot, a corner-cutter, or a junior-pool play — be honest about which. Anything quoted over £200k without fine-tuning, multi-system scope, or a multi-quarter timeline is paying for a sales narrative, not engineering. The most expensive vendor is rarely the best one; the cheapest one almost never is.
What does honest post-launch support look like for a UK engagement?
A 30-day production warranty as default — the vendor fixes their own bugs at no charge for 30 days after the system is live. After that, an optional retainer for tuning, eval refresh, drift monitoring, and on-call response in UK business hours, priced separately and renewable monthly. The retainer should be optional, not bundled into a 'managed service' that you cannot cancel without losing access to your own code. If the vendor's post-launch model requires you to keep paying them to keep your system running, you have bought a service, not a build — and you should know that before signing the MSA.
Want a fixed-price scope inside 72 hours?
30-minute discovery call in UK business hours. We will walk you through the nine criteria above against our own delivery model — and tell you on the call if we are not the right fit. UK GDPR-aligned, DPA-ready, deployable inside your UK cloud.
Reply within 1 business day · India & USA
Compare this framework against the Aiinfox UK country pillar, the UK GDPR AI development deep-dive, and the UK fintech AI development page for the FCA-aware compliance posture in detail. See the voice agent case study and the medical-inquiry RAG case study for documented references that satisfy the seniority and eval-first criteria above. Practice pages: AI agent development, RAG development services, and generative AI. Sibling buying guides for the USA, Canada, and Australia.
