Question 1

How does the time-zone overlap work for a UK LLM build?

Accepted Answer

Strong. India Standard Time is GMT+5:30, which gives roughly four to five hours of native daily overlap with UK business hours — our 1:30pm IST is your 8am GMT, our 6:30pm IST is your 1pm GMT. Daily standups, twice-weekly demos with eval-run numbers and cost telemetry, and ad-hoc debugging when an overnight regression hits the eval suite all land inside UK business hours without late-night calls on either side. Written async updates with overnight regression and cost data go out daily before your standup, so you walk into the day already knowing which prompts regressed and which models drifted on price.

Question 2

Why evals-first instead of prompt-engineering-first?

Accepted Answer

Because every LLM engagement we have audited that failed in production failed because nobody wrote the eval set. The team tuned a prompt until it looked good on three examples, the model swapped underneath them in a vendor update (Claude 3.5 to 4.6, GPT-4o snapshot changes, Llama 3 to 3.1), and quality regressed silently for weeks before someone noticed in a customer complaint. The eval harness is the regression test for the LLM — a fixed reference set of inputs, expected behaviours (faithful citation, refusal when out of scope, structured-output validity), and pass-fail criteria. We wire it in week one and run it on every prompt or model change. Frameworks we use: Braintrust, Langfuse, Phoenix Arize, or a bespoke harness when the standard tools do not fit your eval shape.

Question 3

Is the LLM stack UK GDPR and ICO aligned?

Accepted Answer

Yes. Engagement defaults align with UK GDPR, the Data Protection Act 2018, and ICO published guidance on AI. Every model call is audit-logged with prompt version, model name, input, output, retrieval citations (where applicable), and operator identity — exportable for ICO inspection. A Data Protection Impact Assessment is run for engagements processing personal data at scale or operating in a special-category-data context. For lawful basis: customer-service LLM applications run on legitimate interest with a documented LIA, internal-tooling LLM applications run on contract performance, and explicit consent is captured where the LLM processes special-category data. PII redaction patterns cover NI numbers, UTR, NHS number, sort-code-and-account combinations, and the long tail of UK identifiers.

Question 4

Where will the LLM workload physically run?

Accepted Answer

Your call. We default to AWS eu-west-2 (London), Azure UK South, or GCP europe-west2 for UK clients, and we will run the entire build inside your UK cloud account if your DPO requires no cross-region replication and no data egress to non-UK endpoints. For LLM inference, we route Claude or GPT-4o to a UK or EU region where available (Azure OpenAI Service UK South for GPT-4o is the default for UK GDPR-sensitive workloads), and we self-host Llama 3 70B on vLLM inside your VPC for zero third-party inference. Embedding models can be UK-hosted for clients who refuse to send corpus passages to a US endpoint. For clients with strict no-overseas-processing requirements, the entire LLM stack (inference, embeddings, vector store, observability) runs inside your eu-west-2 VPC.

Question 5

Is the LLM stack FCA-aware for UK regulated financial services?

Accepted Answer

Yes for the controls that affect the LLM application. For FCA-supervised use cases, deterministic-output controls are wired where regulators expect them — temperature pinning at 0 or near-0, structured-output schema validation, refusal layers with measurable out-of-scope rates, and audit logs that capture the full prompt and the full output for SMCR Conduct Rules evidence. Consumer Duty-aligned vulnerability flagging runs as a refusal layer on customer-facing LLM flows: when the conversation flags vulnerability indicators (financial hardship, cognitive distress, bereavement), the LLM escalates to a human rather than continuing. For SS1/23 model risk management at FCA-supervised firms, our DPA and engineering documentation supports your internal model risk assessment process. We do not provide regulatory advice; we build the controls and ship the audit logs your CCO and your SMCR-certified manager can defend.

Question 6

Do you self-host Llama 3 or do you only build on Claude and GPT-4o?

Accepted Answer

Both. Self-hosted Llama 3 70B or 8B on vLLM inside your VPC is the default for UK clients with strict no-overseas-inference requirements (FCA-supervised at higher risk, NHS-adjacent, defence-adjacent) or for cost-sensitive deployments at high volume where per-token API pricing is prohibitive. We have the deployment runbook — vLLM, TGI, or SGLang on GPU instances (A100, H100, or L40S depending on throughput target) with quantised inference (AWQ, GPTQ, INT8) to hit latency and cost targets. Claude and GPT-4o remain the default for clients where the eval bar requires the latest closed-model quality and where the DPA permits the routing.

Question 7

Can you take over a stalled LLM project from a London consultancy?

Accepted Answer

Yes — LLM takeover audits are routine. Step one is reading the code, the prompts, the eval results (if any exist), the retrieval pipeline, the model and provider choices, and the cost telemetry. Step two is shipping the smallest valuable change to prove we understand the system — usually wiring the eval harness or fixing the retrieval layer the previous vendor skipped. Step three is the longer-term plan: incremental stabilisation, a model swap to a better-suited build, or a parallel rebuild if the architecture is unsalvageable. Most takeovers we see did not need a full rewrite; they needed evals, guardrails, observability, and a senior engineer on the build.

Question 8

How does cost compare to a London LLM consultancy?

Accepted Answer

Most v1 LLM engagements at Aiinfox land between £25,000 and £130,000 fixed-price for a focused build — a copilot, a RAG-grounded LLM app, a fine-tuned domain model, or an evals-and-guardrails retrofit. Larger multi-quarter engagements with custom fine-tuning, bespoke evals, FCA documentation, and integration into a regulated platform typically reach £160,000 to £320,000. The cost difference versus a London or Manchester LLM consultancy lands roughly 30 to 50 percent lower on senior rates — useful, but the headline is the engineer on your kickoff call writes your prompts, your evals, your retrieval pipeline, and your code through launch. No swap-out to a junior pool mid-engagement.

LLM development for UK organisations that need models to actually ship.

Production LLM development for the United Kingdom — evals-first, UK-region inference, ICO-aligned.

Production work, not prototypes.

LLM applications and copilots

RAG-grounded LLM systems

Fine-tuning and self-hosted Llama 3

FCA-aware financial LLM systems

LLM evals, guardrails, and ops

LLM takeover and rebuilds

Where this work has shipped.

Financial services and fintech

Healthcare and life sciences

Legal and professional services

SaaS and B2B platforms

Insurance and risk

Govtech and public sector

Media and publishing

Telco and support

How we ship.

Discover

Scope

Build

Ship and operate

LLM applications that hold quality in production. Audit-grade.

Questions teams actually ask.

Ready to ship an LLM application UK regulators trust?

LLM Development in other countries