Aiinfox logo
Buying Guide · United States

How to evaluate an AI development company in the USA.

An objective framework for US CTOs, VPs of Engineering, and procurement leads weighing multiple AI development vendors. Nine criteria, the red flags to watch for, and the contract terms that protect you after the kickoff call.

A US procurement team reviewing AI vendor proposals around a conference table — the kind of evaluation conversation this guide helps you run end-to-end.
50+

AI systems shipped to production

12

industries served end-to-end

<2s

average voice-agent p95 latency

99.95%

production uptime across deployments

Overview

A practical evaluation framework for US teams hiring an AI development company — without the pitch deck.

If you are a US CTO or VP of Engineering evaluating AI development companies in 2026, the field has split into three roughly distinct categories — Bay Area boutiques charging $400-$600 per senior hour, mid-market consultancies running discovery-then-discovery-then-build phases at $200-$350 per hour, and offshore senior-only firms (Aiinfox among them) shipping fixed-price scopes at the equivalent of $90-$150 per senior hour. The cheapest vendor will burn your budget twice over. The most expensive one will sell you a slide deck and a junior pool. The difference between a good evaluation and an expensive mistake is asking the right structured questions in the first 30 minutes — before the demo, before the SOW, before the contract review. This page is the framework we wish every US buyer brought to the first call.

We have written this honestly. Aiinfox is one of the vendors you may be evaluating, and we will not pretend otherwise — but the criteria below were not reverse-engineered from our strengths. They are the same criteria your security team, your finance team, and your engineering team will surface during the diligence cycle anyway, written down in one place so you can run them in week one rather than week six. We name no competitor companies. We frame red flags in the structural language that applies to the whole market, not as ammunition for a sales narrative. Read it, copy what is useful into your own RFP, and apply it to every vendor on your shortlist — including us.

The nine criteria below are sequenced the way they actually break a US engagement. Seniority and delivery model first, because if the engineers writing your code are not the engineers on the kickoff call, nothing else matters. Compliance posture second, because HIPAA, SOC 2, and CCPA scope determines what is even buildable. Eval-first delivery third, because an AI system without evals is a demo, not a product. Cost transparency, takeover clauses, IP assignment, and post-launch support round out the contractual surface. Treat this as a checklist, not an essay — and walk away from any vendor that cannot answer all nine in plain English on the first call.

Why teams pick Aiinfox

  • Senior engineers only — 8+ years average, no junior pool
  • Eval harness in week one, not retrofitted in phase two
  • HIPAA · SOC 2 · CCPA aligned, BAA-ready, US-region deployment
  • Fixed-price 6-week target; overrun cost on us if we miss
  • IP assignment, takeover clause, and 30-day warranty written into every SOW
  • Frisco, TX office + US-hours pod for CT business coverage
About the team
What we build

Production work, not prototypes.

1. Seniority verification

What good looks like: named engineers with public GitHub, prior production credits, and direct calendar access — same people through launch. Red flag: 'team lead plus the team' language with no named engineers, or LinkedIn profiles you cannot find.

Explore

2. Eval-first delivery

What good looks like: a written eval set with ground-truth answers before any prompt is written, plus latency and cost telemetry from day one. Red flag: 'we'll add evals in phase two' or no answer when you ask what the failure modes are.

Explore

3. Compliance posture (HIPAA · SOC 2 · CCPA)

What good looks like: BAA signed before PHI is shared, SOC 2-aligned controls, US-region inference pinned, audit logs exportable. Red flag: 'we work with HIPAA data all the time' with no written controls, no BAA template, and no region pinning.

Explore

4. Fixed-price scope with acceptance criteria

What good looks like: a one-page SOW with scope, acceptance criteria, timeline, and a fixed USD number written in 72 hours. Red flag: time-and-materials with an open-ended discovery phase that runs for two months before any code is committed.

Explore

5. Takeover and exit clauses

What good looks like: source in your GitHub from day one, runbooks at handover, IP assigned to you, and a clean exit path written into the MSA. Red flag: code in a vendor repository, deployment in a vendor cloud, or a 'managed service' wrapper that locks you in.

Explore

6. Post-launch support and the 30-day question

What good looks like: a 30-day production warranty written into the SOW, optional retainer for ongoing tuning, and on-call docs for your team. Red flag: 'we move to support contract after launch' with no warranty window and no runbooks.

Explore
Industries

Where this work has shipped.

1. Seniority verification

Ask: 'who exactly will write my code, and will they be on every demo?' Verify with public GitHub commits, named LinkedIn profiles, and direct calendar access. Walk away from agency-style 'team lead plus team' answers with no named engineers.

2. Eval-first delivery discipline

Ask to see an eval set from a comparable prior engagement before the SOW. Eval coverage on 200+ reference cases is table stakes for production work. 'Evals come in phase two' is the single biggest red flag in this market.

3. Compliance posture (HIPAA · SOC 2 · CCPA)

BAA template in hand, SOC 2-aligned controls written down, US-region inference pinned, audit logs exportable for evidence. If your engagement touches PHI or PII at scale, no written controls is a hard stop.

4. Honest time-zone story

Offshore vendors who claim 'full US business hours from India' without a named US pod are bending the truth. The honest answer is either (a) a real US-hours pod or office, or (b) async-first with named overlap windows. Either is fine — vagueness is not.

5. Fixed-price scope discipline

A vendor who cannot scope a six-week build in 72 hours will not deliver one in six months. T&M with an open discovery phase is a budget extension mechanism, not an engineering discipline.

6. Takeover and IP assignment

Source in your GitHub from day one, deployment in your cloud account, IP assigned to you in the MSA. Vendor-locked code, vendor cloud, or 'managed service' wrappers convert you from buyer to tenant — read the exit clauses before you sign.

7. Cost transparency (USD, fixed, written)

Hourly rates without a scope are a category error. Demand a fixed USD number with acceptance criteria. If the vendor's number is a range with a 3x spread ('$80k-$240k'), they have not understood the scope yet — do not let them learn on your dollar.

8. References that match your shape

Ask for references from a comparable industry, scale, and compliance footprint. A vendor with five healthcare references and no fintech work will struggle with fintech. Generic 'enterprise references' that map to nothing in your sector are marketing, not proof.

Process

How we ship.

01

Define the bar

Write a one-page brief: problem, success metric, compliance scope (HIPAA, SOC 2, CCPA), and a hard budget ceiling. Send the same document to every shortlisted vendor — different briefs make the responses uncomparable.

02

Run a structured 30-minute call

Walk every vendor through the nine criteria above in the same order. Take notes in a shared spreadsheet. Anyone who cannot answer compliance posture and seniority verification on the first call should not get a second.

03

Ask for the SOW in 72 hours

The 72-hour test separates vendors who scope cleanly from vendors who run open-ended discovery. Pass: a one-page SOW with acceptance criteria. Fail: a discovery proposal that itself costs $20k.

04

Run a 2-3 week paid pilot

Never sign a $200k engagement on a deck. Pay $10-25k for a scoped 2-3 week pilot with acceptance criteria. The vendor who ships clean pilot code is the vendor who will ship the production system.

Proof

The cost benchmark for an honest US AI build. Written down.

Bay Area boutique senior rates land at $400-$600 per hour. Mid-market US consultancies at $200-$350. Offshore senior-only firms (Aiinfox included) at $90-$150 equivalent per hour or, more usefully, fixed-price v1 scopes between $25,000 and $120,000 for a six-week build. A reasonable US v1 with HIPAA or SOC 2 compliance scope lands at $60-$120k fixed-price. Anything quoted under $25k is either a pilot or a corner-cutter; anything quoted over $250k without fine-tuning and a multi-quarter scope is paying for a sales narrative.

FAQ

Questions teams actually ask.

How do I actually verify a senior engineer's seniority before signing?

Three concrete checks. First, ask for named engineers on the proposal and confirm their LinkedIn profiles match the seniority claimed — eight-plus years of production AI or ML work, not eight years of general software with six months of LangChain. Second, ask for direct access to those engineers on the discovery call and again on the kickoff — the same names, not a swap. Third, ask for public artifacts: GitHub commits, conference talks, or shipped products you can find. Vendors who refuse all three are running a senior-figurehead-plus-junior-pool model, which is the single most common failure pattern in the US market.

What contract terms should I insist on in the MSA and SOW?

Six non-negotiables. (1) IP assignment — your code, your prompts, your evals, your data, assigned to you on payment. (2) Source in your GitHub from day one, not a vendor repo. (3) Acceptance criteria written into the SOW with a defined test plan. (4) A 30-day production warranty — bugs introduced by the vendor are fixed at no charge for 30 days post-launch. (5) A clean exit clause — runbooks, on-call docs, and credentials handed over on termination for any reason. (6) Data handling terms aligned with your compliance scope (BAA for HIPAA, DPA for CCPA, SOC 2-aligned controls written in). Anyone resisting any of these six is protecting a lock-in.

What does an honest takeover clause look like?

It looks like the vendor losing zero leverage if you choose to leave. Concretely: code lives in your GitHub org from commit one. Deployment runs in your AWS, Azure, or GCP account under your IAM. Secrets live in your secret manager. Runbooks, on-call docs, and architectural decision records are checked into the repo, not in a vendor wiki. IP assigns to you on payment, not on contract end. If your vendor cannot describe a clean handover to a different team in a single page, the architecture is the lock-in. We see this most often with 'managed AI platforms' that wrap an open-source stack and then charge enterprise pricing for the wrapper — the moment you try to leave, the wrapper goes with them.

Should IP be assigned to me, or licensed?

Assigned, with one narrow exception. Custom code, prompts, evals, fine-tuned weights trained on your data, and integration glue should all assign to you on payment — that is the default in US AI MSAs. The narrow exception is pre-existing vendor frameworks or internal libraries that pre-date the engagement; those are usually licensed to you under a perpetual, royalty-free, transferable license rather than assigned. That is fair, provided the license is genuinely perpetual and transferable. Watch for 'licensed for your use' language without 'perpetual and transferable' — that is a renewal trap.

What does honest post-launch support look like in 2026?

A 30-day production warranty as default — the vendor fixes their own bugs at no charge for 30 days after the system is live. After that, an optional retainer for tuning, eval refresh, drift monitoring, and on-call response, priced separately and renewable monthly. The retainer should be optional, not bundled into a 'managed service' that you cannot cancel without losing access to your own code. If the vendor's post-launch model requires you to keep paying them to keep your system running, you bought a service, not a build — and you should know that before signing.

How do I sanity-check a vendor's claimed compliance posture?

Four asks. (1) Show me your BAA template (HIPAA) and your DPA template (CCPA) — vendors who 'work with HIPAA data all the time' but cannot produce a template have not actually done the work. (2) Show me a redacted audit log from a comparable prior engagement — what fields are captured per model and tool call. (3) Tell me which region your inference endpoints run in and how that is enforced — vendors who say 'OpenAI handles that' have not pinned the region. (4) Walk me through a prior incident response — what happened, what was the timeline, what evidence was produced. Anyone who cannot do all four on the first call is selling compliance, not practicing it.

What is a fair fixed-price range for a US v1 AI build in 2026?

A focused v1 — single AI agent, single RAG system, or single voice pipeline — with US compliance scope (HIPAA, SOC 2, or CCPA) lands at $25,000-$120,000 fixed-price for a six-week build with offshore senior-only delivery, $80,000-$250,000 with mid-market US consultancies, and $200,000-$500,000+ with Bay Area boutiques. Anything quoted under $25k is either a pilot, a corner-cutter, or a junior-pool play — be honest about which. Anything quoted over $250k without fine-tuning, multi-system scope, or a multi-quarter timeline is paying for a sales narrative, not engineering. The most expensive vendor is rarely the best one; the cheapest one almost never is.

How do I avoid a stalled engagement that I cannot escape?

Three structural protections written into the contract before kickoff. First, a milestone-based payment schedule — typically 25 percent on kickoff, 25 percent on demo one, 25 percent on demo two, 25 percent on acceptance — so you can stop paying if the work stops shipping. Second, twice-weekly demos against the SOW acceptance criteria, with written go/no-go after each demo. Third, source access from day one in your GitHub, so if you do walk away you walk away with the work in progress. Vendors who resist any of the three are protecting their downside, which means they expect to need that protection — which is itself the signal.

Let's build it

Want a fixed-price scope inside 72 hours?

30-minute discovery call in your US business hours. We will walk you through the nine criteria above against our own delivery model — and tell you on the call if we are not the right fit. HIPAA, SOC 2, and CCPA-aligned. Frisco, TX office for CT-hours coverage.

Book a discovery call

Reply within 1 business day · India & USA

Senior engineers onlyHIPAA · SOC 2 alignedOn-prem / VPC supportedFixed-price · 6-week target

Compare this framework against the Aiinfox US country pillar, the HIPAA AI development deep-dive, and the SOC 2 AI development page for the compliance posture in detail. See the medical-inquiry RAG case study and the voice agent case study for documented references that match the seniority and eval-first criteria above. Practice pages: AI agent development, generative AI, and RAG development services. Sibling buying guides for the UK, Canada, and Australia.