Domain intelligence for better models. Agentic AI for real workflows.

Zstate builds the training infrastructure that makes models smarter, and the AI-native systems that put them to work across domains.

Coding dataset
258k
Real software engineering tasks with full agent trajectories, tool calls, and human acceptance signals. Nothing synthetic.
3.7M
Trajectory
Steps
130k
Reward
Signals
1.7M
Tool
Interactions
Explore →
Healthcare dataset
Explore →
5M+
Prescription digitisation, diagnostic reasoning, radiology, pathology, and drug grounding.
Domain experts
5000+
Specialists across software engineering, healthcare & finance
What we work on
Domain Intelligence
Training infrastructure for models: RL environments, RLHF and SFT datasets, reward signals, and evaluations.
Agentic AI
AI-native systems that turn domain workflows into production software, with automation, orchestration, and human review where needed.

Generic annotation fails
where domain judgment matters

Generic annotation
  • Labels created by people who have never done the work
  • Tasks reduced to shallow fragments with little workflow context
  • Benchmarks that reward looking right, not being right
  • Little connection between training data and production behaviour
Zstate approach
  • Tasks, rubrics, and reward signals designed with domain experts
  • RL environments and evaluation loops grounded in real workflows
  • Training infrastructure built with downstream system behaviour in mind
  • Signals that help models learn judgment, not just pattern matching

Domain Intelligence and Agentic AI

01
Primary

Domain Intelligence

Expert-led data, evaluation, and feedback systems that keep specialist context inside the model-development loop.

  • RLHF preference data & reward model training
  • SFT instruction datasets from domain experts
  • Red-teaming & adversarial evaluation
  • Clinical NLP, diagnostic Q&A, EHR abstraction
  • Earnings analysis, risk data, compliance evaluation
  • Medical coding & ICD abstraction
Start a project →
02
Powered by the same expertise

Agentic AI

Production-grade agentic systems across software engineering, healthcare, and finance, built with domain context carried through design, evaluation, and deployment.

  • End-to-end agentic system design & build
  • Multi-agent pipelines & workflow automation
  • AI-native architecture, not retrofitted legacy code
  • From prototype to scalable production deployment
  • Compliance-aware engineering for regulated industries
Start an agentic project →

Four areas. Genuine depth in each.

01 - Coding

Software engineering data

SWE trajectories with reasoning, tool calls, and acceptance signals.

02 - Healthcare

Healthcare records & reasoning

5M+ clinical records across prescriptions, diagnostics, and pharma.

Complete agent trajectories across 258k real-world software engineering problems with reasoning traces, tool calls, code edits, and explicit user acceptance signals. Three core layers: Task, Trajectory, and Reward datasets.

  • 258k real engineering tasks. Complete agent trajectories across real-world software engineering problems with reasoning traces, tool calls, code edits, and explicit user acceptance signals. Nothing synthetic.
  • Three derived datasets. Task dataset (258k cleaned prompts), Trajectory dataset (3.7M full agent traces), and Reward dataset (130k acceptance signals).
  • Beyond SWE-Bench. Full lifecycle: reasoning → tool calls (6-7 per task across 22 tools) → code edits → human acceptance. Real production tasks, not curated benchmarks.

5M+ connected healthcare records covering prescription digitisation, diagnostic reasoning, radiology, pathology, and drug grounding mapped to symptoms, diseases, and side effects.

  • 5M+ connected healthcare records. Prescription digitisation, diagnostic reasoning, radiology, and pathology report interpretation in one corpus.
  • Prescription, diagnostic, and report workflows. Extraction and interpretation tasks spanning prescriptions, clinical reasoning, radiology, and pathology.
  • Drug data that completes the corpus. Drug layer tied to symptoms, diseases, and side effects - the grounding context for medical AI training and evaluation.

Preference data and SFT datasets for earnings reports, 10-K filings, risk model assessment, regulatory compliance, fraud detection, and trade rationale evaluation.

  • Earnings & analyst evaluation. Preference data and SFT datasets over earnings reports, 10-K filings, and sell-side research. Evaluated by credentialed analysts.
  • Risk & compliance data. Training and evaluation data for risk modelling, regulatory tasks, and stress testing. Reviewed by risk professionals.
  • Fraud detection & trade rationale. Expert-annotated datasets for fraud detection, trade rationale evaluation, and financial reasoning benchmarks.

Multi-agent system design, custom RL environments, and production deployment with guardrails, observability, human-in-the-loop checkpoints, and scalable infrastructure.

  • Agentic system design. Multi-agent architectures with tool use, memory, orchestration, and handoff logic for long-horizon workflows.
  • RL environment engineering. Custom RL environments that simulate real expert decision workflows. High-signal training data and meaningful evaluations.
  • Production deployment & ops. Prototype to production with guardrails, observability, human-in-the-loop checkpoints, and reliable infrastructure at scale.

How we work

01
Scope the workflow
We define the task, decision points, quality bar, and operational constraints.
02
Design the signal
We shape the right combination of datasets, reward signals, evaluations, or RL environments.
03
Deliver in loops
Work is delivered in structured iterations with QA, feedback, and calibration built in.
04
Deploy or extend
If the workflow needs to run in production, we build the agentic system around it.

Some teams start with training infrastructure. Others start with the workflow. We support both.

Why Zstate

Domain experts, not generic labor
The core signal comes from people who understand the work, not from low-context annotation.
Real workflows, not shallow proxies
We build around actual decision processes, including tasks, rubrics, evaluations, and RL environments.
Training and production under one roof
The same company that shapes model signals can also build the system that uses them.
Depth where judgment matters
We focus on software engineering, healthcare, and finance, where expertise changes model quality.

Ready to build AI your domain trusts?

Whether you need domain intelligence or an agentic AI system, let's start with a conversation.

Domain Intelligence
Start a project
Expert-led data, evaluations, and red-teaming for models that need real domain depth.
Agentic AI
Start an agentic project
Production systems, environments, and workflow automation for complex industries.