Domain intelligence for better models. Agentic AI for real workflows.

Production-grade agentic systems across software engineering, healthcare, and finance, built with domain context carried through design, evaluation, and deployment.

End-to-end agentic system design & build
Multi-agent pipelines & workflow automation
AI-native architecture, not retrofitted legacy code
From prototype to scalable production deployment
Compliance-aware engineering for regulated industries

Start an agentic project →

Four areas. Genuine depth in each.

01 - Coding

Software engineering data

SWE trajectories with reasoning, tool calls, and acceptance signals.

02 - Healthcare

Healthcare records & reasoning

5M+ clinical records across prescriptions, diagnostics, and pharma.

03 - Finance

Financial analysis data

Earnings, risk, compliance, and trade rationale evaluation datasets.

04 - Agentic AI

Agentic systems

Multi-agent systems, RL environments, and production deployment.

Complete agent trajectories across 258k real-world software engineering problems with reasoning traces, tool calls, code edits, and explicit user acceptance signals. Three core layers: Task, Trajectory, and Reward datasets.

258k real engineering tasks. Complete agent trajectories across real-world software engineering problems with reasoning traces, tool calls, code edits, and explicit user acceptance signals. Nothing synthetic.
Three derived datasets. Task dataset (258k cleaned prompts), Trajectory dataset (3.7M full agent traces), and Reward dataset (130k acceptance signals).
Beyond SWE-Bench. Full lifecycle: reasoning → tool calls (6-7 per task across 22 tools) → code edits → human acceptance. Real production tasks, not curated benchmarks.

5M+ connected healthcare records covering prescription digitisation, diagnostic reasoning, radiology, pathology, and drug grounding mapped to symptoms, diseases, and side effects.

5M+ connected healthcare records. Prescription digitisation, diagnostic reasoning, radiology, and pathology report interpretation in one corpus.
Prescription, diagnostic, and report workflows. Extraction and interpretation tasks spanning prescriptions, clinical reasoning, radiology, and pathology.
Drug data that completes the corpus. Drug layer tied to symptoms, diseases, and side effects - the grounding context for medical AI training and evaluation.

Preference data and SFT datasets for earnings reports, 10-K filings, risk model assessment, regulatory compliance, fraud detection, and trade rationale evaluation.

Earnings & analyst evaluation. Preference data and SFT datasets over earnings reports, 10-K filings, and sell-side research. Evaluated by credentialed analysts.
Risk & compliance data. Training and evaluation data for risk modelling, regulatory tasks, and stress testing. Reviewed by risk professionals.
Fraud detection & trade rationale. Expert-annotated datasets for fraud detection, trade rationale evaluation, and financial reasoning benchmarks.

Multi-agent system design, custom RL environments, and production deployment with guardrails, observability, human-in-the-loop checkpoints, and scalable infrastructure.

Agentic system design. Multi-agent architectures with tool use, memory, orchestration, and handoff logic for long-horizon workflows.
RL environment engineering. Custom RL environments that simulate real expert decision workflows. High-signal training data and meaningful evaluations.
Production deployment & ops. Prototype to production with guardrails, observability, human-in-the-loop checkpoints, and reliable infrastructure at scale.

How we work

Scope the workflow

We define the task, decision points, quality bar, and operational constraints.

Design the signal

We shape the right combination of datasets, reward signals, evaluations, or RL environments.

Deliver in loops

Work is delivered in structured iterations with QA, feedback, and calibration built in.

Deploy or extend

If the workflow needs to run in production, we build the agentic system around it.

Some teams start with training infrastructure. Others start with the workflow. We support both.

Why Zstate

Domain experts, not generic labor

The core signal comes from people who understand the work, not from low-context annotation.

Real workflows, not shallow proxies

We build around actual decision processes, including tasks, rubrics, evaluations, and RL environments.

Training and production under one roof

The same company that shapes model signals can also build the system that uses them.

Depth where judgment matters

We focus on software engineering, healthcare, and finance, where expertise changes model quality.

Domain intelligence for better models. Agentic AI for real workflows.

Generic annotation failswhere domain judgment matters

Domain Intelligence and Agentic AI