Domain intelligencefor better models.

Expert-verified training data and reward signals, built for STEM, software, healthcare, finance, RL environments, and general knowledge work.

Coding HealthcareFinance

Start a project →

Coding dataset58k

Real software engineering tasks with full agent trajectories, tool calls, and human acceptance signals. Nothing synthetic.

Trajectory steps: 833k
Reward signals: 29k
Tool interactions: 382k

Healthcare dataset1M+

Connected clinical records across prescriptions, diagnostics, radiology, pathology, and drug grounding.

Domain experts2M+

Specialists across software engineering, healthcare & finance.

Latest roles

Generic annotation fails where domain judgment matters.

Generic annotation

Labels from people who have never done the job.
Shallow fragments with none of the workflow around them.
Benchmarks that pay for looking right, not being right.
Judgment errors that surface only in production.

The Zstate approach

Tasks, rubrics, and reward signals designed with working domain experts.
Complete trajectories captured from real production workflows.
Signals grounded in downstream behaviour, so models learn judgment.

Five expert verticals, one standard.

Built with the people who do the work.

Scientists & engineers

Verify AI reasoning, check proofs, and train models that handle units, methods, and causation with real rigor.

12k+Proofs reviewed

Typical work

Derivations & proofs
Math, physics, bio, chem
Units & causation checks
Gold references

Apply as a STEM expert →

Verify with domain rigor

Specialists derive the solution, check the proof, and attach a gold reference the model can learn from.

Hard STEM problem arrives

Source trigger

Expert derives the solution

Specialist review

Proof checked for rigor

Verified by experts

Gold reference attached

Signal ready

Software engineers

Review AI code, write production-grade solutions, and shape the next generation of coding models.

58kEngineering tasks

Typical work

Real engineering problems
Stack-matched projects
Failure analysis
Reference patches

Apply as an engineer →

Score real engineering work

Stack-matched engineers review the trace, flag failure modes, and turn the fix into reward signal.

Repo task enters the queue

Source trigger

Engineer reviews the trace

Stack-matched review

Failure modes scored

Precise analysis

Reference patch becomes signal

Reward ready

Financial analysts & risk professionals

Evaluate earnings, filings, risk models, and trade rationale. Credentialed judgment, not pattern matching.

4,800+Filings scored

Typical work

Earnings & 10-K evaluation
Risk & compliance
Fraud rationale
Analyst scoring

Apply in finance →

Audit financial judgment

Credentialed analysts score the rationale, attach risk flags, and leave an audit trail with the signal.

Filing lands in the workspace

Source trigger

Analyst scores the rationale

Credentialed review

Risk flags attached

Compliance check

Trade signal leaves with audit

Signal ready

Clinicians & healthcare specialists

Prescription digitisation, diagnostic reasoning, radiology, pathology. 1M+ records grounded by experts.

1M+Connected records

Typical work

Clinical workflows
Drug grounding
Diagnostic reasoning
Connected records

Apply in healthcare →

Ground every clinical case

Clinicians ground the record, check drugs and imaging, and ship outcomes with full clinical context.

Clinical record enters review

Source trigger

Specialist grounds the case

Clinician review

Drug and imaging checked

Domain grounding

Outcome ships with context

Signal ready

Sharp readers & writers

Teach AI to follow instructions, write clearly, and stop inventing facts. Judgment most people already have.

120k+Ranked responses

Typical work

Instruction following
Factuality ranking
Hallucination flags
Gold responses

Apply as a generalist →

Rank for clarity and truth

Readers rank tone and clarity, flag hallucinations, and publish preference signal models can train on.

Model responses enter ranking

Source trigger

Readers rank clarity and tone

Expert ranking

Hallucinations flagged

Factuality check

Preference signal published

Signal ready

From real workflows to reward signals.Every dataset walks the same path.

Three stages from source material to production-ready signals, with the same rigor across STEM, coding, finance, health, and generalist work.

Collect complete context

Capture reasoning traces, tool calls, code edits, records, and source material from real production work.

Review with specialists

Domain experts apply explicit rubrics and score the reasoning, method, tool use, and outcome.

Deploy durable signals

Turn accepted work into evaluations, reward signals, and environments that retain source judgment.

1 / 3

Stem task

{
  "environment": "STEM",
  "task": "selectivity-028",
  "system": "Pd catalyst screen",
  "conditions": "65°C · 12 h",
  "variables": 18,
  "result": "validated yield"
}

Software

{
  "task": "build-regression / TS-184",
  "suite": "34 tests · 3 services",
  "stack": "TypeScript · Node",
  "ci": "reproduced",
  "reward": 0.91
}

RL env

{
  "env": "warehouse-routing / E-12",
  "actions": 64,
  "constraints": 8,
  "success_rate": 0.912,
  "episodes": 1200
}

Intelligence for the work that moves the world forward.

Expert-verified samples across STEM, software, healthcare, finance, RL, and knowledge work.

Healthcare

{
  "case": "discharge-plan / 7A",
  "guidelines": 4,
  "findings": 12,
  "safety": "complete",
  "reviewer": "attending MD"
}

Finance

{
  "task": "portfolio-stress / Q3",
  "holdings": 42,
  "factors": 6,
  "shortfall": "2.7%",
  "audit": "locked"
}

Knowledge

{
  "task": "policy-brief / research",
  "sources": 28,
  "claims": 16,
  "citations": "100%",
  "hallucination": "none"
}

Intelligence for the work that moves the world forward.

Expert-verified samples across STEM, software, healthcare, finance, RL, and knowledge work.

Stem task

{
  "environment": "STEM",
  "task": "selectivity-028",
  "system": "Pd catalyst screen",
  "conditions": "65°C · 12 h",
  "variables": 18,
  "result": "validated yield"
}

Healthcare

{
  "case": "discharge-plan / 7A",
  "guidelines": 4,
  "findings": 12,
  "safety": "complete",
  "reviewer": "attending MD"
}

Software

{
  "task": "build-regression / TS-184",
  "suite": "34 tests · 3 services",
  "stack": "TypeScript · Node",
  "ci": "reproduced",
  "reward": 0.91
}

Finance

{
  "task": "portfolio-stress / Q3",
  "holdings": 42,
  "factors": 6,
  "shortfall": "2.7%",
  "audit": "locked"
}

RL env

{
  "env": "warehouse-routing / E-12",
  "actions": 64,
  "constraints": 8,
  "success_rate": 0.912,
  "episodes": 1200
}

Knowledge

{
  "task": "policy-brief / research",
  "sources": 28,
  "claims": 16,
  "citations": "100%",
  "hallucination": "none"
}

Built with the people who do the work.

Tell us about your model, your domain, and your gap. We'll scope the dataset or the system.

Start a project →Meet the team