Research & development

We build systems and protocols that help people verify AI claims and reduce harm from overconfidence, manipulation, and unsafe deployment.

Evaluation harnesses

Repeatable tests for reliability, robustness, and generalization. The output is a report, not a vibe.

We treat governance as an engineering discipline: define gates, log decisions, and document incidents.

A successful result is a public artifact: protocol + results + failure cases + a clear explanation of what was learned.

We will publish failures.

If a system fails a safety gate, that’s data. Hiding it is how ecosystems get corrupted.