Transparency & accountability

We publish how we evaluate, what we measure, what failed, and what changed. This page is the “receipts” hub.

Evaluation protocol

Every serious claim should come with: protocol, metrics, and failure modes.

Defined scope (what the claim is and isn’t)
Measurable success criteria
Adversarial and stress testing
Reproducible runs and versioned results

Audit & incident notes

We log material changes and publish incident-style writeups when something breaks a gate.

Changelog for protocol updates
Safety gate failures and mitigations
Postmortems for regressions
Third-party review where feasible

Accountability index

Our accountability framework is organized into transparency pillars, each with verifiable artifacts or “0+ / coming soon” status.

Governance & oversight structures: 0+ (coming soon)
Audited financial statements & filings: 0+ (coming soon)
Independent evaluations & research reviews: 0+ (coming soon)
Safeguarding & misconduct reporting: 0+ (coming soon)
Engineering transparency & performance metrics: 0+ (coming soon)

Key links

Receipts register — 0+ (growing)
Privacy statement — published
Accessibility statement — published
Audited financials — 0+ (coming soon)
Evaluations / postmortems — 0+ (coming soon)
Safeguarding / report wrongdoing — 0+ (coming soon)
Report a security issue — published
Change log — 0+ (coming soon)

Transparency categories

We track program, research, and governance transparency in explicit categories.

Audited financials & filings: 0+ (coming soon)
Independent evaluation & reviews: 0+ (coming soon)
Safeguarding & misconduct reporting: 0+ (coming soon)
Security & privacy posture: 0+ (coming soon)
Engineering receipts & performance audits: 0+ (coming soon)

How to read our claims

We use cautious language and try to keep claims falsifiable. Here’s the translation:

“We observed…” = measured result under a documented protocol
“We believe…” = hypothesis, not established fact
“We verified…” = protocol + reproducible artifacts exist

Principle: If it can’t be checked, it shouldn’t be marketed as proven.