Transparency & accountability

We publish how we evaluate, what we measure, what failed, and what changed. This page is the “receipts” hub.

Evaluation protocol

Every serious claim should come with: protocol, metrics, and failure modes.

  • Defined scope (what the claim is and isn’t)
  • Measurable success criteria
  • Adversarial and stress testing
  • Reproducible runs and versioned results

Audit & incident notes

We log material changes and publish incident-style writeups when something breaks a gate.

  • Changelog for protocol updates
  • Safety gate failures and mitigations
  • Postmortems for regressions
  • Third-party review where feasible

Accountability index

Our accountability framework is organized into transparency pillars, each with verifiable artifacts or “0+ / coming soon” status.

  • Governance & oversight structures: 0+ (coming soon)
  • Audited financial statements & filings: 0+ (coming soon)
  • Independent evaluations & research reviews: 0+ (coming soon)
  • Safeguarding & misconduct reporting: 0+ (coming soon)
  • Engineering transparency & performance metrics: 0+ (coming soon)

Key links

Transparency categories

We track program, research, and governance transparency in explicit categories.

  • Audited financials & filings: 0+ (coming soon)
  • Independent evaluation & reviews: 0+ (coming soon)
  • Safeguarding & misconduct reporting: 0+ (coming soon)
  • Security & privacy posture: 0+ (coming soon)
  • Engineering receipts & performance audits: 0+ (coming soon)

How to read our claims

We use cautious language and try to keep claims falsifiable. Here’s the translation:

  • “We observed…” = measured result under a documented protocol
  • “We believe…” = hypothesis, not established fact
  • “We verified…” = protocol + reproducible artifacts exist
Principle: If it can’t be checked, it shouldn’t be marketed as proven.