Security Policy

Research context

Failure-First is a defensive AI-safety research project that studies how AI systems fail under adversarial pressure. This public repository contains pattern-level findings and methodology descriptions only. Operational testing infrastructure, adversarial datasets, and full evaluation traces live in a private repository governed by the same Design Charter.

Coordinated vulnerability disclosure

We practice coordinated vulnerability disclosure (CVD) for AI-safety vulnerabilities discovered through this research.

Current status

We have submitted 10 responsible disclosures to model providers (Nvidia, Alibaba, Zhipu, Google/Gemma, Mistral, and others) covering two vulnerability classes: context-collapse attacks and transcription-loophole injection. Initial notifications were sent 2026-04-07. Public discussion follows standard CVD practice — affected parties get a reasonable remediation window before any specifics surface.

Our CVD process

Discovery — pattern identified through systematic evaluation
Verification — finding confirmed across multiple test conditions with statistical controls
Private notification — affected provider contacted via their security reporting channel
Remediation window — minimum 90 days before public discussion of specifics
Public disclosure — pattern-level description only; never operational detail

Research ethics anchor

Disclosure decisions are constrained by charter §3.1, §3.2, §3.6, and §9:

Findings serve the defensive research mission
Operational details are never published
Affected parties are notified before any public discussion
Pattern-level descriptions enable defensive improvements without enabling attacks

Reporting security concerns

For this repository or failurefirst.org

If you find an issue with this repository or the site (exposed credentials, vulnerable dependencies, web-platform issues):

Non-sensitive — open a GitHub issue
Sensitive — email research@failurefirst.org
Private channel — use the GitHub Security tab to file a Security Advisory

Please include:

Affected URL, file, or commit SHA
Reproduction steps (or a minimal PoC)
Impact assessment from your point of view
Any disclosure timeline you would like us to honour

For AI-safety vulnerabilities discovered elsewhere

If you discover vulnerabilities in AI systems through independent research and want to coordinate disclosure:

Do

Follow responsible disclosure
Report to affected vendors before public disclosure
Document findings at pattern-level for academic discussion
Open a GitHub issue if you want to coordinate with us

Do not

Post operational exploits in public issues
Share working bypass techniques without vendor notification
Weaponize research findings

Scope

In scope

Security issues with this GitHub repository or failurefirst.org
Vulnerabilities in public documentation or site infrastructure
Dependency security issues
Collaboration on coordinated disclosure of AI-safety vulnerabilities

Out of scope

Vulnerabilities in third-party AI systems — report directly to the vendor
Requests for operational exploit code or adversarial datasets
Requests for model-specific jailbreak techniques
Best-practice recommendations without a concrete finding (we appreciate them, but they are not security reports)

Response timeline

Stage	Target
Acknowledgement	Within 3 business days
Initial assessment	Within 7 business days
Resolution	Depends on severity and complexity

If you have not received an acknowledgement after 3 business days, please re-send to research@failurefirst.org with [SECURITY] in the subject line.

Research ethics

This project operates within established AI-safety research norms. A full research-ethics charter is maintained in the private repository; the public-facing summary is in DESIGN_CHARTER.md, particularly §9 (Research Ethics Boundaries).

Contact

Non-sensitive — open a GitHub issue
Sensitive disclosures — research@failurefirst.org
CVD coordination — open a GitHub issue with institutional affiliation

Last updated: 2026-05-16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

SECURITY.md

Security Policy

Contents

Research context

Coordinated vulnerability disclosure

Current status

Our CVD process

Research ethics anchor

Reporting security concerns

For this repository or failurefirst.org

For AI-safety vulnerabilities discovered elsewhere

Scope

Response timeline

Research ethics

Contact

There aren't any published security advisories

Security: failurefirst/failure-first

Security

SECURITY.md

Security Policy

Contents

Research context

Coordinated vulnerability disclosure

Current status

Our CVD process

Research ethics anchor

Reporting security concerns

For this repository or failurefirst.org

For AI-safety vulnerabilities discovered elsewhere

Scope

Response timeline

Research ethics

Contact

There aren't any published security advisories