Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,169 @@ All notable changes to VectorPin will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.0-rc.1] — 2026-05-14

Release candidate for 0.2.0. **This is a wire-format break.** Pins
produced by 0.1.x do not verify under the default 0.2.0 verifier;
a `LegacyV1Verifier` is shipped in all three languages as an opt-in
migration aid. The break is the response to a security audit
(2026-05) that identified four cross-implementation issues. See
[`docs/spec.md` §12](docs/spec.md#12-changes-from-v1) for the full
v1 → v2 change list.

### Protocol — wire-format v2

- Protocol version field bumped to `v: 2`. Strict v2 verifiers reject
v1 pins.
- **`v` and `kid` are now signed.** Both are part of the canonical
payload, defeating downgrade attacks and cross-key swap attacks.
- **Domain separator.** Signed bytes are now
`b"vectorpin/v2\x00" || canonical_json(header)` (13-byte tag),
preventing cross-protocol signature reuse with any sister Trust-Stack
protocol.
- **NaN/Inf rejection at sign time.** `+0.0` and `-0.0` remain distinct.
- **NFC normalization mandatory** on every string-typed field
(`model`, `kid`, `ts`, every `extra` key, every `extra` value).
Control characters U+0000–U+001F and bidi overrides U+202A–U+202E /
U+2066–U+2069 are rejected.
- **`extra` is strictly `map<string, string>`.** Non-string values
cause `PARSE_ERROR`.
- **Strict timestamp format.** `ts` must match exactly
`^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$`. No
fractional seconds, no offset variants.
- **Unknown top-level fields rejected** at parse time.
- **Size limits enforced** (`docs/spec.md` §4.3): pin JSON ≤ 64 KiB,
≤ 32 `extra` entries, key ≤ 128 B, value ≤ 1 KiB, `vec_dim` ≤ 2^20,
decoded `sig` length exactly 64.

### Verification — replay protection and revocation

- New `KeyEntry` registry shape with optional
`(valid_from, valid_until)` window. Pins whose `ts` falls outside
the window return `KEY_EXPIRED` — separates rotation from
compromise-driven revocation while preserving historical pin
verifiability.
- Replay-protection check: callers may supply
`expected_record_id` / `expected_collection_id` / `expected_tenant_id`,
verified against the reserved `vectorpin.*` keys in `extra`. Returns
`RECORD_MISMATCH` / `COLLECTION_MISMATCH` / `TENANT_MISMATCH` on
divergence (spec §5 step 8).
- Spec failure-mode taxonomy expanded to include `KEY_EXPIRED`,
`PARSE_ERROR`, the three `*_MISMATCH` codes, and `UNSUPPORTED_DTYPE`.

### Implementations

All three reference implementations produce byte-for-byte identical
canonical bytes and Ed25519 signatures from the same deterministic
seed (verified by `testvectors/v2.json` and the per-language
cross-language test).

#### Python

- `PROTOCOL_VERSION = 2`, `DOMAIN_TAG = b"vectorpin/v2\x00"` exported
from `vectorpin.attestation`.
- `Pin.from_*` strict schema: 64 KiB cap, type/regex/length checks on
every field, `vec_dtype` allowlist, sig length 64 enforced.
- `Verifier` (strict v2) and `LegacyV1Verifier` (opt-in v1+v2).
- `Verifier.verify(..., expected_record_id=..., expected_collection_id=..., expected_tenant_id=...)`
enforces replay-protection bindings.
- `KeyEntry` carries `(valid_from, valid_until)`; `KEY_EXPIRED` fires
per §7.

#### Rust

- `pub const DOMAIN_TAG: &[u8] = b"vectorpin/v2\x00"`,
`pub const PROTOCOL_VERSION: u32 = 2` exported.
- New `VerifyError` variants: `KeyExpired`, `ParseError(String)`,
`RecordMismatch`, `CollectionMismatch`, `TenantMismatch`,
`UnsupportedDtype(String)`.
- `VerifyOptions` builder carries replay-protection expected values.
- `LegacyV1Verifier` opt-in.

#### TypeScript

- Async signing/verifying API throughout (`signAsync` / `verifyAsync`).
Drops the globally-mutable `ed25519.etc.sha512Sync` hook.
- `Signer.fromPrivateBytes` makes a defensive copy of the seed.
`Signer.wipe()` zeros it.
- Pinned exact crypto deps: `@noble/ed25519@2.3.0`,
`@noble/hashes@1.8.0`.
- Prototype-pollution guards in `pinFromDict`; strict base64url
alphabet enforced before signature decode.

### Hardening — implementation surface

Beyond the wire-format break, the audit-driven hardening also closes
implementation-level findings:

- **Python CLI**: `vectorpin keygen` now writes the private seed with
mode `0o600` via `O_EXCL` (no umask reliance, refuses to clobber an
existing key); parent directory created with mode `0o700`. The
public key is explicitly set to `0o644`.
- **Python adapters**: LanceDB validates `id_column` / `vector_column`
/ `pin_column` against an identifier regex and rejects `record_id`
containing NUL, newline, or backslash. Qdrant and Pinecone refuse
an `api_key` over `http://` for non-loopback hosts unless
`VECTORPIN_ALLOW_INSECURE_HTTP=1` is set.
- **Python audit loop**: a single malformed pin in
`audit-{lancedb,chroma,qdrant}` no longer aborts the run; bad rows
are surfaced as `parse_error` and the audit continues.
- **Python `Signer.from_pem`**: requires explicit `password=...` or
`allow_unencrypted=True` to load an unencrypted PEM. Default
behavior refuses.
- **Python dependency bounds**: `cryptography>=42,<46`,
`numpy>=1.26,<3` in `pyproject.toml`.
- **Rust**: `#![forbid(unsafe_code)]` on the crate.
`Signer::generate` returns `Result<Self, SignerError::EmptyKeyId>`.
`Signer::private_key_bytes` returns `Zeroizing<[u8; 32]>`.
`vec_dim` cast via `u32::try_from` on signer + verifier sides.
`Verifier::add_key` returns `Result<(), VerifyError::KeyDecodeFailed>`.
`zeroize = "1"` added as a direct dep.
- **TypeScript**: switched to async signing/verifying API
(`signAsync` / `verifyAsync`), dropping the globally-mutable
`ed25519.etc.sha512Sync` hook. `Signer.fromPrivateBytes` makes a
defensive copy. New `Signer.wipe()` zeros the seed. Module-load
assertion that `crypto.getRandomValues` is available. Prototype-
pollution guards in `pinFromDict`. Sanitized error detail strings
(strip control chars, truncate). `@noble/ed25519@2.3.0` and
`@noble/hashes@1.8.0` pinned to exact versions.

### Test vectors

- `testvectors/v2.json` — 4 positive fixtures covering f32, f64,
`model_hash`, and `extra` with `vectorpin.record_id`. Each carries
`expected_canonical_bytes_b64` for cross-language equality assertion.
- `testvectors/negative_v2.json` — 17 fixtures exercising every
failure mode in spec §5: tampered vector, tampered source, wrong
model, wrong `v`, wrong `kid`, bit-flipped sig, wrong sig length,
unknown top-level field, non-string `extra` value, NaN in vector,
NFD source, fractional-seconds `ts`, offset `ts`, lowercase `t`/`z`
`ts`, record_id mismatch, oversize JSON.
- `testvectors/v1.json` and `testvectors/negative_v1.json` retained
for `LegacyV1Verifier` coverage.

### Migration

Existing v1 pins do not verify under the strict default v2 verifier
in any language. To migrate a corpus:

1. Read each pin with `LegacyV1Verifier` (opt-in flag /
constructor / class).
2. Re-sign with the v2 `Signer`, which writes `v: 2` and the new
canonical bytes.
3. Write the re-signed pin back to the vector store.

Plain re-pinning preserves the bound `(source, vector, model)` triple
while replacing the now-deprecated v1 signature.

### Documentation

- New Zensical-rendered documentation site (`docs/`, `zensical.toml`):
index, getting-started, pin-protocol, CLI guide, adapters, detectors,
deployment, security, troubleshooting. The normative protocol
reference remains `docs/spec.md`. Published at
`https://docs.vectorpin.org/` via GitHub Pages.

## [0.1.1] — 2026-05-07

Patch release. No protocol changes; pins produced by 0.1.0 verify on
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ abstract: >-
post-embedding modification breaks signature verification on read. Reference
implementations in Python, Rust, and TypeScript are byte-for-byte compatible,
locked together by shared test vectors. Part of the ThirdKey Trust Stack.
version: "0.1.1"
date-released: 2026-05-07
version: "0.2.0-rc.1"
date-released: 2026-05-14
keywords:
- vector database
- embedding store
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "vectorpin"
version = "0.1.1"
version = "0.2.0rc1"
description = "Verifiable integrity for AI embedding stores."
readme = "README.md"
requires-python = ">=3.11"
Expand Down
2 changes: 1 addition & 1 deletion rust/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ resolver = "2"
members = ["vectorpin"]

[workspace.package]
version = "0.1.1"
version = "0.2.0-rc.1"
edition = "2021"
rust-version = "1.75"
license = "Apache-2.0"
Expand Down
2 changes: 1 addition & 1 deletion typescript/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "vectorpin",
"version": "0.1.1",
"version": "0.2.0-rc.1",
"description": "Verifiable integrity for AI embedding stores. TypeScript reference implementation.",
"license": "Apache-2.0",
"author": "Jascha Wanger / ThirdKey.ai",
Expand Down
Loading