Skip to content

Feature/pgvector adapter#8

Merged
jaschadub merged 2 commits into
mainfrom
feature/pgvector-adapter
May 15, 2026
Merged

Feature/pgvector adapter#8
jaschadub merged 2 commits into
mainfrom
feature/pgvector-adapter

Conversation

@jaschadub
Copy link
Copy Markdown
Contributor

No description provided.

jaschadub added 2 commits May 15, 2026 11:11
VectorPin can now pin records in a pgvector-equipped Postgres table.
This is the highest-leverage adapter to add: pgvector is the de-facto
choice for teams that already operate Postgres and want to bolt
embedding search onto an existing OLTP database, and a vector row is
structurally indistinguishable from any other row to surrounding RBAC,
backup, replication, and CDC machinery — meaning VectorPin's signed
provenance is the only out-of-band integrity check available.

src/vectorpin/adapters/pgvector.py (new)
- PgVectorAdapter with the same shape as QdrantAdapter / LanceDBAdapter:
  iter_records, get, attach_pin, plus a classmethod .connect(dsn, ...).
- iter_records uses a plain client cursor + fetchmany(batch_size) to
  bound memory without requiring an explicit transaction (autocommit
  mode forbids server-side DECLARE CURSOR).
- TLS guard mirroring QdrantAdapter._enforce_tls: rejects non-loopback
  postgres DSNs without sslmode=require (or stronger), unless
  VECTORPIN_ALLOW_INSECURE_HTTP=1 is set. Postgres credentials live
  inside the DSN, so plaintext to a remote host leaks them.
- Identifier validation on table_name / id_column / vector_column /
  pin_column: ^[A-Za-z_][A-Za-z0-9_]*$. Postgres has no parameterized
  form for identifiers; this is the only line of defense against
  '--table foo; DROP ...' shaped inputs.
- Pin column accepts JSONB (decoded to dict, parsed via Pin.from_dict)
  or TEXT (str, parsed via Pin.from_json). Both routes go through the
  strict v2 schema validation already on main.

src/vectorpin/adapters/__init__.py
- Registers PgVectorAdapter in the lazy-import map and __all__.

src/vectorpin/cli.py
- New audit-pgvector subcommand mirroring audit-lancedb/audit-chroma
  shape: --dsn, --table, --public-key, --key-id, --id-column (default
  id), --vector-column (default embedding), --pin-column (default
  vectorpin), --batch-size.

pyproject.toml
- New optional extra: pgvector = ['psycopg[binary]>=3.1', 'pgvector>=0.3'].
- Added to the 'all' extra.

tests/test_adapter_pgvector.py (new, 22 tests)
- 14 offline (no DB): TLS guard accepts loopback / sslmode=require,
  rejects remote plaintext, env-var escape hatch, keyword-form DSN
  pass-through; identifier validator accepts/rejects parametrized
  hostile inputs.
- 8 live integration: iter_records, attach_pin + get roundtrip, full
  sign-attach-verify roundtrip under the v2 Verifier, KeyError on
  unknown id (get + attach_pin), loopback DSN doesn't trip TLS, bad
  table/column names rejected at connect.
- Integration tests auto-discover the compose service via
  VECTORPIN_TEST_PGVECTOR_URL > PGVECTOR_URL > the compose-default
  DSN, and skip cleanly when no instance is reachable.

All 22 pass against pgvector/pgvector:pg16 from VectorSmuggle's
test_vector_dbs_docker/. Full repo suite: 148 pass, 1 skip (Pinecone
needs cloud creds). ruff clean.
The existing tests/test_adapter_pinecone.py::test_pinecone_live_roundtrip
requires a pre-populated index and a known record id to fetch — fine
for repeat-CI use but unfriendly for a first-time check. This script
is self-contained: it creates a fresh serverless index, seeds one
record, runs the full sign-attach-verify roundtrip via PineconeAdapter,
checks tamper rejection, and deletes the index on exit via try/finally
so a failure cannot leak resources in the operator's account.

Verified PASS against live Pinecone Serverless (AWS us-east-1, free-
tier-eligible). Cost per run: well under one cent.

Usage:
  export PINECONE_API_KEY=pcsk_xxx
  python scripts/pinecone_live_e2e.py

Optional knobs documented in the module docstring: PINECONE_INDEX_NAME,
PINECONE_NAMESPACE, PINECONE_CLOUD, PINECONE_REGION, PINECONE_READY_TIMEOUT.

Tamper-rejection assertion uses VerifyError.SOURCE_MISMATCH (enum
comparison) rather than the .value string form.
@jaschadub jaschadub merged commit 1cebabe into main May 15, 2026
5 checks passed
@jaschadub jaschadub deleted the feature/pgvector-adapter branch May 15, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant