Feature/pgvector adapter#8
Merged
Merged
Conversation
VectorPin can now pin records in a pgvector-equipped Postgres table. This is the highest-leverage adapter to add: pgvector is the de-facto choice for teams that already operate Postgres and want to bolt embedding search onto an existing OLTP database, and a vector row is structurally indistinguishable from any other row to surrounding RBAC, backup, replication, and CDC machinery — meaning VectorPin's signed provenance is the only out-of-band integrity check available. src/vectorpin/adapters/pgvector.py (new) - PgVectorAdapter with the same shape as QdrantAdapter / LanceDBAdapter: iter_records, get, attach_pin, plus a classmethod .connect(dsn, ...). - iter_records uses a plain client cursor + fetchmany(batch_size) to bound memory without requiring an explicit transaction (autocommit mode forbids server-side DECLARE CURSOR). - TLS guard mirroring QdrantAdapter._enforce_tls: rejects non-loopback postgres DSNs without sslmode=require (or stronger), unless VECTORPIN_ALLOW_INSECURE_HTTP=1 is set. Postgres credentials live inside the DSN, so plaintext to a remote host leaks them. - Identifier validation on table_name / id_column / vector_column / pin_column: ^[A-Za-z_][A-Za-z0-9_]*$. Postgres has no parameterized form for identifiers; this is the only line of defense against '--table foo; DROP ...' shaped inputs. - Pin column accepts JSONB (decoded to dict, parsed via Pin.from_dict) or TEXT (str, parsed via Pin.from_json). Both routes go through the strict v2 schema validation already on main. src/vectorpin/adapters/__init__.py - Registers PgVectorAdapter in the lazy-import map and __all__. src/vectorpin/cli.py - New audit-pgvector subcommand mirroring audit-lancedb/audit-chroma shape: --dsn, --table, --public-key, --key-id, --id-column (default id), --vector-column (default embedding), --pin-column (default vectorpin), --batch-size. pyproject.toml - New optional extra: pgvector = ['psycopg[binary]>=3.1', 'pgvector>=0.3']. - Added to the 'all' extra. tests/test_adapter_pgvector.py (new, 22 tests) - 14 offline (no DB): TLS guard accepts loopback / sslmode=require, rejects remote plaintext, env-var escape hatch, keyword-form DSN pass-through; identifier validator accepts/rejects parametrized hostile inputs. - 8 live integration: iter_records, attach_pin + get roundtrip, full sign-attach-verify roundtrip under the v2 Verifier, KeyError on unknown id (get + attach_pin), loopback DSN doesn't trip TLS, bad table/column names rejected at connect. - Integration tests auto-discover the compose service via VECTORPIN_TEST_PGVECTOR_URL > PGVECTOR_URL > the compose-default DSN, and skip cleanly when no instance is reachable. All 22 pass against pgvector/pgvector:pg16 from VectorSmuggle's test_vector_dbs_docker/. Full repo suite: 148 pass, 1 skip (Pinecone needs cloud creds). ruff clean.
The existing tests/test_adapter_pinecone.py::test_pinecone_live_roundtrip requires a pre-populated index and a known record id to fetch — fine for repeat-CI use but unfriendly for a first-time check. This script is self-contained: it creates a fresh serverless index, seeds one record, runs the full sign-attach-verify roundtrip via PineconeAdapter, checks tamper rejection, and deletes the index on exit via try/finally so a failure cannot leak resources in the operator's account. Verified PASS against live Pinecone Serverless (AWS us-east-1, free- tier-eligible). Cost per run: well under one cent. Usage: export PINECONE_API_KEY=pcsk_xxx python scripts/pinecone_live_e2e.py Optional knobs documented in the module docstring: PINECONE_INDEX_NAME, PINECONE_NAMESPACE, PINECONE_CLOUD, PINECONE_REGION, PINECONE_READY_TIMEOUT. Tamper-rejection assertion uses VerifyError.SOURCE_MISMATCH (enum comparison) rather than the .value string form.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.