Skip to content

[SEA-NodeJS] (8/9) Cumulative M0 integration — all features merged + INTERVAL fix + 25/25 datatype parity#385

Draft
msrathore-db wants to merge 35 commits into
mainfrom
msrathore/sea-integration
Draft

[SEA-NodeJS] (8/9) Cumulative M0 integration — all features merged + INTERVAL fix + 25/25 datatype parity#385
msrathore-db wants to merge 35 commits into
mainfrom
msrathore/sea-integration

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

Summary

Cumulative M0 anchor. Contains all of PRs 1, 3–7 plus:

  • `SeaArrowIpcDurationFix` — JS-side IPC FlatBuffer rewrite for INTERVAL YEAR-MONTH and DAY-TIME (kernel emits Arrow Duration; JS converts to thrift-format `"Y-M"` and `"D HH:mm:ss.fffffffff"`)
  • 25/25 datatype parity (BOOLEAN through nested STRUCT) byte-identical vs thrift on pecotesting
  • linux-x64-gnu prebuilt native artifact

Stack position

PR 8/9. This is the cumulative M0 deliverable — reviewers wanting to see "what does M0 actually do" should focus here.

Note on git topology: this branch was developed in parallel-then-merged, so its history isn't a clean linear ancestor of PRs 2/9 (napi-binding consumer). The napi loader / build script work was re-imported here as a separate concern.

Verification

  • 25/25 datatype parity tests pass on `sea-integration@4d80267` vs thrift
  • M0 YAML spec at `~/databricks-driver-test/specs/m0-end-to-end.yaml` (34 test cases) passing
  • CloudFetch parity gate clean at 23.2 MB/s

Creates the napi-rs binding skeleton: Cargo.toml + lib.rs + module
stubs for database/connection/statement/result/error/logger. Captures
napi-rs tokio Handle via OnceCell in runtime.rs. Single working
#[napi] fn version() proves the binding loads + executes end-to-end
in Node.

Depends on krn-async-public-api branch (path dep on kernel).

Round 2 will add open/execute/fetch methods.
…kend

Refactors DBSQLClient/Session/Operation to dispatch through three
backend interfaces. ThriftBackend (lib/thrift-backend/) contains the
relocated existing thrift logic. SeaBackend (lib/sea/) is a stub for
M0; the sea-napi-binding feature wires the real impl.

Public surface (lib/index.ts) unchanged.
No new dependencies. All existing tests pass.

Files:
- lib/contracts/IBackend.ts (new)
- lib/contracts/ISessionBackend.ts (new)
- lib/contracts/IOperationBackend.ts (new)
- lib/contracts/IDBSQLClient.ts (adds useSEA?: boolean to ConnectionOptions)
- lib/thrift-backend/ThriftBackend.ts (new)
- lib/thrift-backend/ThriftSessionBackend.ts (new)
- lib/thrift-backend/ThriftOperationBackend.ts (new)
- lib/sea/SeaBackend.ts (new, M0 stub)
- lib/DBSQLClient.ts (dispatch through IBackend; useSEA picks SeaBackend)
- lib/DBSQLSession.ts (facade over ISessionBackend; staging stays here)
- lib/DBSQLOperation.ts (facade over IOperationBackend; iterators/fetchAll stay here)
- tests/unit/DBSQLClient.test.ts (retarget internal state lookup through backend; pre-seed client.backend in tests that bypass connect())
- tests/unit/DBSQLOperation.test.ts (retarget internal state lookup through backend)
Single mapping function in lib/sea/SeaErrorMapping.ts converts the
napi-binding's surfaced kernel error (code+message+sqlstate) to the
appropriate existing JS error class. M0 minimum: PAT auth errors
land as AuthenticationError; cancel/timeout as OperationStateError;
network/internal as HiveDriverError. SQLSTATE preserved on the
error object via .sqlState property.

No new error classes. M1 may add nuance.
…wired

Adds real async methods on the opaque wrappers backing M0:
- openSession (free function) with PAT → kernel Session
- Connection::execute_statement → kernel ExecutedStatement
- Statement::fetch_next_batch / schema / cancel / close → kernel ResultStream
- Arrow batches returned as IPC bytes (per Layer 2 design)
- Error mapping preserves kernel ErrorCode + SQLSTATE for TS layer
- All entry points wrapped in catch_unwind

End-to-end smoke test against pecotesting passes.
No new dependencies beyond arrow-{ipc,array,schema} + futures.
Uses kernel async public API (no block_on).

Co-authored-by: Isaac
…indings

Round 1 scaffold declared tracing + tracing-subscriber as deps but
never used them. Removed. Logger bridge will re-add in round 3.

Other findings from 6b3affd-2026-05-15.md reviewed:
- Finding 2 (Database::Drop unreachable in Round 1b) — obsoleted by
  Round 2 (40d0b57): database.rs no longer declares a Database struct
  or Drop impl; it is now an `open_session` free function.
- Finding 3 (empty Connection::Drop) — obsoleted by Round 2: the Drop
  impl now spawns a real fire-and-forget close on the captured tokio
  handle.

Co-authored-by: Isaac
…nline type

Addresses code-bloat-watchdog findings from commit 0085928:
- Restores public-API JSDoc on DBSQLSession + DBSQLOperation methods
  (was deleted as scope creep; contracts unchanged so docs still apply)
- Adds makeStubbedClient() helper to tests/unit/DBSQLClient.test.ts;
  replaces 14× duplicated ThriftBackend pre-seed
- Imports WaitUntilReadyOptions instead of inline option types in
  IOperationBackend + DBSQLOperation.waitUntilReady
SeaBackend.connect() now wires PAT options to the napi binding's
openSession(). Non-PAT modes rejected with clear M0-scope error
(OAuth/Azure/Federation land in M1). E2E test against pecotesting
confirms PAT round-trips: connect → openSession → close all clean.

No new dependencies. SeaAuth helper is ~30 LOC.
Implements IOperationBackend over the napi binding's Statement.
fetchChunk decodes Arrow IPC bytes → apache-arrow RecordBatch →
ArrowResultConverter (Phase 1+2 reused unchanged) → JS rows.

All M0 datatypes round-trip via the same converter the thrift path
uses (BOOL, INT8/16/32/64, FLOAT, DOUBLE, DECIMAL, STRING, BINARY,
DATE, TIMESTAMP, INTERVAL, ARRAY, MAP, STRUCT). Unit tests
construct synthetic IPC batches; e2e test against pecotesting
confirms byte-identical parity vs thrift.

No new dependencies. ArrowResultConverter / ResultSlicer /
OperationIterator all reused unchanged (DRY).
Wraps napi Statement.cancel/close. finished() is a no-op for M0
(kernel Statement::execute.await blocks until complete; no polling
needed). cancel mid-fetch propagates within 200ms via kernel's
async cancellation token.

Implementation:
- lib/sea/SeaOperationLifecycle.ts — standalone helpers (seaCancel,
  seaClose, seaFinished, failIfNotActive) over a structurally-typed
  SeaStatementHandle so impl-results can pick them up cleanly.
- lib/sea/SeaOperationBackend.ts — IOperationBackend impl that
  composes the lifecycle helpers; fetch* methods are stubbed and
  owned by the parallel sea-results branch.

Tests:
- 27 unit tests (lifecycle helpers + backend integration)
- 4 e2e tests against pecotesting — cancel latency 64-80ms,
  cancel-mid-fetch throws OperationStateError(Canceled), close
  idempotent, finished() resolves <50ms.

Includes a binding-side fix in native/sea/src/{statement,connection}.rs
to keep the kernel's parent Statement alive alongside the
ExecutedStatement. Without this fix, Statement::Drop invalidates
the produced ExecutedStatement via the kernel's ValidityFlag and
every cancel/close on the resulting JS Statement throws
InvalidStatementHandle. Required because the operation feature's
200ms cancel acceptance is unreachable otherwise.
SeaSessionBackend wraps the napi Connection handle. executeStatement
passes through to napi.executeStatement and returns an
IOperationBackend (SeaOperationBackend in sea-results feature).
Session config + initialCatalog/initialSchema flow to napi
openSession. M0 stops at executeStatement; metadata methods +
per-stmt overrides defer to M1.

No new dependencies. Reuses existing ConnectionOptions / Session
config shapes.

Co-authored-by: Isaac
Resolved conflict in lib/sea/SeaBackend.ts by combining:
- auth: buildSeaConnectionOptions helper from SeaAuth (validates PAT,
  prepends slash, throws AuthenticationError on missing token,
  HiveDriverError on non-PAT authType naming M1 modes)
- execution: SeaBackendOptions {context, nativeBinding} constructor,
  rethrowKernelError mapping, SeaSessionBackend wrapping with
  initialCatalog/initialSchema/sessionConfig defaults

Trimmed tests/unit/sea/auth-pat.test.ts to keep only the
buildSeaConnectionOptions direct-function tests; the
SeaBackend.connect/openSession round-trip tests are covered by
tests/unit/sea/execution.test.ts which uses the new constructor shape.
Resolved conflicts in three files:

- lib/sea/SeaBackend.ts — kept the HEAD version (sea-auth +
  sea-execution lazy-open per session). Rejected sea-results' eager
  openSession-during-connect: it shares one Connection across
  multiple SeaSessionBackends, which would mean any session.close()
  closes the shared kernel Session for all other sessions.

- lib/sea/SeaSessionBackend.ts — kept HEAD shape (SeaSessionDefaults
  bag + UUIDv4 id + closed-flag idempotency + M1 deferred-error
  messages on metadata methods). Borrowed sea-results' useCloudFetch
  → sessionConfig.use_cloud_fetch String() projection.

- lib/sea/SeaOperationBackend.ts — took sea-results' version
  wholesale (the parity-gate-passing implementation with
  fetchChunk/hasMore/getResultMetadata wired through
  SeaResultsProvider + ArrowResultConverter + ResultSlicer).
  Harmonized the local SeaStatementNative reference into the
  SeaNativeStatement type from SeaNativeLoader, and layered in
  the cancel/closed idempotency flags (flag-set-before-await) so
  cancel mid-fetch propagates without a wire round-trip after the
  first call. Pulled in sea-results' SeaArrowIpc.ts +
  SeaResultsProvider.ts files unchanged.
Resolved conflict in lib/sea/SeaOperationBackend.ts by combining:
- sea-results' fetch pipeline (fetchChunk/hasMore/getResultMetadata
  via SeaResultsProvider + ArrowResultConverter + ResultSlicer)
- sea-operation's lifecycle helpers (cancel/close/waitUntilReady
  delegate to seaCancel/seaClose/seaFinished in
  SeaOperationLifecycle.ts; cancel-mid-fetch propagation via
  failIfNotActive at the top of every fetch path)

Type harmonisation: `statement` parameter typed as
SeaOperationStatement = SeaStatementHandle & Partial<SeaNativeStatement>
so that:
- lifecycle-only test stubs (which implement only cancel/close)
  compile without compromise
- the real napi Statement (which implements both surfaces) flows
  through unchanged
- fetch methods check `statement.fetchNextBatch`/`schema` is
  defined and throw a descriptive error if a lifecycle-only stub
  reaches the fetch path

Picked up clean:
- lib/sea/SeaOperationLifecycle.ts (cancel/close/finished helpers
  with idempotency, flag-set-before-await ordering, kernel-error
  mapping, OperationStateError on fetch-after-cancel)
- native/sea/src/statement.rs (StatementInner { _parent, executed }
  fix — keeps parent kernel Statement alive past cancel so the
  ValidityFlag stays set and cancel/close don't return
  InvalidStatementHandle). REQUIRED for cancel <200ms e2e.
- native/sea/src/connection.rs (pass `stmt` + `executed` into
  Statement::from_executed)
Two assertions in tests/unit/sea/execution.test.ts were specific to
the pre-merge SeaBackend / SeaOperationBackend stubs:

1. connect() missing-token rejection now flows through
   SeaAuth.buildSeaConnectionOptions which throws AuthenticationError
   (still a HiveDriverError subclass) with message "non-empty PAT".
   Updated the regex match accordingly.

2. fetchChunk() is no longer a stub — the merged
   SeaOperationBackend uses the sea-results pipeline
   (SeaResultsProvider + ArrowResultConverter + ResultSlicer). The
   "throws M1-deferred error owned by sea-results" test is now
   incorrect by design; removed it with a pointer comment to the
   real coverage in SeaOperationBackend.test.ts and
   results-e2e.test.ts.

891/891 unit tests passing post-merge.
- YEAR-MONTH: convert Arrow Interval[YearMonth] to thrift "N-M" string
  format (with leading "-" for negatives) in Phase 1 of converter
- DAY-TIME: pre-process IPC schema bytes before apache-arrow@13 decode
  (which predates the Arrow Duration type id 18) to remap Duration ->
  Int64 with original time unit preserved in `databricks.arrow.duration_unit`
  field metadata; convert Int64 duration values to thrift
  "D HH:mm:ss.fffffffff" string format

Both interval flavours are formatted by the same converter helper
(formatDayTimeFromTotal); the duration_unit metadata gates between the
native Arrow Interval Int32Array path and the rewritten Duration Int64
path. No apache-arrow bump, no node_modules edits, no kernel-side change.

New: lib/sea/SeaArrowIpcDurationFix.ts (FlatBuffer rewriter using
apache-arrow's internal fb/* accessors).

M0 datatype parity now 25/25.
Per D-006 architectural decision (Python team's workspace pattern):
all language bindings (PyO3, napi-rs) now live as workspace siblings
in the kernel repo at databricks-sql-kernel/{pyo3,napi}/.

What this commit removes from the nodejs repo:
- native/sea/Cargo.toml (path dep relocated; package now at
  databricks-sql-kernel/napi/Cargo.toml with path = "..")
- native/sea/build.rs
- native/sea/src/* (lib, runtime, database, connection, statement,
  result, error, logger, util — all 9 files)
- native/sea/package.json (the @databricks/sea-native-linux-x64-gnu
  sub-package moves to the kernel workspace too)
- native/sea/index.js (regenerated artifact)

What stays in nodejs:
- native/sea/index.d.ts — TS declarations consumed by lib/sea/ adapter
- native/sea/README.md (new) — explains the move; points readers at
  databricks-sql-kernel/napi/

What's updated:
- package.json: `build:native` and `build:native:debug` scripts now
  delegate to the kernel workspace via $DATABRICKS_SQL_KERNEL_REPO
  (defaults to ../../databricks-sql-kernel-sea-WT/napi-binding for the
  local dev worktree layout). Build copies index.node + index.d.ts
  back into native/sea/ for the loader to find.

Why workspace co-location:
- Arrow version pinning lockstep — no silent IPC version drift
- path = ".." (clean) vs ../../../../databricks-sql-kernel-sea-WT/...
- Single CI: cargo build --workspace covers kernel + pyo3 + napi
- Kernel API changes that break either binding caught at PR-review time
- Future cgo binding for Go SEA slots in as another workspace member

This branch (sea-napi-binding) is now a thin consumer of the kernel
napi crate. The actual Rust code lives at krn-napi-binding HEAD on
the kernel repo (commit debe3d7).
Resolves the modify/delete conflict from merging sea-napi-binding:
sea-integration had the StatementInner ValidityFlag fix in
native/sea/src/statement.rs (inherited via sea-operation merge), while
sea-napi-binding wants those files deleted because the Rust source
now lives in the kernel workspace at databricks-sql-kernel/napi/.

Resolution: propagate the StatementInner fix to the kernel napi crate
(commit a8d8df7 on krn-napi-binding), then accept the deletion here.
Built artifacts (.node, .d.ts) are repopulated by
`npm run build:native` which delegates to the kernel workspace.

native/sea/ now contains only README.md and the generated index.d.ts;
index.linux-x64-gnu.node sits next to it as a build artifact.
@github-actions
Copy link
Copy Markdown

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant