Skip to content

Processor durability and processor parallelism #144

Open
novatechflow wants to merge 2 commits into
KafScale:mainfrom
novatechflow:processor-durability
Open

Processor durability and processor parallelism #144
novatechflow wants to merge 2 commits into
KafScale:mainfrom
novatechflow:processor-durability

Conversation

@novatechflow
Copy link
Copy Markdown
Collaborator

Summary

This implements #113 by removing an unnecessary .index download from the processor decode path instead of parallelizing two S3 reads where only one was actually used.

It also fixes a PITR bug by making restore truncate the final segment at the requested cutoff and regenerate a matching .index, rather than only restoring whole segments.

While tracing that path, I also tightened PITR/restore behavior in storage so the final recovered segment can be truncated at the requested cutoff and a matching .index is regenerated.

Changes

  • stop downloading .index in the SQL and Iceberg processor decoders
  • keep .kfs/.index pairing validation in discovery
  • add regression coverage for the decoder behavior
  • add exact segment truncation for PITR/restore
  • rebuild the final restored .index after truncation
  • add recovery and CLI regression coverage with real batch fixtures

Testing

  • go test ./pkg/storage ./cmd/kafscale-cli -count=1 -timeout=120s
  • go test ./... -count=1 -timeout=120s in addons/processors/sql-processor
  • go test ./... -count=1 -timeout=120s in addons/processors/iceberg-processor

Checklist

  • Added/updated unit tests for new logic
  • Added/updated e2e coverage for bug fixes
  • Added license headers to new files

@novatechflow novatechflow force-pushed the processor-durability branch from 52803fe to 9e49edc Compare May 18, 2026 06:23
@novatechflow novatechflow self-assigned this May 18, 2026
@novatechflow novatechflow requested a review from klaudworks May 18, 2026 06:26
@novatechflow novatechflow linked an issue May 18, 2026 that may be closed by this pull request
@novatechflow novatechflow requested a review from kamir May 18, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: parallelize S3 .index + .kfs downloads in processors

1 participant