ci: add OCI smoke gate workflow#2156
Draft
dliappis wants to merge 1 commit into
Draft
Conversation
Builds the AMI as an OCI image via supabox's support/ami/Dockerfile,
brings up the supabox platform stack, and runs dctest's supadev-smoke
spec as a fast pre-flight check before any EC2/testinfra work.
Triggers on pull_request paths that affect the AMI build
(ansible/, nix/, migrations/, flake.{nix,lock}, Dockerfile-*), plus
workflow_dispatch and merge_group.
Flow:
1. Checkout postgres at PR commit.
2. Checkout supabox at pinned SHA (env.SUPABOX_REF).
3. Substitute PR's postgres into supabox/repos/postgres.
4. Install Nix + add the postgres binary cache substituter so
stage 1 of the AMI image is mostly a cache pull.
5. ./supabox init systemd,pg17 (generates env + certs, npm install).
6. docker compose build supabase-postgres-17 (AMI-as-OCI).
7. docker compose up -d --wait --wait-timeout 300.
8. ./dctest test/supadev-smoke.yaml --results-file ... --results-verbose.
9. Always capture docker state; on failure dump last 500 lines per
container log.
10. Upload supabox/diagnostics/ as a 14-day artifact.
Conventions followed:
- Runner blacksmith-2vcpu-ubuntu-2404 (matches testinfra-ami-build.yml).
- supabase/postgres/.github/actions/shared-checkout@HEAD for postgres
checkout.
- ./postgres/.github/actions/nix-install-ephemeral for Nix.
- Concurrency group includes pull_request.number || github.ref.
Deliberate first-iteration omissions:
- Not gating testinfra-ami-build.yml yet — that wiring is a follow-up
once this proves stable.
- pause-restore.yaml coverage is a follow-up (blocked on the upstream
supabox YAML parse fix and on this gate stabilising).
- No matrix over PG 15 / 17 / 17-orioledb — starting with pg17.
SUPABOX_REF is SHA-pinned (not a tracking branch) so a sibling-team
change can't silently break postgres CI. Bump deliberately.
Local-trial evidence: validated end-to-end on macOS Docker against
supabox a0fe25c on 2026-05-15 with 59/59 supadev-smoke tests passing
in ~5.5 min after init. CI-side wall-clock expected ~15-25 min cold
cache, less on warm.
Tracks RELENG-31.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce?
CI — adds a new GH Actions workflow. Not wired as a required check; just runs and reports.
What is the current behavior?
There's no fast pre-EC2 validation for PRs touching the AMI build.
testinfra-ami-build.ymlis the only path, and it does the full Packer + EC2 round-trip (~20-40 min) before any service-level behavior is exercised.What is the new behavior?
A new workflow
.github/workflows/oci-smoke-gate.ymlthat runs on PRs touchingansible/,nix/,migrations/,flake.{nix,lock},Dockerfile-*, or itself (plusworkflow_dispatchandmerge_group).It builds the AMI as an OCI image via
supabase/supabox'ssupport/ami/Dockerfile, brings the supabox platform stack up, and runsdctest test/supadev-smoke.yaml. Diagnostics + container logs (on failure) are uploaded as a 14-day artifact.This PR does not make any other workflow depend on it. It runs alongside
testinfra-ami-build.ymland reports its own status. Promoting it to a required check is a follow-up once we see how reliable it is across real PRs.Additional context
a0fe25con 2026-05-15 — 59/59 supadev-smoke tests pass, ~5.5 min of dctest after init. CI wall-clock expected ~15-25 min cold cache.SUPABOX_REFis SHA-pinned, not trackingmain, so sibling-team changes can't silently break this workflow. Bump deliberately.pause-restore.yamlcoverage is intentionally deferred — upstream supabox needs a YAML parse fix first, and the spec is slower.Relates RELENG-31.