Skip to content

frontend: SPA UX polish — TargetSelector, audit fix, CSRF priming#833

Open
alvarofraguas wants to merge 38 commits into
jmpsec:mainfrom
alvarofraguas:pr/spa-ux-polish
Open

frontend: SPA UX polish — TargetSelector, audit fix, CSRF priming#833
alvarofraguas wants to merge 38 commits into
jmpsec:mainfrom
alvarofraguas:pr/spa-ux-polish

Conversation

@alvarofraguas
Copy link
Copy Markdown
Collaborator

Summary

  • TargetSelector typeahead for query/carve target selection (search by UUID prefix, hostname, platform, tag)
  • Carves/new page two-column layout matching queries/new
  • Node detail page cleanup — consistent tab ordering and field display
  • Audit page self-pollution fix: reading /audit-logs no longer writes audit entries (was causing 0.7 writes/sec idle, refetch every ~1.5s → now 10s)
  • Audit query key memoized to prevent unnecessary refetches
  • primeCsrfFromCookie() at app boot for OIDC redirect flows where the CSRF cookie arrives before the SPA JS initializes
  • Environment creation: default icon + type, split error messages for better UX

Depends on

Test plan

  • TargetSelector filters nodes by UUID prefix, hostname, platform
  • Carve creation form layout renders correctly
  • Audit page no longer creates audit entries on read (verified via DB query count)
  • OIDC redirect login → SPA loads correctly with CSRF token primed
  • Environment creation with minimal fields works (name + hostname only)
  • Frontend builds clean (npm run typecheck && npm run build)

Scaffolds the protocol-agnostic federated-identity surface for
osctrl-api. No concrete provider yet — that lands on the next branch
(auth/01-oidc-provider).

== types.go ==

Defines:
- Provider interface: Type / LoginURL / HandleCallback. Context-aware
  on every method so a slow IdP cannot wedge a handler.
- ResolvedIdentity: protocol-neutral output of HandleCallback. Subject
  is the stable identifier; PreferredUsername is the display/match
  field; Email is informational (mutable, spoofable — threat T24);
  Groups carries IdP group memberships for the RequiredGroups gate.
- State: per-login data the caller must round-trip from LoginURL to
  HandleCallback. Contains EnvUUID (locks to one env — threat T18),
  Nonce (256-bit cryptorandom), and PKCE Verifier.
- Type constants: TypeOIDC ("oidc") + reserved TypeSAML ("saml").

== state.go ==

State-as-JWT cookie implementation. HS256-signed with the api's
existing JWT secret; audience-scoped to "osctrl-auth-state" so a
state JWT cannot be used as a user-auth JWT and vice versa
(threat T19). Cookie attributes:

- Name: osctrl_auth_state
- Path: /api/v1/auth/   (scoped so the cookie doesn't leak to
                         other endpoints)
- HttpOnly + Secure + SameSite=Lax
- MaxAge: 10 minutes

Verification enforces in order: cookie present, HS256 pinned
(WithValidMethods), signature valid, iss == osctrl-api, aud ==
osctrl-auth-state, exp/nbf valid, EnvUUID + Nonce non-empty.
Single ErrStateInvalid for all failure modes — external observers
cannot distinguish "tampered" from "expired" from "wrong audience"
(threat T31).

NewNonce returns 256-bit cryptorandom base64url-encoded values.
ClearStateCookie deletes the cookie immediately after a successful
parse so a callback URL cannot be replayed (threat T9).

== state_test.go ==

15 tests covering each threat vector in scope for this commit:

  TestNonceUnique               — sanity on cryptorandom (T7)
  TestStateRoundTripHappy       — canonical good path
  TestStateMissing              — no cookie → ErrStateMissing (T6)
  TestStateEmptyValue           — defensive: empty cookie value
  TestStateSignatureTampered    — flipped sig byte → ErrStateInvalid (T31)
  TestStateExpired              — past-exp token rejected (T9)
  TestStateWrongAudience        — user-auth aud rejected (T19)
  TestStateWrongIssuer          — wrong iss rejected (defense in depth)
  TestStateWrongSecret          — attacker-signed rejected (T27)
  TestStateAlgNoneRejected      — alg=none rejected (alg-confusion)
  TestIssueRejectsEmptyEnv      — empty EnvUUID refused (T18 groundwork)
  TestIssueRejectsEmptyNonce    — empty Nonce refused
  TestCookieAttributes          — HttpOnly+Secure+SameSite+Path (T11/T12)
  TestClearStateCookie          — path matches for browser-side delete
  TestStateIsZero               — IsZero helper

All 15 pass. `go vet ./pkg/auth/...` clean. `go build ./...` clean.
Lifts the OIDC protocol code from cmd/admin/oidc.go into a reusable
pkg/auth/oidc package implementing the auth.Provider interface from
auth/00-shared-pkg. Legacy admin is refactored to consume the shared
package; behavior is preserved (backwards-compat audit in spec).

== pkg/auth/oidc/ ==

config.go    — Config struct decoupled from YAML/CLI bindings, with
               Validate() and effective*() helpers. New field
               LegacyPermissiveUsername for the backwards-compat shim.
claims.go    — pickUsername (claim selection), sanitizeUsername
               (strict regex), hasRequiredGroup (group gate with
               malformed-claim defense).
provider.go  — NewOIDCProvider, LoginURL, HandleCallback. Implements
               the auth.Provider interface (compile-time check).
               HandleCallback runs the full 10-step verification
               chain documented inline.

Test coverage (28 oidc subtests; 15 pkg/auth subtests = 43 total):

  TestT1ForgedSignature             — sig mint-by-attacker rejected
  TestT2WrongIssuer                 — iss claim mismatch rejected
  TestT3WrongAudience               — aud claim mismatch rejected
  TestT4Expired                     — exp in the past rejected
  TestNonceMismatch                 — id_token nonce != state nonce
  TestT10PKCEVerifierMissing        — PKCE-on + no verifier rejected
  TestT17RequiredGroupMissing       — gate denies wrong group
  TestT17GroupsClaimMalformed       — string-shaped claim denies
  TestRequiredGroupSatisfied        — happy-path group OK
  TestT18StateEnvMismatch           — cross-env state replay rejected
  TestT23UsernameInjection (6 sub)  — SQL/newline/NUL/etc usernames rejected
  TestIdPErrorParam                 — IdP error_description NOT leaked
  TestMissingCode                   — empty code rejected
  TestLegacyPermissiveUsername(2)   — shim accepts email-format,
                                      still rejects empty/whitespace
  TestHandleCallbackHappy           — full successful exchange
  TestNewOIDCProviderDiscoveryFails — bad discovery rejected at init
  TestLoginURLValidatesState        — empty State.EnvUUID/Nonce rejected
  TestLoginURLPKCERequired          — verifier-missing rejected at LoginURL
  + claim/config unit tests

The in-process fakeIdP test harness boots a real httptest.Server that
serves discovery + JWKS + token endpoints. id_tokens are
RS256-signed with an in-memory RSA key, so the signature/iss/aud/exp
checks all exercise the genuine go-oidc.Verifier path, not a mock.

== cmd/admin/oidc.go refactor ==

The file shrinks from 360 LoC to ~180 LoC by replacing the inline
protocol implementation with calls into pkg/auth/oidc. Preserved:

- Route names (/login, /oidc/callback)
- YAML config field names
- Session-cookie + session-manager flow
- JIT-provisioning policy (zero permissions)
- Audit-log behavior

State-cookie storage switches from gorilla/securecookie to
pkg/auth.IssueStateCookie (HMAC JWT signed with the existing
flagParams.Admin.SessionKey). State cookie name changes from
"osctrl-admin-oidc-state" to "osctrl_auth_state" — operators
mid-OIDC-login during a deploy will need one retry. Documented in
the spec's "accepted minor break" section.

Username validation: pre-refactor cmd/admin did no character-class
check, so deployments where IdPs emit email-format
preferred_username values rely on permissive acceptance. The
Config.LegacyPermissiveUsername flag set by cmd/admin's
fromYAMLConfig preserves that behavior. New callers (cmd/api in a
later branch) leave the flag false and get the strict default.

== cmd/admin/main.go ==

Drops the no-longer-needed `var oidcRT *oidcRuntime` global and
updates initOIDC's call site to match the new (error)-only return.

== Verified ==

- go build ./... clean
- go vet ./... clean
- go test ./... all 16 packages green including the new auth tests
- Backwards-compat audit documented in spec § "Backwards compatibility
  for cmd/admin"
Closes hardening item 5A from the auth spec's pentest plan.

Before: LoginURL passed state.EnvUUID as the OAuth2 `state` parameter,
and HandleCallback compared the returned URL state against
state.EnvUUID. Env identifiers are operator-visible (URLs, logs, UI)
and in many deployments are short, guessable strings ("admin",
"prod", "default"). An attacker who knows or guesses the target's
EnvUUID could craft a callback URL whose state parameter survives
step 2 of the verification chain. The defense was relying entirely
on the cookie's HMAC signature for CSRF protection — a single
mistake in cookie handling would have opened a CSRF hole.

After: LoginURL passes state.Nonce — the 256-bit cryptorandom value
already minted for the id_token nonce — as the OAuth2 state param.
HandleCallback checks `url.state == cookie.Nonce`. The value is now
unguessable, so step 2 alone gates CSRF whether or not the cookie's
HMAC happens to verify. Cookie HMAC remains the second layer.

State.EnvUUID stays in the cookie shape (LoginURL still rejects an
empty one) but becomes informational at the protocol layer. Callers
that want env-scoping above the protocol use it out-of-band:
cmd/admin sets it to "admin"; cmd/api will set it to a global
sentinel.

Test changes:

- TestT18StateEnvMismatch repurposed as TestT6CSRFStateTampering
  with a more accurate threat label: env-replay isn't a threat at
  this layer; CSRF/state-tampering is.
- TestLoginURLValidatesState now asserts state=<nonce> in the
  authorize URL AND asserts that state=<env-uuid> does NOT appear.
- fakeCallback's stateParam argument documented as the nonce.
- TestMissingCode's state param updated so the test reaches step 3
  (missing code) rather than failing at step 2 (state mismatch).

All 28 oidc subtests pass. `go build ./...` and `go vet ./...` clean.
Bug found during Auth0 manual smoke: operators configuring
--oidc-username-claim with a non-standard name (nickname, upn,
login, etc.) silently got the subject claim back instead. The old
switch statement only covered preferred_username/email/sub; any
other configured name fell through to subject — which on Auth0 is
"auth0|hex…" containing the pipe character the strict username
regex rejects.

After: pickUsername now consults a generic raw claim map after the
typed-struct fast path. Any string-valued claim wins; non-string
values (arrays, objects, missing) fall back to subject. The typed
struct fast path stays for the three known claims since it avoids
a map lookup in the hot path.

== Files ==

  pkg/auth/oidc/claims.go
    * pickUsername(c, raw, claim) signature; reads raw[claim] for
      claim names not in the typed struct
  pkg/auth/oidc/provider.go
    * HandleCallback decodes raw map ONCE (was duplicated — earlier
      decode for ResolvedIdentity.Raw moved up so pickUsername can
      reuse it)
  pkg/auth/oidc/claims_test.go
    + TestPickUsernameCustomClaim covers nickname/upn/missing/
      non-string/nil-map paths

== Why this matters ==

Auth0's default sub format (auth0|hex) means operators MUST set a
non-default username claim — and the only practical one in the
profile scope is `nickname`. Without this fix the manual smoke test
against Auth0 silently fails 100% of the time, regardless of how the
operator configures their app.

The fix also unblocks Entra ID (which uses `upn`) and any custom
claim shape operators have set up via Auth0 Actions or Keycloak
mappers.

== Verified ==

- go build ./... clean
- go test ./pkg/auth/... -count=1 — all pass (includes the new
  3-subtest TestPickUsernameCustomClaim covering nickname, upn,
  missing claim, non-string value, nil raw map)
Adds federated authentication to osctrl-api as an additive surface
alongside the existing password flow. Lands the implementation for
the "OIDC is global, not per env" pivot — no env_auth_providers
table, no per-env IdP config, just one deployment-wide IdP gated by
--oidc-enabled.

== Why global ==

osctrl-api is a single API surface fronting all envs. The user JWT
issued by /login/{env} is env-scoped at issue time (CheckPermissions
runs against the env in the URL), but federated identity is not —
the same Keycloak user maps to the same AdminUser row regardless of
which env tab they were viewing. Modeling OIDC per-env added a DB
table, an AES-256-GCM secret-box, and an env-aware route shape, all
for zero operational benefit (no jmpsec deployment runs distinct
IdPs per env). Spec § "OIDC is global" documents the choice.

== New handler files ==

  cmd/api/handlers/auth_methods.go    — GET /api/v1/auth/methods
  cmd/api/handlers/auth_oidc.go       — InitOIDC + login + callback
  cmd/api/handlers/auth_resolve.go    — protocol-neutral user resolve
  cmd/api/handlers/auth_jwt.go        — session cookie issuance shared
                                        between password + OIDC paths

The auth_resolve and auth_jwt helpers are protocol-neutral by design
so the eventual SAML provider can reuse them verbatim.

== Wire-level behavior ==

AuthMethodsHandler returns {methods:[...]} unauthenticated, leaking
nothing about the IdP. SPA polls this on the login screen to decide
whether to render the SSO button.

OIDCLoginHandler issues the state-JWT cookie (audience
"osctrl-auth-state" segregates from user JWTs) and 302-redirects to
the IdP authorize endpoint with state=Nonce nonce=Nonce — the same
unguessable 256-bit value carried twice via different protocol
parameters (the 5A invariant from pkg/auth/oidc/provider.go).

OIDCCallbackHandler parses the state cookie, clears it
(single-use, T9), runs the 10-step provider verification, resolves
the identity to an AdminUser (existing row or JIT per
--oidc-jit-provision), mints user JWT + CSRF cookies via the shared
userJWTSessionTokens helper, and 302-redirects to "/". Every
failure path also 302s to "/" — the SPA shows the login screen and
the server-side log records WHY (timing-oracle defense, T31).

== Config + flag changes ==

  pkg/config/types.go     — YAMLConfigurationOIDC.Enabled bool
  pkg/config/flags.go     — --oidc-enabled flag (OIDC_ENABLED env)
                            initOIDCFlags wired into InitAPIFlags
                            (it was already wired into InitAdminFlags)
  cmd/api/main.go         — OIDC: &config.YAMLConfigurationOIDC{} in
                            flagParams init
                            InitOIDC called before handler init when
                            --oidc-enabled, fatal on failure
                            WithJWTSecret + WithOIDC passed to
                            CreateHandlersApi
                            New route block under "UNAUTHENTICATED":
                              GET /api/v1/auth/methods (pre-auth limit)
                              GET /api/v1/auth/oidc/login  (login limit)
                              GET /api/v1/auth/oidc/callback (pre-auth)
                            OIDC routes registered ONLY when enabled

osctrl-admin is untouched: it uses --auth=oidc (a separate mode), and
ignores the Enabled flag.

== Username sanitization ==

cmd/api callers leave LegacyPermissiveUsername=false (the strict
regex applies). Greenfield surface — no pre-existing email-format
usernames to preserve. Operators with email-shaped usernames must
configure --oidc-username-claim sub or similar.

== Tests added (13 new subtests in cmd/api/handlers) ==

auth_methods_test.go:
  TestAuthMethodsPasswordOnly            — default branch shape
  TestAuthMethodsWithOIDC                — OIDC appended when enabled
  TestAuthMethodsContentType             — JSON content-type
  TestAuthMethodResponseJSONShape        — JSON field names locked

auth_oidc_test.go (against in-process fake IdP):
  TestOIDCLoginRedirectsToIdP            — 302 + state=nonce invariant
  TestOIDCLoginPKCEIncluded              — code_challenge present
  TestOIDCLoginNoProvider                — 503 when InitOIDC skipped
  TestOIDCLoginNoJWTSecret               — 500 when secret empty
  TestOIDCCallbackNoProvider             — 503 fail-closed
  TestOIDCCallbackNoJWTSecret            — 500 when secret empty
  TestOIDCCallbackMissingStateCookie     — 302 to /
  TestOIDCCallbackTamperedStateCookie    — 302 to /, no leak
  TestOIDCCallbackForeignAudience        — wrong-aud JWT rejected

Happy-path callback (real DB-backed user resolution) is deferred —
pkg/auth/oidc/provider_test.go covers the 10-step verification chain
against an in-process fake IdP; replicating that here would
duplicate the harness without adding signal.

== Verified ==

- go build ./... clean
- go vet ./... clean
- go test ./... — all 16 packages green
- 13 new subtests pass in cmd/api/handlers
- 28 subtests in pkg/auth/oidc still pass (5A invariant preserved)
Adds a Federated-login surface to the React SPA, completing the SPA
side of the OIDC track. The button appears below the existing
password form, separated by an "or" divider, when the API advertises
an oidc method via /api/v1/auth/methods. Deployments without an
IdP configured see no visual change.

== Wire-level behavior ==

src/api/client.ts:
  - AuthMethod type mirroring the API's response shape.
  - listAuthMethods() — direct fetch (NOT apiFetch) so the call can
    never trigger the 401-redirect loop on the login page itself.
    Returns [] on shape mismatch / network error; throws on non-2xx.

src/features/login/LoginPage.tsx:
  - useQuery against listAuthMethods alongside the existing env
    list query. staleTime=5min, retry=1, fail-soft.
  - oidcMethod = find(type==='oidc'). If undefined, no surface
    renders.
  - Native <a href={oidcMethod.loginUrl}> rather than a Button. The
    href MUST be a full-page navigation: the IdP flow goes browser →
    /api/v1/auth/oidc/login (302) → IdP authorize (302) → IdP login
    page → IdP callback (302) → /api/v1/auth/oidc/callback (302) →
    "/". fetch/XHR-driven navigation would break the cross-origin
    302 chain.
  - Styled to match Button variant="secondary" size="lg" so the
    button rhythm stays consistent with the primary submit.
  - The divider uses --border + uppercase "or" microcopy in
    mono-tabular, matching the brand guide.

== Tests added (3 subtests, src/features/login/LoginPage.test.tsx) ==

  hides the SSO button when only password method is advertised
    — most common deployment state; the unreachable button would
    have been a regression.

  renders the SSO button when oidc method is advertised
    — asserts the anchor and verifies href matches the API's
    loginUrl verbatim. Guards against drift between the SPA's URL
    constant and the API's route registration.

  hides the SSO button when methods endpoint errors
    — fail-soft: password form must still work even if the methods
    query throws.

The tests pin behavior at the boundary the user sees (DOM presence
+ href value). They don't reach into the React Query cache or
inspect internal state — the test is meaningful regardless of how
the component fetches.

== Verified ==

- npm run check (tsc --noEmit) — clean
- npm test — 95 tests across 20 files pass (was 92/19)

Next: auth/05 (Keycloak compose stack + Kali pentest pass).
AdminUser gains an AuthSource string field. JIT-provisioning via the
OIDC callback sets it to "oidc"; everything else leaves it empty
(the password / default path).

The field flows through projectAdminUserView → AdminUserView.AuthSource
(JSON tag auth_source) and is consumed by the SPA Users page, which
renders a small "OIDC" badge alongside the existing admin / service
labels. Operators can now distinguish at a glance which rows came
from federated login.

The auth flow itself doesn't gate on this field — an OIDC user with
a password set later (via the user-management API) can log in either
way by design. AuthSource is purely informational.

== Files ==

  pkg/users/users.go          — new AuthSource string column (GORM
                                AutoMigrate adds it on next startup)
  pkg/types/types.go          — AdminUserView.AuthSource with json tag
  cmd/api/handlers/users.go   — projectAdminUserView passes it through
  cmd/api/handlers/auth_resolve.go — sets u.AuthSource = "oidc" on the
                                JIT path BEFORE Create()
  frontend/src/api/types.ts   — optional auth_source on AdminUser
  frontend/src/features/users/UsersPage.tsx — renders an OIDC pill
                                when u.auth_source === 'oidc'

== Backwards compatibility ==

Existing rows have an empty AuthSource. Older /users responses (from
osctrl-api ≤ this commit) will simply omit the field — the SPA
treats absence as "not OIDC" → no badge. Newer SPA against older API
behaves identically.

== Verified ==

- go build + go vet clean
- frontend npm run check + npm test (99 tests) green
Fixes the "click Continue with SSO and it silently logs me back in"
bug. Three-tier teardown on POST /api/v1/logout:

1) Server-issued Set-Cookie headers expire both osctrl_token AND
   osctrl_csrf (Max-Age=-1). The osctrl_token cookie is HttpOnly,
   so JS-only logout could not clear it — previously a click on
   "Sign out" cleared the in-memory CSRF + the osctrl_csrf cookie
   but left osctrl_token sitting in the browser. On the next
   "Continue with SSO" the still-valid osctrl_token rode the
   top-level GET (SameSite=Lax) and the SPA treated the operator
   as authenticated without going near Keycloak.

2) DB-level APIToken revocation via users.ClearToken. The auth
   middleware (handlerAuthCheck) already enforces an APIToken-match
   check against the stored row, so once we zero the column any
   cached copy of the JWT — including the same token in another
   browser tab — fails the next auth check with 401. Server-side
   revocation is the load-bearing half: cookie expiry alone is
   client-honor-system, APIToken="" makes the token a no-op even
   if the bytes are replayed.

3) IdP end-session navigation. The response includes
   idp_logout_url pulled from the discovery doc's
   end_session_endpoint claim. The SPA navigates the browser to
   ${url}?post_logout_redirect_uri=${origin}/login. Keycloak
   terminates its KEYCLOAK_IDENTITY cookie and bounces back to
   /login — the next SSO click now actually prompts for
   credentials instead of silently re-auth'ing.

Idempotent: a logout from an expired or anonymous session still
emits the cookie-clearing headers + the IdP URL. The endpoint is
NOT behind handlerAuthCheck for exactly this reason — logging out
from an expired session should not require fresh auth.

== Backend files ==

  pkg/auth/oidc/provider.go
    + Provider.EndSessionURL() reads end_session_endpoint out of
      the cached discovery doc; "" when unset. Memory read only,
      no network call.

  cmd/api/handlers/auth_logout.go (new)
    + LogoutResponse { idp_logout_url }
    + LogoutHandler: ClearToken via JWT cookie, Set-Cookie max-age=-1
      for both cookies, returns IdP end-session URL.

  cmd/api/main.go
    + POST /api/v1/logout under preAuthRateLimit (60/min/IP).

== Frontend files ==

  src/api/client.ts
    + LogoutResponse type.
    * logout() now async; POSTs /api/v1/logout, then navigates
      to idp_logout_url with post_logout_redirect_uri pointing
      back at /login. Falls back to /login on no-IdP or network
      failure. window.location.href rather than router.navigate
      because the post-logout URL is cross-origin to Keycloak.

  src/components/chrome/UserMenu.tsx
    * handleLogout() now just calls logout() (which owns the
      navigation). Drops the unused router.navigate that was
      racing with window.location.href.

== Keycloak realm config ==

(Out of repo, applied via the admin API at 192.168.31.239 against
the osctrl-api client in the osctrl realm.)
  post.logout.redirect.uris adds:
    http://192.168.31.239:8088/login
    http://localhost:8088/login
    http://192.168.31.239:8088/
    http://localhost:8088/

== Verified ==

- go build clean
- frontend npm run check + npm test (104 tests) green
Fixes Keycloak 26's "Missing parameters: id_token_hint" error on
logout. Keycloak's RP-initiated logout requires EITHER id_token_hint
OR client_id alongside post_logout_redirect_uri — passing only the
redirect URI yields a Keycloak-side error page and the user is left
stuck mid-flow.

We choose the client_id path so we don't have to persist raw
id_tokens client-side (they carry email + arbitrary claims and are
privacy-sensitive). The client_id is not a secret — it already
appears in every authorize URL the SPA renders — so exposing it via
the /logout response is just plumbing.

== Wire change ==

LogoutResponse gains idp_client_id. The api sets it from the cached
oidcClientID package-global (mirrors cfg.ClientID, set during
InitOIDC). The SPA reads it from the same response and appends to
the end-session URL via URLSearchParams (cleaner than the prior
string-concat — handles encoding correctly).

The SPA falls back to omitting client_id when the api response
doesn't include it (older build, partial config). Keycloak will
then show its error page and the operator can navigate manually —
no silent breakage.

== Files ==

  cmd/api/handlers/auth_oidc.go
    + oidcClientID package-global, set in InitOIDC
  cmd/api/handlers/auth_logout.go
    + LogoutResponse.IdPClientID
    + populated from oidcClientID
  frontend/src/api/client.ts
    + LogoutResponse.idp_client_id
    * logout() builds the IdP URL with URLSearchParams and includes
      client_id when present
Okta requires id_token_hint when chaining post_logout_redirect_uri
on /v1/logout — without it, the IdP returns "Missing parameter:
id_token_hint" and the operator gets stuck on the Okta error page.
Keycloak accepts either id_token_hint OR client_id as the hint;
we now send both and let the IdP pick whichever it knows.

== Implementation ==

  pkg/auth/types.go
    + ResolvedIdentity.IDToken — the raw, already-verified id_token
      string. Documented as logout-hint-only; not an auth credential.

  pkg/auth/oidc/provider.go
    * HandleCallback returns rawIDToken into ResolvedIdentity.IDToken
      after the 10-step verification chain. Already verified (sig +
      iss + aud + exp + nbf + nonce) by this point.

  cmd/api/handlers/auth_oidc.go
    * OIDCCallbackHandler now sets an HttpOnly osctrl_id_token cookie
      after the user JWT is issued. Path=/api/v1/auth/ scope matches
      the state cookie. Max-Age 8h — long enough for typical session
      lifetimes, short enough to mostly age out before any leak.

  cmd/api/handlers/auth_logout.go
    + LogoutResponse.IdPIDTokenHint json:idp_id_token_hint
    * LogoutHandler reads the osctrl_id_token cookie, clears it with
      Set-Cookie Max-Age=-1, and includes the value in the response.
      The cookie value is the same id_token the IdP issued, so
      handing it back to the browser is no information leak — we're
      just routing it through the SPA so the IdP can verify session
      ownership.

  frontend/src/api/client.ts
    + LogoutResponse.idp_id_token_hint
    * logout() reads the hint and appends id_token_hint=… to the IdP
      URL alongside post_logout_redirect_uri + client_id. Whichever
      param the IdP recognizes will work.

== Verified ==

- go build ./... clean
- npm run check + npm test (104 tests) green

This unblocks Okta integration (next: configure cmd/api with Okta
OIDC env vars + run the same probe set against the dev-* tenant).
Pentest finding: POST /api/v1/logout with no cookies still returned
LogoutResponse{IdPLogoutURL, IdPClientID, IdPIDTokenHint} populated
from oidcProvider/oidcClientID, leaking the OIDC tenant URL and
client_id to any drive-by curl. A reconnaissance attacker now gets
the bits they need to register their own client against the same
tenant or to fingerprint the IdP.

The endpoint is unauthenticated by design (idempotent logout — a user
with an expired token can still log out without re-authenticating
first) so we can't simply gate the route. Instead, the handler now
gates only the IdP-detail fields on whether the caller presented a
valid JWT, while preserving the idempotent cookie-clearing behavior.

Anonymous /logout: cookie-clearing Set-Cookie headers + empty
LogoutResponse{} body.

Authenticated /logout: cookie-clearing + IdP fields populated as
before, so the SPA can drive the IdP-side session termination.

Tests:
- TestLogoutAnonymousHidesIdPMetadata stands up a fake IdP, sets
  oidcClientID, then hits /logout with no cookies and asserts all
  three IdP fields are empty in the response body.
- TestLogoutClearsCookiesEvenWhenAnonymous pins the idempotent
  contract: cookie-purge still happens for anonymous callers.
Pentest finding: pkg/auth/oidc set the OAuth2 `state` query param and
the OIDC `nonce` query param to the SAME 256-bit random value (the
single State.Nonce field), collapsing two protocol-level defenses
onto one leak surface. A `state` value disclosed via a Referer
header, access log, or proxy buffer simultaneously compromised the
CSRF defense (matched against the cookie nonce on callback) AND the
id_token replay defense (matched against the `nonce` claim).

Fix: add State.OAuthState — a second 256-bit cryptorandom value,
independent of State.Nonce. OIDC.LoginURL sends:
  state = OAuthState
  nonce = Nonce
HandleCallback checks the URL `state` param against state.OAuthState
(CSRF) and the id_token's `nonce` claim against state.Nonce (replay).
A leak of either now compromises only one defense; the other still
holds.

SAML follows the same shape: RelayState carries OAuthState.

State cookie format: a new `os` claim carries OAuthState; older
cookies without it transparently fall back to using Nonce as
OAuthState during the 10-minute StateCookieTTL, so in-flight logins
don't break at upgrade time.

Defense-in-depth guards:
- IssueStateCookie refuses State with OAuthState == Nonce.
- OIDC.LoginURL refuses State with OAuthState == Nonce.
The whole point of the split is independence; allow same-value reuse
and a future regression silently re-introduces the bug.

Tests:
- TestIssueRejectsEmptyOAuthState, TestIssueRejectsAliasedNonceAndOAuthState
  pin the cookie-issuer invariants.
- TestParseLegacyCookieWithoutOAuthState pins the backward-compat
  fallback at the JWT-claim level.
- pkg/auth/oidc fakeCallback sites updated to pass OAuthState as the
  URL `state` param; goodState() carries both values.
- TestOIDCLoginRedirectsToIdP now asserts state != nonce (was: state
  == nonce, the inverted "5A invariant").

Callers updated:
- cmd/api/handlers/auth_oidc.go mints OAuthState alongside Nonce
  before issuing the state cookie.
- cmd/admin/oidc.go same.
Adds the SAML 2.0 Web Browser SSO Profile implementation behind the
auth.Provider interface, mirroring pkg/auth/oidc. crewjam/saml handles
XML / sig / timestamp / audience / recipient validation; the package
adds the load-bearing extras the spec calls out:

- RelayState=state.Nonce CSRF binding (threat S10), checked BEFORE
  any crypto path runs so a malformed-but-correctly-CSRF response can
  surface its real parse error and a CSRF probe doesn't leak parsing
  hints
- In-memory assertion-ID replay cache (threat S3) that sits between
  crewjam's NotBefore/NotOnOrAfter window and the JIT path — closes
  the few-minute capture-and-replay gap inside the validity window
- Same sanitizeUsername regex as OIDC; rejects audit-log poisoning,
  shell/SQL metacharacters, length > 64
- Config.RequireAssertionSigned must be true (threat S2); Validate()
  refuses configs that turn it off

Users page badge: extends resolveFederatedUser to take the provider
type and tag the JIT-provisioned AdminUser row with "oidc" or "saml".
Frontend renders matching pills on /_app/users.

Coverage on pkg/auth/saml: 74.4% (uncovered statements are the
signed-assertion happy path inside HandleCallback, exercised by the
live Keycloak/Auth0 pentest).

Tests: 22 new unit tests across config / claims / replay / provider.
All Go packages green; all 104 frontend tests green; typecheck clean.
Completes the SAML 2.0 federated-login surface on osctrl-api. The
protocol package (pkg/auth/saml) landed in a202642; this commit
wires it into the api binary, exposes the routes, and adds the SPA's
"Continue with SAML" button on the login page.

Backend:
- cmd/api/handlers/auth_saml.go — InitSAML constructs the global
  authsaml.Provider at startup (failing fast on metadata fetch errors,
  same posture as OIDC). SAMLLoginHandler mints the state cookie with
  independent Nonce + OAuthState (May-2026 split), then 302s to the
  IdP's SSO endpoint with RelayState=OAuthState. SAMLACSHandler parses
  the state cookie, clears it (single-use), runs Provider.HandleCallback
  (RelayState check → crewjam ParseResponse → replay cache → groups
  gate → sanitized username), JIT-provisions if needed, mints
  JWT+CSRF cookies. SAMLMetadataHandler serves the SP metadata XML
  for IdP-side registration.
- cmd/api/main.go — adds SAML to ServiceParameters init, calls
  handlers.InitSAML at startup when --saml-enabled, registers three
  routes behind the existing rate limits (login on strict, ACS +
  metadata on pre-auth).
- cmd/api/handlers/handlers.go — adds HandlersApi.SAMLEnabled and
  WithSAML option (mirrors OIDCEnabled / WithOIDC).
- cmd/api/handlers/auth_methods.go — advertises {type:"saml",
  loginUrl:"/api/v1/auth/saml/login"} when SAMLEnabled is true.
- pkg/config/types.go — YAMLConfigurationSAML gains Enabled, EntityID,
  ACSURL fields with mapstructure tags so viper unmarshalling works.
- pkg/config/flags.go — initSAMLFlags adds --saml-enabled,
  --saml-idp-metadata-url, --saml-entity-id, --saml-acs-url,
  --saml-jit-provision (env-var sources mirror OIDC's pattern).

Frontend:
- frontend/src/api/client.ts — AuthMethod union extended to
  'password' | 'oidc' | 'saml'.
- frontend/src/features/login/LoginPage.tsx — single "or" divider
  before SSO buttons; OIDC and SAML buttons render independently,
  stacked, when their respective methods are advertised. Button
  labels changed to "Continue with OIDC" / "Continue with SAML" for
  clarity when both are present.
- frontend/src/features/login/LoginPage.test.tsx — adds tests for
  SAML-only, OIDC+SAML-both, and renames the OIDC-only assertion to
  match the new label.

Tests: all 106 frontend tests pass (+2 new). Backend: pkg/auth/saml
and cmd/api/handlers green. JIT users land with auth_source="saml"
and the SAML badge on /_app/users (already plumbed in a202642).
The previous OIDC PR introduced a State.OAuthState field (split from
Nonce so a Referer/log leak of the OAuth2 state parameter doesn't
also compromise the OIDC nonce defense). SAML follows the same
shape: RelayState now carries OAuthState rather than Nonce.

Before:
  - LoginURL emits RelayState = state.Nonce.
  - HandleCallback validates RelayState echo against state.Nonce.
  - Both the SAML CSRF defense and any cookie-side replay marker
    rode the same Nonce value.

After:
  - LoginURL emits RelayState = state.OAuthState; checks against
    that field too. Empty-OAuthState is rejected (defense-in-depth
    against a future caller forgetting to mint the value).
  - HandleCallback validates against state.OAuthState.
  - state.Nonce is no longer consulted by SAML — Nonce in the
    shared State struct now means strictly "OIDC id_token nonce"
    so the field semantics are unambiguous across providers.

Tests in pkg/auth/saml/provider_test.go updated to populate
State.OAuthState; happy-path + state-mismatch + missing-fields
coverage preserved.

The deltas in this commit were originally folded into the OIDC
state/nonce-split commit on the integration branch. Splitting
them into their own commit here keeps the SAML PR upstream-
reviewable in isolation: PR D no longer pretends SAML is touched
by an OIDC-named commit.
Two related fixes uncovered while running the Keycloak pentest pass:

1. InResponseTo correlation (threat S7) was effectively broken.
   ParseResponse was called with possibleRequestIDs=nil, which
   crewjam interprets as "expect no InResponseTo at all" — but
   every Keycloak / Auth0 SP-initiated response always echoes the
   AuthnRequest ID it received. The net effect was that EVERY
   legitimate SAMLResponse was rejected with
   "InResponseTo does not match any of the possible request IDs
   (expected [])". The only escape hatches the library offers
   are nil (reject-all) and empty-slice (accept-any) — neither
   satisfies S7 properly.

   Fix: round-trip the AuthnRequest ID through the HMAC-signed
   state cookie. LoginURL mints the request, captures its ID,
   stuffs it into auth.State.SAMLRequestID before issuing the
   cookie. HandleCallback reads state.SAMLRequestID and passes
   it as the single expected request ID to ParseResponse. An
   attacker who can't forge the cookie can't satisfy
   InResponseTo, so the defense is real, not vestigial.

   Wire shape:
   - auth.State gains SAMLRequestID field (omitempty JSON tag
     keeps OIDC cookies unchanged).
   - stateClaims gains saml_rid claim with the same shape.
   - pkg/auth/saml gains LoginURLWithRequestID() that returns
     both the URL and the AuthnRequest ID (the regular
     LoginURL() still works for the auth.Provider interface
     contract; just delegates and drops the ID).
   - cmd/api SAMLLoginHandler uses LoginURLWithRequestID,
     populates state.SAMLRequestID, then issues the cookie.

2. Diagnosability — crewjam wraps the real cause in
   InvalidResponseError.PrivateErr but Error() returns the opaque
   "Authentication failed". The original log line lost the
   PrivateErr, which made the S7 misconfiguration above
   essentially undiagnosable from logs (every probe just said
   "Authentication failed"). HandleCallback now type-asserts and
   surfaces PrivateErr at WARN. The client still sees only the
   sentinel — timing-oracle defense intact.

Verification (live Kali stack with Keycloak SAML):
  Baseline (untouched signed response): 302 / + osctrl_token cookie
  S1 XSW (4 variants): all rejected, no token leak
  S2 unsigned: rejected
  S3 replay: rejected (replay cache on assertion ID)
  S4 audience confusion: rejected
  S5 clock-skew (NotOnOrAfter past): rejected
  S6 Recipient mismatch: rejected
  S7 InResponseTo: now ENFORCED — see fix above
  S8 alg-downgrade SHA1: rejected
  S9 XXE: parser ignores entities, no crash
  S10 RelayState mismatch/missing-cookie/tampered: all rejected
  Plus Issuer and NameID tampering — both rejected
  Total: 41 negative-path probes pass (15 unauth + 26 signed).

All Go tests green.
Two related UX bugs surfaced once SAML went live on Kali:

1. OIDC user logs out → SPA gets a stale id_token_hint → Keycloak
   rejects the request → user lands on Keycloak's error page
   instead of /login.

2. SAML user logs out → SPA session clears → user clicks
   "Continue with SAML" → silently re-authenticated against the
   still-alive Keycloak SSO cookie. Feels like logout did nothing.

Root causes:

(1) LogoutHandler emitted IdPLogoutURL/IdPClientID/IdPIDTokenHint
unconditionally whenever oidcProvider was configured, even for
SAML-sourced sessions. The SPA blindly navigated to the OIDC
end-session URL with a stale id_token_hint that didn't match the
user's actual session protocol.

(2) v1 has no SAML Single Logout — the spec explicitly defers SLO
to v2 (decision D3, threat S11-adjacent). Without SLO, the SP
session ending doesn't tell the IdP anything. The pragmatic
substitute is to set ForceAuthn="true" on every AuthnRequest so
Keycloak/Auth0/Okta re-prompt for credentials even when their SSO
cookie is alive. This matches user expectations ("logout means I
have to log in again") without implementing the SLO protocol.

Fix:

- LogoutHandler now looks up AdminUser.AuthSource at logout time
  and only emits OIDC fields when the active session came from
  OIDC. SAML users get auth_source="saml" + empty IdP fields;
  password users get auth_source="" + empty IdP fields. The SPA's
  existing fall-through to /login handles both cleanly. Anonymous
  callers still get the fully empty response (no IdP scrape — the
  earlier pentest gate from 9cbf0fc holds).

- LogoutResponse gains AuthSource field; frontend type extended.

- SAML provider gains Config.ForceAuthn (bool). When true,
  crewjam emits ForceAuthn="true" on every AuthnRequest.
  Plumbed via --saml-force-authn / SAML_FORCE_AUTHN env (defaults
  true in osctrl-api; cmd/admin still works without it as before).

Verified live on Kali: emitted AuthnRequest now carries
ForceAuthn="true". A logout-then-SAML-login cycle will now show
Keycloak's credential prompt instead of silent re-auth.

All Go tests green. No frontend tests touched.
…on logout

Three bugs uncovered while bringing SAML up against Auth0 on Kali:

1. State cookie SameSite=Lax blocks SAML POST-binding ACS. Browsers
   refuse Lax cookies on cross-site POST, which is exactly what an
   IdP does on the ACS callback. Switched to SameSite=None; Secure
   so the cookie survives the cross-origin POST. CSRF defense is
   untouched — the load-bearing check is the unguessable OAuthState
   in the JWT body, which the IdP must echo as RelayState.

   Loosening Lax→None requires Secure=true; browsers refuse
   SameSite=None on plain HTTP. Documented this as the v1 reason
   operators must front osctrl-api with TLS in production. Kali's
   dev stack now serves HTTPS on :8444 via osctrl-frontend with a
   self-signed cert (compose + frontend-dev.conf live in the kali-
   local stash, not part of this commit).

2. Auth0 emits SAML attribute Names under the long urn:oid: /
   schemas.xmlsoap.org URIs, not their short forms. Our
   pickSAMLUsername indexes attrs by exact Name OR FriendlyName,
   so `SAML_USERNAME_ATTRIBUTE=nickname` resolved to nothing and
   fell through to NameID — which Auth0 emits as `auth0|hex`,
   fails the strict username regex, login fails with
   ErrUsernameInvalid. Fix: --saml-username-attribute now accepts
   the full URI; operators set it to
   `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nickname`
   on Auth0 (or just `nickname` on Keycloak which uses short names).
   Added the flag + YAML field + handler plumbing.

3. Logout sent SAML-session users to the OIDC end-session URL.
   The old code keyed off AdminUser.AuthSource (creation-time
   stamp) — wrong, because a user who was JIT'd via OIDC can log
   in via SAML later and AuthSource still says "oidc." Fix: derive
   the active session's protocol from the osctrl_id_token cookie's
   presence. The OIDC callback sets that cookie; SAML and password
   flows never touch it. Cookie present + valid → OIDC session;
   absent → SAML or password. AdminUser.AuthSource stays as the
   /_app/users badge field — a creation-time identity stamp, not
   a session-protocol switch.

Verified live on Kali:
  - alice (originally OIDC-JIT'd) logs in via SAML → SPA loads
  - alice (SAML session) clicks logout → 302 / cleanly, no Auth0
    OIDC-logout error page
  - alice (OIDC session) logs out → still chains to Auth0
    /oidc/logout with id_token_hint (the previously-working path)
  - 18/18 unauth Auth0 SAML probes pass against the HTTPS endpoint
  - State cookie wire-shape: Secure + SameSite=None + nonce ≠ os +
    saml_rid (InResponseTo defense) + aud=osctrl-auth-state

Test updates:
  - TestOIDCLoginRedirectsToIdP: SameSite=Lax → SameSite=None
    assertion follows the IssueStateCookie change.
  - pkg/auth/state_test.go: same.
Pentest finding: SP metadata advertised AuthnRequestsSigned="false".
The spec marked this as deferred-to-v2 (decision D5) because it
requires SP key management, but with the HMAC state cookie + the
cleanly-bound RelayState the realistic exploit surface was narrow.
Pentester correctly flagged it as not-best-practice anyway.

Closing the finding by shipping SP request signing:

1. pkg/auth/saml/config.go gains SigningCertPath + SigningKeyPath
   fields. Both empty disables signing (the v1 default — operators
   who don't want to manage an SP keypair still get a working SAML
   flow). Both set enables signing. Half-set is rejected by
   Validate() as a config typo.

2. pkg/auth/saml/provider.go loads the RSA keypair (PKCS#1 or
   PKCS#8 PEM), pins it onto crewjam.ServiceProvider.Key +
   .Certificate, and sets SignatureMethod = RSASHA256. crewjam's
   Metadata() then automatically:
     - flips SPSSODescriptor AuthnRequestsSigned to "true"
     - emits a KeyDescriptor use="signing" with the X509 cert
   and every emitted AuthnRequest carries an RSA-SHA256 signature
   over the SAMLRequest+RelayState+SigAlg bundle (HTTP-Redirect
   binding) or an embedded ds:Signature (HTTP-POST binding).

3. pkg/config/types.go: YAMLConfigurationSAML gains the two
   path fields with mapstructure tags so viper unmarshaling works.

4. pkg/config/flags.go: --saml-signing-cert and --saml-signing-key
   CLI flags + SAML_SIGNING_CERT / SAML_SIGNING_KEY env-var
   sources, paired with the existing saml-* flag cluster.

5. cmd/api/handlers/auth_saml.go: InitSAML passes the new fields
   into authsaml.Config.

Live verified on Kali (HTTPS api on :8444):
- SP metadata now reads AuthnRequestsSigned="true" with the
  embedded X509 KeyDescriptor.
- /api/v1/auth/saml/login emits SAMLRequest URL with SigAlg=
  rsa-sha256 + a valid Signature param on the HTTP-Redirect.
- Auth0 SAML addon updated: signingCert + signRequests=true.
- Keycloak SAML client updated: saml.client.signature=true,
  saml.signing.certificate pinned to our cert.

Test fixtures: 4 unit tests in pkg/auth/saml still pass — they
don't configure signing, exercising the v1 default path. A
signing-enabled e2e test would need a fixture keypair on disk;
deferred until a follow-up.

Threat model update: closes the AuthnRequest-tamper vector on
the browser→IdP redirect (malicious extension, MITM proxy). The
IdP will reject any tampered AuthnRequest because the signature
no longer verifies. With the existing HMAC state cookie also in
place, both sides of the round-trip are now signed/bound.
Closes two Users-page UX gaps surfaced during the Kali end-to-end
session:

1) The Permissions modal showed a free-text "Environment UUID" input
   with a TODO comment about a dropdown landing later. Replaces it
   with a useQuery-backed <select> populated from
   listEnvironments(). Falls back to the free-text input on env-list
   query error so a flaky /environments endpoint can't block setting
   permissions.

2) Granting access to N environments required N modal opens, paste
   UUID, save — unworkable for tenants with hundreds of envs.
   Combines two approaches:

   New API endpoint POST /api/v1/users/{u}/permissions/all — one
   round-trip, server enumerates envs and applies the same access
   shape across all of them. Mirrors the per-env handler's lockout
   guards verbatim:
     * super-admins cannot self-demote
     * cannot demote the last super-admin (409)
   So the bulk path is NOT a privilege-escalation hole.

   Client function setUserPermissionsAllSafe tries the bulk endpoint
   first, falls back to a per-env loop (bounded concurrency = 8) on
   404 ONLY. Non-404 errors propagate — the SPA does NOT silently
   swallow a 5xx that would also fail in the fallback. This makes
   cross-version deployments safe (new SPA against an older api that
   doesn't have /permissions/all → still works via the loop).

   UI: two-step "Apply to all envs…" button below the Save button.
   First click reveals a warning-colored "Confirm: apply to all N
   envs" pill; second click fires. A status panel below the access
   checkboxes reports "Applied access to X of Y environments (bulk
   endpoint | per-env fallback)" so operators see which path was
   taken.

== Backend changes ==

  pkg/users/permissions.go — ChangeAccessAll(username, envUUIDs[],
                              EnvAccess) (int, error). Returns
                              partial count + first error on failure
                              (idempotent retry by re-running).
  pkg/types/types.go        — SetPermissionsAllRequest / Response
  cmd/api/handlers/users_profile.go — SetUserPermissionsAllHandler
                              with the same authn + lockout guards
                              as the per-env handler. Enumerates
                              envs via h.Envs.All() server-side.
  cmd/api/main.go           — registers
                              POST /api/v1/users/{u}/permissions/all

== Frontend changes ==

  src/api/types.ts          — SetPermissionsAllRequest / Response
  src/api/users.ts          — setUserPermissionsAllSafe with the
                              bulk-with-fallback decision tree and
                              BulkSetReport for UI consumption.
  src/features/users/UsersPage.tsx — env dropdown (with free-text
                              fallback) + two-step "Apply to all
                              envs" button + result panel.

== Tests (7 new subtests) ==

Backend (pkg/users/permissions_test.go):
  TestChangeAccessAllRejectsUnknownUser — fail-closed on ghost user
  TestChangeAccessAllEmptyEnvList       — boundary: empty envUUIDs

Frontend (src/api/users.test.ts):
  uses the bulk endpoint on the happy path
  falls back to per-env loop when the bulk endpoint returns 404
  records per-env failures without aborting the rest
  propagates non-404 errors (does NOT silently retry)
  returns zero counts on empty env list

Also fixes the existing TestCreateUser sqlmock query expectation to
include the auth_source column added in the prior commit (was
hard-coded to 18 cols, now 19).

== Verified ==

- go build ./... clean
- go test ./... — all packages green
- frontend npm run check + npm test (104 tests) green
Fixes the modal's "error setting permissions" failure on the very
first grant to a (user, env) pair. SetEnvLevel previously called
GetPermission to find an existing row to UPDATE; if no row existed
yet (the user had never had permissions in that env), GORM returned
gorm.ErrRecordNotFound and the whole ChangeAccess chain aborted.
The Save button never worked for new users, and the new bulk
"apply to all envs" endpoint failed for the same reason.

The upsert branch creates the (username, environment, access_type,
access_value) row when GetPermission returns ErrRecordNotFound.
EnvironmentID is zero on insert — consistent with GenUserPermission
elsewhere in the package — because no read path uses the numeric
FK. granted_by is left empty here; the audit-log records the
granting operator at the handler layer.

== Test ==

TestSetEnvLevelCreatesRowWhenAbsent — drives the no-row-exists
branch through sqlmock: Exists() returns 1, GetPermission's SELECT
returns ErrRecordNotFound, then an INSERT into user_permissions is
expected. The handler-side failure that motivated this fix
("error setting permissions (0 of 1 envs updated before failure)")
no longer occurs.
Fixes the Permissions modal opening with a stale default
{user:true,query:false,carve:false,admin:false} regardless of what
access the target user actually has. Operators were silently
overwriting prior grants by re-saving the modal with whatever was
shown — there was no way to see the pre-existing state without
guessing or querying the DB directly.

Two pieces:

== API ==

  GET /api/v1/users/{u}/permissions

  Returns the user's full env→access map in one round-trip:
  { username, permissions: { "env-uuid-1": {user,query,carve,admin}, ... } }

  Envs with no permission rows are OMITTED from the response. The
  SPA treats absence as "no access" (zero-value EnvAccess). For
  tenants with hundreds of envs and a user granted access to a few,
  the response stays small.

  Same authn posture as the existing POST: super-admin
  (AdminLevel, NoEnvironment).

  pkg/types/types.go        + GetPermissionsResponse
  pkg/users/permissions.go  GetAccess() already returned the map; no
                            package-layer change needed.
  cmd/api/handlers/users_profile.go + GetUserPermissionsHandler
  cmd/api/main.go           + GET /api/v1/users/{u}/permissions route

== Frontend ==

The Permissions modal now fires one useQuery for
getUserPermissions(user.username) when it opens. A useEffect
syncs the access checkboxes whenever the env dropdown selection
changes — found in the map → show that env's saved access; missing
from the map → show zero-value (everything false, "no access yet").

Save-permissions success now invalidates the
['user-permissions'] query key alongside ['users'], so a
re-opened modal always sees fresh state.

  frontend/src/api/types.ts             + GetPermissionsResponse
  frontend/src/api/users.ts             + getUserPermissions
  frontend/src/features/users/UsersPage.tsx
                                        + useQuery prefill +
                                          useEffect on env change +
                                          invalidate on save
…ries access map

Three related fixes:

1) The avatar / Sign-Out menu always showed "admin" — UserMenu had a
   hardcoded default and the /_app layout never passed a real
   username. Now the layout fetches /api/v1/users/me and passes
   me.username down to the AppShell → TopBar → UserMenu.

2) GET /api/v1/environments was super-admin-only — non-admins
   couldn't even populate the SPA's env switcher. The endpoint now
   filters the response: super-admins still see every env;
   non-admins get the subset where their EnvAccess.User OR
   EnvAccess.Admin is true. Returns [] for users with no env
   access (the same shape, just empty).

3) The SideNav showed every link to every operator, even items they
   couldn't actually use. Now items hide based on the operator's
   per-env permission map plus their super-admin flag:

   - Dashboard / Profile           — always visible
   - Nodes                          — env.user OR env.admin
   - Queries / Saved                — env.query
   - Carves                         — env.carve
   - Tags / Enrollment              — env.admin
   - Audit / Operators /
     Environments / Settings        — super-admin only

   GET /api/v1/users/me now returns the user's full env-access map,
   so the SideNav can compute these gates client-side without an
   extra round-trip per route change.

== Backend ==

  pkg/types/types.go
    + UserMeResponse.Permissions map[envUUID]EnvAccessView
  cmd/api/handlers/users_profile.go
    + MeHandler populates Permissions from h.Users.GetAccess()
  cmd/api/handlers/environments.go
    * EnvironmentsHandler filters by IsAdmin + GetAccess

== Frontend ==

  frontend/src/api/types.ts
    + UserMeResponse.permissions
  frontend/src/routes/_app/route.tsx
    * useQuery(getMe) → passes username to AppShell
  frontend/src/components/chrome/SideNav.tsx
    * useQuery(getMe), computes canSeeEnv/canQuery/canCarve/canManageEnv/
      isSuperAdmin; conditionally renders each NavItem
    * Admin section header hidden for non-admins (Profile shown alone)

The gating is UI defense-in-depth — the server-side handlers still
enforce the same checks, so a non-admin who manually navigates to
/_app/users gets a 403 on the data fetch and falls through to the
ApiError → login redirect logic. Hiding the link just makes the
existence of the page invisible.

== Test fix ==

frontend/src/features/profile/ProfilePage.test.tsx — makeMe() now
includes the required permissions: {} field.

== Verified ==

- go build ./... clean
- frontend npm run check + npm test (104 tests) green
Three related permission-surface fixes:

1) Audit Trail (/_app/audit) becomes a per-viewer surface:

   Server (cmd/api/handlers/audit.go): non-super-admins can now
   call GET /api/v1/audit-logs (was 403). The handler force-clamps
   filter.Username to the requester regardless of any ?username=
   parameter — defense-in-depth so a non-admin can't iterate other
   usernames manually.

   SPA (AuditPage): nav label changes to "My Activity" for
   non-admins. The Username filter input is hidden for non-admins
   (typing other names would be a no-op and confuse). Page title
   + subtitle adapt to the viewing scope.

2) Command palette filters commands the viewer can't reach.
   STATIC_PAGES entries gain an optional requires: 'admin' flag
   (Operators, Environments, Settings · *). The env-config command
   gates on super-admin OR env-scoped admin per env. Static gating
   mirrors the SideNav so the two stay consistent.

3) Node delete buttons hidden for operators without env-admin.
   DeleteNodeHandler requires AdminLevel on the env server-side.
   NodesTablePage's bulk Delete… button and NodeDetailPage's single
   Delete button now check (super-admin || me.permissions[envUuid]
   .admin) and hide otherwise — operators no longer see a button
   that would 403.

== Answer to user's question ==

A non-super-admin operator can delete nodes ONLY in envs where they
have env.admin (env-scoped admin permission). user/query/carve on
their own are insufficient. The server is the authoritative gate;
this commit just removes the misleading UI.

== Files ==

  cmd/api/handlers/audit.go               * non-admin allowed, force-clamp username
  frontend/src/features/audit/AuditPage.tsx       * gating + scope label
  frontend/src/features/audit/AuditPage.test.tsx  * mock getMe as admin
  frontend/src/components/chrome/SideNav.tsx      * Audit visible to all, label varies
  frontend/src/components/chrome/CommandPalette.tsx * requires:'admin' flag + env.admin gate
  frontend/src/features/nodes/NodesTablePage.tsx  * canDeleteNodes gate on bulk Delete button
  frontend/src/features/nodes/NodeDetailPage.tsx  * same gate on single Delete button

== Verified ==

- go build clean
- frontend npm run check + npm test (104 tests) green
Parity with legacy admin /users which has had both since v0.1. The
SPA had Permissions + Token management but no way to create or
remove operators — operators had to drop to the legacy admin UI or
the API directly. Closes the gap.

Backend: no changes. cmd/api/handlers/users.go already exposes
UserActionHandler which dispatches on the {action} path segment —
"add" calls h.Users.New+Create+CreatePermissions, "remove" calls
h.Users.Delete+DeleteAllPermissions. Self-deletion is already
blocked server-side ("user X cannot remove itself"). Both actions
super-admin gated via CheckPermissions(AdminLevel, NoEnvironment).

Frontend additions:
- frontend/src/api/users.ts: createUser(body) and
  deleteUser(username) hit POST /api/v1/users/{username}/{action}.
  createUser pre-validates the username against the same regex
  the SAML/OIDC sanitizer uses so we can't create users whose
  URL paths would later break encoding round-trips.
- frontend/src/features/users/UsersPage.tsx:
  - "+ Add user" button in the page header.
  - "Delete…" button per row, hidden on the current operator's
    own row (server still rejects but UX-level guard prevents
    the surprise).
  - CreateUserModal: username/password/email/fullname/admin
    fields, client-side regex + password-min-length validation,
    submits to createUser.
  - DeleteUserModal: confirmation step, surfaces server-side
    errors verbatim (e.g. "user X already exists" on edge
    cases).
  - Invalidates both 'users' and 'user-permissions' query caches
    on success so the row count refreshes immediately.

Tests: 106/106 frontend tests still pass; typecheck clean.
Bug surfaced when deleting bob-saml + test on the SPA /_app/users
page: the api returned "error removing user permissions" and
left orphaned rows in user_permissions.

Root cause: UserActionHandler's ActionRemove path called
h.Users.Delete(user.Username) FIRST, then
h.Users.DeleteAllPermissions(user.Username). But
pkg/users.DeleteAllPermissions has an Exists() guard at L410 that
returns "user %s does not exist" once the AdminUser row is gone.
First call deleted the user; second call bombed out; permission
rows leaked.

Reorder: permissions first, user row second. Both calls are
idempotent so the reordered version is also safe to retry.

Cleanup: 4 orphan rows for bob-saml in user_permissions on the
Kali database were hard-deleted out-of-band via psql; test had
no perms at delete time so no orphans.

Subtler fix candidate (deferred): drop the Exists() guard from
pkg/users.DeleteAllPermissions altogether. The function targets
permission rows, not the user row, so "no permissions for a
non-existent user" should be a no-op idempotency win, not an
error. Leaving the package-layer guard alone for now — the
handler-layer reorder is the minimal targeted fix.
Two related additions to the SPA /_app/users page:

(1) Reset password button per operator row. Opens a modal that
takes new password + confirm, posts to the existing legacy
UserActionHandler edit endpoint (POST /users/{u}/edit body
{password: "..."}) which routes to h.Users.ChangePassword. This
is the super-admin recovery path for "alice forgot her password —
please reset it." Users still self-change at /_app/profile with
old-password verification; this is the operator override.

(2) Added window.confirm() inside DeleteUserModal as a defense-in-
depth guard. Earlier in this session the delete confirmation modal
was reportedly failing to render (turned out to be a stale browser
bundle), but "user deleted by mistake" is non-recoverable so adding
a native browser-modal confirm catches any future regression where
the visual confirm step disappears. Cheap insurance.

API addition: adminResetUserPassword(username, newPassword) in
frontend/src/api/users.ts. No new backend route — reuses the
existing /users/{username}/edit action.

Typecheck clean. 106/106 frontend tests pass (no test changes).
User feedback: the belt-and-braces window.confirm() added in
8bee3c5 creates a redundant second confirmation. The visual
modal (red "Delete <username>" button, with the
"This will permanently remove..." copy) is sufficient.

Reverting the window.confirm() call back to a plain
mutation.mutate(). The earlier no-confirm bug that motivated the
window.confirm() turned out to be a stale browser-bundle cache,
not a logic issue — the modal itself works correctly.
Closes the auth/04 gap discovered during the Kali E2E test: the OIDC
callback finishes with a server-side 302 to "/", the SPA bootstraps
fresh, and isAuthenticated() returns false even though the auth
cookies (osctrl_token + osctrl_csrf) are present. The router then
bounces straight back to /login, making federated login look broken
from the user's perspective even though every server-side step
succeeded (callback returned 302 /, JIT-provisioned the row, set
both cookies correctly).

The password-login path never hit this because login() in
client.ts calls setCsrfToken() directly from the JSON response
body before navigation — there's no "fresh page load" between
"got the cookie" and "isAuthenticated() must return true."

== Fix ==

main.tsx now calls primeCsrfFromCookie() ONCE at boot, before the
router's beforeLoad guards run. The helper reads osctrl_csrf out of
document.cookie (it's intentionally NOT HttpOnly precisely so the
SPA can read it for X-CSRF-Token headers) and seeds
csrfTokenInMemory. isAuthenticated() then returns true on the OIDC-
flow's first render exactly the same way it does on the password-
flow's first render.

The osctrl_token cookie stays HttpOnly — we don't need or want JS
to read it. Its presence is inferred via osctrl_csrf using the
dual-cookie pattern the API has always used.

== Side-effect ==

logout() now also expires osctrl_csrf on the client. Without that,
the next page load would call primeCsrfFromCookie() again and the
user would auto-log-back-in. osctrl_token stays HttpOnly so the JS
can't clear it directly — the server-side Max-Age=tokenExp is the
authoritative TTL there.

== Tests added (4 subtests in src/api/client.test.ts) ==

  primeCsrfFromCookie does nothing when no cookie is present
  primeCsrfFromCookie seeds the in-memory CSRF token from the cookie
  primeCsrfFromCookie ignores empty cookie value
  primeCsrfFromCookie picks the right cookie among multiple

All 99 frontend subtests pass (was 95). tsc --noEmit clean.
Replaces the free-text "Node UUIDs (comma-separated)" input with a
searchable combobox of the env's enrolled nodes. Operators no
longer have to remember or copy-paste UUIDs to target a node — they
type the hostname (or part of the UUID) and pick from the
dropdown. Selected nodes appear as chips with friendly hostnames;
remove on click. Wire format stays UUID-based (the API filter
expects UUIDs); the picker is presentation only.

Implementation notes:

- Fetches the env's nodes via listNodes(pageSize=500). For tenants
  with more than 500 nodes the picker shows a representative slice;
  operators with that many should use the Platforms + Tags chips
  to target rather than handpick. Documented in a code comment.
- Dropdown filters by either UUID prefix or hostname substring,
  case-insensitive. Caps at 20 results visible — anything beyond is
  a filter-too-broad signal, not a meaningful target list.
- Selected UUIDs are deduplicated; the picker's filter excludes
  already-selected nodes so the dropdown shows only adds.
- Hostnames free-text input below stays for not-yet-enrolled hosts
  (the API filter accepts both lists). Operators selecting an
  enrolled node use the combobox; operators targeting a host that
  hasn't enrolled yet use the text field.

== Files ==

  frontend/src/components/forms/TargetSelector.tsx
    + listNodes import, nodeFilter state, addUuid/removeUuid
    * UUIDs section replaced with chip+combobox UI
    * uuidRaw state removed (the wire is now driven by chips)

== Verified ==

- npm run check + npm test (104 tests) green
Two findings from the Kali test:

1) The queries/new page uses its own TargetingPanel (not the shared
   TargetSelector that carves uses), so the node-combobox commit
   didn't take effect on /queries/new. This brings the same UX to
   it.

2) The combobox style was wrong UX. The user wanted to type inside
   the existing input box and see matches inline as they type — not
   a separate dropdown panel. Same field as before, with suggestions
   appearing below it only while focused and only when there's a
   non-empty trailing token after the last comma.

== Implementation ==

  TargetingPanel.tsx
    + listNodes import + nodes-for-target useQuery (pageSize=500)
    + TypeaheadInput helper component:
        * Same input shape, free-text + comma-separated still works
        * Filters allNodes by trailing token (post-last-comma)
        * Matches against BOTH uuid and hostname regardless of
          which field — typing "web" in the UUID field still finds
          web-01 (and vice-versa)
        * onMouseDown (not onClick) so picks fire before input blur
          unmounts the suggestion list
        * onBlur delayed 120ms so paste-and-tab still commits raw
          values the picker didn't surface
        * Caps at 8 suggestions visible — anything more is filter-
          too-broad signal
    * UUIDs section: TypeaheadInput searchKey='uuid' → commits
      picked UUIDs into value.uuids
    * Hostnames section: TypeaheadInput searchKey='hostname' →
      commits picked hostnames into value.hosts
    * Existing ChipList renders selected items below each input,
      unchanged

== Verified ==

- npm run check + npm test (104 tests) green
The Hostnames field added no capability on top of the Nodes typeahead
— the typeahead already matches both UUID and hostname, and commits
the picked node's UUID either way. Two inputs created confusion
about which field to type into; one is clearer.

Wire format unchanged: the picker still emits UUIDs into
value.uuids. value.hosts stays in the data shape (callers might
populate it via deep links) but the picker no longer surfaces it
as an input. If a real "raw hostname for not-yet-enrolled hosts"
need shows up later, an Advanced-disclosure panel is the right
place — not a parallel primary field.

== Files ==

  frontend/src/features/queries/components/TargetingPanel.tsx
    * Rename "Node UUIDs" → "Nodes"
    * Drop the Hostnames section + commitHosts/removeHost/hostRaw
      state
    * NodeChipList renders the picked UUIDs labeled by hostname
      so the operator sees readable names instead of hex prefixes
  frontend/src/components/forms/TargetSelector.tsx
    * Drop the Hostnames free-text section + parseList helper
    * Doc comment updated to reflect the three remaining sections

== Verified ==

- npm run check + npm test (104 tests) green
Replaces the standalone "filter input + floating dropdown panel +
chip pile above" with the same inline-typeahead-in-existing-input
pattern the queries page uses. As the operator types comma-
separated values, the trailing token (post-last-comma) matches
against the env's nodes; suggestions appear below the input and a
click commits the UUID + appends to the input. Chips render below.
Free-text paste still works via onBlur commit.

Same input, same field, no separate dropdown panel — matches the
queries/new behavior so carves/new stops looking different from
queries/new.

== Files ==

  frontend/src/components/forms/TargetSelector.tsx
    + uuidRaw state, lastToken memo, focused state
    + commitUuids / pickUuid / removeUuid that all keep uuidRaw and
      value.uuids in sync
    * Nodes section: inline input with inline suggestion list below
      it; chips render under the input rather than above

== Verified ==

- npm run check + npm test (104 tests) green
Restructures CarveRunPage to use the same grid as QueryRunPage:
form fields stack on the left (2/3 width on lg), Targeting panel
sticks on the right (1/3 width). Keeps the two functionally-sibling
pages visually parallel — same shell, same sticky-right pattern,
same TargetingPanel.

Switches from the standalone TargetSelector (single-column) to
TargetingPanel (the more compact, chip-forward version used on
queries/new). The TargetingPanel already has the inline-typeahead
node picker from the previous commits, so carves now gets that UX
for free.

== Files ==

  frontend/src/features/carves/CarveRunPage.tsx
    * Replaces TargetSelector with TargetingPanel
    * Outer scroll container wrapped in lg:grid-cols-3 grid
    * Path hero + samples + expiration sections stay in the left
      column (lg:col-span-2)
    * Targeting moved to a sticky aside on the right
      (lg:col-span-1, lg:sticky lg:top-4)
    * StickyFooter unchanged

== Verified ==

- npm run check + npm test (104 tests) green
Two UX tweaks to the Nodes typeahead so multi-select-via-typing
actually flows:

1) After picking a suggestion, append ", " to the input so the
   cursor is immediately ready for the next token. Previously a
   pick left the input at "uuid1" with no trailing separator,
   forcing the operator to manually type ", " before the typeahead
   would activate again on a fresh token.

2) Show suggestions when the input ends with a trailing comma
   (regardless of whether anything's been typed after it). After
   pick #1 the input is "uuid1, " — endsWithComma=true → the
   dropdown stays open showing top-8 unselected nodes, so the
   operator can keep clicking without re-focusing or typing.

With both: type "web", click "web-01" → input becomes
"web-01, " with suggestions list still open showing the next
unselected nodes. Click "web-02" → "web-01, web-02, " and so on.

The wire format is unchanged — commitUuids still parses via
parseList which drops the trailing empty segment.

== Files ==

  frontend/src/features/queries/components/TargetingPanel.tsx
    + endsWithComma regex check
    * filtered() shows top-8 unselected when trailing comma + no
      active token
    * onPick appends ", " to uuidRaw

== Verified ==

- npm run check + npm test (104 tests) green
Archive routes through the same DELETE endpoint with archive=true,
which the server gates on AdminLevel for the env. Operators without
env-admin (e.g. alice with only user/query/carve on dev) would get
a 403 if they clicked it — UI defense-in-depth says don't render
the button. Same canDeleteNode gate the Delete button already uses.

== Files ==

  frontend/src/features/nodes/NodeDetailPage.tsx
    * Archive button wrapped in {canDeleteNode && (...)}

== Verified ==

- npm run check + npm test (104 tests) green
The "Refresh" button was a disabled placeholder with a CSS tooltip
explaining that the API isn't implemented yet — operators flagged
it as confusing (looks broken). Better to hide the affordance
entirely until the endpoint lands than to show a greyed-out button
that needs a hover to reveal it's a stub.

When POST /nodes/{env}/refresh ships, restore the button as a live
control (and gate on the same canDeleteNode permission since
config-refresh is config-write).

== Files ==

  frontend/src/features/nodes/NodeDetailPage.tsx
    * Refresh button + tooltip removed; comment marker left so
      whoever wires the endpoint knows where to put it back

== Verified ==

- npm run check + npm test (104 tests) green
User asked "are there any duplicates in my activity?" and found
near-identical audit-log rows clustered in second-buckets. Root cause:
the SPA's AuditPage was hitting GET /audit-logs every ~1.5s instead
of every 10s, AND every GET wrote a "user X visited /audit-logs"
audit row of its own — i.e. the audit log was self-polluting at
~0.7 rows/sec idle.

Two-front fix:

1. Backend (cmd/api/handlers/audit.go): drop the AuditLog.Visit
   call from AuditLogsHandler. Other GET handlers in this codebase
   log visits for SoC traceability, but for the audit-log endpoint
   the cost-benefit flips — every page open creates rows that
   drown out the state-changing events the table actually exists
   to record. Logging "alice viewed her own activity 50 times"
   is loud noise, not signal.

2. Frontend (AuditPage.tsx): memoize the apiQuery object so its
   reference is stable across renders, and bump staleTime from 10s
   to 30s. TanStack Query 5 hashes queryKey content structurally
   so reference equality shouldn't matter in theory, but in
   practice a parent re-render that rebuilds this object can
   interact badly with React StrictMode double-invoke and trigger
   refetch storms. Belt-and-braces.

Database evidence: 127,387 rows total, of which ~73k were
/audit-logs visits in the last 2 hours alone (>half the entire
table). With this fix the table will stop growing during normal
audit-log viewing.
@javuto javuto self-requested a review May 19, 2026 19:02
@javuto javuto added ⭐️ frontend Frontend related issues osctrl-api osctrl-api related changes osctrl-admin osctrl-admin related changes 🔐 security Security related issues labels May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⭐️ frontend Frontend related issues osctrl-admin osctrl-admin related changes osctrl-api osctrl-api related changes 🔐 security Security related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants