Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,65 @@ for that specific tag for the per-commit details.
exists, format-conforms to `<64-hex> <path>`, and excludes itself.
Full suite: 3672 tests / 0 failures / 0 errors.

- **Production-readiness PR 4 of 5 — observability.** Closes the missing-MDC,
hot-path-health-probe, MCP-error-leak, and structured-logging gaps.
- **`RequestIdFilter` (new).** Populates SLF4J `MDC.request_id` FIRST in
the security chain so every downstream filter, controller, MCP tool,
and exception handler sees the same correlation ID. Strict allow-list
on inbound `X-Request-Id` (8–64 hex/dash/underscore chars) prevents
log-forging; bad inputs are replaced with a generated UUID. Echoes
the ID back to the client in the `X-Request-Id` response header. MDC
is cleared in `finally` to prevent leak across pooled threads (both
Tomcat platform and virtual-thread carriers). Pre-PR-4 every
`MDC.get("request_id")` call returned null; the four downstream
consumers (BearerAuthFilter, RateLimitFilter, GraphController,
GlobalExceptionHandler) all generated synthetic UUIDs that never
correlated.
- **JSON-structured logging** (`logback-spring.xml`). Serving profile
switches the encoder from `%msg%n` plaintext to LogstashEncoder
(`logstash-logback-encoder` 9.0 — MIT). One JSON event per log line
with `ts`, `level`, `logger`, `thread`, `msg`, `stack`, all MDC
entries (`request_id`), and a static `application: codeiq` field for
multi-pod ingestion. Indexing/CLI profiles keep plaintext to avoid
JSON noise leaking into `codeiq index` output.
- **`GraphHealthIndicator` 30s TTL cache.** Pre-PR-4 every readiness
probe (k8s default ~1Hz) ran a `MATCH (n) RETURN count(n)` Cypher
query — wakes the page cache, competes with API traffic.
`AtomicReference<CachedHealth>` lock-free cache absorbs the flood;
one underlying probe per 30s regardless of caller concurrency.
Error response sanitized too: pre-PR-4 the `error` detail
surfaced `e.getMessage()` (CodeQL `java/error-message-exposure`,
permitAll endpoint = anonymous probers). Now only an `error_class`
indicator; full stack is logged at WARN.
- **Liveness/readiness groups** (`application.yml`). Pre-PR-4
`GraphHealthIndicator` contributed to BOTH probes — a graph-down
event would flap the pod (k8s killing it) instead of just routing
away. Pinned to readiness only:
`liveness: livenessState`, `readiness: readinessState,
graphHealthIndicator`.
- **Prometheus metrics** (`/actuator/prometheus`). Added
`micrometer-registry-prometheus` dep. Exposed under the bearer-
authenticated `/actuator/**` rule (NOT permitAll — full metrics
tree is reconnaissance data). Application tag `codeiq` for
multi-pod scraping. Step interval 10s.
- **Structured MCP error envelope.** Pre-PR-4 every MCP tool catch
block returned `toJson(Map.of("error", e.getMessage()))` — flat
string, no correlation. Refactored to a centralized
`errorEnvelope(code, e)` helper that returns
`{code, message, request_id, error}` (legacy `error` field
preserved for backwards-compat). Codes assigned per failure
category: `INTERNAL_ERROR`, `INVALID_INPUT`, `FILE_READ_FAILED`,
`SERIALIZATION_FAILED`. Full exception logged server-side with
request_id; only sanitized envelope reaches the client. `readFile`
no longer concatenates `e.getMessage()` into a string (CWE-209).
- **Tests:** new `RequestIdFilterTest` (7 cases — UUID generation,
header pass-through, control-char rejection, length bounds, MDC
clear-in-finally including throw path). `GraphHealthIndicatorTest`
extended with cache-hit assertion (3 calls → 1 underlying
`count()`) and updated for sanitized error fields.
`McpToolsTest#readFileShouldHandleMissingFile` updated for new
envelope contract. Full suite: 3680 tests / 0 failures / 0 errors.

## [0.1.0] - 2026-03-28

First general-availability cut. See the
Expand Down
7 changes: 7 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,13 @@ bean for code paths that haven't been ported yet.
- **`serve.sh` and `serve.bat` MUST NOT contain network calls.** Audit RAN-46 §3 — air-gapped deploy model. Pre-PR-3 these scripts had `curl -fL https://repo1.maven.org/...` to download the CLI JAR on first run; that's gone. Receivers must `--include-jar` when bundling or stage the JAR from an internal mirror. There's a regression test in `BundleCommandTest#bundleCreatesZipWithCorrectStructure` that asserts `serve.sh` contains neither `curl` nor `maven.org` — keep that test green.
- **`.dockerignore` does NOT inherit `.gitignore`.** Docker resolves COPY against the build context, which includes uncommitted/untracked working-tree files. `.gitignore` only stops things being staged; it has no effect on what `docker build` sees. Mirror the secret-pattern globs explicitly in `.dockerignore` (`.env*`, `*.jks`, `id_rsa`, `credentials.{json,yaml}`, etc.). Pre-PR-3 the `.dockerignore` was 9 lines and would have shipped a `.env.prod` straight into a published image.
- **Semgrep is pinned to `semgrep==1.161.0`** in `.github/workflows/security.yml`. Bumps go through Dependabot's pip ecosystem on a documented cadence — `pip install --upgrade semgrep` (floating) was previously flagged by Scorecard `Pinned-Dependencies`. Don't unpin to "always get latest"; a CI-time auto-bump on a security-scanner can break the build silently when the new release adds rules.
- **`RequestIdFilter` MUST stay outermost in the security chain.** Filter chain order (set in `SecurityConfig#servingFilterChain`): RequestIdFilter → SecurityHeadersFilter → RateLimitFilter → BearerAuthFilter. If you reorder and put RequestIdFilter inside, the rate-limit reject and auth-reject log lines will fire BEFORE MDC is populated, and clients won't get a correlated `X-Request-Id` response header on 401/429. The filter is registered LAST in the `addFilterBefore(..., UPAFilter.class)` sequence because each `addFilterBefore` pushes the previously-inserted filter further from the target — so the LAST registration is the OUTERMOST in the actual chain.
- **Inbound `X-Request-Id` is allow-list validated, NOT escaped.** `RequestIdFilter` accepts only `[A-Za-z0-9_-]{8,64}`; anything else is replaced with a generated UUID. Don't try to "be helpful" by lowercasing/trimming — the allow-list is the defense, not the sanitizer. CWE-117 log-forging via `X-Request-Id: \nINFO: granted access` is impossible because the value never reaches a logger before validation.
- **MDC must be cleared in finally — even on throw.** `RequestIdFilter.doFilterInternal` puts MDC.request_id, runs the chain in a try-block, and clears in finally. There's a regression test (`RequestIdFilterTest#clearsMdcEvenWhenChainThrows`) that asserts the clear runs when the chain throws. Tomcat's NIO worker pool and virtual-thread carriers both reuse threads; a leaked MDC entry from request N is visible to request N+1 if you skip the finally.
- **`GraphHealthIndicator` is on the readiness probe ONLY.** Configured via `management.endpoint.health.group.{liveness,readiness}` in `application.yml`. Liveness = "JVM up; restart on fail". Readiness = "graph loaded; route traffic only when up". Putting `graphHealthIndicator` on the liveness probe means a graph-down event flaps the pod (k8s killing it) instead of just routing away. If you add another HealthIndicator that's about app health (not infra dependencies), it goes on liveness; if it's about a specific dependency (DB, message bus), it goes on readiness.
- **Logback `serving` profile uses JSON, others use plaintext.** `logback-spring.xml` defines a `JSON` appender (LogstashEncoder) used only by the `<springProfile name="serving">` block. CLI / indexing / default profiles use the `CONSOLE` (plaintext `%msg%n`) appender so that `codeiq index` doesn't dump JSON-shaped log lines into the user's terminal. Don't switch the global default appender to JSON — the CLI is interactive and the project deliberately uses `System.out` for status messages alongside the logger.
- **MCP errors return a structured envelope `{code, message, request_id, error}`.** `McpTools#errorEnvelope(code, exception)` is the canonical wrapper. Codes: `INTERNAL_ERROR` (catch-all), `INVALID_INPUT` (IllegalArgumentException), `FILE_READ_FAILED` (file IO), `SERIALIZATION_FAILED` (Jackson). The legacy `error` field is preserved for backwards-compat with older MCP clients reading `error` directly — DON'T remove it without grepping for clients in this org first. Full exception always logged server-side at WARN with request_id; only sanitized envelope reaches the client.
- **`/actuator/prometheus` is bearer-authenticated, NOT permitAll.** It matches the `/actuator/**` rule in `SecurityConfig`. Don't add `prometheus` to the permitAll list "for the scraper" — Prometheus's `bearer_token_file` config field exists for exactly this case. Anonymous metric scraping is reconnaissance data (request rates, error counts, JVM internals). The application tag `codeiq` is set via `management.metrics.tags.application` so multi-pod ingestion is filterable.

## Supply-chain observability (OpenSSF)

Expand Down
18 changes: 18 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,24 @@
<version>8.18.0</version>
</dependency>

<!-- Logstash JSON encoder (MIT). Drops a structured JSON line per log
event with timestamp, level, logger, thread, message, MDC entries
(request_id, etc.), and optional stack trace. Used in the serving
profile only — the indexing/CLI profiles keep the human-readable
%msg%n encoder. ~250 KB. -->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>9.0</version>
</dependency>

<!-- Micrometer Prometheus registry. Exposed at /actuator/prometheus
behind bearer auth. Spring Boot manages the version via its BOM. -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- Neo4j Embedded (Community Edition) -->
<dependency>
<groupId>org.neo4j</groupId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
package io.github.randomcodespace.iq.config.security;

import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.slf4j.MDC;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Component;
import org.springframework.web.filter.OncePerRequestFilter;

import java.io.IOException;
import java.util.UUID;
import java.util.regex.Pattern;

/**
* First filter in the {@code serving} chain — populates {@code request_id} in
* the SLF4J MDC for every request so downstream code (filters, controllers,
* MCP tools, exception handlers) can read it via {@link MDC#get(String)} and
* the JSON log encoder includes it on every emitted line.
*
* <p><b>ID source priority:</b>
* <ol>
* <li>Inbound {@code X-Request-Id} header, if it matches a strict UUID-or-
* hex/dash pattern (defense against header injection — see CWE-117).
* This lets upstream load balancers / API gateways propagate trace IDs.</li>
* <li>Generated {@link UUID#randomUUID()} otherwise.</li>
* </ol>
*
* <p>The same value is echoed back to the client in the {@code X-Request-Id}
* response header so a caller seeing a 401/429/500 can quote it in support
* channels and operators can grep their JSON logs for it.
*
* <p><b>MDC discipline.</b> The MDC is request-scoped via SLF4J's
* {@code ThreadLocal} — the {@code finally} block clears it so the next
* request on the same (virtual or platform) thread doesn't inherit a stale
* ID. Without this, a thread-pool reuse leaks IDs across requests.
*
* <p><b>Filter chain position.</b> Registered FIRST in
* {@code SecurityConfig#servingFilterChain} (before SecurityHeadersFilter,
* RateLimitFilter, BearerAuthFilter) so the rate-limiter and auth-rejection
* log lines all include the same {@code request_id} that the client receives
* back in the response. If you reorder filters, this MUST stay outermost.
*/
@Component
@Profile("serving")
public class RequestIdFilter extends OncePerRequestFilter {

static final String HEADER = "X-Request-Id";
static final String MDC_KEY = "request_id";

/**
* Strict allow-list for inbound {@code X-Request-Id} header values.
* Hex digits, dashes, underscores, and length 8–64 — accommodates UUIDs
* (with or without dashes), Stripe-style {@code req_*} (after stripping
* the prefix in upstream gateways), and short hex correlation IDs.
* Anything else gets replaced with a generated UUID — log forging via
* {@code X-Request-Id: \nINFO: granted access} is impossible.
*/
private static final Pattern VALID_REQUEST_ID = Pattern.compile("^[A-Za-z0-9_-]{8,64}$");

@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain chain) throws ServletException, IOException {
String inbound = request.getHeader(HEADER);
String requestId = (inbound != null && VALID_REQUEST_ID.matcher(inbound).matches())
? inbound
: UUID.randomUUID().toString();
MDC.put(MDC_KEY, requestId);
response.setHeader(HEADER, requestId);
try {
chain.doFilter(request, response);
} finally {
// Always clear — virtual-thread carriers and Tomcat platform
// threads both pool, so a leaked MDC entry from request N is
// visible to request N+1 if we skip this.
MDC.remove(MDC_KEY);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ public SecurityFilterChain servingFilterChain(
HttpSecurity http,
BearerAuthFilter bearerAuthFilter,
SecurityHeadersFilter securityHeadersFilter,
RateLimitFilter rateLimitFilter) throws Exception {
RateLimitFilter rateLimitFilter,
RequestIdFilter requestIdFilter) throws Exception {
http
// CSRF is suppressed for ALL paths via ignoringRequestMatchers("/**")
// (functionally equivalent to .csrf().disable() but avoids the literal
Expand Down Expand Up @@ -69,16 +70,22 @@ public SecurityFilterChain servingFilterChain(
.requestMatchers("/api/**", "/mcp/**", "/actuator/**").authenticated()
.anyRequest().denyAll())
// Filter chain order (outermost → innermost):
// 1. SecurityHeadersFilter — adds defensive response headers always.
// 2. RateLimitFilter — 429 before any auth or DB work; throttles
// 1. RequestIdFilter — populates MDC.request_id FIRST so every
// downstream log line (rate-limit reject,
// auth-reject, error envelope) carries the
// same ID and the client gets it back in
// X-Request-Id.
// 2. SecurityHeadersFilter — adds defensive response headers always.
// 3. RateLimitFilter — 429 before any auth or DB work; throttles
// unauthenticated brute-force too.
// 3. BearerAuthFilter — token validation; 401 if missing/wrong.
// 4. BearerAuthFilter — token validation; 401 if missing/wrong.
// Each addFilterBefore(X, UsernamePasswordAuthenticationFilter.class) inserts
// X immediately before UPAFilter, pushing the previously-inserted filter farther
// from the target — so the registration order here IS the chain order.
.addFilterBefore(securityHeadersFilter, UsernamePasswordAuthenticationFilter.class)
.addFilterBefore(rateLimitFilter, UsernamePasswordAuthenticationFilter.class)
.addFilterBefore(bearerAuthFilter, UsernamePasswordAuthenticationFilter.class)
.addFilterBefore(requestIdFilter, UsernamePasswordAuthenticationFilter.class)
.formLogin(AbstractHttpConfigurer::disable)
.httpBasic(AbstractHttpConfigurer::disable)
.anonymous(AbstractHttpConfigurer::disable);
Expand Down
Loading
Loading