Skip to content

Add beta Bazel JVM manifest support#1312

Open
Simon (simonhj) wants to merge 11 commits into
v1.xfrom
simon/bazel-manifest-cli
Open

Add beta Bazel JVM manifest support#1312
Simon (simonhj) wants to merge 11 commits into
v1.xfrom
simon/bazel-manifest-cli

Conversation

@simonhj
Copy link
Copy Markdown

@simonhj Simon (simonhj) commented May 13, 2026

Summary

Adds beta Bazel JVM SBOM support to Socket CLI.

Bazel is multi-language, but this PR starts with Bazel + Maven because many Bazel JVM repos declare Maven dependencies through rules_jvm_external in MODULE.bazel or WORKSPACE instead of committing a manifest Socket can already scan. The extractor asks Bazel what Maven artifacts it resolved, converts that into a maven_install.json-shaped manifest, and sends it through the existing scan pipeline.

What changed

  • Adds socket manifest bazel [beta], a generation-only command for producing Bazel JVM SBOM manifests.
  • Extends socket scan create --auto-manifest so Bazel workspaces are detected automatically and scanned through the normal scan-create flow.
  • Detects Bazel workspaces via MODULE.bazel, WORKSPACE, or WORKSPACE.bazel.
  • Supports Bzlmod and legacy WORKSPACE invocation modes.
  • Discovers Maven repos from Bazel-visible repos / workspace metadata, including custom repo names beyond @maven.
  • Parses jvm_import and aar_import rules from bazel query --output=build.
  • Uses unsorted_deps.json when available as a faster structured source.
  • Writes auto-manifest output under .socket-auto-manifest/maven_install.json so we do not overwrite a repo's checked-in maven_install.json.
  • Adds docs, changelog entry, socket.json defaults, fixtures, and test coverage.

User flow

Generate only:

socket manifest bazel .

Generate and upload in one step:

socket scan create --auto-manifest .

Testing

Tested with unit and integration coverage for:

  • Bazel binary resolution.
  • Bzlmod vs legacy WORKSPACE detection.
  • Bazel query argv construction.
  • Maven repo discovery, including custom repo names and multi-repo cases.
  • jvm_import / aar_import parsing.
  • unsorted_deps.json parsing.
  • Normalization into maven_install.json.
  • Auto-manifest dispatch and .socket-auto-manifest/ output behavior.

Also tested against a corpus of Bazel repositories covering constructed and real-world Maven extraction cases, Bzlmod, legacy WORKSPACE, custom repo names, pinned and unpinned lockfile flows, and scan-create auto-manifest behavior.

Local checks:

git diff --check origin/v1.x
pnpm check:tsc
pnpm exec eslint --report-unused-disable-directives eslint.config.js src/commands/fix/coana-fix.mts src/commands/manifest src/commands/scan/handle-create-new-scan.mts src/commands/scan/handle-create-new-scan.test.mts src/utils/socket-json.mts vitest.config.mts
pnpm exec vitest run src/commands/manifest/bazel/bazel-bin-detect.test.mts src/commands/manifest/bazel/bazel-build-parser.test.mts src/commands/manifest/bazel/bazel-java-shim.test.mts src/commands/manifest/bazel/bazel-output-base-check.test.mts src/commands/manifest/bazel/bazel-python-shim.test.mts src/commands/manifest/bazel/bazel-query-runner.test.mts src/commands/manifest/bazel/bazel-repo-discovery.test.mts src/commands/manifest/bazel/bazel-workspace-detect.test.mts src/commands/manifest/bazel/cmd-manifest-bazel.test.mts src/commands/manifest/bazel/extract_bazel_to_maven.test.mts src/commands/manifest/bazel/generate_auto_manifest.bazel.constructed.test.mts src/commands/manifest/detect-manifest-actions.test.mts src/commands/manifest/generate_auto_manifest.test.mts src/commands/manifest/cmd-manifest.test.mts src/commands/scan/handle-create-new-scan.test.mts

Note

Medium Risk
Adds a new Bazel-based manifest generation pipeline that shells out to bazel/bazelisk, reads workspace outputs, and feeds generated files into scan create target discovery; failures or environment differences (Java/Python/Bazel setup, permissions, timeouts) could impact scan creation behavior.

Overview
Adds beta Bazel JVM SBOM support via a new socket manifest bazel subcommand that discovers rules_jvm_external Maven repos (Bzlmod and legacy WORKSPACE), runs Bazel queries / reads unsorted_deps.json, and normalizes the results into a maven_install.json-shaped manifest.

Extends socket scan create --auto-manifest to detect Bazel workspaces (MODULE.bazel, WORKSPACE, WORKSPACE.bazel), generate the Bazel manifest into a sidecar directory, and include those generated files in subsequent scan file discovery.

Introduces Bazel-specific hardening and plumbing (bazel binary resolution, Java/Python prerequisites, --output_base validation, bounded parsers/DoS guards), plus new socket.json defaults, docs/changelog updates, fixtures, and broad unit/integration test coverage.

Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.

Copy link
Copy Markdown
Contributor

@mtorp Martin Torp (mtorp) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👏

Comment on lines +19 to +25
const FIXTURE = path.join(
process.env['HOME'] as string,
'src',
'bazel-bench',
'constructed',
'java-maven',
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this may be a reference to a local path. Is than intentional?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. All three test files referencing ~/src/bazel-bench have been removed — they depended on a local fixture not available to open-source users.

Comment thread src/commands/manifest/README.md Outdated
- `--bazel-flags <str>` — flags forwarded to every bazel invocation (single quoted string).
- `--bazel-output-base <dir>` — Bazel `--output_base` for read-only-cache CI environments.
- `--out <dir>` — output directory; default `./.socket/bazel-manifests/`.
- `--dry-run`, `--verbose`, `--json`, `--markdown` — standard diagnostic flags.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says --json and --markdown aren't implemented for this subcommand. They're probably not necessary, so I suggest removing them from this readme.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed --json and --markdown from the options list.

Comment on lines +119 to +122
const output = await spawn(opts.bin, argv, {
cwd: opts.cwd,
...(opts.env ? { env: opts.env } : {}),
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do Bazel queries typically take to execute? Would it make sense to add some default generous timeout to this operation just in case it hangs in a CI run for whatever reason.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a BAZEL_QUERY_TIMEOUT_MS = 600_000 (10 min) constant and passed it to both spawn() calls. Bazel cold-cache starts can take a few minutes; 10 min should be generous without letting a hung server block CI indefinitely.

function extractAttr(body: string, attr: string): string | undefined {
// Match `<attr> = "VALUE"` — quoted-string attrs only.
// Quoted value capped at 4 KiB; canonical Maven URLs are ~150 bytes.
const re = new RegExp(`\\b${attr}\\s*=\\s*"([^"\\n]{0,4096})"`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving the regex construction out of the function body to avoid having to recreate it at every call.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added ATTR_RE_CACHE and TAG_RE_CACHE Maps at module level; both extractAttr and extractTagValue now check the cache before constructing a new regex.

- Remove three integration test files that depend on ~/src/bazel-bench,
  a local fixture not available to open-source users
- Drop --json and --markdown from the manifest bazel README options
  since those flags are not implemented for this subcommand
- Add a 10-minute timeout to bazel spawn calls to prevent CI hangs
  on cold-cache or stalled bazel server invocations
- Cache per-attr and per-tag-key regexes at module level in
  bazel-build-parser to avoid recompiling on every rule block
@simonhj Simon (simonhj) marked this pull request as ready for review May 16, 2026 12:34
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.

mkdirSync(resolved, { recursive: true })
} catch (e) {
throw new InputError(
`--bazel-output-base could not be created at ${resolved}: ${(e as Error).message}`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad-hoc error cast instead of error helpers

Low Severity

(e as Error).message is used in a catch block to extract the error message. The repo convention requires using helpers from @socketsecurity/lib/errors (e.g., errorMessage(e)) or getErrorCause(e) from utils/errors.mts instead of ad-hoc error type casts.

Fix in Cursor Fix in Web

Triggered by learned rule: Use @socketsecurity/lib/errors helpers — no ad-hoc error type checks

Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.

// Always surface the error message; users should not have to
// re-run a multi-minute bazel build with --verbose just to see whether
// the failure was a missing dependency, permission error, or network blip.
const msg = e instanceof Error ? e.message : String(e)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad-hoc instanceof Error ternary for error message

Low Severity

e instanceof Error ? e.message : String(e) is used in the top-level catch block. The repo convention requires using errorMessage(e) from @socketsecurity/lib/errors or getErrorCause(e) from utils/errors.mts instead of ad-hoc instanceof Error ternaries extracting .message.

Fix in Cursor Fix in Web

Triggered by learned rule: Use @socketsecurity/lib/errors helpers — no ad-hoc error type checks

Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.

if (verbose) {
logger.log(
`[VERBOSE] discovery: probe @${repoName}: REJECT (probe threw):`,
e instanceof Error ? e.message : String(e),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad-hoc instanceof Error ternary in verbose logging

Low Severity

e instanceof Error ? e.message : String(e) is used in the validateMavenRepo catch block. The repo convention requires using errorMessage(e) from @socketsecurity/lib/errors or getErrorCause(e) from utils/errors.mts instead of ad-hoc instanceof Error ternaries.

Fix in Cursor Fix in Web

Triggered by learned rule: Use @socketsecurity/lib/errors helpers — no ad-hoc error type checks

Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.

} catch {
// Ignore errors.
}
rmSync(tmp, { recursive: true, force: true })
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw rmSync used instead of safeDelete in tests

Low Severity

Multiple new test files use rmSync for temp directory cleanup. The repo convention requires using safeDelete from @socketsecurity/lib/fs instead of raw fs.rmSync. Since these are async test suites, the async safeDelete() variant is preferred.

Additional Locations (2)
Fix in Cursor Fix in Web

Triggered by learned rule: No raw fs.rm or rm -rf — use safeDelete from @socketsecurity/lib/fs

Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants