Add beta Bazel JVM manifest support#1312
Conversation
Martin Torp (mtorp)
left a comment
There was a problem hiding this comment.
Looks good 👏
| const FIXTURE = path.join( | ||
| process.env['HOME'] as string, | ||
| 'src', | ||
| 'bazel-bench', | ||
| 'constructed', | ||
| 'java-maven', | ||
| ) |
There was a problem hiding this comment.
Looks like this may be a reference to a local path. Is than intentional?
There was a problem hiding this comment.
Good catch. All three test files referencing ~/src/bazel-bench have been removed — they depended on a local fixture not available to open-source users.
| - `--bazel-flags <str>` — flags forwarded to every bazel invocation (single quoted string). | ||
| - `--bazel-output-base <dir>` — Bazel `--output_base` for read-only-cache CI environments. | ||
| - `--out <dir>` — output directory; default `./.socket/bazel-manifests/`. | ||
| - `--dry-run`, `--verbose`, `--json`, `--markdown` — standard diagnostic flags. |
There was a problem hiding this comment.
Claude says --json and --markdown aren't implemented for this subcommand. They're probably not necessary, so I suggest removing them from this readme.
There was a problem hiding this comment.
Removed --json and --markdown from the options list.
| const output = await spawn(opts.bin, argv, { | ||
| cwd: opts.cwd, | ||
| ...(opts.env ? { env: opts.env } : {}), | ||
| }) |
There was a problem hiding this comment.
How long do Bazel queries typically take to execute? Would it make sense to add some default generous timeout to this operation just in case it hangs in a CI run for whatever reason.
There was a problem hiding this comment.
Added a BAZEL_QUERY_TIMEOUT_MS = 600_000 (10 min) constant and passed it to both spawn() calls. Bazel cold-cache starts can take a few minutes; 10 min should be generous without letting a hung server block CI indefinitely.
| function extractAttr(body: string, attr: string): string | undefined { | ||
| // Match `<attr> = "VALUE"` — quoted-string attrs only. | ||
| // Quoted value capped at 4 KiB; canonical Maven URLs are ~150 bytes. | ||
| const re = new RegExp(`\\b${attr}\\s*=\\s*"([^"\\n]{0,4096})"`) |
There was a problem hiding this comment.
I suggest moving the regex construction out of the function body to avoid having to recreate it at every call.
There was a problem hiding this comment.
Done. Added ATTR_RE_CACHE and TAG_RE_CACHE Maps at module level; both extractAttr and extractTagValue now check the cache before constructing a new regex.
- Remove three integration test files that depend on ~/src/bazel-bench, a local fixture not available to open-source users - Drop --json and --markdown from the manifest bazel README options since those flags are not implemented for this subcommand - Add a 10-minute timeout to bazel spawn calls to prevent CI hangs on cold-cache or stalled bazel server invocations - Cache per-attr and per-tag-key regexes at module level in bazel-build-parser to avoid recompiling on every rule block
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.
| mkdirSync(resolved, { recursive: true }) | ||
| } catch (e) { | ||
| throw new InputError( | ||
| `--bazel-output-base could not be created at ${resolved}: ${(e as Error).message}`, |
There was a problem hiding this comment.
Ad-hoc error cast instead of error helpers
Low Severity
(e as Error).message is used in a catch block to extract the error message. The repo convention requires using helpers from @socketsecurity/lib/errors (e.g., errorMessage(e)) or getErrorCause(e) from utils/errors.mts instead of ad-hoc error type casts.
Triggered by learned rule: Use @socketsecurity/lib/errors helpers — no ad-hoc error type checks
Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.
| // Always surface the error message; users should not have to | ||
| // re-run a multi-minute bazel build with --verbose just to see whether | ||
| // the failure was a missing dependency, permission error, or network blip. | ||
| const msg = e instanceof Error ? e.message : String(e) |
There was a problem hiding this comment.
Ad-hoc instanceof Error ternary for error message
Low Severity
e instanceof Error ? e.message : String(e) is used in the top-level catch block. The repo convention requires using errorMessage(e) from @socketsecurity/lib/errors or getErrorCause(e) from utils/errors.mts instead of ad-hoc instanceof Error ternaries extracting .message.
Triggered by learned rule: Use @socketsecurity/lib/errors helpers — no ad-hoc error type checks
Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.
| if (verbose) { | ||
| logger.log( | ||
| `[VERBOSE] discovery: probe @${repoName}: REJECT (probe threw):`, | ||
| e instanceof Error ? e.message : String(e), |
There was a problem hiding this comment.
Ad-hoc instanceof Error ternary in verbose logging
Low Severity
e instanceof Error ? e.message : String(e) is used in the validateMavenRepo catch block. The repo convention requires using errorMessage(e) from @socketsecurity/lib/errors or getErrorCause(e) from utils/errors.mts instead of ad-hoc instanceof Error ternaries.
Triggered by learned rule: Use @socketsecurity/lib/errors helpers — no ad-hoc error type checks
Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.
| } catch { | ||
| // Ignore errors. | ||
| } | ||
| rmSync(tmp, { recursive: true, force: true }) |
There was a problem hiding this comment.
Raw rmSync used instead of safeDelete in tests
Low Severity
Multiple new test files use rmSync for temp directory cleanup. The repo convention requires using safeDelete from @socketsecurity/lib/fs instead of raw fs.rmSync. Since these are async test suites, the async safeDelete() variant is preferred.
Additional Locations (2)
Triggered by learned rule: No raw fs.rm or rm -rf — use safeDelete from @socketsecurity/lib/fs
Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.


Summary
Adds beta Bazel JVM SBOM support to Socket CLI.
Bazel is multi-language, but this PR starts with Bazel + Maven because many Bazel JVM repos declare Maven dependencies through
rules_jvm_externalinMODULE.bazelorWORKSPACEinstead of committing a manifest Socket can already scan. The extractor asks Bazel what Maven artifacts it resolved, converts that into amaven_install.json-shaped manifest, and sends it through the existing scan pipeline.What changed
socket manifest bazel [beta], a generation-only command for producing Bazel JVM SBOM manifests.socket scan create --auto-manifestso Bazel workspaces are detected automatically and scanned through the normal scan-create flow.MODULE.bazel,WORKSPACE, orWORKSPACE.bazel.@maven.jvm_importandaar_importrules frombazel query --output=build.unsorted_deps.jsonwhen available as a faster structured source..socket-auto-manifest/maven_install.jsonso we do not overwrite a repo's checked-inmaven_install.json.socket.jsondefaults, fixtures, and test coverage.User flow
Generate only:
socket manifest bazel .Generate and upload in one step:
socket scan create --auto-manifest .Testing
Tested with unit and integration coverage for:
jvm_import/aar_importparsing.unsorted_deps.jsonparsing.maven_install.json..socket-auto-manifest/output behavior.Also tested against a corpus of Bazel repositories covering constructed and real-world Maven extraction cases, Bzlmod, legacy WORKSPACE, custom repo names, pinned and unpinned lockfile flows, and scan-create auto-manifest behavior.
Local checks:
Note
Medium Risk
Adds a new Bazel-based manifest generation pipeline that shells out to
bazel/bazelisk, reads workspace outputs, and feeds generated files intoscan createtarget discovery; failures or environment differences (Java/Python/Bazel setup, permissions, timeouts) could impact scan creation behavior.Overview
Adds beta Bazel JVM SBOM support via a new
socket manifest bazelsubcommand that discoversrules_jvm_externalMaven repos (Bzlmod and legacyWORKSPACE), runs Bazel queries / readsunsorted_deps.json, and normalizes the results into amaven_install.json-shaped manifest.Extends
socket scan create --auto-manifestto detect Bazel workspaces (MODULE.bazel,WORKSPACE,WORKSPACE.bazel), generate the Bazel manifest into a sidecar directory, and include those generated files in subsequent scan file discovery.Introduces Bazel-specific hardening and plumbing (bazel binary resolution, Java/Python prerequisites,
--output_basevalidation, bounded parsers/DoS guards), plus newsocket.jsondefaults, docs/changelog updates, fixtures, and broad unit/integration test coverage.Reviewed by Cursor Bugbot for commit 28ad5cb. Configure here.