Bloodhound v2 – CI/CD, Validation, and Operational Documentation Updates#2
Open
mmccla1n wants to merge 86 commits into
Open
Bloodhound v2 – CI/CD, Validation, and Operational Documentation Updates#2mmccla1n wants to merge 86 commits into
mmccla1n wants to merge 86 commits into
Conversation
…wn controls Summary - Replace v1 script with v2 package architecture (scanner/budget/whitelist/teardown). - Slack reporting: scan summary (per region + totals, including 0 counts for scanned types), budget summary, teardown plan/results, and dedicated whitelisted resources list. - Whitelist: tag-based keep rule (default bloodhound:keep=true) plus optional KEEP_RESOURCE_IDS. - Teardown: dry-run by default; apply-mode gated by APPLY_CHANGES and supports simulate mode (TEARDOWN_SIMULATE) plus safety rails (TEARDOWN_TARGET_IDS, TEARDOWN_ALLOW_ALL). - Budgeting: 7-month cohort spend tracking and month-end projection via Cost Explorer. Operational - Add lambda handler entrypoint (lambda_function.lambda_handler) and local runner (run_local.py). - Add env.example and .env auto-loading for local runs. - Add .gitignore to prevent committing secrets/venvs/build zips. - Update requirements to resolve urllib3/botocore conflict. - Add v2 GitHub Actions workflow (invoke_lambda_v2.yml). - Add v2 plan doc and split Slack setup into SLACK_SETUP.md. Notes - v1 is preserved separately under versions/v1_0/ outside this repo directory; v2 deletes/terminations require explicit env flags.
- Move Lambda entrypoint into handlers/ and update Terraform handler + build pipeline - Move docs into docs/ and link from README - Move local runner + AWS helper JSON into tools/ - Remove empty scripts directory - Keep functionality unchanged (only paths/organization)
… the demo, feel free to use this just update with your aws profile, it builds 8 ec2, 2 rds and half get whitelisted, cleaned up readme
Mmc/bloodhound v2
…re for infra automation
…pendency split, and Slack manifest infrastructure
Key changes:
- Separate runtime and development dependencies
- requirements.txt now contains only Lambda runtime packages
- requirements-dev.txt added for local development/testing dependencies
- Document Lambda dependency strategy
- add docs/lambda_packaging.md explaining:
- why boto3 should not be bundled
- Lambda runtime dependency behavior
- packaging workflow
- future Docker-based packaging
- Add Lambda packaging flow documentation and diagrams
- Introduce Slack manifest-based configuration
- infra/slack/bloodhound_v2_manifest.json becomes Slack app source of truth
- add infra/slack/README.md documenting manifest structure and change policy
- Update docs/SLACK_SETUP.md to support manifest-based setup with manual fallback
- Update main README with improved onboarding flow and Slack setup references
- Improve infra documentation
- clarify Lambda environment variables and secret handling
- document Terraform build behavior
- Introduce Lambda alias infrastructure for versioned deployments and safe rollback
- infra/alias.tf
- lambda_alias_version_override variable
- Improve Terraform packaging pipeline documentation
- Update .gitignore and repo structure for build artifacts
- Prepare repo for future deterministic Docker-based Lambda packaging
No infrastructure behavior changes yet; Terraform deployment remains ZIP-based.
Docker packaging planned for future phase.
- added terraform aws account guard to prevent deploy to wrong account - added lifecycle.prevent_destroy to lambda iam role and policy - confirmed terraform plan safe (1 add, 2 update, 0 destroy) slack integration - wired /seek and /seek_destroy to lambda function url - verified slack -> lambda -> aws scan flow - confirmed async lambda invocation working - slack messages returning scan + budget + teardown plan validation - ran /seek from slack - confirmed lambda invocation in cloudwatch logs - lambda run time ~14s, memory usage normal - dry-run teardown confirmed (no deletes) docs - added slack validation doc (watching cloudwatch logs) - added safe operations guide for teardown controls - added future infra hardening notes status phase 3 complete (slack wired) phase 4 validation in progress
…eployment guard Infrastructure safety - Add Terraform apply-mode guard preventing deployment when APPLY_CHANGES=true unless allow_apply_mode=true - Add deletion cap via TEARDOWN_MAX_DELETE_COUNT to prevent large accidental teardown operations - Update lambda.tf and variables.tf with documented safety logic and comments Runtime safety - Extend TeardownConfig with max_delete_count - Add executor guard to abort teardown when plan exceeds deletion limit - Preserve simulate-mode protections and dry-run behavior Operational validation - Add docs/validate_teardown.md with full controlled teardown test procedure - Update Slack/Lambda validation documentation and common failure scenarios - Document CLI methods for verifying Lambda environment variables and CloudWatch logs Configuration updates - Update env.example and terraform.tfvars.example to include TEARDOWN_MAX_DELETE_COUNT - Clarify apply-mode behavior and Terraform deployment guard Documentation improvements - Expand README teardown controls and safety model - Update infra README with destructive-mode deployment guard - Improve V2_PLAN and validation guides for operational clarity These changes introduce defense-in-depth protections for Bloodhound teardown operations while providing reproducible validation workflows for engineers.
…entation - Add automated validation tooling: - tools/run_validation_workflow.sh - tools/smoke_test_lambda.sh - tools/validate_teardown.sh - tools/show_validation_history.sh - Implement controlled teardown validation pipeline - Add Terraform validation resource: - infra/test_resource.tf - Introduce validation logging and history tracking - Add configuration system documentation: - docs/configuration_system.md - docs/run_validation.md - Update teardown validation documentation - Improve README with configuration safety warning and documentation index - Rename architecture document to docs/bloodhound_v2_plan.md - Update env.example with safety guard configuration - Add Terraform variables for validation resources and safety controls This commit introduces a full validation framework for Bloodhound v2 including smoke testing, controlled teardown verification, configuration safety guards, and documentation for operational workflows.
…ove teardown validation tooling Core changes - Standardized Slack command routing to maintain internal modes and - Added support for preview mode while preserving existing destructive flow - Ensured Lambda worker receives correct execution flags (apply_changes / simulate) - Fixed Slack command handler logic and improved safety gating for destructive operations Slack integration - Updated Slack manifest to include , , , , and - Aligned manifest URLs with Lambda Function URL endpoint - Updated Slack command documentation and operational guidance Infrastructure - Updated Terraform outputs and test resource configuration - Improved Lambda smoke test tooling Validation & tooling - Added automated validation workflow scripts - Improved teardown validation scripts and history utilities - Added operational docs for Slack app usage and troubleshooting Documentation - Updated Slack setup documentation - Updated configuration system documentation - Updated validation workflow documentation - Added troubleshooting and operational runbooks
- Combined header and mode_text generation into a single decision block - Added explicit DRY RUN banner to prevent operator confusion”
…face - Implement /v2_status Slack command for Bloodhound system status - Add system health indicator (🟢 🟡 🔴) based on teardown configuration and safety limits - Improve Slack report formatting (section dividers, vertical service/action lists, Top Regions Affected) - Update Slack manifest to include /v2_status - Update validation scripts and teardown tooling references - Synchronize documentation across README and docs/* with full v2 command set Commands now supported: /v2_seek /v2_seek_destroy_plan /v2_seek_destroy CONFIRM /v2_status
Mmc/bloodhound v2
…safety improvements (WIP) Summary ------- Refactors Bloodhound pipeline structure to separate event handling and operational services. This keeps the orchestration layer clean and improves maintainability of scan, budget, and teardown logic. Major Changes ------------- • Introduced new architecture layers - handlers/: event interpretation (Slack, validation harness, scheduled runs) - services/: operational pipeline logic (scan, budget, status, teardown) • Extracted pipeline logic from app.py into service modules: - scan_service.py - budget_service.py - status_service.py - teardown_service.py • Added handler modules: - slack_handler.py - validation_handler.py - scheduled_handler.py • execute_pipeline() now acts as a clean orchestrator coordinating services. Validation Safety Improvements ------------------------------ • Added validation-mode safeguards to ensure destructive testing can only affect validation resources. • Validation runs now enforce: - target ID filtering - validation tag checks - restricted teardown scope Validation Workflow (WIP) ------------------------- Validation pipeline currently under active testing: Terraform -> create validation instance Validation script -> capture instance ID Lambda invocation -> validation mode Teardown restricted via TEARDOWN_TARGET_IDS Script verifies instance deletion Status ------ Validation harness still in progress. Destructive validation behavior being verified before finalizing CI/CD integration.
- Add explicit event routing in app.py for slack_command, validation, and scheduled sources - Harden validation_handler with source checks and target_ids enforcement - Document validation harness architecture and payload model - Update README to reflect Lambda → app.run() → pipeline execution flow - Clarify dual execution paths (Slack operator vs validation harness) - Align documentation with v2 slash commands and current validation workflow - Fix outdated doc references and legacy command notes
Engineering notes: Changes made while validating the Bloodhound teardown workflow and debugging Lambda packaging behavior. Changes: - Add jq validation check to ensure Lambda response success - Add scheduled_handler entrypoint for scheduled scans - Improve Terraform Lambda packaging triggers and debug visibility - Exclude __pycache__ and .pyc files from Lambda bundle - Add AWS CLI '--cli-binary-format raw-in-base64-out' to Lambda invocation workflow - Add Terraform + Lambda troubleshooting documentation Validation: Pipeline verified using run_validation_workflow.sh with successful EC2 teardown validation.
This commit introduces the Bloodhound teardown validation system along with several reliability improvements. Key updates: - added strict bash mode (set -euo pipefail) to prevent silent script failures - added Lambda execution metric validation before checking AWS resources - replaced fixed sleep with a loop that waits until EC2 is fully terminated - added workflow logging using RUN_ID for each validation run - added log cleanup to keep only the last 3 validation logs - limited Lambda rebuilds to actual code changes - updated troubleshooting documentation for the build pipeline - confirmed full teardown validation workflow working end-to-end Validation workflow test: 1. smoke test checks Lambda configuration 2. Terraform creates a disposable EC2 instance 3. Lambda teardown deletes the instance 4. execution metrics are verified 5. EC2 termination is confirmed Result: PASS This validation workflow ensures Bloodhound safely deletes targeted resources.
Mmc/bloodhound v2
- Add structured CI log groups for improved debugging - Add Lambda error detection and StatusCode validation - Stream CloudWatch Lambda logs into GitHub Actions output - Document GitHub automation in docs/github_actions.md - Update README with GitHub Actions workflow references
Mmc/bloodhound v2
… assumption - Add scripts/bootstrap_github_oidc.sh to configure GitHub OIDC provider and IAM role - Detect AWS account ID dynamically using STS - Add IAM resource tagging for governance and ownership tracking - Add cleanup trap to remove temporary IAM policy artifacts (trust-policy.json, lambda-policy.json) - Update GitHub Actions workflow to use OIDC role assumption - Document GitHub automation and OIDC bootstrap process in README
Add GitHub OIDC bootstrap script and switch CI authentication to role…
- Restrict OIDC subject to repo:*/Bloodhound:ref:refs/heads/main - Update trust policy automatically if role exists - Improve documentation and security comments
- Allow forks of Bloodhound repo to assume role
added debug statements for GA to check output, temporary add
Bloodhound: add Terraform support for validation workflow
- Removed validation option from workflow_dispatch inputs - Validation pipeline still exists but requires CI hardening - Will be re-enabled prior to GA once validation workflow stabilizes
Bloodhound: temporarily disable validation workflow in CI
Mmc/bloodhound v2
…ution troubleshooting Expanded Lambda packaging documentation and troubleshooting guidance for the Bloodhound Lambda deployment. Changes include: - Added explanation of build environment vs Lambda runtime differences - Documented common dependency resolution failures during packaging - Added guidance on avoiding transitive dependency pinning - Expanded Docker-based packaging section for future deterministic builds - Added troubleshooting section covering pip dependency conflicts - Linked packaging documentation with Terraform troubleshooting guide These updates were added after encountering a real dependency conflict between botocore and a manually pinned urllib3 version during Lambda packaging. The documentation now explains: - how Lambda packages are built locally - why boto3 should not be bundled - how dependency conflicts occur - recommended dependency management practices - the long-term plan for Docker-based packaging This improves maintainability of the infrastructure documentation and provides engineers with clear debugging guidance for Lambda packaging failures.
…ocumentation * Document script-driven build process (build_lambda.sh) * Introduce layered build directory model (.build/deps, src, lambda_pkg) * Clarify Terraform triggers and packaging flow * Improve troubleshooting for archive/build edge cases * Add guardrails for modifying build pipeline Ensures documentation reflects deterministic, cache-aware Lambda packaging architecture
- routed scheduled events through dedicated handler instead of run() - added run_scheduled_scan() as explicit scheduled entrypoint - removed scheduled flow from generic run() path - updated lambda router to distinguish scheduled vs default invocations - aligned documentation to reflect actual execution model This change addresses the suspected recursion issue in scheduled Lambda executions. Validation pending via Terraform deploy, GHA, and Slack testing.
… path - routed scheduled events through dedicated handler instead of run() - added validate_scheduler mode to simulate EventBridge scheduled trigger in GHA - standardized CloudWatch logging across lambda entrypoint - added request_id tracing for improved log visibility Validation: - manual scan/status verified via GHA and CLI - WIP on scheduler path validation via validate_scheduler mode
Mmc/bloodhound v2..fix(lambda): prevent scheduled recursion and add scheduler validation path
…tional and infrastructure guides
Move Lambda packaging out of Terraform and into scripts/build_lambda.sh. Key changes: - Introduced scripts/build_lambda.sh to build the Lambda deployment package - Default build mode uses AWS SAM Docker image for Amazon Linux compatibility - Added optional local build mode for faster development - Terraform terraform_data.build_lambda_pkg now invokes the build script - archive_file continues to package .build/lambda_pkg into the deployment zip - Added structured build logging and package visibility for debugging Benefits: - Ensures dependencies match the AWS Lambda runtime environment - Keeps Terraform focused strictly on infrastructure - Produces deterministic and reproducible Lambda packages - Improves debugging when diagnosing Lambda import errors - Enables future CI/CD integration Docker builds are now the default to ensure production-safe artifacts.
Mmc/bloodhound v2
updated doc
…mbda build system - corrected Lambda packaging documentation to reflect real .build structure - removed outdated deps/src build directory references - documented final Lambda artifact (.build/bloodhound_lambda_v2.zip) - clarified Docker build environment using SAM build container (public.ecr.aws/sam/build-python3.10) - added Python packaging metadata explanation (*.dist-info, bin/) - improved Lambda packaging troubleshooting guidance - moved Terraform bootstrap import documentation to infra/README.md - removed Terraform bootstrap section from Slack documentation - clarified safe operations and teardown validation documentation - ensured infrastructure docs accurately reflect current build and deployment pipeline
…g artifacts - add comprehensive quick demo guide covering 8 operational scenarios - document local execution, Slack commands, teardown planning, and controlled deletion - add GitHub Actions automation and manual operations walkthroughs - document CloudWatch log inspection and teardown validation workflow - add architecture overview documentation - include demo artifacts (screenshots and PDF walkthroughs)
Mmc/bloodhound v2
docs: expose quick_demo guide in README and features documentation
moved screenshot pics
corrected text output
Corrected path to render pdfs correctly
corrected doc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces updates to the Bloodhound v2 project related to CI/CD workflows, operational tooling, validation infrastructure, and documentation. Changes include updates to GitHub Actions workflows such as introducing a dedicated manual operations workflow (bloodhound_ops.yml), adding a concurrency guard to prevent overlapping executions, expanding Lambda execution logging, streaming CloudWatch logs into CI output, and masking sensitive environment variables in logs. GitHub Actions authentication was updated to use AWS OIDC role assumption with a bootstrap script for configuring the IAM provider and role, eliminating the need for static AWS credentials in CI. A validation harness was added to support controlled teardown testing using Terraform-created resources, with safeguards requiring validation source identification, explicit target IDs, and validation tags on resources. Additional operational configuration options were introduced to define teardown limits and execution conditions, including deletion caps, simulation mode, Terraform deployment guard checks, and AWS account verification. Execution observability was expanded through CI summaries and improved Lambda logging visibility. Repository documentation was also updated to reflect the current project architecture, operational workflows, validation processes, and configuration system. The validation workflow option has been temporarily removed from CI workflow inputs while stabilization continues, though validation tooling remains available locally via repository scripts. These updates provide expanded CI/CD workflow capabilities, validation tooling for teardown operations, OIDC-based authentication for CI, improved execution logging, and updated operational documentation.