Skip to content

feat: Add queries to pre-compute derived edges and provenance summary#2015

Merged
SandeepTuniki merged 17 commits into
masterfrom
linked-containment-membership-aggregations
May 20, 2026
Merged

feat: Add queries to pre-compute derived edges and provenance summary#2015
SandeepTuniki merged 17 commits into
masterfrom
linked-containment-membership-aggregations

Conversation

@SandeepTuniki
Copy link
Copy Markdown
Contributor

@SandeepTuniki SandeepTuniki commented May 18, 2026

This PR adds queries that run through bigquery federation on the spanner db, and pre-computes the following derived information associated with specific imports:

On Edge table:

  • linkedContainedInPlace
  • linkedMemberOf
  • linkedMember

On Cache table:

  • statvars and their associated info

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the aggregation logic by introducing the BigQueryExecutor and GraphAggregator classes to handle query execution and global graph aggregations, respectively. The AggregationUtils class was updated to orchestrate these new components. Feedback was provided to avoid hardcoding environment-specific values like the Spanner destination URI, recommending instead that the code should rely on environment variables and fail explicitly if they are missing to ensure configuration correctness.

Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py Outdated
…AL_QUERY.

- Updated aggregation logic to pull data from Spanner using Connection ID.
- Switched to per-query temporary tables to avoid shared state.
- Implemented idempotency using FilteredEdges CTE with WHERE NOT EXISTS subquery.
- Added environment variable support for Spanner instance/database configuration.
- Added get_spanner_destination_uri method to BigQueryExecutor to derive URI from metadata.
- Removed redundant destination_uri parameters from GraphAggregator and AggregationUtils.
- Updated all aggregation queries to retrieve the destination URI directly from the executor.
Explicitly scoping the BigQuery client to the provided project_id for better resource management.
- Added run_all() method to GraphAggregator to manage the sequence of global aggregations.
- Simplified AggregationUtils.run_aggregation() by delegating orchestration to GraphAggregator.
- Improved code structure and separation of concerns.
@SandeepTuniki SandeepTuniki changed the title feat: Add queries to pre-compute linkedContainedInPlace, linkedMemberOf, and linkedMember triples feat: Add queries to pre-compute derived edges and Cache table May 19, 2026
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py Outdated
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
@SandeepTuniki
Copy link
Copy Markdown
Contributor Author

SandeepTuniki commented May 20, 2026

TODO: Pass an environment variable to the aggregation utils from the main.py file. Also, add the env in cloudbuild.yaml and cloudbuild_main.yaml

Update - This change is done now.

@SandeepTuniki SandeepTuniki marked this pull request as ready for review May 20, 2026 07:55
@SandeepTuniki SandeepTuniki requested a review from vish-cs May 20, 2026 08:10
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py Outdated
@SandeepTuniki SandeepTuniki enabled auto-merge (squash) May 20, 2026 09:56
@SandeepTuniki SandeepTuniki disabled auto-merge May 20, 2026 09:57
@SandeepTuniki SandeepTuniki enabled auto-merge (squash) May 20, 2026 10:01
@SandeepTuniki SandeepTuniki changed the title feat: Add queries to pre-compute derived edges and Cache table feat: Add queries to pre-compute derived edges and provenance summary May 20, 2026
@SandeepTuniki SandeepTuniki merged commit dad55b4 into master May 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants