feat: Add queries to pre-compute derived edges and provenance summary#2015
Merged
SandeepTuniki merged 17 commits intoMay 20, 2026
Conversation
…rOf, and linkedMember recursive queries
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the aggregation logic by introducing the BigQueryExecutor and GraphAggregator classes to handle query execution and global graph aggregations, respectively. The AggregationUtils class was updated to orchestrate these new components. Feedback was provided to avoid hardcoding environment-specific values like the Spanner destination URI, recommending instead that the code should rely on environment variables and fail explicitly if they are missing to ensure configuration correctness.
…AL_QUERY. - Updated aggregation logic to pull data from Spanner using Connection ID. - Switched to per-query temporary tables to avoid shared state. - Implemented idempotency using FilteredEdges CTE with WHERE NOT EXISTS subquery. - Added environment variable support for Spanner instance/database configuration.
- Added get_spanner_destination_uri method to BigQueryExecutor to derive URI from metadata. - Removed redundant destination_uri parameters from GraphAggregator and AggregationUtils. - Updated all aggregation queries to retrieve the destination URI directly from the executor.
Explicitly scoping the BigQuery client to the provided project_id for better resource management.
- Added run_all() method to GraphAggregator to manage the sequence of global aggregations. - Simplified AggregationUtils.run_aggregation() by delegating orchestration to GraphAggregator. - Improved code structure and separation of concerns.
linkedContainedInPlace, linkedMemberOf, and linkedMember triples
vish-cs
reviewed
May 20, 2026
vish-cs
reviewed
May 20, 2026
Contributor
Author
|
TODO: Pass an environment variable to the aggregation utils from the main.py file. Also, add the env in cloudbuild.yaml and cloudbuild_main.yaml Update - This change is done now. |
…membership-aggregations
vish-cs
reviewed
May 20, 2026
vish-cs
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds queries that run through bigquery federation on the spanner db, and pre-computes the following derived information associated with specific imports:
On
Edgetable:linkedContainedInPlacelinkedMemberOflinkedMemberOn
Cachetable: