refactor: extract dereference/validate pipeline from reconcile_hive#707
refactor: extract dereference/validate pipeline from reconcile_hive#707adwk67 wants to merge 11 commits into
Conversation
Move external resource resolution (product image, S3 connection, metadata database, OPA config) into controller::dereference module with its own error enum. Extract config validation and merging into validate_cluster(), which produces a ValidatedHiveCluster proving all product-config validation succeeded before any Kubernetes resources are created. The validated struct owns the resolved product image and per-role/ per-rolegroup merged configs. Existing build functions are unchanged and receive their parameters from the validated structs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Jenkins: 🟢 (one failure on NS teardown) https://testing.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/hive-operator-it-custom/52/ |
Rename FailedToResolveResourceConfig to FailedToResolveConfig and fix OPA error display string to match the convention used across all three dereference/validate extraction PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Image resolution is a pure computation, not an I/O dereference, so it belongs in validate_cluster alongside the other config validation. This aligns with the pattern used by the trino and airflow operators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| let metadata_database_connection_details = hive | ||
| .spec | ||
| .cluster_config | ||
| .metadata_database | ||
| .jdbc_connection_details("METADATA") | ||
| .context(InvalidMetadataDatabaseConnectionSnafu)?; |
There was a problem hiding this comment.
The database connection string is just validated. No resource is dereferenced. Therefore, I would move this code into the validation step.
| /// The validated cluster: proves that product-config validation and config merging | ||
| /// succeeded for every role and role group before any resources are created. | ||
| #[derive(Clone, Debug)] | ||
| pub struct ValidatedHiveCluster { |
There was a problem hiding this comment.
I would not put the product name into the identifiers where it is not necessary. The context of this operator is Hive, therefore the validated cluster struct contains the validated configuration for Hive.
There are often tasks which must be applied to all operators. It is easier to apply them if the identifiers are equal and do not contain the product name.
| pub struct ValidatedHiveCluster { | |
| pub struct ValidatedCluster { |
There was a problem hiding this comment.
I'm in two minds about this: on the one hand it makes sense to have it generic (like I commented in the Trino PR), but on the other, I can see the downside of having generically-named structs that have very different fields.
There was a problem hiding this comment.
(that's a very minor downside)
| /// The validated cluster: proves that product-config validation and config merging | ||
| /// succeeded for every role and role group before any resources are created. | ||
| #[derive(Clone, Debug)] | ||
| pub struct ValidatedHiveCluster { |
There was a problem hiding this comment.
I would move all structs which are used by other steps, into the controller module, so that all steps depend on the parent module but not on other step modules. The ValidatedCluster struct will e.g. be used in the build step.
There was a problem hiding this comment.
See 1758748. Although I would maybe have moved the struct over once dependencies are clear (i.e. when implementing subsequent steps).
| .resolve(image_base_name, image_repository, pkg_version) | ||
| .context(ResolveProductImageSnafu)?; | ||
|
|
||
| let role = hive.spec.metastore.as_ref().context(NoMetaStoreRoleSnafu)?; |
There was a problem hiding this comment.
This is okay in this pull request, because we do not want to introduce CRD changes. But I wonder why this field is defined as optional in the CRD if it is actually mandatory. If we want to build a lean operator, we should not try to fix problems we never should have created.
There was a problem hiding this comment.
I seem to remember that some of the Option-ality was to do with the fragment/merge mechanism, but not sure if that is relevant here.
| pub role_groups: BTreeMap<HiveRole, BTreeMap<String, ValidatedRoleGroupConfig>>, | ||
| pub role_configs: BTreeMap<HiveRole, ValidatedRoleConfig>, |
There was a problem hiding this comment.
If there is only one role and we never intend to add another role, wouldn't it be better to skip the indirection with the singleton BTreeMap? Otherwise, all usages are prepended with an if let Some(role_config) = ....
| pub role_groups: BTreeMap<HiveRole, BTreeMap<String, ValidatedRoleGroupConfig>>, | |
| pub role_configs: BTreeMap<HiveRole, ValidatedRoleConfig>, | |
| pub role_groups: BTreeMap<String, ValidatedRoleGroupConfig>, | |
| pub role_configs: ValidatedRoleConfig, |
There was a problem hiding this comment.
We had new roles "appear" for Kafka (with KRaft), but, yes, I think that is very unlikely for Hive.
There was a problem hiding this comment.
See 6db46c1. A single role is now assumed for the validated cluster which means the loops in the reconciler can also be simplified.
| pub fn validate_cluster( | ||
| hive: &v1alpha1::HiveCluster, | ||
| image_base_name: &str, | ||
| image_repository: &str, | ||
| pkg_version: &str, | ||
| product_config_manager: &ProductConfigManager, | ||
| ) -> Result<ValidatedHiveCluster, Error> { |
There was a problem hiding this comment.
Is it useful to have parameters for image_base_name and pkg_version which are always populated with constants?
There was a problem hiding this comment.
Isn't pkg_version only determined at runtime?
| } | ||
|
|
||
| /// External references resolved during the dereference step. | ||
| pub struct DereferencedObjects { |
There was a problem hiding this comment.
The DereferencedObjects struct usually contains unvalidated resources. For instance, some resources are required with certain contents depending on the cluster specification. Therefore, the dereferenced objects are validated in the validation step and stored (sometimes converted) in the ValidatedCluster struct. The ValidatedCluster struct is then used in the following steps. The DereferencedObjectsshould only be necessary to hand over the resources to the validation step.
There was a problem hiding this comment.
Are you requesting a change here?
There was a problem hiding this comment.
The smoke test of the opensearch-operator has an assertion that covers all generated resources (https://github.com/stackabletech/opensearch-operator/blob/26.3.0/tests/templates/kuttl/smoke/10-assert.yaml.j2). The other integration tests only add assertions for the resources specific to the test case (https://github.com/stackabletech/opensearch-operator/blob/26.3.0/tests/templates/kuttl/security-config/11-assert.yaml#L29-L41). That leads to relatively high test coverage.
The pull requests for stackabletech/issues#850 should only perform internal changes. If we also add an assertion to the smoke test with the resources produced before this change, then we could be more confident that no unwanted changes are introduced.
Co-authored-by: Siegfried Weber <mail@siegfriedweber.net>
…move derives where they do not exist on the upstream type
…pendency for subsequent steps
Summary
Clone,Debug,Eq,Hash,Ord,PartialEq,PartialOrdonHiveRoleso it can be used as aBTreeMapkey and in validated structscontroller::dereferencemodule with its own Snafu error enumvalidate_cluster(), producing aValidatedHiveClusterstruct that proves all validation succeeded before any Kubernetes resources are createdValidatedHiveClusterowns the resolved product image and per-role/per-rolegroup merged configs; existing build functions are unchanged and receive parameters from the validated structsReviewer notes
dereference()andvalidate_cluster()contain no new logic — they are pure extractions of code that was previously inline inreconcile_hive()ValidatedHiveClusterstruct intentionally has fewer fields than a full ownership model would (noname/namespace/uid/metadatavalidated types). This is a "construct but decompose" fail-fast gate: built early in reconcile to prove validation passes, then its fields feed the existing unchanged build functionscontroller/directory coexists withcontroller.rs— Rust treatscontroller.rsas the module root and looks for submodules incontroller/. Currently justdereference, withvalidateand further stages to follow in later PRsDereferencewrapper variant in the controller'sErrorenum replaces 3 individual error variants (ResolveProductImage,InvalidOpaConfig,InvalidMetadataDatabaseConnection) that moved into the new module's own error enum.ConfigureS3ConnectionandObjectHasNoNamespaceremain in the controller because they are still used by build functionsTest plan
🤖 Generated with Claude Code