[controller] Consolidate Helix-admin clients into one owner (ZK client re-architecture, PR 1/2)#2872
Open
namithanivead wants to merge 6 commits into
Open
Conversation
Replace the inline admin.* controller-cluster creation in createControllerClusterIfRequired() with a delegation to helixAdminClient.createVeniceControllerCluster(), so all Helix-admin operations on the controller cluster go through the single ZkHelixAdminClient instead of the duplicate VeniceHelixAdmin.admin. Step 1 of organizing the controller's ZK clients into one Helix-admin owner (System linkedin#1) and one Venice-metadata owner (System linkedin#2). No ZK address change. createVeniceControllerCluster() is a strict superset of the removed inline logic (adds persistBestPossibleAssignment + retry, already used by the HAAS path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Client Add HelixAdminClient#createVeniceStorageClusterLegacy(clusterName) and delegate createClusterIfRequired() to it. The new method is a verbatim relocation of the inline non-HAAS storage-cluster setup (cluster creation with the same properties + LeaderStandby state model, then controller- cluster resource registration via DelayedAutoRebalancer + CrushRebalanceStrategy), with admin -> helixAdmin. Step 2 of consolidating all Helix-admin operations behind the single ZkHelixAdminClient (System linkedin#1). Removes the now-unused CONTROLLER_CLUSTER_NUMBER_OF_PARTITION constant and three now-unused rebalancer imports from VeniceHelixAdmin. No ZK-address change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
helixAdminClient
Add HelixAdminClient#setupCustomizedStateConfig(clusterName) and route
both remaining storage-Helix reads off the duplicate VeniceHelixAdmin.admin:
- isClusterValid() -> helixAdminClient.isVeniceStorageClusterCreated()
(identical getClusters().contains(...) operation)
- HelixUtils.setupCustomizedStateConfig(admin, ...) ->
helixAdminClient.setupCustomizedStateConfig(...) (uses helixAdmin)
Step 3 of consolidating Helix-admin ops behind ZkHelixAdminClient. After
this, admin is referenced only by its own declaration/construction/close,
which the next commit removes. No ZK-address change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The VeniceHelixAdmin.admin field was a second ZKHelixAdmin connected to
the same storage ZK as ZkHelixAdminClient's helixAdmin, doing the same
category of work. After Commits 1-3 routed every admin.* operation through
helixAdminClient, the field is removed entirely:
- drop the field, its dedicated ZkClient construction, and admin.close()
- getHelixAdmin() (a raw-Helix test/maintenance seam) now delegates to
HelixAdminClient#getHelixAdmin(), which returns the single helixAdmin
- remove now-unused ZKHelixAdmin / ZNRecordSerializer imports
Step 4 (final) of consolidating all Helix-admin operations behind one
ZkHelixAdminClient (System linkedin#1). No ZK-address change. main +
integration-test sources compile; getHelixAdmin() callers in
TestHAASController / TestVeniceHelixAdminWithSharedEnvironment unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover the new HelixAdminClient#createVeniceStorageClusterLegacy method in
TestZkHelixAdminClient:
- happy path: creates the storage cluster + LeaderStandby state model and
registers it as a controller-cluster resource with DelayedAutoRebalancer
+ CrushRebalanceStrategy (the legacy non-HAAS rebalancer config)
- early-return when the cluster already exists (no addCluster/addResource)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reduce the interface javadoc to the contract (legacy/non-HAAS, creates + registers as a controller-cluster resource, no-op if it already exists), matching the concise style of the sibling methods. Implementation details (rebalancer classes, the VeniceHelixAdmin consolidation history) belong in the impl/commit history, not the interface contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
19f7ae5 to
bd8799a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The end goal (a later PR) is to point the System #2 client at a separate ZK ensemble for backup/HA. That requires first cleanly separating the two systems' clients — this PR is the first step.
What this PR does (Commits 1–4 + tests)
VeniceHelixAdminheld its ownZKHelixAdmin(admin) onzookeeper.address, duplicating thehelixAdmininsideZkHelixAdminClient(same ZK, same work). This PR routes everyadmin.*call through the singlehelixAdminClientand deletes the duplicate:helixAdminClient.createVeniceControllerCluster().HelixAdminClient#createVeniceStorageClusterLegacy(clusterName)(verbatim relocation,admin→helixAdmin, preserving the non-HAASDelayedAutoRebalancer+CrushRebalanceStrategyconfig).isClusterValid→isVeniceStorageClusterCreated, and addHelixAdminClient#setupCustomizedStateConfig(clusterName).adminfield, itsZkClientconstruction, andadmin.close().getHelixAdmin()(raw-Helix test seam) now delegates toHelixAdminClient#getHelixAdmin().createVeniceStorageClusterLegacy.After this PR,
VeniceHelixAdminholds exactly three ZK-touching fields with clear ownership:helixAdminClient(all Helix admin, System #1),helixManager(live leader-election session, System #1), andzkClient(Venice metadata, System #2).Behaviour delta (intentional, benign)
The non-HAAS controller-cluster path now sets
persistBestPossibleAssignment=true(via the sharedcreateVeniceControllerCluster), which the old inline code did not. Matches the HAAS path; benign for stateless controllers.Testing
TestZkHelixAdminClient(13 unit tests, incl. 2 new)TestHAASController(11 integration tests)TestVeniceHelixAdminWithSharedEnvironment#testAddVersionWhenClusterInMaintenanceModegetHelixAdmin()seam:services:venice-controller:compileJava,:internal:venice-test-common:compileIntegrationTestJava, and spotless all pass.Follow-up PRs (separate, not blocking this one)
These complete the System #1 / System #2 separation. They are independent follow-ups — this PR does not depend on them and is safe to merge first.
Commit 5 — Purify the System #2 (
zkClient) of Helix-data readsThe Venice-metadata
zkClientis still (mis)used for a few System #1 (Helix) reads. These must move to the Helix-owned side before the later HA PR can repointzkClientat another ensemble (otherwise those Helix reads would follow it to the wrong ZK). All exist onmaintoday, unchanged by this PR:HelixLiveInstanceMonitorconstructed on the VenicezkClientbut watches HelixLIVEINSTANCES—VeniceHelixAdmin.java:769.zkClient—zkClient.exists("/<cluster>/EXTERNALVIEW/<resource>")andzkClient.getChildren("/<cluster>/EXTERNALVIEW")atVeniceHelixAdmin.java:1085and:1090.zkClient.getServers()—VeniceControllerStateModel.java:256andHelixVeniceClusterResources.java:169.Commit 6 — Make the System #2 boundary structural (explicit metadata client)
Introduce an explicit Venice-metadata client wrapper so the ~50–100 metadata accessors (StoreConfig, Schemas, StoreGraveyard, ExecutionId, OfflinePushStatus, Personas, AdminTopicMetadata, …) across
VeniceHelixAdmin,HelixVeniceClusterResources, andVeniceParentHelixAdmingo through one clearly-owned client — turning the #1/#2 split from convention into structure.Later — the actual HA change
Add a config (e.g.
venice.metadata.zk.address) that points only the System #2 client at a separate/backup ZK ensemble. Depends on Commit 5.🤖 Generated with Claude Code