Skip to content

fix: failed to drain unmanaged clusterqueue#2

Merged
thxCode merged 1 commit into
mainfrom
fix/managed-toggle-drain
Jun 17, 2026
Merged

fix: failed to drain unmanaged clusterqueue#2
thxCode merged 1 commit into
mainfrom
fix/managed-toggle-drain

Conversation

@thxCode

@thxCode thxCode commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Signed-off-by: thxCode <thxcode0824@gmail.com>
Copilot AI review requested due to automatic review settings June 17, 2026 13:39

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets a regression where toggling a node out of management (gpustack.ai/managed=false) would orphan single-node scheduling objects without triggering the expected drain/deletion chain in the worker controllers.

Changes:

  • Include systemname.ManagedLabelKey in Node-watch update predicates so managed toggles enqueue reconciles for ResourceFlavors/Cohorts (and related controllers).
  • Add unit test cases that guard the “managed=false nodes are excluded from indexes” behavior for ResourceFlavor/Cohort reconcilers.
  • Remove the --aggressive-event-filtering manager option/flag and its plumbing.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/worker/controllers/worker/resourceflavor.go Node-watch predicate now treats managed-label changes as relevant for enqueuing drain logic.
pkg/worker/controllers/worker/resourceflavor_test.go Adds a case covering unmanaged nodes being excluded by the flavor-profile index (drain expected).
pkg/worker/controllers/worker/nodefeature.go Simplifies Node update predicate path by removing “aggressive” mode plumbing.
pkg/worker/controllers/worker/cohort.go Node-watch predicate now treats managed-label changes as relevant for cohort lifecycle.
pkg/worker/controllers/worker/cohort_test.go Adds a case covering unmanaged nodes being excluded by the cohort-profile index (delete expected).
pkg/worker/controllers/worker/clusterqueue.go Node-watch predicate now includes managed-label changes in label comparison.
pkg/manager/option.go Removes AggressiveEventFiltering option and CLI flag definition.
pkg/manager/helper.go Removes AllowAggressiveEventFiltering() and related state from the manager wrapper.
pkg/manager/config.go Removes config wiring for aggressive event filtering.
.claude/skills/gpustack-operator-e2e/SKILL.md Documents an e2e/manual verification flow for managed-toggle draining behavior.

Comment thread pkg/worker/controllers/worker/clusterqueue.go
Comment thread pkg/manager/option.go
Comment thread pkg/worker/controllers/worker/resourceflavor.go
Comment thread pkg/worker/controllers/worker/cohort.go
@thxCode thxCode merged commit 1ef027e into main Jun 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants