Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Priorities machine with remediate-machine anotation when selecting the next machine to be remediated #11495

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Karthik-K-N
Copy link
Contributor

What this PR does / why we need it:

This PR contains the changes to priorities the machine with cluster.x-k8s.io/remediate-machine annotation

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #11385

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

This PR is currently missing an area label, which is used to identify the modified component when generating release notes.

Area labels can be added by org members by writing /area ${COMPONENT} in a comment

Please see the labels list for possible areas.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-area PR is missing an area label label Nov 28, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 28, 2024
@fabriziopandini fabriziopandini changed the title 🌱 Priorities machine with remidiation anotation for remidiation 🌱 Priorities machine with remediate-machine anotation when selecting the next machine to be remediated Dec 2, 2024
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Karthik-K-N thanks for taking up this work!
Let me know if you want some additional clarification

@Karthik-K-N
Copy link
Contributor Author

Thanks for the clarification, I will update it accordingly. Thanks

@Karthik-K-N
Copy link
Contributor Author

Updated with changes for KCP as suggested, Will do for remaining soon.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 17, 2024
@Karthik-K-N
Copy link
Contributor Author

Karthik-K-N commented Dec 17, 2024

Made changes for machine set controller as well, Please take a look when you get some time. Thanks.

@enxebre
Copy link
Member

enxebre commented Dec 17, 2024

can we update the annotation description as well?

current:

// RemediateMachineAnnotation is the annotation used to mark machines that should be remediated by MachineHealthCheck reconciler.

I think it should be:

// RemediateMachineAnnotation request the MachineHealthCheck reconciler to mark a Machine as unhealthy. CAPI builtin remediation will prioritize Machines with the annotation to be remediated.

@Karthik-K-N
Copy link
Contributor Author

RemediateMachineAnnotation

Updated. Thank you.

Comment on lines 1577 to 1595
// Returns the machines to be remediated in the following order
// Machines with RemediateMachineAnnotation annotation if any,
// Machines failing to come up first because
// there is a chance that they are not hosting any workloads (minimize disruption).
func getMachinesToRemediateInOrder(remediateMachines []*clusterv1.Machine) []*clusterv1.Machine {
// From machines to remediate select the machines with RemediateMachineAnnotation annotation.
annotatedMachines := collections.FromMachines(remediateMachines...).Filter(collections.HasAnnotationKey(clusterv1.RemediateMachineAnnotation))

// Filter out the machines which are unique (machines which are not in annotated machines list)
uniqueMachines := collections.FromMachines(remediateMachines...).Difference(annotatedMachines).UnsortedList()

// Sort the machines from newest to oldest.
sort.SliceStable(uniqueMachines, func(i, j int) bool {
return uniqueMachines[i].CreationTimestamp.After(uniqueMachines[j].CreationTimestamp.Time)
})

// combine the annotated machines along with machines to remediate.
return append(annotatedMachines.UnsortedList(), uniqueMachines...)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this alternative implementation

Suggested change
// Returns the machines to be remediated in the following order
// Machines with RemediateMachineAnnotation annotation if any,
// Machines failing to come up first because
// there is a chance that they are not hosting any workloads (minimize disruption).
func getMachinesToRemediateInOrder(remediateMachines []*clusterv1.Machine) []*clusterv1.Machine {
// From machines to remediate select the machines with RemediateMachineAnnotation annotation.
annotatedMachines := collections.FromMachines(remediateMachines...).Filter(collections.HasAnnotationKey(clusterv1.RemediateMachineAnnotation))
// Filter out the machines which are unique (machines which are not in annotated machines list)
uniqueMachines := collections.FromMachines(remediateMachines...).Difference(annotatedMachines).UnsortedList()
// Sort the machines from newest to oldest.
sort.SliceStable(uniqueMachines, func(i, j int) bool {
return uniqueMachines[i].CreationTimestamp.After(uniqueMachines[j].CreationTimestamp.Time)
})
// combine the annotated machines along with machines to remediate.
return append(annotatedMachines.UnsortedList(), uniqueMachines...)
}
// Returns the machines to be remediated in the following order
// - Machines with RemediateMachineAnnotation annotation if any,
// - Machines failing to come up first because
// there is a chance that they are not hosting any workloads (minimize disruption).
func sortMachinesToRemediate(machines []*clusterv1.Machine) {
sort.SliceStable(machines, func(i, j int) bool {
_, iHasRemediateAnnotation := machines[i].Annotations[clusterv1.RemediateMachineAnnotation]
_, jHasRemediateAnnotation := machines[j].Annotations[clusterv1.RemediateMachineAnnotation]
if iHasRemediateAnnotation && !jHasRemediateAnnotation {
return true
}
if !iHasRemediateAnnotation && jHasRemediateAnnotation {
return false
}
return machines[i].CreationTimestamp.After(machines[j].CreationTimestamp.Time)
})
}

(trying to have something easier to read/that sticks to the fact that this is a sorting func)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense, I think I took the complicated way, I just updated as suggested but still kept in a seperate function to easily test, Let me know if I need to make it inplace and enhance the existing test itself.

@@ -1334,10 +1334,10 @@ func (r *Reconciler) reconcileUnhealthyMachines(ctx context.Context, s *scope) (

// Calculates the Machines to be remediated.
// Note: Machines already deleting are not included, there is no need to trigger remediation for them again.
machinesToRemediate := collections.FromMachines(machines...).Filter(collections.IsUnhealthyAndOwnerRemediated, collections.Not(collections.HasDeletionTimestamp)).UnsortedList()
remediateMachines := collections.FromMachines(machines...).Filter(collections.IsUnhealthyAndOwnerRemediated, collections.Not(collections.HasDeletionTimestamp)).UnsortedList()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I would keep the existing name, it seems more clear to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider cluster.x-k8s.io/remediate-machine annotated when selecting the machine to be remediate
4 participants