-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-13590. Refactor HealthyPipelineSafeModeRule to not use PipelineReportFromDatanode #9651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@priyeshkaratha Have few comments, logs will increase too much with change and rule once satisfied may not be satisfied next time, which is change in behavior. Check the impact.
|
|
||
| LOG.info("Found {} open RATIS/THREE pipelines", openPipelines.size()); | ||
| currentHealthyPipelineCount = (int) openPipelines.stream() | ||
| .filter(this::isPipelineHealthy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier, for every pipeline event, pipeline info is logged once. But now it will be logged for all pipeline for every event, can result in a lot of INFO logs for pipeline only.
Also will be triggered for EC pipeline type to be logged which is not logged earlier.
| // Verify pipeline has all 3 nodes | ||
| List<DatanodeDetails> nodes = pipeline.getNodes(); | ||
| if (nodes.size() != 3) { | ||
| LOG.info("Pipeline {} is not healthy: has {} nodes instead of 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logging will happen now 3 times for every pipeline event, 2 times in validate() as called pre-process() and post-process(), and one time in getStatus(). And this will be done for every pipeline in the SCM for each event.
| "finalization. Bypassing healthy pipeline safemode rule."); | ||
| return true; | ||
| } | ||
| // Query PipelineManager directly for healthy pipeline count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current implementation, if Rule is satisfied once, its marked exit. Never enter again. But with having dynamic rule check, one time may be success and other time may be failure.
This may not satisfy the purpose.
What changes were proposed in this pull request?
Refactored HealthyPipelineSafeModeRule to make it simpler and more reliable.
Earlier, it depended on events and internal state to track pipelines. Now, it directly checks the current state from the PipelineManager whenever validation is needed.
HealthyPipelineSafeModeRule.java
Pipeline has exactly 3 nodes
All nodes are in healthy state (NodeStatus.inServiceHealthy())
All nodes are registered with the node manager
SafeModeMetrics.java
Replaced incCurrentHealthyPipelinesCount() with setNumCurrentHealthyPipelines(long)
Test Updates
What is the link to the Apache JIRA
HDDS-13590
How was this patch tested?
Tested using existing testcases written for safemode validation rules and Safemode Manager.
Validated using forked CI.