-
Notifications
You must be signed in to change notification settings - Fork 5k
[Fix-17817] [Master] Fix workflow timeout alerts failed #17819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
...inscheduler/server/master/engine/workflow/lifecycle/event/WorkflowTimeoutLifecycleEvent.java
Fixed
Show fixed
Hide fixed
ruanwenjun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, it's better to add IT case.
| * @param modifyBy modifyBy | ||
| */ | ||
| public void sendWorkflowTimeoutAlert(WorkflowInstance workflowInstance, ProjectUser projectUser) { | ||
| public void sendWorkflowTimeoutAlert(WorkflowInstance workflowInstance, ProjectUser projectUser, String modifyBy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't change this, this PR should only fix the alert bug.
| // Calculate remaining time until timeout: timeout - elapsed time | ||
| long delayTime = TimeUnit.MINUTES.toMillis(timeout) | ||
| - (System.currentTimeMillis() - workflowInstance.getStartTime().getTime()); | ||
| // Ensure delayTime is not negative (trigger immediately if already timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not clear in which case the delayTime might be negative, since System.currentTimeMillis() - workflowInstance.getStartTime().getTime() should always > 0.
| private void doWorkflowTimeoutAlert(final WorkflowInstance workflowInstance) { | ||
| // ProjectUser will be built in WorkflowAlertManager | ||
| workflowAlertManager.sendWorkflowTimeoutAlert(workflowInstance, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should fix like #17818, otherwise will throw NPE
8e10927 to
3de2674
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our environment, we've already resolved the task timeout alerts and workflow timeout alerts. Here's how I determine whether a workflow has completed:
final IWorkflowExecutionGraph workflowExecutionGraph = workflowExecutionRunnable.getWorkflowExecutionGraph();
if (workflowExecutionGraph.isAllTaskExecutionRunnableChainFinish()) {
// all the TaskExecutionRunnable chain in the graph is finish, means the workflow is already finished.
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFinalState() reflects the persistent final state, making it more reliable. isAllTaskExecutionRunnableChainFinish(), on the other hand, only reflects the task completion state in memory; it may not yet have transitioned to the final workflow state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFinalState()reflects the persistent final state, making it more reliable.isAllTaskExecutionRunnableChainFinish(), on the other hand, only reflects the task completion state in memory; it may not yet have transitioned to the final workflow state.
In my opinion, If the workflow is already in a completed state in memory, does that mean timeout handling is no longer meaningful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being equal to zero doesn't seem reasonable either.
final int timeout = workflowInstance.getTimeout();
checkState(timeout > 0, "The workflow timeout: %s must >0 minutes", timeout);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it shouldn't be 0, but I need to double-check.
|
@ruanwenjun I've already modified it as requested. Could you please check it again? |
ruanwenjun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please removed the UT, this kind of ut help little, we should add IT case.
| final WorkflowInstance workflowInstance = workflowExecutionRunnable.getWorkflowInstance(); | ||
| final String workflowName = workflowExecutionRunnable.getName(); | ||
|
|
||
| // Check if workflow is still active (not finished) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Check if workflow is still active (not finished) |
| // Check if warning group is configured | ||
| if (workflowInstance.getWarningGroupId() == null) { | ||
| log.info("Skipped sending timeout alert for workflow {} because warningGroupId is null.", workflowName); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Check if warning group is configured | |
| if (workflowInstance.getWarningGroupId() == null) { | |
| log.info("Skipped sending timeout alert for workflow {} because warningGroupId is null.", workflowName); | |
| return; | |
| } |
If the warningGroupIf is null, don't create event.
| import org.mockito.junit.jupiter.MockitoExtension; | ||
|
|
||
| @ExtendWith(MockitoExtension.class) | ||
| class WorkflowTimeoutLifecycleEventHandlerTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this UT.
| import org.mockito.junit.jupiter.MockitoExtension; | ||
|
|
||
| @ExtendWith(MockitoExtension.class) | ||
| class WorkflowStartLifecycleEventHandlerTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this UT.
| import org.mockito.junit.jupiter.MockitoExtension; | ||
|
|
||
| @ExtendWith(MockitoExtension.class) | ||
| class WorkflowTimeoutLifecycleEventTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this UT.
|
Please retry analysis of this Pull-Request directly on SonarQube Cloud |
Purpose of the pull request
fix #17817
Brief change log
Verify this pull request
This pull request is code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(or)
Pull Request Notice
Pull Request Notice
If your pull request contains incompatible change, you should also add it to
docs/docs/en/guide/upgrade/incompatible.md