-
Notifications
You must be signed in to change notification settings - Fork 2
fix: record failed lifecycle operations #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds an event-driven resource state system: new Event type with Apply handlers for refresh/create/update/delete, propagates resource operation errors into ResourceData.Error, updates planner/tests to emit error events, and integrates event application into workflow activities and logging. Changes
Sequence DiagramsequenceDiagram
participant WA as Workflow Activity
participant Evt as Event
participant RC as Resource Context
participant Res as Resource Implementation
participant RD as ResourceData
WA->>Evt: Apply(ctx, rc)
Evt->>Evt: switch on Type
alt Create
Evt->>RC: load resource handler
Evt->>Res: Create(ctx)
alt success
Res-->>RD: updated state
else failure
Res-->>Evt: error
Evt->>RD: set Error, NeedsRecreate
end
else Update
Evt->>Res: Update(ctx)
alt success
Res-->>RD: updated state
else failure
Res-->>Evt: error
Evt->>RD: set Error
end
else Delete
Evt->>Res: Delete(ctx)
alt success
Res-->>RD: removed
else failure
Res-->>Evt: error (propagate)
end
else Refresh
Evt->>Res: Refresh(ctx)
Evt->>RD: preserve NeedsRecreate/Error or surface refresh error
end
Evt->>RC: ToResourceData(resource)
Evt-->>WA: return error or nil
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/internal/workflows/activities/apply_event.go (1)
136-167: Task logs can show success even when Create/Update fails.
event.Applycaptures Create/Update failures inResource.Errorwithout returning an error, so the current log path marks these as success. This makes task logs misleading while the plan still fails later. Consider loggingResourceError()as an error entry but not returning it, so state persistence isn’t blocked.✅ Suggested fix (log resource error without failing activity)
start := time.Now() - applyErr := event.Apply(ctx, rc) + applyErr := event.Apply(ctx, rc) + resourceErr := event.ResourceError() duration := time.Since(start) fields["duration_ms"] = duration.Milliseconds() if applyErr != nil { fields["success"] = false fields["error"] = applyErr.Error() @@ return applyErr } + + if resourceErr != nil { + fields["success"] = false + fields["error"] = resourceErr.Error() + err := log(task.LogEntry{ + Message: fmt.Sprintf("error while %s resource %s", verb, resourceIdentifier), + Fields: fields, + }) + if err != nil { + return fmt.Errorf("failed to record event error: %w", err) + } + return nil + } fields["success"] = true
🤖 Fix all issues with AI agents
In `@server/internal/resource/event_test.go`:
- Around line 127-133: The success branch in the test currently only checks
resource equality and can miss unexpected errors from event.Apply; update the
else branch to assert.NoError(t, err) before asserting equality so that when
tc.expectedErr == "" you first verify err == nil (call assert.NoError with the
test context t and the err returned by event.Apply) and then check that
event.Resource equals expected; reference event.Apply, err, event.Resource and
tc.expectedErr when locating the change.
🧹 Nitpick comments (1)
server/internal/resource/event.go (1)
45-65: Optional: guard against nil Event/Resource before registry lookup.
This prevents a possible nil dereference and improves error messaging if upstream ever passes a malformed event.🔧 Suggested defensive check
func (e *Event) Apply(ctx context.Context, rc *Context) error { + if e == nil || e.Resource == nil { + return errors.New("event has no resource data") + } resource, err := rc.Registry.Resource(e.Resource) if err != nil { return err }
Adds a new `Event.Apply` method that's responsible for calling the correct lifecycle method on the resource, managing the resource data fields, and determining when to return an error. This change helps to consolidate the resource model logic in the `resource` package. PLAT-212
Adds the ability to record partial failures from resource lifecycle methods by adding a new `ResourceData.Error` field. This field is populated when an error occurs during `Create` and `Update` events. In this case, we'll persist failed resource back to the state before we terminate the plan. Combined with the existing `NeedsCreate` field, we can record when an operation needs to be retried on the next run. Recording partial failures also makes it so resources can be properly deleted if the user opts to perform a delete rather than a retry. PLAT-212
71606db to
95f2bbd
Compare
Summary
Adds the ability to record partial failures from resource lifecycle methods by adding a new
ResourceData.Errorfield. This field is populated when an error occurs duringCreateandUpdateevents. In this case, we'll persist failed resource back to the state before we terminate the plan. Combined with the existingNeedsCreatefield, we can record when an operation needs to be retried on the next run. Recording partial failures also ensures resources can be properly deleted if the user opts to delete rather than retry.Changes
This PR is split into two commits:
workflowspackage into theresourcepackage to consolidate that logic and make it easier to unit test.Errorfield toResourceDataplanCreatesto ensure failed updates are retriedCreateandUpdateeventsWorkflows.applyEventsto propagate lifecycle errors after persisting the resource back to stateTesting
To test the issue from the ticket:
PLAT-212
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.