Skip to content

Conversation

@d-gubert
Copy link
Member

@d-gubert d-gubert commented Nov 13, 2025

Proposed changes (including videos or screenshots)

This change makes the LivenessManager send a ping request to the app subprocess only if it becomes "idle", i.e. there is no message received from the subprocess in a configured interval.

Previously, the ping mechanism would send requests to the app subprocess in a fixed interval. However, everytime the subprocess sends a message to the parent, it is also telling that it is alive - so there is no need to send a ping to check for "liveness".

This could cause issues if the subprocess is busy enough that it fails to respond to the ping, as 4 consecutive ping timeouts makes the LivenessManager force a restart (kill subprocess, spawn a new one). This restart is essentially unnecessary, as we know the subprocess is not "stuck" - it's just not able to respond to the ping as fast as it is required.

Unnecessary kills will cause messages to apps to be dropped / time out, which impacts user experience.

Issue(s)

ARCH-1807

Steps to test or reproduce

Further comments

Summary by CodeRabbit

  • Improvements

    • Emit signals when runtimes finish constructing and on inbound activity to improve heartbeat accuracy.
    • Enhanced liveness tracking (last-heartbeat timestamp, watchdog, start/stop semantics, pending ping) to avoid unnecessary pings and recover stalled runtimes faster.
    • Messaging now supports lightweight ping messages to better indicate activity and reduce false restarts.
  • Tests

    • Added unit tests covering heartbeat updates, ping timing, timeout handling, and restart-on-failure behavior.
  • Chores

    • Added a changeset noting patch bumps and the ping-behavior fix.

@changeset-bot
Copy link

changeset-bot bot commented Nov 13, 2025

🦋 Changeset detected

Latest commit: 67fb2ff

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 42 packages
Name Type
@rocket.chat/apps-engine Patch
@rocket.chat/meteor Patch
@rocket.chat/apps Patch
@rocket.chat/core-services Patch
@rocket.chat/core-typings Patch
@rocket.chat/fuselage-ui-kit Patch
@rocket.chat/rest-typings Patch
@rocket.chat/ddp-streamer Patch
@rocket.chat/presence Patch
rocketchat-services Patch
@rocket.chat/uikit-playground Patch
@rocket.chat/api-client Patch
@rocket.chat/cron Patch
@rocket.chat/ddp-client Patch
@rocket.chat/freeswitch Patch
@rocket.chat/gazzodown Patch
@rocket.chat/http-router Patch
@rocket.chat/livechat Patch
@rocket.chat/model-typings Patch
@rocket.chat/ui-avatar Patch
@rocket.chat/ui-client Patch
@rocket.chat/ui-contexts Patch
@rocket.chat/web-ui-registration Patch
@rocket.chat/account-service Patch
@rocket.chat/authorization-service Patch
@rocket.chat/omnichannel-transcript Patch
@rocket.chat/presence-service Patch
@rocket.chat/queue-worker Patch
@rocket.chat/stream-hub-service Patch
@rocket.chat/federation-matrix Patch
@rocket.chat/license Patch
@rocket.chat/media-calls Patch
@rocket.chat/omnichannel-services Patch
@rocket.chat/pdf-worker Patch
@rocket.chat/models Patch
@rocket.chat/network-broker Patch
@rocket.chat/omni-core-ee Patch
@rocket.chat/mock-providers Patch
@rocket.chat/ui-video-conf Patch
@rocket.chat/ui-voip Patch
@rocket.chat/instance-status Patch
@rocket.chat/omni-core Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

Walkthrough

Adds heartbeat and constructed events to the Deno runtime, implements a watchdog-style LivenessManager with ping/heartbeat tracking and lifecycle methods (attach/start/stop/ping), updates ProcessMessenger to accept and send ping commands, and adds unit tests for ping/timeout and restart behaviors.

Changes

Cohort / File(s) Summary
Runtime events
packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts
Emits 'constructed' after app construction completes and emits 'heartbeat' when parsing subprocess stdout messages.
Liveness manager
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts
Exports COMMAND_PING; renames timing options (pingRequestTimeoutpingTimeoutInMS, pingFrequencyInMSpingIntervalInMS); adds state (lastHeartbeatTimestamp, watchdogTimeout, pendingPing); adds public methods attach(deno), start(), stop(), getPendingPing(); ping() now returns Promise<boolean> with AbortController/timeout handling; getRuntimeData() includes lastHeartbeatTimestamp; wired to controller 'heartbeat' and 'constructed' events; restart/stop logic adjusted.
Process messenger
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts
Introduces internal `Message = JsonRpc
Tests
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
Adds LivenessManagerTestFixture and tests for heartbeat timestamp updates, stale/regular ping behavior, successful ping/pong, consecutive timeouts, and restart triggering using fake timers and mocks.
Metadata
.changeset/kind-mangos-sit.md
Adds a changeset note describing the ping behavior fix (no code change).

Sequence Diagram(s)

sequenceDiagram
  participant Controller as DenoRuntimeSubprocessController
  participant LM as LivenessManager
  participant PM as ProcessMessenger
  participant SP as Subprocess

  Controller->>Controller: emit 'constructed' after app build
  Controller->>LM: attach(deno) (on 'constructed' -> start)

  loop Subprocess messages
    SP->>PM: stdout (JSON-RPC or pong)
    PM->>Controller: parseStdout -> emit 'heartbeat'
    Controller->>LM: 'heartbeat' event
    LM->>LM: update lastHeartbeatTimestamp
  end

  alt heartbeat stale (watchdog fires)
    LM->>PM: send(COMMAND_PING)
    PM->>SP: write ping
    SP->>PM: pong
    PM->>Controller: parseStdout -> emit 'heartbeat'
    Controller->>LM: 'heartbeat' -> reset counters
  else heartbeat recent
    LM->>LM: skip ping
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • LivenessManager.ts: AbortController/timeouts, watchdog lifecycle, restart logic and counter handling.
    • ProcessMessenger.ts: widened Message type and send strategy transitions.
    • AppsEngineDenoRuntime.ts: emission points for 'constructed' and 'heartbeat'.
    • Tests: fake timers, event wiring and fixture correctness.

Possibly related PRs

Suggested labels

stat: ready to merge, stat: QA assured

Suggested reviewers

  • ggazzo
  • sampaiodiego

Poem

🐰 I tapped a tiny ping with care,
A heartbeat echoed through the air,
Constructed signals, watchdog near,
I guard the subprocess, never fear.
Hop on — the liveness dance is here!

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preventing ping requests when apps are active, which aligns with the changeset and PR objectives.
Linked Issues check ✅ Passed The PR implements the core requirements from ARCH-1807: it treats inbound messages as liveness signals via heartbeat events, prevents unnecessary pings when active, and implements idle-based ping behavior to reduce false-positive restarts.
Out of Scope Changes check ✅ Passed All changes are scoped to the ping/liveness mechanism: heartbeat event emission in DenoRuntimeSubprocessController, idle-based ping logic in LivenessManager, ping message support in ProcessMessenger, and corresponding tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/apps-ping-on-idle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dionisio-bot
Copy link
Contributor

dionisio-bot bot commented Nov 13, 2025

Looks like this PR is ready to merge! 🎉
If you have any trouble, please check the PR guidelines

@d-gubert d-gubert added this to the 7.13.0 milestone Nov 13, 2025
@d-gubert
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (1)

47-49: Consider initializing lastHeartbeatTimestamp to 0 instead of NaN.

While start() sets this to Date.now() before first use, initializing to NaN could be confusing when reading runtime data before start() is called. Using 0 or -1 would be more conventional and safer for numeric comparisons.

Apply this diff to improve clarity:

-	private lastHeartbeatTimestamp = NaN;
+	private lastHeartbeatTimestamp = 0;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c390d5c and 1e822ad.

📒 Files selected for processing (4)
  • packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (2 hunks)
  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (10 hunks)
  • packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (2 hunks)
  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure tests run reliably in parallel without shared state conflicts

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (test, page, expect) consistently

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use descriptive test names that clearly communicate expected behavior

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Maintain test isolation between test cases

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Group related tests in the same file

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: When generating tests, provide complete runnable TypeScript test files with proper imports, clear describe/test blocks, and adherence to these guidelines

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use test.beforeAll() and test.afterAll() for setup and teardown

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
📚 Learning: 2025-09-16T22:08:51.490Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-09-16T22:08:51.490Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure a clean state for each test execution

Applied to files:

  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: 📦 Meteor Build (coverage)
🔇 Additional comments (17)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (7)

7-7: LGTM!

Exporting COMMAND_PING enables consistent usage across modules and prevents duplication.


10-11: LGTM!

The renamed options (pingTimeoutInMS, pingIntervalInMS) are more descriptive and explicitly indicate the unit, improving code clarity.

Also applies to: 30-33


75-77: LGTM!

The heartbeat listener correctly updates the timestamp, enabling the watchdog to track subprocess liveness through activity rather than periodic pings.


86-95: LGTM!

Exposing lastHeartbeatTimestamp in runtime data improves observability of the liveness mechanism.


97-106: LGTM!

Deferring start() until the 'constructed' event ensures the watchdog begins only after the subprocess is fully initialized, aligning with the new event-driven lifecycle.


108-120: LGTM! Core feature implemented correctly.

The watchdog mechanism correctly implements idle detection: pings are sent only when no heartbeat has been received within pingIntervalInMS. The unref() call ensures the timer doesn't prevent process exit.


135-187: LGTM!

The refactored ping() method correctly:

  • Updates lastHeartbeatTimestamp on successful pong (line 144)
  • Adds abort handling to enable clean cancellation (lines 155-158)
  • Cleans up listeners in the finally block (lines 180-182)

The changes maintain existing timeout/restart logic while improving resource management.

packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (4)

16-16: LGTM!

Updating _sendStrategy to accept Message type correctly broadens support to include COMMAND_PING while maintaining type safety.


22-24: LGTM!

Replacing the getter-based API with a direct method improves code clarity while maintaining the same call signature for clients.


5-9: The current import syntax is correct and requires no changes.

The code uses import type { COMMAND_PING } properly. Since COMMAND_PING is a const with the value '_zPING', TypeScript infers its type as the literal type "_zPING". The typeof COMMAND_PING operator correctly extracts this type for the Message union. The COMMAND_PING value itself is never used at runtime in this file—only its type is needed for type annotations. This is valid TypeScript and follows best practices for type-only imports. The successful compilation confirms there are no issues.

Likely an incorrect or invalid review comment.


50-56: No changes needed. The @msgpack/msgpack Encoder's encode method signature accepts unknown type, which safely handles both JsonRpc objects and string values like COMMAND_PING. Line 55 is type-safe and will work correctly.

packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (2)

262-264: LGTM!

Emitting 'constructed' after the app construction completes enables the LivenessManager to start the watchdog at the right time, ensuring the liveness mechanism begins only when the subprocess is fully initialized.


665-665: LGTM!

Emitting 'heartbeat' for every JSON-RPC message (but not PONG) correctly signals subprocess activity to the LivenessManager. This allows the watchdog to skip pings when the subprocess is actively communicating, which is the core feature of this PR.

packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (4)

43-146: LGTM!

The test fixture setup is comprehensive and well-structured:

  • Proper use of fake timers for deterministic testing
  • Thorough mocking of dependencies
  • Appropriate setup/teardown lifecycle
  • Fast timing values for quick test execution

148-162: LGTM!

The test correctly verifies that:

  • start() initializes lastHeartbeatTimestamp to the current time (0 in mock time)
  • The 'heartbeat' event updates the timestamp to the current time (50ms)

164-171: LGTM!

The test correctly verifies that a ping is sent when the heartbeat becomes stale (no heartbeat received within pingIntervalInMS).


173-212: LGTM!

The remaining tests correctly verify:

  1. No ping is sent when a heartbeat occurs mid-interval, keeping the subprocess "fresh"
  2. Successful ping/pong cycle resets the consecutive timeout counter and updates the heartbeat timestamp

The test coverage adequately validates the new watchdog behavior.

@d-gubert d-gubert force-pushed the fix/apps-ping-on-idle branch 2 times, most recently from cba3df1 to 13b9a83 Compare November 13, 2025 20:33
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.99%. Comparing base (0691514) to head (67fb2ff).
⚠️ Report is 5 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #37496      +/-   ##
===========================================
+ Coverage    68.97%   68.99%   +0.02%     
===========================================
  Files         3358     3358              
  Lines       114240   114241       +1     
  Branches     20537    20537              
===========================================
+ Hits         78792    78823      +31     
+ Misses       33359    33329      -30     
  Partials      2089     2089              
Flag Coverage Δ
e2e 57.47% <ø> (+0.02%) ⬆️
e2e-api 43.77% <ø> (+0.96%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@d-gubert d-gubert marked this pull request as ready for review November 13, 2025 21:53
@d-gubert d-gubert requested a review from a team as a code owner November 13, 2025 21:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (1)

263-264: Consider adding a blank line for readability.

The emit('constructed') call logically concludes the setup phase after the construction RPC. Adding a blank line before it would visually separate the RPC call from the lifecycle event emission, improving readability.

Apply this diff:

 	await this.sendRequest({ method: 'app:construct', params: [this.appPackage] });
+
 	this.emit('constructed');
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1e822ad and 13b9a83.

📒 Files selected for processing (4)
  • packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (2 hunks)
  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (11 hunks)
  • packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (2 hunks)
  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-13T01:52:59.469Z
Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 37496
File: packages/apps-engine/src/server/runtime/deno/LivenessManager.ts:122-126
Timestamp: 2025-11-13T01:52:59.469Z
Learning: In JavaScript and Node.js, clearTimeout() and clearInterval() are interchangeable because setTimeout() and setInterval() share the same ID pool. Using clearTimeout() on an interval timer or clearInterval() on a timeout timer works correctly, though using matching pairs (setTimeout/clearTimeout, setInterval/clearInterval) is preferred for code clarity.

Applied to files:

  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts
🧬 Code graph analysis (1)
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (3)
packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (1)
  • DenoRuntimeSubprocessController (90-707)
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (1)
  • ProcessMessenger (11-59)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (2)
  • LivenessManager (21-240)
  • COMMAND_PING (7-7)
🔇 Additional comments (15)
packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (1)

665-666: LGTM! Heartbeat emission aligns with PR objectives.

Emitting the heartbeat event for every valid non-PONG message from the subprocess correctly signals ongoing activity and prevents unnecessary pings when the subprocess is actively communicating. This placement before message handling ensures that liveness is recorded regardless of processing outcomes.

packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (4)

41-132: Excellent test fixture design!

The test fixture is well-structured with:

  • Proper lifecycle management (setup/teardown at both fixture and test levels)
  • Comprehensive mocks that faithfully implement the EventEmitter interfaces
  • Fake timers for deterministic timing behavior
  • Fast timeout/interval values to keep tests quick

This design enables thorough testing of the LivenessManager's timing-sensitive behavior.


146-160: LGTM! Heartbeat timestamp tracking is correctly validated.

The test properly verifies that:

  1. The timestamp is initialized to 0 when start() is called (via the constructed event)
  2. The timestamp updates to the current time when a heartbeat event is emitted

162-185: LGTM! Ping behavior tests validate the core PR objective.

These tests correctly verify:

  1. testPingWhenHeartbeatStale: A ping is sent when no heartbeat has been received for the full interval
  2. testNoPingWhenHeartbeatRecent: A recent heartbeat prevents an unnecessary ping

This validates that the watchdog only pings when the subprocess appears idle, as intended by the PR.


187-210: LGTM! Successful ping/pong validation is thorough.

The test correctly validates the complete ping/pong cycle:

  1. Ping is sent after the interval elapses
  2. A simulated pong response resets the consecutive timeout counter
  3. The heartbeat timestamp is updated to reflect the successful round-trip

This ensures that successful pings maintain the liveness state and prevent unnecessary restarts.

packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (10)

7-7: LGTM! Exporting COMMAND_PING improves testability.

Exporting the constant enables tests and other modules (like ProcessMessenger) to reference the canonical ping command value, avoiding duplication and ensuring consistency.


10-11: LGTM! Option names now explicitly include time units.

Renaming pingRequestTimeout to pingTimeoutInMS and pingFrequencyInMS to pingIntervalInMS makes the units explicit and improves code clarity.


47-49: LGTM! New fields support heartbeat-aware liveness tracking.

The additions are well-designed:

  • watchdogTimeout properly tracks the interval for cleanup
  • lastHeartbeatTimestamp initialized to NaN clearly indicates an uninitialized state until start() is called

75-77: LGTM! Heartbeat listener correctly updates liveness timestamp.

Setting up the heartbeat listener in the constructor ensures that every subprocess heartbeat updates the timestamp, allowing the watchdog to detect ongoing activity and skip unnecessary pings.


87-90: LGTM! Runtime data now includes heartbeat timestamp for observability.

Exposing lastHeartbeatTimestamp via getRuntimeData() enables monitoring and debugging of the liveness mechanism, consistent with the existing metrics.


104-106: LGTM! Deferring start until construction completes is correct.

Listening for the constructed event ensures that liveness monitoring only begins after the app subprocess is fully initialized, preventing premature pings during setup.


108-120: LGTM! Watchdog implements smart idle-only pinging.

The watchdog correctly:

  1. Initializes the heartbeat timestamp when monitoring starts
  2. Checks elapsed time since the last heartbeat before pinging
  3. Skips pings when recent activity has been detected
  4. Uses unref() to avoid blocking process shutdown

This implementation aligns perfectly with the PR objective of preventing pings when the subprocess is actively communicating.


122-126: LGTM! Clean shutdown of liveness monitoring.

The stop() method properly:

  1. Aborts any in-flight ping operations
  2. Clears the watchdog interval
  3. Resets the timeout reference

This ensures graceful cleanup of monitoring resources.


138-186: LGTM! Enhanced ping method with proper async handling and cleanup.

The updated ping() method properly:

  1. Returns a Promise<void> for better async coordination
  2. Updates lastHeartbeatTimestamp on successful pong to maintain liveness state
  3. Handles abort signals for graceful cancellation
  4. Cleans up abort listeners in the finally block to prevent memory leaks
  5. Uses the renamed pingTimeoutInMS option

The internal error handling (restart on timeout limit, silent abort on stop) is appropriate for the watchdog pattern.


194-195: LGTM! Using stop() in handleExit ensures comprehensive cleanup.

Calling stop() instead of just emitting abort ensures that both the watchdog interval and any in-flight pings are properly cleaned up when the subprocess exits, improving maintainability.

@d-gubert d-gubert force-pushed the fix/apps-ping-on-idle branch from 6a6505c to c3a50cf Compare November 16, 2025 00:01
@d-gubert d-gubert force-pushed the fix/apps-ping-on-idle branch from c3a50cf to e1c3baa Compare November 16, 2025 00:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a6505c and e1c3baa.

📒 Files selected for processing (5)
  • .changeset/kind-mangos-sit.md (1 hunks)
  • packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (2 hunks)
  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (11 hunks)
  • packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (2 hunks)
  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts
  • .changeset/kind-mangos-sit.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-13T01:52:59.469Z
Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 37496
File: packages/apps-engine/src/server/runtime/deno/LivenessManager.ts:122-126
Timestamp: 2025-11-13T01:52:59.469Z
Learning: In JavaScript and Node.js, clearTimeout() and clearInterval() are interchangeable because setTimeout() and setInterval() share the same ID pool. Using clearTimeout() on an interval timer or clearInterval() on a timeout timer works correctly, though using matching pairs (setTimeout/clearTimeout, setInterval/clearInterval) is preferred for code clarity.

Applied to files:

  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts
🧬 Code graph analysis (2)
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (1)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (1)
  • COMMAND_PING (7-7)
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (3)
packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (1)
  • DenoRuntimeSubprocessController (90-707)
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (1)
  • ProcessMessenger (11-59)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (2)
  • LivenessManager (21-243)
  • COMMAND_PING (7-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 📦 Build Packages
  • GitHub Check: CodeQL-Build
  • GitHub Check: CodeQL-Build
🔇 Additional comments (4)
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (2)

10-10: COMMAND_PING import addresses previous review feedback.

The test now correctly imports COMMAND_PING from the LivenessManager module instead of duplicating the constant, which ensures consistency and eliminates the risk of tests using an incorrect value if the constant changes.


146-252: Excellent test coverage of the watchdog behavior.

The test suite comprehensively verifies the new idle-based ping mechanism:

  • testHeartbeatUpdatesTimestamp confirms heartbeat events update the timestamp
  • testPingWhenHeartbeatStale verifies pings are sent when the subprocess is idle
  • testNoPingWhenHeartbeatRecent validates that pings are suppressed when heartbeat is recent (core PR objective)
  • testSuccessfulPingPong tests the full ping/pong cycle
  • testConsecutiveTimeouts verifies timeout tracking and reset behavior

This directly validates the PR's goal of preventing unnecessary pings when apps are active.

packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (2)

112-124: Watchdog implementation correctly prevents unnecessary pings.

The start() method implements the core PR objective: it only sends pings when the subprocess has been idle (no heartbeat within pingIntervalInMS). The check on lines 116-118 compares the time since last heartbeat against the ping interval, skipping the ping if a recent heartbeat was received. This prevents the false-positive restarts described in the PR objectives where busy subprocesses couldn't respond to pings quickly enough.


78-81: Heartbeat tracking reduces false-positive restarts.

The heartbeat event listener updates lastHeartbeatTimestamp and resets the timeout counter, treating any inbound subprocess message as a liveness signal. This is a key improvement that prevents unnecessary restarts of active subprocesses, directly addressing the issue described in ARCH-1807.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1c3baa and 06bb922.

📒 Files selected for processing (3)
  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (11 hunks)
  • packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (2 hunks)
  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-13T01:52:59.469Z
Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 37496
File: packages/apps-engine/src/server/runtime/deno/LivenessManager.ts:122-126
Timestamp: 2025-11-13T01:52:59.469Z
Learning: In JavaScript and Node.js, clearTimeout() and clearInterval() are interchangeable because setTimeout() and setInterval() share the same ID pool. Using clearTimeout() on an interval timer or clearInterval() on a timeout timer works correctly, though using matching pairs (setTimeout/clearTimeout, setInterval/clearInterval) is preferred for code clarity.

Applied to files:

  • packages/apps-engine/src/server/runtime/deno/LivenessManager.ts
🧬 Code graph analysis (1)
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (1)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (1)
  • COMMAND_PING (7-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 📦 Build Packages
  • GitHub Check: CodeQL-Build
  • GitHub Check: CodeQL-Build
🔇 Additional comments (16)
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (5)

9-9: LGTM: Clean type union for message variants.

The Message type union appropriately models the two message types that can be sent through the messenger. Once the import on line 5 is corrected, this will properly type COMMAND_PING as the literal string "_zPING".


16-16: LGTM: Strategy signature widened correctly.

Updating the _sendStrategy signature to accept Message instead of JsonRpc is the correct approach to support both JSON-RPC messages and ping commands.


22-24: LGTM: Simplified API from getter to direct method.

The change from a getter-based accessor to a direct method call simplifies the API and is more intuitive. The call signature remains compatible (messenger.send(message)), and the new signature properly enforces the Message type constraint.


50-52: LGTM: Strategy method parameter types updated consistently.

The parameter type update to Message maintains consistency with the _sendStrategy signature.


54-56: No action required — encoder compatibility is confirmed.

The @msgpack/msgpack Encoder class natively supports serializing both objects (JsonRpc) and strings (COMMAND_PING). This is standard functionality for the msgpack serialization format. The Message union type correctly defines the expected parameter types, and TypeScript enforces type safety. The change is safe and correct as-is.

packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (11)

7-7: LGTM: Ping command constant properly exported.

The COMMAND_PING constant is appropriately defined and exported for use in ProcessMessenger.ts. The value '_zPING' is distinctive enough to avoid conflicts with regular messages.


10-11: LGTM: Option names clarified and defaults are reasonable.

The renamed options (pingTimeoutInMS and pingIntervalInMS) are more descriptive. The defaults (1s timeout, 10s interval) align well with the new idle-based ping strategy described in the PR objectives.


47-52: LGTM: State fields support watchdog pattern effectively.

The new fields are well-suited for the watchdog implementation:

  • watchdogTimeout properly tracks the interval timer
  • lastHeartbeatTimestamp initialized to NaN ensures initial comparisons fail safely
  • pendingPing enables testability of async ping behavior

78-81: LGTM: Heartbeat handling implements the core optimization.

This is the key improvement from the PR objectives: treating inbound messages as liveness signals. Resetting both the timestamp and the consecutive timeout counter on heartbeat events ensures the watchdog only pings idle processes.


90-99: LGTM: Runtime data expanded to include heartbeat timestamp.

Adding lastHeartbeatTimestamp to the runtime data provides valuable observability for monitoring subprocess liveness.


101-110: LGTM: Watchdog starts only after app construction.

Deferring the watchdog start until the 'constructed' event ensures the liveness checks don't begin until the subprocess is fully initialized, preventing false positives during startup.


112-131: LGTM: Watchdog implements idle-based ping correctly.

The watchdog logic effectively implements the PR's core objective:

  • Line 116-118: Only pings when lastHeartbeatTimestamp indicates the subprocess has been idle for pingIntervalInMS
  • Line 130: unref() prevents the timer from keeping the Node process alive
  • Lines 120-127: Graceful handling of synchronous send failures

This ensures busy subprocesses aren't interrupted with unnecessary pings.


133-138: LGTM: Stop method cleanly tears down watchdog state.

The stop() method properly:

  • Aborts any in-flight ping via the event emitter
  • Clears the watchdog interval (note: clearInterval is preferred over clearTimeout for clarity, though they're interchangeable)
  • Nulls out tracking state

140-142: LGTM: Test accessor for ping state.

The getPendingPing() method appropriately exposes the pending ping promise for test validation.


151-201: LGTM: Comprehensive ping implementation with proper cleanup.

The refactored ping() method correctly:

  • Tracks pending ping state for testability (line 154)
  • Updates lastHeartbeatTimestamp on successful pong (line 160)
  • Handles timeout, abort, and success paths appropriately
  • Cleans up event listeners in the finally block (line 197)
  • Properly sequences the promise setup before sending the ping command (line 200)

The logic correctly implements the watchdog pattern and integrates with the abort controller.


229-230: LGTM: Restart properly stops watchdog first.

Calling stop() at the beginning of restartProcess() ensures:

  • The watchdog interval is cleared
  • Any pending ping is aborted
  • State is properly reset before the restart attempt

This prevents the watchdog from interfering during the restart sequence.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (1)

41-51: Consider fully restoring real timers after the fixture

Using mock.timers.enable() in setupFixture and mock.timers.reset() in teardownFixture is fine, but it may leave timer mocking enabled for other tests in the same process. Consider calling mock.timers.disable() (or equivalent) in teardownFixture to fully restore real timers, if your Node version supports it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 06bb922 and 67fb2ff.

📒 Files selected for processing (1)
  • packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (3)
packages/apps-engine/src/server/runtime/deno/AppsEngineDenoRuntime.ts (1)
  • DenoRuntimeSubprocessController (90-707)
packages/apps-engine/src/server/runtime/deno/ProcessMessenger.ts (1)
  • ProcessMessenger (11-57)
packages/apps-engine/src/server/runtime/deno/LivenessManager.ts (2)
  • LivenessManager (21-254)
  • COMMAND_PING (7-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: 📦 Build Packages
  • GitHub Check: CodeQL-Build
  • GitHub Check: CodeQL-Build
🔇 Additional comments (7)
packages/apps-engine/tests/server/runtime/deno/LivenessManager.spec.ts (7)

53-144: Mock wiring and cleanup for controller/subprocess look solid

The setup/teardown logic for the mock controller, subprocess, messenger, and the send spy is consistent with how LivenessManager is described to behave. Event listener delegation via local EventEmitter instances plus livenessManager.stop() and removeAllListeners() in teardown should prevent leaked timers or listeners.


146-160: Heartbeat timestamp test matches intended semantics

This test correctly asserts that lastHeartbeatTimestamp starts at the initial mocked time (0) and moves forward on heartbeat, aligned with how the manager tracks liveness using heartbeats plus mocked time.


162-169: Ping-on-stale-heartbeat behavior is well covered

Advancing time by exactly one ping interval after constructed and asserting a single COMMAND_PING send verifies the watchdog’s basic “stale heartbeat ⇒ ping” behavior cleanly.


171-185: Recent heartbeat suppressing ping is correctly tested

Ticking half the interval, emitting heartbeat, then ticking the remaining half and asserting no ping ensures the “active subprocess ⇒ no ping” rule is enforced, which directly addresses the PR’s objective of avoiding pings while the app is busy.


187-210: Successful ping/pong flow and counters are validated well

This test robustly verifies that a ping is sent, a delayed pong resets pingTimeoutConsecutiveCount, and lastHeartbeatTimestamp tracks the simulated latency using mocked time. It closely matches the documented ping() behavior.


212-252: Consecutive timeout tracking and reset logic look correct

The test exercises multiple ping timeouts, checks that pingTimeoutConsecutiveCount increments as expected, and validates that a subsequent heartbeat both suppresses the next ping and resets the timeout counter back to 0. This gives good coverage of the timeout bookkeeping.


254-298: Restart-on-timeout-limit behavior is accurately asserted

By lowering consecutiveTimeoutLimit, forcing two timeouts, awaiting getPendingPing() each time, and spying on restartProcess, this test cleanly validates that hitting the configured timeout limit triggers a restart with the expected reason string. Using the public getRuntimeData() for counter checks keeps it aligned with the manager’s exposed diagnostics.

@ggazzo ggazzo added the stat: QA assured Means it has been tested and approved by a company insider label Nov 17, 2025
@dionisio-bot dionisio-bot bot added the stat: ready to merge PR tested and approved waiting for merge label Nov 17, 2025
@ggazzo ggazzo merged commit ec22d9c into develop Nov 17, 2025
87 of 89 checks passed
@ggazzo ggazzo deleted the fix/apps-ping-on-idle branch November 17, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stat: QA assured Means it has been tested and approved by a company insider stat: ready to merge PR tested and approved waiting for merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants