Skip to content

Conversation

@basvandijk
Copy link
Collaborator

@basvandijk basvandijk commented Dec 30, 2025

What?

Automate the optional setup of the PrometheusVm by either setting enable_metrics = True in the system_test bazel macro or running a test with bazel test //rs/tests/my_test --test_env=ENABLE_METRICS=1.

Why?

Automating the setup of the PrometheusVm:

  • Reduces code since no manual setup code like the following is needed in setup functions anymore:
    PrometheusVm::default()
        .start(&env)
        .expect("failed to start prometheus VM");
    ...
    env.sync_with_prometheus();
  • Reduces setup time since the PrometheusVm is started in parallel with the rest of the testnet setup.
  • The env.sync_with_prometheus() is now done periodically in the background meaning that topology changes are automatically picked up by Prometheus.
  • Since the setup of the PrometheusVm can now be done dynamically via --test_env=ENABLE_METRICS=1 you can quickly enable it for a test where you need some metrics for debugging without touching any code and thus no waiting for a rebuild.

How?

When ENABLE_METRICS=1 the test driver will run a metrics_setup task in parallel with the setup task which will start a PrometheusVm. Additionally a metrics_sync task is run in the background which will periodically call env.sync_with_prometheus() to sync the IC topology with the Prometheus configuration.

New Prometheus target JSON files are only uploaded to the PrometheusVm when they're different (i.e. have a different hash) than the already uploaded files.

Future Work

Automatically execute env.download_prometheus_data_dir_if_exists() on teardown like in rs/tests/consensus/consensus_performance.rs.

@github-actions github-actions bot added the feat label Dec 30, 2025
@basvandijk basvandijk added the CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 label Jan 5, 2026
@basvandijk basvandijk marked this pull request as ready for review January 5, 2026 13:34
@basvandijk basvandijk requested review from a team as code owners January 5, 2026 13:34
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

Copy link
Contributor

@NikolaMilosa NikolaMilosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite different to how logs are done:

  1. these rely on an environment variable presence
  2. metrics default to disabled whereas logs default to enabled (no issues here as these serve different purpose)

Should we do some work to move logs to rely on environment variables as well?

@basvandijk
Copy link
Collaborator Author

@NikolaMilosa

Should we do some work to move logs to rely on environment variables as well?

The advantage of environment variables for metrics is that you can enable them dynamically without touching any code. We don't need this for logs since logs should always be enabled. Additionally for logs setting an environment variable is not enough since there we also need to depend on extra runtime dependencies.

Copy link
Contributor

@mbjorkqvist mbjorkqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @basvandijk !

@basvandijk basvandijk removed the CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 label Jan 6, 2026
Copy link
Contributor

@daniel-wong-dfinity-org daniel-wong-dfinity-org left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for rs/tests/nns/sns/lib/src/sns_deployment.rs, the one file owned by Governance team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants