-
Notifications
You must be signed in to change notification settings - Fork 371
feat: automatically deploy a Prometheus VM in testnets #8201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):
-
Update
unreleased_changelog.md(if there are behavior changes, even if they are
non-breaking). -
Are there BREAKING changes?
-
Is a data migration needed?
-
Security review?
How to Satisfy This Automatic Review
-
Go to the bottom of the pull request page.
-
Look for where it says this bot is requesting changes.
-
Click the three dots to the right.
-
Select "Dismiss review".
-
In the text entry box, respond to each of the numbered items in the previous
section, declare one of the following:
-
Done.
-
$REASON_WHY_NO_NEED. E.g. for
unreleased_changelog.md, "No
canister behavior changes.", or for item 2, "Existing APIs
behave as before.".
Brief Guide to "Externally Visible" Changes
"Externally visible behavior change" is very often due to some NEW canister API.
Changes to EXISTING APIs are more likely to be "breaking".
If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.
If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.
Reference(s)
For a more comprehensive checklist, see here.
GOVERNANCE_CHECKLIST_REMINDER_DEDUP
NikolaMilosa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite different to how logs are done:
- these rely on an environment variable presence
- metrics default to disabled whereas logs default to enabled (no issues here as these serve different purpose)
Should we do some work to move logs to rely on environment variables as well?
The advantage of environment variables for metrics is that you can enable them dynamically without touching any code. We don't need this for logs since logs should always be enabled. Additionally for logs setting an environment variable is not enough since there we also need to depend on extra runtime dependencies. |
mbjorkqvist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @basvandijk !
daniel-wong-dfinity-org
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for rs/tests/nns/sns/lib/src/sns_deployment.rs, the one file owned by Governance team.
What?
Automate the optional setup of the PrometheusVm by either setting
enable_metrics = Truein thesystem_testbazel macro or running a test withbazel test //rs/tests/my_test --test_env=ENABLE_METRICS=1.Why?
Automating the setup of the PrometheusVm:
env.sync_with_prometheus()is now done periodically in the background meaning that topology changes are automatically picked up by Prometheus.--test_env=ENABLE_METRICS=1you can quickly enable it for a test where you need some metrics for debugging without touching any code and thus no waiting for a rebuild.How?
When
ENABLE_METRICS=1the test driver will run ametrics_setuptask in parallel with thesetuptask which will start a PrometheusVm. Additionally ametrics_synctask is run in the background which will periodically callenv.sync_with_prometheus()to sync the IC topology with the Prometheus configuration.New Prometheus target JSON files are only uploaded to the PrometheusVm when they're different (i.e. have a different hash) than the already uploaded files.
Future Work
Automatically execute
env.download_prometheus_data_dir_if_exists()on teardown like inrs/tests/consensus/consensus_performance.rs.