-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[WIP] monit: investigating tests again - using copilot on this one #11255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] monit: investigating tests again - using copilot on this one #11255
Conversation
|
ha, copilot cannot be used and the PR comes from one repo to another... 🤦 |
|
I managed to get Copilot to work locally in VSCode and to analyse the issue. It seems the problem is with a specific range of versions of Monit (5.34.x). Here is Copilot's summary: Investigation Results: Monit 5.34.x RegressionAfter analyzing the CI failures, I've identified the root cause is a known bug in monit version 5.34.x. Test Failure AnalysisBuild #165134 (20251203.7) Results:
Upstream Bug EvidenceAccording to the official monit changelog: Version 5.34.1 (Oct 4 2024):
Version 5.35.0 (Apr 10 2025) - Fixes:
ConclusionThe test failures are NOT caused by:
The failures are correctly identifying platforms running the broken monit versions (5.34.1, 5.34.2, 5.34.3, 5.34.4). The tests appropriately receive SIGTERM when monit hangs during operation. The tests are marked as RecommendationNo changes needed to the monit module or tests. The current |
|
Copilot is quick to make strong statements and declarations, I would take those with a pinch of salt, but it does look like there is something there. Next step is to try blocking those versions from being installed and see what happens. |
58badc8 to
e01a548
Compare
|
Ok, changes the integration test to skip the test when the installed monit is 5.34.x. First run, all checks passed. |
|
Second run, all passed. |
|
I have sampled the logs - the tests are running normally when monit is not 5.34, for both lower and higher versions. Will close and reopen a couple of times more, for the sake of statistics, but it is looking good. |
|
Round 3, all checks passed |
|
Round 9 (with delay): Ansible devel, Ubuntu 24.04 ... and there goes our nice theory down the drain. Exact same error, in the exact same place. Monit 5.33 so it's not the version - though it looks like the 5.34 jobs were big time offenders. I will resume this tomorrow. |
|
Round 10, all checks passed. |
|
Round 11, all checks passed. |
|
Round 12, FAIL with SIGTERM(rc=-15)
So, this is most likely NOT a problem with |
The wait task was checking 'monit status' (general), but the actual failing command is 'monit status -B httpd_echo' (service-specific). This causes a race where general status succeeds but service queries fail. Update to check the exact command format that will be used.
|
Round 13, all checks passed. Again! |
|
Round 14, all checks passed |
The version restriction was based on incorrect diagnosis. The actual issue was the readiness check validating general status instead of service-specific queries. Now that we check the correct command format, the tests should work across all monit versions.
|
Round 15 failed for debian 13 and alpine 3.21 (both running monit 5.34.x) |
|
Round 16 - failed for debian 13 (5.34.3) |
|
Round 17 - FAIL, Alpine 3.22 (Monit 5.35) |
|
Round 18 - FAIL for Debian 13 (Monit 5.34) |
After the readiness check succeeds, add a 1-second pause before running actual tests. Monit 5.34.x and 5.35 appear to have a concurrency issue where rapid successive 'monit status -B' calls can cause hangs even though the first call succeeds.
SUMMARY
This PR will try to leverage Copilot to find why monit tests fail/pass inconsistently.
References:
ISSUE TYPE
COMPONENT NAME
monit