Skip to content

Conversation

@ktf
Copy link
Member

@ktf ktf commented Dec 10, 2025

This anticipates the forwarding to the earliest possible moment, i.e. when
we are about to insert the messages in a slot. This is the earliest moment
we can guarantee messages will be seen only once.

@ktf ktf requested a review from a team as a code owner December 10, 2025 20:07
@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@ktf
Copy link
Member Author

ktf commented Dec 10, 2025

@shahor02 this works in my synthetic tests (stage/bin/o2-testworkflows-early-forwarding -s --severity detail --early-forward-policy=always) . In the end I refactored the code to find the earliest spot where messages are guaranteed to be seen only once and I moved the early forward there.

@davidrohr @shahor02 I have noticed that the early forwarding is disabled by default. Is this expected?

@ktf
Copy link
Member Author

ktf commented Dec 10, 2025

@jgrosseo @nicolaspoffley I expect this to improve parallelism on hyperloop as well.

@shahor02
Copy link
Collaborator

@ktf for me it is not expected that the EF is disabled, when I was debugging the slow turnover of Polaris jobs, I thought the forwarding is done at the beginning of run method. Was not this the supposed behaviour of the EF?

@ktf
Copy link
Member Author

ktf commented Dec 10, 2025

@shahor02 I need to have a better look. Maybe it's just my small reproducer to be wrong.

I also see there is some issues with some of the tests. I will debug better tomorrow morning.

@alibuild
Copy link
Collaborator

alibuild commented Dec 11, 2025

Error while checking build/O2/fullCI_slc9 for 13018ab at 2025-12-11 02:12:

No log files found

Full log here.

@ktf
Copy link
Member Author

ktf commented Dec 11, 2025

Ok, fixed the off by one issue with multiparts.

@alibuild
Copy link
Collaborator

alibuild commented Dec 11, 2025

Error while checking build/O2/fullCI_slc9 for f6dfcce at 2026-01-06 21:52:

## sw/BUILD/o2codechecker-latest/log
100% tests passed, 0 tests failed out of 1


## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc9_x86-64/O2/14910-slc9_x86-64-local9/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 137
[ERROR] Workflow crashed - PID 8327 (EMCALRawToCellConverterSpec) did not exit correctly however it's not clear why. Exit code forced to 128.
[ERROR] Unable to pass configuration to children
[8912:TRD-Digits-proxy]: [FATAL] error while setting up workflow in o2-qc: Error while parsing serialised workflow
[ERROR] Workflow crashed - PID 8890 (PHS-ClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8799 (MFT-MFTAsyncTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8785 (GLO-MUONTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8874 (MFT-MFTClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8807 (TRD-PHTrackMatch-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8883 (MID-QcTaskMIDTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8857 (ITS-ITSClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8839 (GLO-MTCITSTPC-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8824 (EMC-RawTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8876 (MID-QcTaskMIDClust-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8901 (TOF-TaskDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8826 (FDD-DigitQcTaskFDD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8868 (ITS-ITSTrackTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8916 (TRD-Tracklets-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8922 (ZDC-QcZDCRecTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8810 (TRD-RawData-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8869 (MCH-QcTaskMCHDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8782 (CPV-PhysicsOnEPNs-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8812 (TRD-Tracking-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8880 (MID-QcTaskMIDDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8907 (TPC-Tracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8851 (GLO-Vertexing-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8892 (TOF-MatchingTOFwTRD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8815 (EMC-CellTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8905 (TPC-Clusters-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8805 (TPC-PID-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8836 (FT0-DigitQcTaskFT0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8838 (FV0-DigitQcTaskFV0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8692 (qc-task-TRD-Digits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8923 (internal-dpl-injected-dummy-sink) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8774 (qc-task-TRD-Tracklets) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8661 (qc-task-PHS-ClusterTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8691 (qc-task-TPC-Tracks) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8676 (qc-task-TOF-MatchingTOFwTRD) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8689 (qc-task-TPC-Clusters) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8645 (qc-task-MID-QcTaskMIDDigits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8658 (qc-task-MID-QcTaskMIDTracks) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8323 (internal-dpl-ccdb-backend) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8777 (qc-task-ZDC-QcZDCRecTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8912 (TRD-Digits-proxy) was killed abnormally with Killed and exited code was set to 137.
[0 more errors; see full log]

Full log here.

@davidrohr
Copy link
Collaborator

@shahor02 this works in my synthetic tests (stage/bin/o2-testworkflows-early-forwarding -s --severity detail --early-forward-policy=always) . In the end I refactored the code to find the earliest spot where messages are guaranteed to be seen only once and I moved the early forward there.

@davidrohr @shahor02 I have noticed that the early forwarding is disabled by default. Is this expected?

For online and offline reco we enable it here: https://github.com/davidrohr/O2DPG/blob/a5af1be2a96bbe3b2eeb2cf13d41c4afd1b81e4a/DATA/common/getCommonArgs.sh#L12

@shahor02
Copy link
Collaborator

@ktf this seems to be genuine crash:

[8369:EMCALRawToCellConverterSpec]: [14:43:54][INFO] Correctly handshaken websocket connection.
[8369:EMCALRawToCellConverterSpec]: [14:43:59][WARN] Timed out sending after 1s. Downstream backpressure detected on from_EMCALRawToCellConverterSpec_to_Dispatcher[0].
[8369:EMCALRawToCellConverterSpec]: [14:44:02][INFO] Downstream backpressure on from_EMCALRawToCellConverterSpec_to_Dispatcher[0] recovered.
[8369:EMCALRawToCellConverterSpec]: *** Program crashed (Segmentation fault)
[8369:EMCALRawToCellConverterSpec]: Backtrace by DPL:
[8369:EMCALRawToCellConverterSpec]: Executable is /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/bin/o2-emcal-reco-workflow
[8369:EMCALRawToCellConverterSpec]:     /lib64/libc.so.6:     ?? ??:0
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: fair::mq::shmem::Message::Copy(fair::mq::Message const&)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: o2::framework::DataProcessingHelpers::routeForwardedMessages(o2::framework::FairMQDeviceProxy&, std::span<std::unique_ptr<fair::mq::Message, std::default_delete<fair::mq::Message> >, 18446744073709551615ul>&, std::vector<fair::mq::Parts, std::allocator<fair::mq::Parts> >&, bool, bool)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so:     ?? ??:0
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so:     ?? ??:0
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: o2::framework::DataRelayer::relay(void const*, std::unique_ptr<fair::mq::Message, std::default_delete<fair::mq::Message> >*, o2::framework::DataRelayer::InputInfo const&, unsigned long, unsigned long, std::function<void (o2::framework::ServiceRegistryRef&, std::span<std::unique_ptr<fair::mq::Message, std::default_delete<fair::mq::Message> >, 18446744073709551615ul>&)>, std::function<void (o2::framework::TimesliceSlot, std::vector<o2::framework::MessageSet, std::allocator<o2::framework::MessageSet> >&, o2::framework::TimesliceIndex::OldestOutputInfo)>)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so:     ?? ??:0
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::handleData(o2::framework::ServiceRegistryRef, o2::framework::InputChannelInfo&)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::doPrepare(o2::framework::ServiceRegistryRef)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: o2::framework::run_callback(uv_work_s*)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: o2::framework::DataProcessingDevice::Run()
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/FairMQ/v1.10.0-7/lib/libfairmq.so.1.10.0: fair::mq::Device::RunWrapper()
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/FairMQ/v1.10.0-7/lib/libfairmq.so.1.10.0: boost::detail::function::void_function_obj_invoker1<std::function<void (fair::mq::State)>, void, fair::mq::State>::invoke(boost::detail::function::function_buffer&, fair::mq::State)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/FairMQ/v1.10.0-7/lib/libfairmq.so.1.10.0: boost::signals2::detail::signal_impl<void (fair::mq::State), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (fair::mq::State)>, boost::function<void (boost::signals2::connection const&, fair::mq::State)>, boost::signals2::mutex>::operator()(fair::mq::State)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/FairMQ/v1.10.0-7/lib/libfairmq.so.1.10.0: fair::mq::fsm::Machine_::ProcessWork()
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/FairMQ/v1.10.0-7/lib/libfairmq.so.1.10.0: fair::mq::StateMachine::ProcessWork()
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/FairMQ/v1.10.0-7/lib/libfairmq.so.1.10.0: fair::mq::DeviceRunner::Run()
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: doChild(int, char**, o2::framework::ServiceRegistry&, o2::framework::RunningWorkflowInfo const&, o2::framework::RunningDeviceRef, o2::framework::DriverConfig const&, o2::framework::ProcessingPolicies, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uv_loop_s*)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: runStateMachine(std::vector<o2::framework::DataProcessorSpec, std::allocator<o2::framework::DataProcessorSpec> > const&, WorkflowInfo const&, std::vector<o2::framework::DataProcessorInfo, std::allocator<o2::framework::DataProcessorInfo> > const&, o2::framework::CommandInfo const&, o2::framework::DriverControl&, o2::framework::DriverInfo&, o2::framework::DriverConfig&, std::vector<o2::framework::DeviceMetricsInfo, std::allocator<o2::framework::DeviceMetricsInfo> >&, std::vector<o2::framework::ConfigParamSpec, std::allocator<o2::framework::ConfigParamSpec> > const&, boost::program_options::variables_map&, std::vector<o2::framework::ServiceSpec, std::allocator<o2::framework::ServiceSpec> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: doMain(int, char**, std::vector<o2::framework::DataProcessorSpec, std::allocator<o2::framework::DataProcessorSpec> > const&, std::vector<o2::framework::ChannelConfigurationPolicy, std::allocator<o2::framework::ChannelConfigurationPolicy> > const&, std::vector<o2::framework::CompletionPolicy, std::allocator<o2::framework::CompletionPolicy> > const&, std::vector<o2::framework::DispatchPolicy, std::allocator<o2::framework::DispatchPolicy> > const&, std::vector<o2::framework::ResourcePolicy, std::allocator<o2::framework::ResourcePolicy> > const&, std::vector<o2::framework::CallbacksPolicy, std::allocator<o2::framework::CallbacksPolicy> > const&, std::vector<o2::framework::SendingPolicy, std::allocator<o2::framework::SendingPolicy> > const&, std::vector<o2::framework::ConfigParamSpec, std::allocator<o2::framework::ConfigParamSpec> > const&, std::vector<o2::framework::ConfigParamSpec, std::allocator<o2::framework::ConfigParamSpec> > const&, o2::framework::ConfigContext&)
[8369:EMCALRawToCellConverterSpec]:     o2-emcal-reco-workflow() [0x407811]:     std::vector<o2::framework::ChannelConfigurationPolicy, std::allocator<o2::framework::ChannelConfigurationPolicy> >::~vector() at stl_vector.h:735
[8369:EMCALRawToCellConverterSpec]:     /sw/slc9_x86-64/O2/14910-slc9_x86-64-local1/lib/libO2Framework.so: callMain(int, char**, int (*)(int, char**))
[8369:EMCALRawToCellConverterSpec]:     o2-emcal-reco-workflow() [0x404c59]:     main at runDataProcessing.h:220
[8369:EMCALRawToCellConverterSpec]:     /lib64/libc.so.6:     ?? ??:0
[8369:EMCALRawToCellConverterSpec]:     /lib64/libc.so.6:     ?? ??:0
[8369:EMCALRawToCellConverterSpec]:     o2-emcal-reco-workflow() [0x404cf5]:     _start at ??:?
[8369:EMCALRawToCellConverterSpec]: Backtrace complete.

@ktf
Copy link
Member Author

ktf commented Dec 11, 2025

@shahor02 indeed. I am investigating.

@ktf
Copy link
Member Author

ktf commented Dec 11, 2025

I suspect it's an issue with the back pressure. I will try to replicate.

@alibuild
Copy link
Collaborator

alibuild commented Jan 6, 2026

Error while checking build/O2/fullCI_slc9 for 71ab802 at 2026-01-07 11:52:

## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc9_x86-64/O2/14910-slc9_x86-64-local2/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 128
[ERROR] Workflow crashed - PID 8367 (EMCALRawToCellConverterSpec) did not exit correctly however it's not clear why. Exit code forced to 128.
[ERROR] Workflow crashed - PID 8592 (EMC-RawTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8582 (TRD-Tracking-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8629 (PHS-ClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8612 (ITS-ITSTrackTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8613 (MCH-QcTaskMCHDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8616 (MFT-MFTClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8627 (MID-QcTaskMIDTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8577 (TRD-PHTrackMatch-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8644 (TRD-Digits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8647 (ZDC-QcZDCRecTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8598 (FT0-DigitQcTaskFT0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8646 (TRD-Tracklets-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8621 (MID-QcTaskMIDDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8573 (MFT-MFTAsyncTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8620 (MID-QcTaskMIDClust-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8638 (TPC-Clusters-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8599 (FV0-DigitQcTaskFV0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8635 (TOF-MatchingTOFwTRD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8639 (TPC-Tracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8636 (TOF-TaskDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8602 (GLO-MTCITSTPC-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8609 (ITS-ITSClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8603 (GLO-Vertexing-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8649 (internal-dpl-injected-dummy-sink) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8536 (qc-task-TPC-Clusters) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8561 (qc-task-ZDC-QcZDCRecTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR]  - Device EMCALRawToCellConverterSpec: pid 8367 (exit 128)
[ERROR]  - Device qc-task-TPC-Clusters: pid 8536 (exit 137)
[ERROR]  - Device qc-task-ZDC-QcZDCRecTask: pid 8561 (exit 137)
[ERROR]  - Device MFT-MFTAsyncTask-proxy: pid 8573 (exit 137)
[ERROR]  - Device TRD-PHTrackMatch-proxy: pid 8577 (exit 137)
[ERROR]  - Device TRD-Tracking-proxy: pid 8582 (exit 137)
[ERROR]  - Device EMC-RawTask-proxy: pid 8592 (exit 137)
[ERROR]  - Device FT0-DigitQcTaskFT0-proxy: pid 8598 (exit 137)
[ERROR]  - Device FV0-DigitQcTaskFV0-proxy: pid 8599 (exit 137)
[ERROR]  - Device GLO-MTCITSTPC-proxy: pid 8602 (exit 137)
[ERROR]  - Device GLO-Vertexing-proxy: pid 8603 (exit 137)
[ERROR]  - Device ITS-ITSClusterTask-proxy: pid 8609 (exit 137)
[ERROR]  - Device ITS-ITSTrackTask-proxy: pid 8612 (exit 137)
[ERROR]  - Device MCH-QcTaskMCHDigits-proxy: pid 8613 (exit 137)
[ERROR]  - Device MFT-MFTClusterTask-proxy: pid 8616 (exit 137)
[ERROR]  - Device MID-QcTaskMIDClust-proxy: pid 8620 (exit 137)
[ERROR]  - Device MID-QcTaskMIDDigits-proxy: pid 8621 (exit 137)
[ERROR]  - Device MID-QcTaskMIDTracks-proxy: pid 8627 (exit 137)
[ERROR]  - Device PHS-ClusterTask-proxy: pid 8629 (exit 137)
[ERROR]  - Device TOF-MatchingTOFwTRD-proxy: pid 8635 (exit 137)
[0 more errors; see full log]

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Jan 7, 2026

Error while checking build/O2/fullCI_slc9 for 4ce2e90 at 2026-01-09 14:24:

## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc9_x86-64/O2/14910-slc9_x86-64-local5/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 128
[ERROR] Workflow crashed - PID 8379 (EMCALRawToCellConverterSpec) did not exit correctly however it's not clear why. Exit code forced to 128.
[ERROR] Workflow crashed - PID 8539 (PHS-ClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8542 (TPC-Clusters-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8546 (TRD-Tracklets-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8545 (TRD-Digits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8549 (ZDC-QcZDCRecTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8541 (TOF-TaskDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8551 (internal-dpl-injected-dummy-sink) was killed abnormally with Killed and exited code was set to 137.
[ERROR]  - Device EMCALRawToCellConverterSpec: pid 8379 (exit 128)
[ERROR]  - Device PHS-ClusterTask-proxy: pid 8539 (exit 137)
[ERROR]  - Device TOF-TaskDigits-proxy: pid 8541 (exit 137)
[ERROR]  - Device TPC-Clusters-proxy: pid 8542 (exit 137)
[ERROR]  - Device TRD-Digits-proxy: pid 8545 (exit 137)
[ERROR]  - Device TRD-Tracklets-proxy: pid 8546 (exit 137)
[ERROR]  - Device ZDC-QcZDCRecTask-proxy: pid 8549 (exit 137)
[ERROR]  - Device internal-dpl-injected-dummy-sink: pid 8551 (exit 137)
[ERROR] SEVERE: Device EMCALRawToCellConverterSpec (8379) returned with 128


## sw/BUILD/o2checkcode-latest/log
--
========== List of errors found ==========
++ GRERR=0
++ grep -v clang-diagnostic-error error-log.txt
++ grep ' error:'
++ GRERR=1
++ [[ 1 == 0 ]]
++ mkdir -p /sw/INSTALLROOT/946b400d5bc2f6a791cd5c4f4bf0df75f6d3eb63/slc9_x86-64/o2checkcode/1.0-local315/etc/modulefiles
++ cat
--

Full log here.

@alibuild
Copy link
Collaborator

alibuild commented Jan 9, 2026

Error while checking build/O2/fullCI_slc9 for 4b63d04 at 2026-01-12 18:50:

## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc9_x86-64/O2/14910-slc9_x86-64-local2/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 128
[ERROR] Workflow crashed - PID 8325 (EMCALRawToCellConverterSpec) did not exit correctly however it's not clear why. Exit code forced to 128.
[ERROR] Workflow crashed - PID 8828 (TRD-RawData-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8860 (FT0-DigitQcTaskFT0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8863 (FV0-DigitQcTaskFV0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8882 (ITS-ITSClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8858 (FDD-DigitQcTaskFDD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8807 (CPV-PhysicsOnEPNs-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8823 (TPC-PID-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8848 (EMC-RawTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8815 (MFT-MFTAsyncTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8813 (GLO-MUONTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8838 (TRD-Tracking-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8967 (TRD-Tracklets-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8884 (ITS-ITSTrackTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8953 (TOF-TaskDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8971 (ZDC-QcZDCRecTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8912 (MFT-MFTClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8825 (TRD-PHTrackMatch-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8663 (qc-task-MID-QcTaskMIDTracks) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8925 (MID-QcTaskMIDDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8949 (TOF-MatchingTOFwTRD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8878 (GLO-Vertexing-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8789 (qc-task-TRD-Tracklets) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8893 (MCH-QcTaskMCHDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8725 (qc-task-TRD-Digits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8871 (GLO-MTCITSTPC-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8847 (EMC-CellTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8653 (qc-task-MID-QcTaskMIDClust) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8960 (TPC-Clusters-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8654 (qc-task-MID-QcTaskMIDDigits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8649 (qc-task-ITS-ITSTrackTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8650 (qc-task-MCH-QcTaskMCHDigits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8940 (PHS-ClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8933 (MID-QcTaskMIDTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8922 (MID-QcTaskMIDClust-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8961 (TPC-Tracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8966 (TRD-Digits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8711 (qc-task-TPC-Clusters) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8972 (internal-dpl-injected-dummy-sink) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8686 (qc-task-PHS-ClusterTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8699 (qc-task-TOF-TaskDigits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8724 (qc-task-TPC-Tracks) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8694 (qc-task-TOF-MatchingTOFwTRD) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8800 (qc-task-ZDC-QcZDCRecTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8652 (qc-task-MFT-MFTClusterTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8637 (qc-task-FT0-DigitQcTaskFT0) was killed abnormally with Killed and exited code was set to 137.
[ERROR]  - Device EMCALRawToCellConverterSpec: pid 8325 (exit 128)
[0 more errors; see full log]

Full log here.

This anticipates the forwarding to the earliest possible moment, i.e. when
we are about to insert the messages in a slot. This is the earliest moment
we can guarantee messages will be seen only once.
@alibuild
Copy link
Collaborator

alibuild commented Jan 12, 2026

Error while checking build/O2/fullCI_slc9 for 7c245c7 at 2026-01-12 20:55:

## sw/BUILD/O2-full-system-test-latest/log
command /sw/slc9_x86-64/O2/14910-slc9_x86-64-local5/prodtests/full-system-test/dpl-workflow.sh had nonzero exit code 128
[ERROR] Workflow crashed - PID 8364 (EMCALRawToCellConverterSpec) did not exit correctly however it's not clear why. Exit code forced to 128.
[ERROR] Workflow crashed - PID 8493 (MFT-MFTAsyncTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8491 (CPV-PhysicsOnEPNs-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8505 (GLO-Vertexing-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8492 (GLO-MUONTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8520 (TPC-Tracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8511 (MID-QcTaskMIDClust-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8498 (EMC-CellTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8496 (TRD-RawData-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8521 (TRD-Digits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8514 (PHS-ClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8526 (ZDC-QcZDCRecTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8503 (FV0-DigitQcTaskFV0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8497 (TRD-Tracking-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8494 (TPC-PID-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8499 (EMC-RawTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8500 (FDD-DigitQcTaskFDD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8507 (ITS-ITSTrackTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8518 (TOF-TaskDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8513 (MID-QcTaskMIDTracks-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8515 (TOF-MatchingTOFwTRD-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8525 (TRD-Tracklets-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8502 (FT0-DigitQcTaskFT0-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8495 (TRD-PHTrackMatch-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8508 (MCH-QcTaskMCHDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8506 (ITS-ITSClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8512 (MID-QcTaskMIDDigits-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8519 (TPC-Clusters-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8509 (MFT-MFTClusterTask-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8504 (GLO-MTCITSTPC-proxy) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8528 (internal-dpl-injected-dummy-sink) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8489 (qc-task-TRD-Tracklets) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8488 (qc-task-TRD-Digits) was killed abnormally with Killed and exited code was set to 137.
[ERROR] Workflow crashed - PID 8490 (qc-task-ZDC-QcZDCRecTask) was killed abnormally with Killed and exited code was set to 137.
[ERROR]  - Device EMCALRawToCellConverterSpec: pid 8364 (exit 128)
[ERROR]  - Device qc-task-TRD-Digits: pid 8488 (exit 137)
[ERROR]  - Device qc-task-TRD-Tracklets: pid 8489 (exit 137)
[ERROR]  - Device qc-task-ZDC-QcZDCRecTask: pid 8490 (exit 137)
[ERROR]  - Device CPV-PhysicsOnEPNs-proxy: pid 8491 (exit 137)
[ERROR]  - Device GLO-MUONTracks-proxy: pid 8492 (exit 137)
[ERROR]  - Device MFT-MFTAsyncTask-proxy: pid 8493 (exit 137)
[ERROR]  - Device TPC-PID-proxy: pid 8494 (exit 137)
[ERROR]  - Device TRD-PHTrackMatch-proxy: pid 8495 (exit 137)
[ERROR]  - Device TRD-RawData-proxy: pid 8496 (exit 137)
[ERROR]  - Device TRD-Tracking-proxy: pid 8497 (exit 137)
[ERROR]  - Device EMC-CellTask-proxy: pid 8498 (exit 137)
[ERROR]  - Device EMC-RawTask-proxy: pid 8499 (exit 137)
[0 more errors; see full log]

Full log here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants