Skip to content

Conversation

@imawby
Copy link
Contributor

@imawby imawby commented Dec 3, 2025

Hello hello,

In this PR I add the cluster splitting model training scripts to the LArMachineLearningData repository. There are two models used in the cluster splitting algorithm: 1) ContaminationModel, used to determine whether a cluster 'window' is shower-like, contaminated or not contaminated, and 2) SplitPointModel, used to classify each position in a window as a signal/background split point. The scripts should be ran in this order:

  1. Windows.ipynb: Used to slim/filter the DLThreeDClusterSplittingAlgorithm training tree.
  2. MergeFiles.ipynb: Used to combine the output files from Windows.ipynb.
  3. EncoderTraining.ipynb: Used to train the ContaminationModel.
  4. EncoderPerformance.ipynb: Used to obtain classification score/confusion matrices for a a specified ContaminationModel.
  5. EncoderDecoderTraining.ipynb: Used to train the SplitPointModel.
  6. EncoderDecoderPerformance.ipynb: Used to obtain classification score/confusion matrices for a a specified SplitPointModel.

(The training of the encoder and encoder-decoder is orthogonal, so steps 3/4 and 5/6 can be swapped).

The other files are:

  • Datasets.py: Contains the datasets used in the Encoder and EncoderDecoder trainings.
  • Models.py: Contains the ContaminationModel and SplitPointModel definitions.
  • TrainingMetrics.py: Contains functions used to draw training metric plots i.e. the confusion matrix and classification score plots.
  • Utilities.py: Contains the functions used to create the time-series windows.

Let me know if you have any questions!

Copy link
Collaborator

@AndyChappell AndyChappell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Isobel, thanks for the PR. Mostly formatting and stylistic requests, with a little refactoring and one more substantive point re garbage collection. Otherwise I think this is looking good.

Copy link
Collaborator

@AndyChappell AndyChappell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good, thanks Isobel.

@AndyChappell AndyChappell changed the base branch from master to feature/rsdc_milestone_1_2 January 8, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants