-
Notifications
You must be signed in to change notification settings - Fork 24
Description
It would be ideal for PyEarthTools to support generation of standardised metrics scorecards for both training and evaluation. It would be very helpful to have a basket of metrics generated per-epoch or per-n-training-steps so that the effect of training on verification function performance could be visualised and understood. Such a scorecard approach might also help point the way towards multi-objective loss functions in future.
There are a number of questions to be dealt with here, including:
- Setting appropriate comparison benchmarks, like persistence, climatology, and a comparison model (if available)
- What metrics to include by default for various model types
- Whether to generate just data for each epoch, or whether to generate an artefact such as images, PDF, html etc for posterity
It would be useful to identify some of the most common headline metrics typically used, and the most common stratifications used (e.g. performance in particular geographic regions, etc) and write that up as a scorecard spec, which we can then implement and hook into the training cycle.
If anyone has any opinions, please feel free to add them to this issue.