-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hi, thank you for the great work and the open-sourced code.
I'm currently trying to reproduce your baseline results using an H20 GPU. However, I noticed that the original paper used an A100 GPU, and due to driver and compatibility constraints on my side, I’m using a slightly different version of PyTorch than the one specified in your environment.
The issue I'm encountering is that training proceeds normally for the first few hundred iterations, but then hangs (freezes) around iteration 1000–2000, without any error message or crash. The GPU usage flatlines, and no forward/backward steps are completed. I’ve verified that this is not due to out-of-memory issues or data loading bottlenecks.
I suspect this may be due to hardware or framework-level differences (e.g., CUDA kernel incompatibility between A100 and H20), but I wanted to check whether anyone else has encountered similar behavior or if you have any suggestions (e.g., known incompatibilities, or workaround options).
Here’s a brief summary of my setup:
GPU: NVIDIA H20
PyTorch: 2.6.0+cu122
CUDA: 12.2
Thanks in advance!