Thank you for your research contributions and for open-sourcing the code in this field. However, when we tried to run “python run.py --cfg configs/liger_gla.yaml” and “python run.py --cfg configs/liger_gsa.yaml” (https://huggingface.co/linear-moe-hub/Liger-GLA-8B), we encountered the following error for both:
“
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for Embedding:
size mismatch for weight: copying a param with shape torch.Size([128256, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).”
This seems to indicate a discrepancy between the model configuration provided by the project and the checkpoint weights. Could you please provide a solution?
Looking forward to your reply. Best wishes for your work!