The pre-trained weight file (checkpoint) does not match the model architecture defined by your current code.

Thank you for your research contributions and for open-sourcing the code in this field. However, when we tried to run “python run.py --cfg configs/liger_gla.yaml” and “python run.py --cfg configs/liger_gsa.yaml” （https://huggingface.co/linear-moe-hub/Liger-GLA-8B）, we encountered the following error for both:
“
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for Embedding:
        size mismatch for weight: copying a param with shape torch.Size([128256, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).”

This seems to indicate a discrepancy between the model configuration provided by the project and the checkpoint weights. Could you please provide a solution?

Looking forward to your reply. Best wishes for your work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The pre-trained weight file (checkpoint) does not match the model architecture defined by your current code. #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The pre-trained weight file (checkpoint) does not match the model architecture defined by your current code. #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions