Skip to content

Fhujinwu/TLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Test-Time Learning for Large Language Models

Static Badge Static Badge Static Badge

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan
South China University of Technology, Pazhou Laboratory, Zhejiang University, South China Agricultural University, Chongqing University of Posts and Telecommunications

🔥News

  • 2025-07-31: Update AdaptEval benchmark and models.
  • 2025-05-27: We have released our paper on Arxiv.
  • 2025-05-01: TLM is accepted by ICML2025.

🚀Quick Start

## clone our repo
git clone https://github.com/Fhujinwu/TLM.git
cd TLM
## install TLM environment
conda create --name tlm --yes python=3.10
conda activate tlm
pip install -e ".[torch,metrics]" --no-build-isolation

🗂 Benchmarks and models

🔨 Training

All datasets and their contents from AdaptEval are defined in the dataset_info.json file included in this repository. You only need to specify the desired dataset in your configuration file to use it.

For example, to adapt to the geography dataset:

  • For offline test-time learning, you can start training with the following command:
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/offline_ttl.yaml
  • For online test-time learning, use:
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/online_ttl.yaml

The offline_ttl.yaml and online_ttl.yaml files provide example configurations for fine-tuning with test-time learning. These configurations specify parameters about model, fine-tuning method, dataset, TTL method and so on. Please customize these files according to your own requirements.

⚖️ Evaluation

After running the above training commands, you will obtain the model inference results in the specified output_dir. You can then evaluate these results.

First, install the required dependencies:

pip install rouge_score rouge-chinese bert_score git+https://github.com/google-research/bleurt.git

All evaluation-related scripts are located in the scripts/eval folder:

  • For datasets in DomainBench and InstructionBench, copy the path to your model inference results into eval_simility.py and run the script.
  • For datasets in ReasoningBench, copy the path to your model inference results into eval_accuracy.py and run the script.

💬 Citation

Thanks for the open-source code of LLaMA-Factory

If you find our work interesting and meaningful, welcome to give a 🌟 to our repo and cite our paper.

@inproceedings{hutest,
  title={Test-Time Learning for Large Language Models},
  author={Hu, Jinwu and Zhang, Zitian and Chen, Guohao and Wen, Xutao and Shuai, Chao and Luo, Wei and Xiao, Bin and Li, Yuanqing and Tan, Mingkui},
  booktitle={Forty-second International Conference on Machine Learning}
}

Star History

Star History Chart

About

[ICML2025] Test-Time Learning for Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages