Test-Time Learning for Large Language Models

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan
_{South China University of Technology, Pazhou Laboratory, Zhejiang University, South China Agricultural University, Chongqing University of Posts and Telecommunications}

🔥News

2025-07-31: Update AdaptEval benchmark and models.
2025-05-27: We have released our paper on Arxiv.
2025-05-01: TLM is accepted by ICML2025.

🚀Quick Start

## clone our repo
git clone https://github.com/Fhujinwu/TLM.git
cd TLM
## install TLM environment
conda create --name tlm --yes python=3.10
conda activate tlm
pip install -e ".[torch,metrics]" --no-build-isolation

🗂 Benchmarks and models

Benchmarks：https://huggingface.co/datasets/Jinwu01/AdaptEval
Models: https://huggingface.co/Jinwu01/TLM

🔨 Training

All datasets and their contents from AdaptEval are defined in the dataset_info.json file included in this repository. You only need to specify the desired dataset in your configuration file to use it.

For example, to adapt to the geography dataset:

For offline test-time learning, you can start training with the following command:

CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/offline_ttl.yaml

For online test-time learning, use:

CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/online_ttl.yaml

The offline_ttl.yaml and online_ttl.yaml files provide example configurations for fine-tuning with test-time learning. These configurations specify parameters about model, fine-tuning method, dataset, TTL method and so on. Please customize these files according to your own requirements.

⚖️ Evaluation

After running the above training commands, you will obtain the model inference results in the specified output_dir. You can then evaluate these results.

First, install the required dependencies:

pip install rouge_score rouge-chinese bert_score git+https://github.com/google-research/bleurt.git

All evaluation-related scripts are located in the scripts/eval folder:

For datasets in DomainBench and InstructionBench, copy the path to your model inference results into eval_simility.py and run the script.
For datasets in ReasoningBench, copy the path to your model inference results into eval_accuracy.py and run the script.

💬 Citation

Thanks for the open-source code of LLaMA-Factory

If you find our work interesting and meaningful, welcome to give a 🌟 to our repo and cite our paper.

@inproceedings{hutest,
  title={Test-Time Learning for Large Language Models},
  author={Hu, Jinwu and Zhang, Zitian and Chen, Guohao and Wen, Xutao and Shuai, Chao and Luo, Wei and Xiao, Bin and Li, Yuanqing and Tan, Mingkui},
  booktitle={Forty-second International Conference on Machine Learning}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
docker		docker
evaluation		evaluation
examples		examples
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.local		.env.local
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Case_Agriculture.png		Case_Agriculture.png
Case_Alpaca-GPT4.png		Case_Alpaca-GPT4.png
Case_Dolly.png		Case_Dolly.png
Case_Finance.png		Case_Finance.png
Case_GSM8K.png		Case_GSM8K.png
Case_Geography.png		Case_Geography.png
Case_InstructionWild.png		Case_InstructionWild.png
Case_Logiqa.png		Case_Logiqa.png
Case_Medicine.png		Case_Medicine.png
Case_MetaMath.png		Case_MetaMath.png
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Test-Time Learning for Large Language Models

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan
_{South China University of Technology, Pazhou Laboratory, Zhejiang University, South China Agricultural University, Chongqing University of Posts and Telecommunications}

🔥News

🚀Quick Start

🗂 Benchmarks and models

🔨 Training

⚖️ Evaluation

💬 Citation

Star History

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Fhujinwu/TLM

Folders and files

Latest commit

History

Repository files navigation

Test-Time Learning for Large Language Models

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan South China University of Technology, Pazhou Laboratory, Zhejiang University, South China Agricultural University, Chongqing University of Posts and Telecommunications

🔥News

🚀Quick Start

🗂 Benchmarks and models

🔨 Training

⚖️ Evaluation

💬 Citation

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Jinwu Hu, Zitian Zhang, Guohao Chen, Xutao Wen, Chao Shuai, Wei Luo, Bin Xiao, Yuanqing Li, Mingkui Tan
_{South China University of Technology, Pazhou Laboratory, Zhejiang University, South China Agricultural University, Chongqing University of Posts and Telecommunications}

Packages