Skip to content

Conversation

@avazr
Copy link
Contributor

@avazr avazr commented Nov 18, 2025

Description
Enable distributed multi-node execution for LLaMA benchmarks in SuperBench using torchrun. Includes configuration for multiple nodes and GPUs, dynamic rendezvous setup, and documentation updates to guide users on running multi-node benchmarks.

Files Changed

  • docs/user-tutorial/benchmarks/model-benchmarks.md

    • Added a Multi-node LLaMA Benchmarks section.
    • Instructions for setting up multi-node experiments with torchrun.
    • Example YAML configuration showing node_num, proc_num, MASTER_ADDR, and MASTER_PORT.
    • Added prerequisites for passwordless SSH and NVIDIA IMEX service.
  • superbench/benchmarks/model_benchmarks/pytorch_base.py

    • Added detection of multi-node execution (_multi_node).
    • Initialized torch.distributed process group for multi-node training.
    • Added debug logging for rank, world_size, and master node info.
    • Support for both single-node and multi-node distributed execution using TCPStore.
  • superbench/runner/runner.py

    • Updated runner to pass --nnodes and --rdzv-endpoint arguments to torchrun if multi-node mode is enabled.
    • Added randomized rendezvous ID for each run.

@avazr avazr self-assigned this Nov 18, 2025
@avazr avazr requested a review from a team as a code owner November 18, 2025 22:14
@avazr avazr added documentation Improvements or additions to documentation benchmarks SuperBench Benchmarks runner SuperBench Runner configuration Benchmark configurations labels Nov 18, 2025
@avazr
Copy link
Contributor Author

avazr commented Nov 18, 2025

@microsoft-github-policy-service agree company="Microsoft"

@guoshzhao guoshzhao changed the title Add Multi-Node Support for LLaMA Benchmarks Benchmark: Model benchmark - Add Multi-Node Support for LLaMA Benchmarks Nov 19, 2025

## Multi-node LLaMA Benchmarks

SuperBench uses [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html) for multi-node LLaMA benchmarks based on PyTorch. Follow the steps below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you saying it is "for multi-node LLaMA benchmarks"? Looks the change should target for all models with the mode torch.distributed.

NCCL_SOCKET_IFNAME: 'eth0'
NCCL_IB_DISABLE: '1'
NCCL_IGNORE_DISABLED_P2P: '0'
MASTER_ADDR: '10.0.0.6' # Example of rank 0 node IP
Copy link
Contributor

@guoshzhao guoshzhao Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these two parameters MASTER_ADDR and MASTER_PORT optional? For example, give a default value for port, and choose the first node as the default address. The reason is that in some automated cases, we'd like to use the existing configuration file directly and don't want to change the configuration file for each run.

self._local_world_size = int(os.environ['LOCAL_WORLD_SIZE'])
self._multi_node = True if self._world_size != self._local_world_size else False

if self._multi_node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to distinguish between multi_node and single_node here? Previously the current implementation should work for multiple nodes, and it works for both of multiple nodes distributed benchmarking and single nodes (multiple cards) parallel benchmarking. Have you tried if the current implementation work?

Comment on lines +156 to +157
f'--nnodes={mode.node_num} --rdzv-endpoint=$MASTER_ADDR:$MASTER_PORT '
f'--rdzv-id={random.randint(100, 999)} --rdzv-backend=c10d ' if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happen if we use --nnodes=$NNODES --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT instead of "rdzv"-related parameters? Does it still work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks configuration Benchmark configurations documentation Improvements or additions to documentation runner SuperBench Runner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants