[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273

lancelly · 2025-12-24T09:20:52Z

As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.

Summary by CodeRabbit

Release Notes

New Features
- Introduced Python-based unified scheduler option (enabled via TLLM_USE_PYTHON_SCHEDULER=1 environment variable) with policy-based scheduling strategies.
- Exposed new methods for querying unique tokens and encoder tokens.
- Added cache block scheduling utilities for improved memory management.
Chores
- Updated initialization to support Python scheduler configuration.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: Lanyu Liao <[email protected]>

lancelly · 2025-12-24T09:22:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-24T09:28:30Z

PR_Github #29796 [ run ] triggered by Bot. Commit: 411c254

coderabbitai · 2025-12-24T09:29:02Z

📝 Walkthrough

Walkthrough

These changes extend Python bindings for GenLlmReq and KVCacheManager C++ classes, add an environment variable to enable Python-based scheduling, and introduce a comprehensive Python scheduling framework with capacity and micro-batch scheduling policies as an alternative to C++ scheduler components.

Changes

Cohort / File(s)	Change Summary
C++ Pybind/Nanobind Bindings `cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp`	Expose new GenLlmReq Python methods: `get_unique_tokens(beam)` and `get_unique_tokens()` overloads, plus `get_encoder_unique_tokens()` returning optional VecUniqueTokens. Adjust binding chain on `use_draft_model` to enable additional chained bindings.
KV Cache Manager Bindings `cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp`	Add Python bindings for `find_new_context_block(unique_tokens, llm_request)` on BaseKVCacheManager and `scheduling_has_free_blocks(num_required, window_size)` on KVCacheManager, delegating to underlying C++ implementations.
Scheduler Initialization & Configuration `tensorrt_llm/__init__.py`, `tensorrt_llm/_torch/pyexecutor/_util.py`	Set `TLLM_USE_PYTHON_SCHEDULER=1` environment variable on startup. Add conditional logic in `create_py_executor_instance` to select SimpleUnifiedScheduler when flag is enabled; otherwise retain existing C++ scheduler selection logic.
Python Scheduling Framework `tensorrt_llm/_torch/pyexecutor/scheduler.py`	Introduce comprehensive Python-based scheduling system: PyCapacityScheduler (orchestrator with policy-based fitting), PyMicroBatchScheduler (encoder/context/generation batching), and SimpleUnifiedScheduler (composite runner). Add SchedulerPolicyBase with MaxRequestsPolicy, GuaranteedNoEvictPolicy, MaxUtilizationPolicy implementations; block-tracking managers; ChunkingPolicy enum; and state/prioritization logic mirroring C++ behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Executor as Executor
    participant Sched as SimpleUnifiedScheduler
    participant Capacity as PyCapacityScheduler
    participant MicroBatch as PyMicroBatchScheduler
    participant KVCache as KVCacheManager
    participant Policy as SchedulerPolicy

    Executor->>Sched: schedule(pending_requests, running_requests, kv_cache_manager)
    activate Sched
    
    Sched->>Capacity: schedule(pending, running, kv_cache_manager)
    activate Capacity
    
    Capacity->>Policy: get_new_request_ids(pending)
    activate Policy
    Policy->>Capacity: filtered_request_ids
    deactivate Policy
    
    loop For each candidate request
        Capacity->>KVCache: find_new_context_block(unique_tokens, request)
        KVCache->>Capacity: context_block_info
        Capacity->>Capacity: fit_request_to_blocks()
    end
    
    Capacity->>Sched: scheduled_requests, paused_requests
    deactivate Capacity
    
    Sched->>MicroBatch: schedule(scheduled_requests, kv_cache_manager)
    activate MicroBatch
    
    MicroBatch->>MicroBatch: compute_chunk_sizes()
    
    rect rgb(200, 220, 255)
        note right of MicroBatch: Encoder phase
        MicroBatch->>KVCache: scheduling_has_free_blocks()
        KVCache->>MicroBatch: has_free
    end
    
    rect rgb(220, 240, 220)
        note right of MicroBatch: Context phase
        MicroBatch->>MicroBatch: select_requests_for_context()
    end
    
    rect rgb(255, 240, 200)
        note right of MicroBatch: Generation phase
        MicroBatch->>MicroBatch: select_requests_for_generation()
    end
    
    MicroBatch->>Sched: SchedulerOutput (batches, tokens)
    deactivate MicroBatch
    
    Sched->>Executor: SchedulerOutput
    deactivate Sched

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is minimal and vague ('As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.'), missing required sections like Description, Test Coverage, and PR Checklist.	Expand the description to explain the problem being solved, the solution approach, affected components, test coverage, and confirm completion of the PR checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: re-implementing MicroBatchScheduler and CapacityScheduler in Python, with the JIRA ticket properly referenced.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2025-12-27T14:53:14Z

PR_Github #30026 [ run ] triggered by Bot. Commit: 4b65790

tensorrt-cicd · 2025-12-27T15:38:27Z

PR_Github #30026 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23104 completed with status: 'SUCCESS'

nvpohanh · 2025-12-30T13:38:50Z

@peaceh-nv is this related to what you are working on?

eopXD · 2025-12-30T13:48:59Z

@lancelly To speed up the reviewal, may you give more detail to the logic of your change.

My first questions are:

Do we have functional change in this MR?
What is the motivation/problem of the change and how is it resolved?
What is the roadmap of re-implementation?

lancelly · 2026-01-05T03:24:57Z

@lancelly To speed up the reviewal, may you give more detail to the logic of your change.

My first questions are:

Do we have functional change in this MR?

What is the motivation/problem of the change and how is it resolved?

What is the roadmap of re-implementation?

@QiJune has a proposal for refactoring the scheduler: [RFC] Unified Python SPMD Scheduler for TRT-LLM. Our first step is to reimplement the C++ scheduler in Python. The goal of this PR is to be fully consistent with the C++ scheduler functionality (with code basically aligned line by line). Based on this PR, we will test whether the host overhead of the Python scheduler is acceptable (host overhead has been found to be a bottleneck under certain workloads). This PR will not affect the current main branch functionality, as the Python scheduler is disabled by default and can be enabled via an environment variable for testing purposes. @nvpohanh @eopXD

Signed-off-by: Lance Liao <[email protected]>

lancelly · 2026-01-06T06:42:24Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-06T06:50:22Z

PR_Github #30697 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-06T14:10:03Z

PR_Github #30697 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23686 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2026-01-06T14:32:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-06T14:38:10Z

PR_Github #30750 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-06T22:20:36Z

PR_Github #30750 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23734 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2026-01-07T01:36:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-07T01:42:35Z

PR_Github #30797 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-07T19:45:59Z

PR_Github #30797 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23779 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2026-01-08T00:57:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-08T01:04:17Z

PR_Github #30950 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-08T09:19:16Z

PR_Github #30950 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23914 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2026-01-08T09:34:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-08T09:51:21Z

PR_Github #31048 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-08T13:57:03Z

PR_Github #31048 [ run ] completed with state FAILURE. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23989 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2026-01-08T14:29:24Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-08T14:35:09Z

PR_Github #31078 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-08T14:35:11Z

PR_Github #31078 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 AM PST on 1/8.

lancelly · 2026-01-09T02:32:14Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-09T02:38:10Z

PR_Github #31167 [ run ] triggered by Bot. Commit: 6af3e00

tensorrt-cicd · 2026-01-09T15:59:45Z

PR_Github #31167 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #24082 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

QiJune and others added 22 commits December 17, 2025 13:37

re-implement micro batch scheduler and capacity scheduler in python

64bce0f

Signed-off-by: junq <[email protected]>

refine

034fffb

Signed-off-by: junq <[email protected]>

enable SimpleUnifiedScheduler

927b417

Signed-off-by: junq <[email protected]>

fix

3609b20

Signed-off-by: junq <[email protected]>

fix

c901b21

Signed-off-by: junq <[email protected]>

fix

84cebc9

Signed-off-by: junq <[email protected]>

fix

490f8e9

Signed-off-by: junq <[email protected]>

fix

4e62403

Signed-off-by: junq <[email protected]>

fix

87caccb

Signed-off-by: junq <[email protected]>

fix

d1aebe7

Signed-off-by: junq <[email protected]>

fix

4d1f530

Signed-off-by: junq <[email protected]>

fix

641236d

Signed-off-by: junq <[email protected]>

enable py scheduler

162d59e

Signed-off-by: junq <[email protected]>

support bert

707fb4a

Signed-off-by: junq <[email protected]>

fix

fbc8486

Signed-off-by: junq <[email protected]>

fix

d344670

Signed-off-by: junq <[email protected]>

fix

6617a47

Signed-off-by: junq <[email protected]>

fix

63c09c6

Signed-off-by: junq <[email protected]>

fix gemma

2a3a7f2

Signed-off-by: junq <[email protected]>

fix lora

2f30b99

Signed-off-by: junq <[email protected]>

implement scheduler using python

411c254

Signed-off-by: Lanyu Liao <[email protected]>

lancelly requested review from a team as code owners December 24, 2025 09:20

lancelly requested a review from HuiGao-NV December 24, 2025 09:20

lancelly requested review from QiJune and litaotju December 24, 2025 09:22

lancelly added 3 commits January 5, 2026 22:00

add stats

d7fb900

Signed-off-by: Lance Liao <[email protected]>

optimize scheduler after profiling wiht line_profiler

f6cf566

Signed-off-by: Lance Liao <[email protected]>

enable py scheduler for ci

6af3e00

Signed-off-by: Lance Liao <[email protected]>

QiJune mentioned this pull request Jan 7, 2026

[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10080

Closed

1 task

[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273

Are you sure you want to change the base?

[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273

Conversation

lancelly commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

lancelly commented Dec 24, 2025

Uh oh!

tensorrt-cicd commented Dec 24, 2025

Uh oh!

coderabbitai bot commented Dec 24, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

nvpohanh commented Dec 30, 2025

Uh oh!

eopXD commented Dec 30, 2025

Uh oh!

lancelly commented Jan 5, 2026

Uh oh!

lancelly commented Jan 6, 2026

Uh oh!

tensorrt-cicd commented Jan 6, 2026

Uh oh!

tensorrt-cicd commented Jan 6, 2026

Uh oh!

lancelly commented Jan 6, 2026

Uh oh!

tensorrt-cicd commented Jan 6, 2026

Uh oh!

tensorrt-cicd commented Jan 6, 2026

Uh oh!

lancelly commented Jan 7, 2026

Uh oh!

tensorrt-cicd commented Jan 7, 2026

Uh oh!

tensorrt-cicd commented Jan 7, 2026

Uh oh!

lancelly commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

lancelly commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

lancelly commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

lancelly commented Jan 9, 2026

Uh oh!

tensorrt-cicd commented Jan 9, 2026

Uh oh!

tensorrt-cicd commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lancelly commented Dec 24, 2025 •

edited by coderabbitai bot

Loading