Skip to content

Conversation

@lancelly
Copy link
Collaborator

@lancelly lancelly commented Dec 24, 2025

As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced Python-based unified scheduler option (enabled via TLLM_USE_PYTHON_SCHEDULER=1 environment variable) with policy-based scheduling strategies.
    • Exposed new methods for querying unique tokens and encoder tokens.
    • Added cache block scheduling utilities for improved memory management.
  • Chores

    • Updated initialization to support Python scheduler configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

QiJune and others added 22 commits December 17, 2025 13:37
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
@lancelly lancelly requested review from a team as code owners December 24, 2025 09:20
@lancelly lancelly requested a review from HuiGao-NV December 24, 2025 09:20
@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@lancelly lancelly requested review from QiJune and litaotju December 24, 2025 09:22
@tensorrt-cicd
Copy link
Collaborator

PR_Github #29796 [ run ] triggered by Bot. Commit: 411c254

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

These changes extend Python bindings for GenLlmReq and KVCacheManager C++ classes, add an environment variable to enable Python-based scheduling, and introduce a comprehensive Python scheduling framework with capacity and micro-batch scheduling policies as an alternative to C++ scheduler components.

Changes

Cohort / File(s) Change Summary
C++ Pybind/Nanobind Bindings
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp, cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
Expose new GenLlmReq Python methods: get_unique_tokens(beam) and get_unique_tokens() overloads, plus get_encoder_unique_tokens() returning optional VecUniqueTokens. Adjust binding chain on use_draft_model to enable additional chained bindings.
KV Cache Manager Bindings
cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp, cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
Add Python bindings for find_new_context_block(unique_tokens, llm_request) on BaseKVCacheManager and scheduling_has_free_blocks(num_required, window_size) on KVCacheManager, delegating to underlying C++ implementations.
Scheduler Initialization & Configuration
tensorrt_llm/__init__.py, tensorrt_llm/_torch/pyexecutor/_util.py
Set TLLM_USE_PYTHON_SCHEDULER=1 environment variable on startup. Add conditional logic in create_py_executor_instance to select SimpleUnifiedScheduler when flag is enabled; otherwise retain existing C++ scheduler selection logic.
Python Scheduling Framework
tensorrt_llm/_torch/pyexecutor/scheduler.py
Introduce comprehensive Python-based scheduling system: PyCapacityScheduler (orchestrator with policy-based fitting), PyMicroBatchScheduler (encoder/context/generation batching), and SimpleUnifiedScheduler (composite runner). Add SchedulerPolicyBase with MaxRequestsPolicy, GuaranteedNoEvictPolicy, MaxUtilizationPolicy implementations; block-tracking managers; ChunkingPolicy enum; and state/prioritization logic mirroring C++ behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Executor as Executor
    participant Sched as SimpleUnifiedScheduler
    participant Capacity as PyCapacityScheduler
    participant MicroBatch as PyMicroBatchScheduler
    participant KVCache as KVCacheManager
    participant Policy as SchedulerPolicy

    Executor->>Sched: schedule(pending_requests, running_requests, kv_cache_manager)
    activate Sched
    
    Sched->>Capacity: schedule(pending, running, kv_cache_manager)
    activate Capacity
    
    Capacity->>Policy: get_new_request_ids(pending)
    activate Policy
    Policy->>Capacity: filtered_request_ids
    deactivate Policy
    
    loop For each candidate request
        Capacity->>KVCache: find_new_context_block(unique_tokens, request)
        KVCache->>Capacity: context_block_info
        Capacity->>Capacity: fit_request_to_blocks()
    end
    
    Capacity->>Sched: scheduled_requests, paused_requests
    deactivate Capacity
    
    Sched->>MicroBatch: schedule(scheduled_requests, kv_cache_manager)
    activate MicroBatch
    
    MicroBatch->>MicroBatch: compute_chunk_sizes()
    
    rect rgb(200, 220, 255)
        note right of MicroBatch: Encoder phase
        MicroBatch->>KVCache: scheduling_has_free_blocks()
        KVCache->>MicroBatch: has_free
    end
    
    rect rgb(220, 240, 220)
        note right of MicroBatch: Context phase
        MicroBatch->>MicroBatch: select_requests_for_context()
    end
    
    rect rgb(255, 240, 200)
        note right of MicroBatch: Generation phase
        MicroBatch->>MicroBatch: select_requests_for_generation()
    end
    
    MicroBatch->>Sched: SchedulerOutput (batches, tokens)
    deactivate MicroBatch
    
    Sched->>Executor: SchedulerOutput
    deactivate Sched
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and vague ('As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.'), missing required sections like Description, Test Coverage, and PR Checklist. Expand the description to explain the problem being solved, the solution approach, affected components, test coverage, and confirm completion of the PR checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: re-implementing MicroBatchScheduler and CapacityScheduler in Python, with the JIRA ticket properly referenced.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30026 [ run ] triggered by Bot. Commit: 4b65790

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30026 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23104 completed with status: 'SUCCESS'

@nvpohanh
Copy link
Collaborator

@peaceh-nv is this related to what you are working on?

@eopXD
Copy link
Collaborator

eopXD commented Dec 30, 2025

@lancelly To speed up the reviewal, may you give more detail to the logic of your change.

My first questions are:

  • Do we have functional change in this MR?
  • What is the motivation/problem of the change and how is it resolved?
  • What is the roadmap of re-implementation?

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 5, 2026

@lancelly To speed up the reviewal, may you give more detail to the logic of your change.

My first questions are:

  • Do we have functional change in this MR?
  • What is the motivation/problem of the change and how is it resolved?
  • What is the roadmap of re-implementation?

@QiJune has a proposal for refactoring the scheduler: [RFC] Unified Python SPMD Scheduler for TRT-LLM. Our first step is to reimplement the C++ scheduler in Python. The goal of this PR is to be fully consistent with the C++ scheduler functionality (with code basically aligned line by line). Based on this PR, we will test whether the host overhead of the Python scheduler is acceptable (host overhead has been found to be a bottleneck under certain workloads). This PR will not affect the current main branch functionality, as the Python scheduler is disabled by default and can be enabled via an environment variable for testing purposes. @nvpohanh @eopXD

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 6, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30697 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30697 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23686 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 6, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30750 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30750 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23734 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 7, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30797 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30797 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23779 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30950 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30950 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23914 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31048 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31048 [ run ] completed with state FAILURE. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #23989 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31078 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31078 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 AM PST on 1/8.

@lancelly
Copy link
Collaborator Author

lancelly commented Jan 9, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31167 [ run ] triggered by Bot. Commit: 6af3e00

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31167 [ run ] completed with state SUCCESS. Commit: 6af3e00
/LLM/main/L0_MergeRequest_PR pipeline #24082 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants