-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: Lanyu Liao <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #29796 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThese changes extend Python bindings for GenLlmReq and KVCacheManager C++ classes, add an environment variable to enable Python-based scheduling, and introduce a comprehensive Python scheduling framework with capacity and micro-batch scheduling policies as an alternative to C++ scheduler components. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Executor as Executor
participant Sched as SimpleUnifiedScheduler
participant Capacity as PyCapacityScheduler
participant MicroBatch as PyMicroBatchScheduler
participant KVCache as KVCacheManager
participant Policy as SchedulerPolicy
Executor->>Sched: schedule(pending_requests, running_requests, kv_cache_manager)
activate Sched
Sched->>Capacity: schedule(pending, running, kv_cache_manager)
activate Capacity
Capacity->>Policy: get_new_request_ids(pending)
activate Policy
Policy->>Capacity: filtered_request_ids
deactivate Policy
loop For each candidate request
Capacity->>KVCache: find_new_context_block(unique_tokens, request)
KVCache->>Capacity: context_block_info
Capacity->>Capacity: fit_request_to_blocks()
end
Capacity->>Sched: scheduled_requests, paused_requests
deactivate Capacity
Sched->>MicroBatch: schedule(scheduled_requests, kv_cache_manager)
activate MicroBatch
MicroBatch->>MicroBatch: compute_chunk_sizes()
rect rgb(200, 220, 255)
note right of MicroBatch: Encoder phase
MicroBatch->>KVCache: scheduling_has_free_blocks()
KVCache->>MicroBatch: has_free
end
rect rgb(220, 240, 220)
note right of MicroBatch: Context phase
MicroBatch->>MicroBatch: select_requests_for_context()
end
rect rgb(255, 240, 200)
note right of MicroBatch: Generation phase
MicroBatch->>MicroBatch: select_requests_for_generation()
end
MicroBatch->>Sched: SchedulerOutput (batches, tokens)
deactivate MicroBatch
Sched->>Executor: SchedulerOutput
deactivate Sched
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR_Github #30026 [ run ] triggered by Bot. Commit: |
|
PR_Github #30026 [ run ] completed with state |
|
@peaceh-nv is this related to what you are working on? |
|
@lancelly To speed up the reviewal, may you give more detail to the logic of your change. My first questions are:
|
@QiJune has a proposal for refactoring the scheduler: [RFC] Unified Python SPMD Scheduler for TRT-LLM. Our first step is to reimplement the C++ scheduler in Python. The goal of this PR is to be fully consistent with the C++ scheduler functionality (with code basically aligned line by line). Based on this PR, we will test whether the host overhead of the Python scheduler is acceptable (host overhead has been found to be a bottleneck under certain workloads). This PR will not affect the current main branch functionality, as the Python scheduler is disabled by default and can be enabled via an environment variable for testing purposes. @nvpohanh @eopXD |
Signed-off-by: Lance Liao <[email protected]>
Signed-off-by: Lance Liao <[email protected]>
Signed-off-by: Lance Liao <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #30697 [ run ] triggered by Bot. Commit: |
|
PR_Github #30697 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30750 [ run ] triggered by Bot. Commit: |
|
PR_Github #30750 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30797 [ run ] triggered by Bot. Commit: |
|
PR_Github #30797 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30950 [ run ] triggered by Bot. Commit: |
|
PR_Github #30950 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #31048 [ run ] triggered by Bot. Commit: |
|
PR_Github #31048 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #31078 [ run ] triggered by Bot. Commit: |
|
PR_Github #31078 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #31167 [ run ] triggered by Bot. Commit: |
|
PR_Github #31167 [ run ] completed with state
|
As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.
Summary by CodeRabbit
Release Notes
New Features
TLLM_USE_PYTHON_SCHEDULER=1environment variable) with policy-based scheduling strategies.Chores
✏️ Tip: You can customize this high-level summary in your review settings.