-
Notifications
You must be signed in to change notification settings - Fork 607
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Support user-defined op fusions
enhancement
New feature or request
#2597
opened Jan 14, 2026 by
timmoon10
Loading…
7 of 13 tasks
fix: enable opt for cutlass sources to avoid infinite compile time
build
Build system
#2595
opened Jan 14, 2026 by
kainzhong
Loading…
5 of 13 tasks
Initial commit to pass scale as Tensor for multi_tensor_scale op
#2594
opened Jan 13, 2026 by
vasunvidia
•
Draft
13 tasks
Revert adding pytorch-triton as a build requirement
2.12.0
#2592
opened Jan 13, 2026 by
tdophung
Loading…
5 of 13 tasks
[Draft] [JAX] Custom partitioning for Permutation primitives
MoE
#2591
opened Jan 13, 2026 by
tdophung
Loading…
13 tasks
(Bug fix) Fix accuracy issue for blockwise scaling+E8 scale on Blackwell
#2589
opened Jan 13, 2026 by
lhb8125
Loading…
13 tasks
[Common] MXFP8 kernel for grouped tensors
#2586
opened Jan 12, 2026 by
Oleg-Goncharov
•
Draft
13 tasks
[Common] Enable determinism for cuDNN >= 9.18 on Blackwell
2.12.0
#2584
opened Jan 12, 2026 by
cyanguwa
Loading…
8 of 13 tasks
Make router_fusion to adapt for the large num_of_expert(>2048)
#2582
opened Jan 9, 2026 by
Autumn1998
Loading…
13 tasks
fix(build): Handle namespace packages for PyPI CUDA detection
#2580
opened Jan 9, 2026 by
sbhavani
Loading…
6 of 13 tasks
fix(examples): te_llama compatibility with transformers >= 4.57
#2572
opened Jan 7, 2026 by
sbhavani
Loading…
6 of 13 tasks
[PyT] Update THD sink attention logic for cudnn >=9.18.0
2.12.0
#2568
opened Jan 6, 2026 by
cuichenx
Loading…
13 tasks
[NVFP4][Dense/MoE] Integrate Cutlass NVFP4 Row-Cast-Col-RHT-Transpose-Cast Fusion Kernel
fp4
MoE
#2555
opened Jan 3, 2026 by
zhongbozhu
Loading…
3 of 16 tasks
[Pytorch] Enhance bf16 precision optimizer performance with memory buffer
#2551
opened Dec 31, 2025 by
Baidu-AIAK
Loading…
[PyTorch] Remove unnecessary save of weights
#2549
opened Dec 30, 2025 by
pggPL
Loading…
8 of 13 tasks
[PyTorch]Add Casting-Free FP8-Flow-MoE Blockwise Optimizations
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2544
opened Dec 26, 2025 by
xiaoxi-wangfj
Loading…
4 of 13 tasks
[PyT] Plumbing correct bias dims from TE to cudnn
attention
bug
Something isn't working
pytorch
#2537
opened Dec 20, 2025 by
KshitijLakhani
Loading…
5 of 11 tasks
[DO NOT MERGE] Get seqlens and offsets in O(N) space instead of O(N*N) space
do not merge
#2530
opened Dec 17, 2025 by
KshitijLakhani
•
Draft
13 tasks
[JAX] Calculate seqlens and offsets in O(N) space instead of O(N*N) space for THD sequences
attention
#2522
opened Dec 16, 2025 by
KshitijLakhani
•
Draft
13 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.