Skip to content

[BUG] Cute-dsl always use ptxas 12.9 to compile instead of newer version #2981

@tridao

Description

@tridao

Which component has the problem?

CuTe DSL

Bug Report

Describe the bug
Even in the newest environment (CTK 13.1, driver 590), cute-dsl still uses ptxas 12.9 to compile, leading to suboptimal code.
This is important as CTK 12.9 and 13.0 generates suboptimal SASS for MMA while CTK 13.1 might generate better SASS (#2408 (comment)).
Is there a way for users to force compilation with a newer version of ptxas?
Right now I'm guessing ptxas is embedded in _cutlass_ir.cpython-312-x86_64-linux-gnu.so, is there a way to pass the path to the system's ptxas?

I compile the blackwell gemm example (https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/blackwell/dense_gemm_persistent.py) with CUTE_DSL_KEEP_CUBIN=1 then look at the SASS, and it shows:

//--------------------- .note.nv.tkinfo           --------------------------
        .section        .note.nv.tkinfo,"",@"SHT_NOTE"
        .sectionflags   @"SHF_NOTE_NV_TKINFO"
        .tkinfo
        /*0018*/        .word   0x00000081
        /*0030*/        .string ""
        /*0030*/        .string "ptxas"
        /*0030*/        .string "Cuda compilation tools, release 12.9, V12.9.83"
        /*0030*/        .string "Build system must define TOOLS_VERSION_EXTENDED"
        /*0030*/        .string "-O 3 -arch sm_100a "

Steps/Code to reproduce bug

CUTE_DSL_KEEP_CUBIN=1 python dense_gemm_persistent.py --mma_tiler_mn 256,256 --cluster_shape_mn 2,1 --mnkl 8192,8192,8192,1 --use_tma_store --use_2cta_instrs --benchmark --warmup_iterations=1 --iterations=30 --ab_dtype BFloat16
nvdisasm cutlass_bmm___main__PersistentDenseGemmKernelobjectat_Tensorgmemoi64i641_Tensorgmemoi641i64_Tensorgmemoi64i641_74_CUstream0x0_functionrunlocalslambdaat.sm_100a.cubin > sm100_dense_gemm.sass
vim sm100_dense_gemm.sass

Environment details (please complete the following information):
Driver Version: 590.48.01
CUDA Version: 13.1
nvidia-cutlass-dsl: 4.3.5
nvcc: V13.1.80

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions