ROCM 7.0.1 build errors

**Describe the Issue**
Unable to compile with ROCm 7.0.1 on Ubuntu 24.04.03 - looks a bit like #136 , though this is a released version, not a release candidate.

```
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/hip_assert.h:29:6: error: declaration of 'abort' has a different language linkage
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/hip_assert.h:46:6: error: redefinition of '__assert_fail'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/hip_assert.h:88:6: error: redefinition of '__assertfail'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:496:12: error: no template named 'conditional' in namespace '__hip_internal'; did you mean 'rocwmma::conditional'?
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:498:16: error: no template named 'conditional' in namespace '__hip_internal'; did you mean 'rocwmma::conditional'?
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:817:36: error: unknown type name '__hip_bfloat16_raw'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:824:9: error: no viable conversion from '__hip_bfloat16' to 'float'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:843:39: error: unknown type name '__hip_bfloat162_raw'; did you mean '__hip_bfloat162'?
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:850:10: error: no viable conversion from '__hip_bfloat162' to 'float2' (aka 'HIP_vector_type<float, 2>')
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1052:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1089:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1610:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1660:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2167:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2205:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2729:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2779:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
ggml/src/ggml-cuda/fattn-wmma-f16.cu:418:45: error: call to '__hadd' is ambiguous
ggml/src/ggml-cuda/fattn-wmma-f16.cu:418:45: error: call to '__hadd' is ambiguous
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1100.

```

**Additional Information:**
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)

```
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.3 LTS"
```

```
$ head /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 33
model name	: AMD Ryzen 9 5900XT 16-Core Processor
stepping	: 2
microcode	: 0xa201213
cpu MHz		: 2188.078
cache size	: 512 KB
physical id	: 0
```

```
$ rocm-smi --showhw
====================================== ROCm System Management Interface ======================================
=========================================== Concise Hardware Info ============================================
GPU  NODE  DID     GUID   GFX VER  GFX RAS  SDMA RAS  UMC RAS  VBIOS             BUS           PARTITION ID  
0    1     0x744c  10429  gfx1100  N/A      N/A       N/A      113-D70401XT-P11  0000:2D:00.0  0             
==============================================================================================================
============================================ End of ROCm SMI Log =============================================

```

```
$ git log | head -5
commit 03eabefd615ad23d65924eb2e341f3ed61f02c1f
Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Date:   Wed Aug 27 12:01:38 2025 -0500

    workflow
```

Installed ROCm 7.0.1 for Ubuntu 24.04 as described at https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html

<details><summary>Build Log</summary>
<code>
$ make LLAMA_HIPBLAS=1
I koboldcpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I UNAME_O:  GNU/Linux
I CFLAGS:   -I. -Iggml/include -Iggml/src -Iggml/src/ggml-cpu -Iinclude -Isrc -I./common -I./vendor -I./vendor/stb -I./include -I./include/CL -I./otherarch -I./otherarch/tools -I./otherarch/sdcpp -I./otherarch/ttscpp/include -I./otherarch/ttscpp/src -I./otherarch/sdcpp/thirdparty -I./include/vulkan -O3 -fno-finite-math-only -std=c11 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -DGGML_USE_CPU -DGGML_USE_CPU_REPACK -DNDEBUG -s -DGGML_USE_LLAMAFILE -pthread -Wno-deprecated -Wno-deprecated-declarations -Wno-unused-variable -pthread -march=native -mtune=native
I CXXFLAGS: -I. -Iggml/include -Iggml/src -Iggml/src/ggml-cpu -Iinclude -Isrc -I./common -I./vendor -I./vendor/stb -I./include -I./include/CL -I./otherarch -I./otherarch/tools -I./otherarch/sdcpp -I./otherarch/ttscpp/include -I./otherarch/ttscpp/src -I./otherarch/sdcpp/thirdparty -I./include/vulkan -O3 -fno-finite-math-only -std=c++17 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -DGGML_USE_CPU -DGGML_USE_CPU_REPACK -DNDEBUG -s -DGGML_USE_LLAMAFILE -pthread -Wno-multichar -Wno-write-strings -Wno-deprecated -Wno-deprecated-declarations -Wno-unused-variable -pthread
I LDFLAGS:  -ldl
I CC:       cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
I CXX:      g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
I HIP CC:       AMD clang version 20.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.0.1 25314 f4087f6b428f0e6f575ebac8a8a724dab123d06e)
I HIP CXX:      AMD clang version 20.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.0.1 25314 f4087f6b428f0e6f575ebac8a8a724dab123d06e)

g++ -I. -Iggml/include -Iggml/src -Iggml/src/ggml-cpu -Iinclude -Isrc -I./common -I./vendor -I./vendor/stb -I./include -I./include/CL -I./otherarch -I./otherarch/tools -I./otherarch/sdcpp -I./otherarch/ttscpp/include -I./otherarch/ttscpp/src -I./otherarch/sdcpp/thirdparty -I./include/vulkan -O3 -fno-finite-math-only -std=c++17 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -DGGML_USE_CPU -DGGML_USE_CPU_REPACK -DNDEBUG -s -DGGML_USE_LLAMAFILE -pthread -Wno-multichar -Wno-write-strings -Wno-deprecated -Wno-deprecated-declarations -Wno-unused-variable -pthread ggml.o ggml-cpu.o ggml-ops.o ggml-vec.o ggml-binops.o ggml-unops.o ggml_v3.o ggml_v2.o ggml_v1.o expose.o gpttype_adapter.o sdcpp_default.o whispercpp_default.o tts_default.o embeddings_default.o llavaclip_default.o llava.o ggml-backend_default.o ggml-backend-reg_default.o ggml-repack.o ggml-alloc.o ggml-cpu-traits.o ggml-quants.o ggml-cpu-quants.o kcpp-quantmapper.o kcpp-repackmapper.o unicode.o unicode-data.o ggml-threading.o ggml-cpu-cpp.o gguf.o sgemm.o common.o sampling.o kcpputils.o mtmdaudio.o -shared -o koboldcpp_default.so -ldl
/opt/rocm/llvm/bin/clang++ -I. -Iggml/include -Iggml/src -Iinclude -Isrc -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -I./otherarch/sdcpp -I./otherarch/sdcpp/thirdparty -I./include/vulkan -O3 -fno-finite-math-only  -DNDEBUG -std=c++17 -fPIC -DLOG_DISABLE_LOGS -D_GNU_SOURCE -DGGML_USE_LLAMAFILE -DGGML_HIP_ROCWMMA_FATTN -I/opt/rocm/include/rocwmma/ -DGGML_USE_HIPBLAS -DGGML_USE_HIP -DGGML_HIP_NO_VMM -DGGML_USE_CUDA -DSD_USE_CUDA -DSD_USE_CUBLAS  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-7.0.1/include -I/include  --offload-arch=gfx1100 -DGGML_CUDA_DMMV_X=32  -DGGML_CUDA_MMV_Y=2 -DK_QUANTS_PER_ITERATION=2 -x hip -c -o ggml/src/ggml-cuda/fattn-wmma-f16.o ggml/src/ggml-cuda/fattn-wmma-f16.cu
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:77:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/hip_assert.h:29:6: error: declaration of 'abort' has a different language linkage
   29 | void abort() {
      |      ^
/usr/include/hip/amd_detail/amd_device_functions.h:805:6: note: previous definition is here
  805 | void abort() {
      |      ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:77:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/hip_assert.h:46:6: error: redefinition of '__assert_fail'
   46 | void __assert_fail(const char *assertion,
      |      ^
/usr/include/hip/amd_detail/amd_device_functions.h:822:6: note: previous definition is here
  822 | void __assert_fail(const char *assertion,
      |      ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:77:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/hip_assert.h:88:6: error: redefinition of '__assertfail'
   88 | void __assertfail()
      |      ^
/usr/include/hip/amd_detail/amd_device_functions.h:864:6: note: previous definition is here
  864 | void __assertfail()
      |      ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:496:12: error: no template named 'conditional' in namespace '__hip_internal'; did you mean 'rocwmma::conditional'?
  496 |   typename __hip_internal::conditional<
      |            ^~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../utility/type_traits.hpp:109:16: note: 'rocwmma::conditional' declared here
  109 |     using std::conditional;
      |                ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:498:16: error: no template named 'conditional' in namespace '__hip_internal'; did you mean 'rocwmma::conditional'?
  498 |       typename __hip_internal::conditional<sizeof(T) == 4, unsigned int,
      |                ^~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../utility/type_traits.hpp:109:16: note: 'rocwmma::conditional' declared here
  109 |     using std::conditional;
      |                ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:817:36: error: unknown type name '__hip_bfloat16_raw'
  817 | __hip_cvt_bfloat16raw_to_fp8(const __hip_bfloat16_raw hr, const __hip_saturation_t sat,
      |                                    ^
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:824:9: error: no viable conversion from '__hip_bfloat16' to 'float'
  824 |   float fval = __hip_bfloat16(hr);
      |         ^      ~~~~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:843:39: error: unknown type name '__hip_bfloat162_raw'; did you mean '__hip_bfloat162'?
  843 | __hip_cvt_bfloat16raw2_to_fp8x2(const __hip_bfloat162_raw hr, const __hip_saturation_t sat,
      |                                       ^
/usr/include/hip/amd_detail/amd_hip_bf16.h:113:8: note: '__hip_bfloat162' declared here
  113 | struct __hip_bfloat162 {
      |        ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:850:10: error: no viable conversion from '__hip_bfloat162' to 'float2' (aka 'HIP_vector_type<float, 2>')
  850 |   float2 f2 = __hip_bfloat162(hr);
      |          ^    ~~~~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_vector_types.h:470:9: note: candidate constructor not viable: no known conversion from '__hip_bfloat162' to 'const HIP_vector_type<float, 2> &' for 1st argument
  470 |         HIP_vector_type(const HIP_vector_type&) = default;
      |         ^               ~~~~~~~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_vector_types.h:473:9: note: candidate constructor not viable: no known conversion from '__hip_bfloat162' to 'HIP_vector_type<float, 2> &&' for 1st argument
  473 |         HIP_vector_type(HIP_vector_type&&) = default;
      |         ^               ~~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_vector_types.h:465:9: note: candidate template ignored: requirement 'sizeof...(Us) == 2U' was not satisfied [with Us = <__hip_bfloat162>]
  465 |         HIP_vector_type(Us... xs) noexcept
      |         ^
/usr/include/hip/amd_detail/amd_hip_vector_types.h:456:9: note: explicit constructor is not a candidate
  456 |         HIP_vector_type(U x_) noexcept
      |         ^
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1052:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
 1052 |       : __x(__hip_cvt_float_to_fp8(static_cast<float>(f), __default_saturation,
      |                                    ^~~~~~~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1089:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
 1089 |     return __hip_bfloat16(f);
      |            ^~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'float' to 'const __hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'float' to '__hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 1 was provided
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1610:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
 1610 |       : __x(__hip_cvt_float_to_fp8(static_cast<float>(f), __default_saturation,
      |                                    ^~~~~~~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:1660:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
 1660 |     return __hip_bfloat16(f);
      |            ^~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'float' to 'const __hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'float' to '__hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 1 was provided
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2167:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
 2167 |       : __x(__hip_cvt_float_to_fp8(static_cast<float>(f), __default_saturation,
      |                                    ^~~~~~~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2205:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
 2205 |     return __hip_bfloat16(f);
      |            ^~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'float' to 'const __hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'float' to '__hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 1 was provided
In file included from ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:31:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/fragment_traits_impl.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/layout.hpp:30:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../mapping_util.hpp:29:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../types.hpp:88:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/rocwmma/internal/layout/../float8.hpp:118:
In file included from /opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/hip_fp8.h:30:
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2729:36: error: cannot convert 'const __hip_bfloat16' to 'float' without a conversion operator
 2729 |       : __x(__hip_cvt_float_to_fp8(static_cast<float>(f), __default_saturation,
      |                                    ^~~~~~~~~~~~~~~~~~~~~
/opt/rocm-7.0.1/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp8.h:2779:12: error: no matching conversion for functional-style cast from 'float' to '__hip_bfloat16'
 2779 |     return __hip_bfloat16(f);
      |            ^~~~~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'float' to 'const __hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'float' to '__hip_bfloat16' for 1st argument
  108 | struct __hip_bfloat16 {
      |        ^~~~~~~~~~~~~~
/usr/include/hip/amd_detail/amd_hip_bf16.h:108:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 1 was provided
ggml/src/ggml-cuda/fattn-wmma-f16.cu:418:45: error: call to '__hadd' is ambiguous
  418 |                 KQ_rowsum_h2[j0/nwarps].x = __hadd(KQ_rowsum_h2[j0/nwarps].x, val);
      |                                             ^~~~~~
ggml/src/ggml-cuda/fattn-wmma-f16.cu:526:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  526 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
ggml/src/ggml-cuda/fattn-wmma-f16.cu:548:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  548 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/usr/include/hip/amd_detail/amd_device_functions.h:257:39: note: candidate function
  257 | __device__ static inline unsigned int __hadd(int x, int y) {
      |                                       ^
/usr/include/hip/amd_detail/amd_hip_fp16.h:1368:20: note: candidate function
 1368 |             __half __hadd(__half x, __half y)
      |                    ^
ggml/src/ggml-cuda/fattn-wmma-f16.cu:418:45: error: call to '__hadd' is ambiguous
  418 |                 KQ_rowsum_h2[j0/nwarps].x = __hadd(KQ_rowsum_h2[j0/nwarps].x, val);
      |                                             ^~~~~~
ggml/src/ggml-cuda/fattn-wmma-f16.cu:530:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, true>' requested here
  530 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
ggml/src/ggml-cuda/fattn-wmma-f16.cu:548:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  548 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/usr/include/hip/amd_detail/amd_device_functions.h:257:39: note: candidate function
  257 | __device__ static inline unsigned int __hadd(int x, int y) {
      |                                       ^
/usr/include/hip/amd_detail/amd_hip_fp16.h:1368:20: note: candidate function
 1368 |             __half __hadd(__half x, __half y)
      |                    ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx1100.
make: *** [Makefile:303: ggml/src/ggml-cuda/fattn-wmma-f16.o] Error 1


</code>
</details>

<details><summary>rocminfo</summary>
<code>

ROCk module version 6.14.14 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.18
Runtime Ext Version:     1.11
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
XNACK enabled:           NO
DMAbuf Support:          YES
VMM Support:             YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 5900XT 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 5900XT 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3300                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    63664968(0x3cb7348) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    63664968(0x3cb7348) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    63664968(0x3cb7348) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 4                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    63664968(0x3cb7348) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-d8607413b0e3a90b               
  Marketing Name:          Radeon RX 7900 XT                  
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      81920(0x14000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2075                               
  BDFID:                   11520                              
  Internal Node ID:        1                                  
  Compute Unit:            84                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        2147483647(0x7fffffff)             
    y                        65535(0xffff)                      
    z                        65535(0xffff)                      
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 542                                
  SDMA engine uCode::      24                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    20955136(0x13fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    20955136(0x13fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
    ISA 2                    
      Name:                    amdgcn-amd-amdhsa--gfx11-generic   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)             
        y                        65535(0xffff)                      
        z                        65535(0xffff)                      
      FBarrier Max Size:       32                                 
*** Done ***             


</code>
</details> 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCM 7.0.1 build errors #141

ROCk module version 6.14.14 is loaded

HSA System Attributes

==========
HSA Agents

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ROCM 7.0.1 build errors #141

Description

ROCk module version 6.14.14 is loaded

HSA System Attributes

========== HSA Agents

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

==========
HSA Agents