-
Notifications
You must be signed in to change notification settings - Fork 115
fix: add talos support #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I can independently confirm that this works on my Talos cluster. After installing with For repeatability, you will need a container image to be built. I have pushed one to asymingt/k8s-dra-driver-gpu. You will need to modify this line to To optionally rebuild the container image, install docker + qemu-binfmt + buildx, checkout this code and run: |
|
I believe this is set to be fixed on the Talos side, by them installing the nvidia stuff where this expects it to go, not the other way around |
|
While we wait for Talos to update its driver install location, I've been trying to get MPS working on Talos using this PR branch and the following helm values. gpuResourcesEnabledOverride: true
resources:
gpus:
enabled: true
computeDomains:
enabled: false
featureGates:
MPSSupport: trueLooks like the It's probably related to this issue: #469 I've opened a PR to fix it on your branch: hydazz#1 |
|
@hydazz given your comment about Talos adjusting themselves to accommodate the existing search paths, how would you propose moving forward with this PR? |
@klueska I don't have definitive knowledge, I just inferred that conclusion based on:
(I could not find such referenced discussion) siderolabs/extensions#836 I don't know if there is talks between nvidia/talos, or whats outside of linked above, but it could easily be fixed here, just with something better than Perhaps @frezbo would have more insight? |
| func getTalosLibrarySearchPaths() []string { | ||
| return []string{ | ||
| "/driver-root/usr/local/glibc/usr/lib", | ||
| "/driver-root/usr/local/glibc/lib", | ||
| "/driver-root/usr/local/glibc/lib64", | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar is this something we would want to add directly to nvcdi in the nvidia-container-toolkit as a standard search path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a problem in adding this to the toolkit. At the moment the defaults are defined at quite a low level (which is where @hydazz has added them in NVIDIA/nvidia-container-toolkit#1621) and we may want to consider making these easier to specify at a higher level.
It would be nice if these paths are supported on the nvidia side, we're (SideroLabs) is open to using better paths, but we have a constraint that it cannot be standard |
|
@klueska what do you think: can we still do something here for the next release? I feel like we should. But now it's rather tight again. |
Signed-off-by: hydazz <[email protected]>
Signed-off-by: hydazz <[email protected]>
|
opened PR in the toolkit to remove the overwrite here please review and lmk if any changes are needed, keen to jump onto the DRA train 🙂 |
|
@hydazz @jgehrcke if we're expecting the toolkit to be updated to include this change, I don't think that's something that can be done for the upcoming release. Would a middle ground be adding a configurable option (envvar / config file option) that allows a user to specify the paths in the container to be searched for libraries explicilty? This can be passed to the CDI library on construction and used in the prestart scripts. Note that although NVIDIA/nvidia-container-toolkit#1621 gets us some of the way there, we may need to also update the detection logic to also be able to locate |
This is a starter PR to add support for Talos OS's different nvidia paths.
Tested the gpu component with the changes here in my environment and it works.
Feedback is needed, as i'm unsure how to add the
usr/local/glibcpath to CDI nicely, I don't believegetTalosLibrarySearchPathswill cut it globally...