Post History

Current version by Nick Antonaccio

Current VersionApr 12, 2026 at 18:18

A few things about the basics of using Strix Halo:

  • Memory usage is shared between CPU and GPU. Depending on your system, you can set these settings in BIOS and/or in software that ships with your unit. I typically use the ASUS Armoury Crate software to set GPU usage to 96Gb (this shows up as ~108GB GPU VRAM in LM Studio).

  • MOST IMPORTANT: Use ROCm llama.cpp instead of Vulkan, if you have any problems with Vulkan. I've noticed that ROCm typically works much faster, can access more GPU memory, and has fewer issues in general. In LM Studio, click Settings -> Runtime -> download the ROCm runtime, and select it to be used for GGUF.

  • Reduce the default number of layers offloaded to GPU, if a model doesn't load (I tend to limit GPU use to 66Gb or less).

  • I've heard that disabling 'Try nmap()' is sometimes necessary if you have issues loading a model

  • I also updated drivers from https://www.amd.com/en/support/download/drivers.html (not sure if this is needed, but the system sent me a notification to do that)

Previous Versions
Version 2Apr 12, 2026 at 18:18

A few things about the basics of using Strix Halo:

  • Memory usage is shared between CPU and GPU. Depending on your system, you can set these settings in BIOS and/or in software that ships with your unit. I typically use the ASUS Armoury Crate software to set GPU usage to 96Gb (this shows up as ~108GB GPU VRAM in LM Studio).

  • Use ROCm llama.cpp instead of Vulkan. It works faster and can access more of the memory. In LM Studio, click Settings -> Runtime -> download the ROCm runtime, and select it to be used for GGUF.

  • Disable Try nmap() if you have issues loading a model

  • I also updated drivers from https://www.amd.com/en/support/download/drivers.html (not sure if this is needed, but the system sent me a notification to do that)

Version 1Apr 12, 2026 at 17:59

A few things about the basics of using Strix Halo:

  • Memory usage is shared between CPU and GPU. Depending on your system, you can set these settings in BIOS and/or in software that ships with your unit. I typically use the ASUS Armoury Crate software to set GPU usage to 96Gb (this shows up as ~108GB GPU VRAM in LM Studio).

  • Use ROCm llama.cpp instead of Vulkan. It works faster and can access more of the memory. In LM Studio, click Settings -> Runtime -> download the ROCm runtime, and select it to be used for GGUF.

  • Disable Try nmap() if you have issues loading a model