A few things about the basics of using Strix Halo:
-
Memory usage is shared between CPU and GPU. Depending on your system, you can set these settings in BIOS and/or in software that ships with your unit. I typically use the ASUS Armoury Crate software to set GPU usage to 96Gb (this shows up as ~108GB GPU VRAM in LM Studio).
-
MOST IMPORTANT: Use ROCm llama.cpp instead of Vulkan, if you have any problems with Vulkan. I've noticed that ROCm typically works much faster, can access more GPU memory, and has fewer issues in general. In LM Studio, click Settings -> Runtime -> download the ROCm runtime, and select it to be used for GGUF.
-
Reduce the default number of layers offloaded to GPU, if a model doesn't load (I tend to limit GPU use to 66Gb or less).
-
I've heard that disabling 'Try nmap()' is sometimes necessary if you have issues loading a model
-
I also updated drivers from https://www.amd.com/en/support/download/drivers.html (not sure if this is needed, but the system sent me a notification to do that)