Comparing Strix Halo vs DGX Spark

43 views Pinned
Nick Antonaccio
Nick AntonaccioAdmin
May 09, 2026 at 12:16 (edited, 3 revisions)
#1

TL;DR: Apple Mac laptops cost at least twice as much as Strix Halo laptops, for the same 128Gb shared memory, so the price/perfomance of the ASUS ROG Flow Z13 is hard to beat. But if you're building a stationary self-hosted GPU server for long context agentic workflows, the prefill (prompt processing) speed of the Nvidia GB10 DGX Spark machines, the full CUDA stack, and the ability to cluster multiple machines easily, is absolutely worth the extra $1000 (for my needs).

This week, I bought an ASUS Ascent GX10, which has the exact same NVIDIA GB10 chip as the Nvidia DGX Spark (with128GB LPDDR5x). I chose the 1TB PCIe Gen4 NVMe SSD model, which cost $3,498.99 on Amazon. That's about $1000 more than my ASUS ROG Flow Z13 (Strix Halo) laptops.

I've been testing deeply, and will soon post lots of specific numbers for all the different models and quantizations which work best with it, but for the moment, the TL;DR is that it edges out the Strix Halo in most performance tests, but not by a tremendous margin for token generation.

The main benefits out of the box, compared to the Strix Halo machines, are:

  • By far, the biggest difference is that prefill computation is dramatically faster on the GB10 processor than the Strix Halo, so processing input prompts before outputting inference results moves noticeably quicker (it's blindingly fast on the GX10). You'll see a big performance improvement when computing huge prompt contexts, which can grow gigantic during long agentic sessions. That prefill speed makes an even bigger difference when you run an agent harness that has a large default prompt. This is true despite the fact that Strix Halo is reported to be about 50% faster at prompt processing than the M4 Max (those Macs must really suffer when processing very large agentic contexts, processing large input documents, etc.).

  • You can run slightly bigger quants at the same speed, and get a little better inference output speed for the same size model, on the DGX Spark architecture. The general ballpark is that something in the order of, for example, a Q4 or Q5 quant of a 100B model on the Strix Halo will run at approximately the same speed as a Q6 quant of the same model on the DGX Spark. This can push some of the 100GB+ category models from being too sluggish, to being acceptably usable.

  • Most models run well on Strix Halo, but certain video, image, and other specialized model categories are optimized to run on the Nvidia CUDA stack, and some require CUDA outright. This isn't the case for most Transformer based LLMs, but if CUDA is required for your intended use case, the DGX Spark is the winner (Vulcan and ROCm runtimes continue to improve for the Strix Halo, so be aware that reviews about Strix Halo usability from last year may be outdated).

  • The DGX Spark platform includes 2 two QSFP ports driven by NVIDIA ConnectX-7 network interfaces. This is really a killer built-in feature, and one of the reasons I'm starting to invest in this Nvidia hardware platform. Just buy an $80 QSFP cable, and you can cluster 2 DGX Spark machines, to run models that are twice the size. This pushes the DGX Spark platform into a much higher potential echelon of usefulness, without much technical trouble. With a bunch of clustered DGX Spark machines, you can even run trillion parameter class models like Kimi, GLM, Deepseek, etc.

All this said, there are still some great benefits to the ASUS ROG Flow Z13 Laptop, which I absolutely love. First, it's a portable laptop, which means not only can you use it for inference while traveling or away from Internet connectivity, you also get the monitor and everything needed to run, for about $1000 less than the Ascent GX10 ($2,572.59 vs $3,498.99).

Another really big difference to be keenly aware of, is that the Ascent GX10 runs on an ARM CPU and comes with an Nvidia modded version of Ubuntu. So far, I haven't had any problems getting all the software required to run everything I need for the Arm CPU, but there are definitely some limitations. Chrome is not yet available for Ubuntu on Arm, for example. I was able to install Chromium and copy bookmarks, so that's not been a problem. I'm waiting to see how well Nvidia supports this OS in the future - I've read that they've had some issues providing ongoing support for some of their previous OSs.

I love Ubuntu and Linux in general, and every software I want to use (including LM Studio, Rustdesk for desktop sharing (instead of Chrome remote desktop), etc.) has worked out fine on the GX10, so this isn't an issue for me, but if you require anything in the Microsoft ecosystem, or if you need the x86-64 platform for any particular purpose, then that requirement may make the whole DGX Spark platform a no-go for your needs.

One really surprising exception to all the software I've successfully installed on the GX10 has been Pi coding agent (which I prefer to all other agents, at the moment). For some reason I haven't worked out yet, Pi can't properly access its read, write, and other tools on the GX10. I have built some workarounds, and I've also built software using self-hosted models on the GX10, using Nanobot, which I also like as an agent harness (I'll test Hermes and others too...). But this is a caveat that may point to other potential software issues with the Arm CPU platform of the GX10. Stay tuned for more testing... (UPDATE: I've experienced this same issue with the Strix Halo - it's easily fixable with a brief instruction in the agents.md file).

Finally, many of the benefits of the GX10 out of the box may potentially be solved by upcoming improvements in the whole ROCm/Vulcan/LlammaCpp ecosystem. AMD is a big company and there's massive competition in China to eliminate reliance on the Nvidia CUDA stack. Already in the last year, Strix Halo has gone from being problematic in terms of driver and framework support/performance, to being very effective, without any changes to the hardware platform. As the software improves, the Strix Halo platform just keeps getting to be a more solid alternative to Nvidia hardware.

Along these same lines, Strix Halo does currently perform marginally better on Linux, so just installing Ubuntu, instead of using the default Windows 11 that comes on the ASUS ROG Flow Z13, for example, may close the performance gap between it and the GX10, for LLM inference work (though such improvements probably won't affect prefill performance much).

Also, the DGX Spark platform comes with super fast networking built in, so clustering is a first class use case. But for the $1000 extra cost per machine, you could buy fast network hardware for the Strix Halo, so that might just be a wash in terms of price/performance.

Finally, the power demands differ between the GX10 and the ROG Flow Z13. The GX10 comes with a 240W power supply designed for sustained AI workloads. The ROG Flow Z13 typically operates around a 100W–200W range, depending on whether it's used in portable or desktop-docked modes (they're both powered by USB-C inputs). So neither unit uses a ton of power, but when clustering many machines (to run Kimi, Deepseek, GLM, etc.), the GX10 may end up drawing just enough more power, that you'd need to upgrade a home electrical circuit.

There's a lot more to test, and clustering will be one of my biggest goals this year, especially since Apple has said that we shouldn't expect any high-RAM models for many months (which has caused prices on even used M3ultra Studio machines with 512Gb RAM to get up into the $25,000-$30,000 range).

The big takeaway is that both these platforms are useful. The DGX wins for large context pre-processing (long agentic tasks, processing large input documents, etc.).

I think the bigger thing to pay attention to, though, is improvements in LLM software. Just in the last few weeks, new open source LLMs have dramatically increased the usefulness of machines in these existing hardware classes - and I fully expect that trend to continue. Already, the newest Qwen3.6 MOE and dense models absolutely crush previous model options on small-medium sized GPU hardware, for coding at least. I'm even using the Qwen3.6 MOE models to write useful production code on a much less expensive laptop with an RTX 3080, which has only 16GB VRAM (and I've seen people use that model successfully on even smaller GPUs, with caveats). The Gemma 4 models show that a trend of small models becoming significantly more powerful, should probably be expected. If research such as Turboquant, 1 bit model architectures, and other small hardware approaches are successful, we could see capabilities like those of current frontier models coming from much small machines within the next year.

So get the GX10 or any DGX Spark variant if you need CUDA support, and if prefill speed is critical for agentic work and large input document processing - that Nvidia prefill speed really does make a big difference in performance if using an agent is your primary use case. Stay away from DGX Spark if anything in your stack requires x86-64 and Windows. The portability, low power draw, and output inference speeds of the ASUS ROG Flow Z13, are its main benefits, along with the lower cost and availability. That laptop is a great machine for running significant inference loads on battery power, away from the Internet. We'll see how both the platforms cluster, soon.

Nick Antonaccio
Nick AntonaccioAdmin
May 09, 2026 at 14:06 (edited, 3 revisions)
#2

Using LM Studio on the GX10 has really nudged me to use the LM Link feature, which allows it to be accessed as a secure API server on any remote machine which also has LM Studio installed (without having to forward any ports on your router). LM link lets me use all the models on my GX10, with inference processed by the GX10's GPU), and results delivered on any other machine that has LM Studio installed.

To be clear, when you run Pi, for example, on several remote machines, and each of those PI instances connects to the API server in the LM Studio instance installed on each of those individual machines. The model list shown in each of those machine's LM Studio instances (connected to your LM Link account) includes all the models on the GX10 - those models on the GX10 appear as if they're installed directly in the local instance of LM Studio (so Pi connects to the API served by the local LM Studio instance and the inference runs on the GX10). That's pretty slick.

UPDATE: I've been running LM Link with one of the Strix Halo machines acting as server, and that's also working reliably.

Nick Antonaccio
Nick AntonaccioAdmin
May 09, 2026 at 14:11 (edited, 1 revision)
#3

The prefill (prompt processing) speed is dramatically better on the GX10, compared to the Strix Halo. This can have a significant impact on the overall performance of agentic workflows, especially the startup time required to start a new session. It doesn't make as much of a difference in chat, and for incrementally parsed prompts in a longer session, but when you're dealing with agents that send huge default prompt contexts, or when you're processing large input documents, the GX10 gets cranking much faster. Pi's smaller default prompt contexts can really help improve this performance on the Strix Halo.

Please login to post a reply.

© 2026 AI By Nick.