Post History

Current version by swampstream

Current VersionApr 21, 2026 at 09:59

I find myself idealizing TPS and gear.

And then on the other hand looking towards clusters to run bigger models.

And then I run into clusters just running different MoE's / Experts on different PC's and having them talk to eachother and exchange things. These seem to be solving problems quicker than big singular models, even more so when a single big model is spread across a cluster.

https://x.com/AustinBaggio/status/2039043909142065426?s=20

It's a bit of a head spin at the moment, as you all can imagine. It's multivariate, non linear maybe, I don't know. Bit beyond me.

So it got me thinking a bit in terms of what's the first principle analysis for this whole AI thing we are in?

The question reminded me of Exergy. Exergy is a concept I use to study, it can describe something about useful energy in a system. The amount of energy produced by a product/service (lets say, oil well/solar cell/etc) given an amount of energy put into the product to make it. Joules produced / Joules in. There are different ways to spin the exergy concept in the energy field, but it got me wondering about AI performance criteria.

What are we performing, what is the goal? Get an answer to a problem, in the quickest time? And, ideally with the least amount of power consumed and funds used?

(Tokens to get to answer)/(time to get to answer)

So just to think it through a little... Big smart model, fast hardware: 100 tokens in 1 sec = 100 Small model, fast hardware: 300 tokens in 5 sec? = 60 Cluster of small models different MoE's, slower hardware: 500 tokens in 10 sec = 50 Cluster of one model, big smart, slower hardware: 100 tokens in 20 sec = 5

I guess this says something about the efficiency to getting an answer. More oriented toward judging a system efficiency?

And then...

(Cost of hardware + power) divided by ((Tokens to get to answer)/(time to get to answer))

Big smart model, fast: 5000/100 = 50 Small, fast: 5000 / 60 = 83 Cluster of small models different MoE's, cheap: 3000/50 = 60 Cluster of one model, big smart, cheap: 3000/5 = 600

Hmm sort of corrects for hardware expense, I left out power costs for now. Cost of Token effectiveness of a system?

Its a bit late and I need a rest. But maybe someone can pitch in, perhaps its useful to think about it a bit for normies like me that like to investigate the parameters of performance before spending hard earned cash.

Previous Versions
Version 1Apr 21, 2026 at 09:59

I find myself idealizing TPS and gear.

And then on the other hand looking towards clusters to run bigger models.

And then I run into clusters just running different MoE's / Experts on different PC's and having them talk to eachother and exchange things. These seem to be solving problems quicker than big singular models, even more so when a single big model is spread across a cluster.

https://x.com/AustinBaggio/status/2039043909142065426?s=20

It's a bit of a head spin at the moment, as you all can imagine. It's multivariate, non linear maybe, I don't know. Bit beyond me.

So it got me thinking a bit in terms of what's the first principle analysis for this whole AI thing we are in?

The question reminded me of Exergy. Exergy is a concept I use to study, it can describe something about useful energy in a system. The amount of energy produced by a product/service (lets say, oil well/solar cell/etc) given an amount of energy put into the product to make it. Joules produced / Joules in. There are different ways to spin the exergy concept in the energy field, but it got me wondering about AI performance criteria.

What are we performing, what is the goal? Get an answer to a problem, in the quickest time? And, ideally with the least amount of power consumed and funds used?

(Tokens to get to answer)/(time to get to answer)

So just to think it through a little... Big smart model, fast hardware: 100 tokens in 1 sec = 100 Small model, fast hardware: 300 tokens in 5 sec? = 60 Cluster of small models different MoE's, slower hardware: 500 tokens in 10 sec = 50 Cluster of one model, big smart, slower hardware: 100 tokens in 20 sec = 5

I guess this says something about the efficiency to getting an answer. More oriented toward judging a system efficiency?

And then...

(Cost of hardware + power) divided by ((Tokens to get to answer)/(time to get to answer))

Big smart model, fast: 5000/100 = 50 Small, fast: 5000 / 60 = 83 Cluster of small models different MoE's, cheap: 3000/50 = 60 Cluster of one model, big smart, cheap: 3000/5 = 600

Hmm sort of corrects for hardware expense, I left out power costs for now. Cost of Token effectiveness of a system?

Its a bit late and I need a rest. But maybe someone can pitch in, perhaps its useful to think about it a bit for normies like me that like to investigate the parameters of performance before spending hard earning cash.