Post History

Current VersionApr 09, 2026 at 00:20

This is a copy of several related posts from the old Rebolforum, about the Jan inference engine:

2026-03-19 23:07:39

I've really come to like the Jan AI app. It's a 54Mb download, compared to 582Mb for LM studio and 1690Mb (1.6Gb) for Ollama, and even half the size of the smallest version of Koboldcpp that has no Cuda support (and Jan does support Cuda). Jan works really well, out of the box, for inference with local models, and it's super simple to connect to all the most common online LLM APIs. I use it with Openrouter on a bunch of small netbooks. It takes literally just a few seconds to download the .exe & pop in your Openrouter API key, and you've got an instant local chat interface to hundreds of LLM models of every type (Openrouter provides access to basically every one of the most common commercial and open source models, the instant they're publicly available, all with a single API key). And if your local machine has a GPU, you can make full use of your hardware for local inference, using any common model that you can download - Jan has a 'hub' UI that lets you download all the most common OSS models, right in the app (just like in LM Studio, but I actually prefer Jan's simple interface).

2026-03-22 19:43:30

I've gotta plug Jan a bit more. What a cool little app for local inference. It's improved tremendously in recent months, and I find myself using it more and more, over LM Studio, Ollama, Koboldcpp, etc.

It only takes a few seconds to install, and is the smallest download I've seen yet for any inference app.

Aside from the built-in hub for model downloads, Jan also includes a simple point and click UI to instantly import any existing .gguf files that are on your hard drive (so, if you've got a bunch of models already downloaded in LM Studio, for example, you can import and use them immediately without having to download again for use in Jan).

The interface for connecting with 3rd party APIs is especially simple - just pop in your API key - do this with OpenRouter and you've got immediate local chat access with basically every commercial and open source model in common use (you can also choose to use any API key you have set up with other individual API services such as OpenAI, Gemini, Grok, etc. - but all those are all included in OpenRouter).

I've started using Jan with OpenRouter to quickly test every new model, and to do actual chat work with multiple models, even on low power netbooks that I keep scattered at various geographical locations (different offices at home/work, in my camper, vehicles, girlfriend's place, etc.). It's a super slick, quick little app that works beautifully for this purpose.

Jan also wires web search and tools cools automatically for models you've downloaded, so your local LLMs can do useful work, right way. Wiring up web search in LM Studio is a pain, and that little addition turns small models, which can run on inexpensive consumer GPUs, into much more capable tools. Even some of the tiniest models (like the default Jan model) can perform effective research, and work with files and the command console on your machine, even if you only have a CPU. Running models on a CPU only only works at a few tokens per second, but it's pretty freakin cool to see a netbook accomplishing actually useful intelligent AI work. I've run GPT-OSS:20b on a netbook with 8Gb RAM (around 5 tps), and the newest qwen3.5 models are looking even more capable, with even faster tokens per second).

The newest version of Jan has Claude Code and Openclaw integrations built right in. You can also use Jan as an HTTP API server, use MCP servers (some come built into Jan, ready to use), build your own assistant definitions, etc.

The newest killer feature of LM Studio is LM Link, which lets you load models on remote machines and use them as if they are local. This can be useful, for example, if you have one huge server machine with lots of expensive hardware that you aren't going to duplicate, and you want to share that machine's inference capabilities instantly between all your installations of LLM Studio.

But if you're not going to do the LM link thing, Jan really covers the majority of bases. Besides being a *much smaller installation than any other option, it's also very pleasant to use.

It's a complete, instant solution for on-device API access to hosted models - install it on an old PC, netbook, etc., pop in an OpenRouter API key, and you've got a local AI powerhouse all ready to go in a minute, with access to every new bleeding edge model, right when they come out. Mix that with everything needed to download, import, and work with local models, and it does everything that most users need for LLM inference.

2026-03-22 20:10:11

To be clear about Jan, if you hook up an LLM API such as OperRouter, that sort of connected inference runs at whatever fast speed the chosen LLM server service provides - so you'll see the typical 50-100 tokens per second those APIs deliver, even if you're running Jan on a machine that only has a CPU (not the few tokens per second you'd expect from local model inference on a CPU). And if you have a local GPU, you'll get whatever inference speed that hardware is capable of delivering, for any local models you choose to download and run.

2026-03-25 18:39:15

When it comes to versions of Linux that can be run from a live USB drive, on little netbooks that aren't ancient, my favorite is Q4OS. Q4OS is dead simple to set up, has all the features I need for a typical little daily driver machine, it runs quickly from a little drive, and is very familiar to anyone used to Windows.

Today I tried running Jan on it. It took less than 2 minutes to set up, and ran perfectly :) What a cool way to use a cheap old machine, without even a hard drive, as a mean little AI monster.

Version 2Apr 09, 2026 at 00:20

This is a copy of several related posts from the old Rebolforum, about the Jan inference engine:

2026-03-19 23:07:39

I've really come to like the Jan AI app. It's a 54Mb download, compared to 582Mb for LM studio and 1690Mb (1.6Gb) for Ollama, and even half the size of the smallest version of Koboldcpp that has no Cuda support (and Jan does support Cuda). Jan works really well, out of the box, for inference with local models, and it's super simple to connect to all the most common online LLM APIs. I use it with Openrouter on a bunch of small netbooks. It takes literally just a few seconds to download the .exe & pop in your Openrouter API key, and you've got an instant local chat interface to hundreds of LLM models of every type (Openrouter provides access to basically every one of the most common commercial and open source models, the instant they're publicly available, all with a single API key). And if your local machine has a GPU, you can make full use of your hardware for local inference, using any common model that you can download - Jan has a 'hub' UI that lets you download all the most common OSS models, right in the app (just like in LM Studio, but I actually prefer Jan's simple interface).

2026-03-22 19:43:30 I've gotta plug Jan a bit more. What a cool little app for local inference. It's improved tremendously in recent months, and I find myself using it more and more, over LM Studio, Ollama, Koboldcpp, etc.