https://sleepingrobots.com/dreams/stop-using-ollama/
“Ollama wrapped that work in a nice CLI, raised VC money on the back of it, spent over a year refusing to credit it, forked it badly, shipped a closed-source app alongside it, and then pivoted the whole thing toward cloud services. At every decision point where they could have been good open-source citizens, they chose the path that made them look more self-sufficient to investors.”
总之就是开源小偷,还尝试锁死用户
原文建议用这些:
llama.cpp is the engine. It has an OpenAI-compatible API server (llama-server), a built-in web UI, full control over context windows and sampling parameters, and consistently better throughput than Ollama. In February 2026, Gerganov’s ggml.ai joined Hugging Face to ensure the long-term sustainability of the project. It’s truly community-driven, MIT-licensed, and under active development with 450+ contributors.
llama-swap handles multi-model orchestration, loading, unloading, and hot-swapping models on demand behind a single API endpoint. Pair it with LiteLLM and you get a unified OpenAI-compatible proxy that routes across multiple backends with proper model aliasing.
LM Studio gives you a GUI if that’s what you want. It uses llama.cpp under the hood, exposes all the knobs, and supports any GGUF model without lock-in. Jan is another open-source desktop app with a clean chat interface and local-first design. Msty offers a polished GUI with multi-model support and built-in RAG. koboldcpp is another option with a web UI and extensive configuration options.
Red Hat’s ramalama is worth a look too, a container-native model runner that explicitly credits its upstream dependencies front and center. Exactly what Ollama should have done from the start.

