Running LLMs Locally with LM Studio and Jan
Running large language models (LLMs) locally offers distinct advantages over cloud-based alternatives, primarily in terms of privacy, cost, and latency.
By processing data directly on your own hardware, you gain complete control over sensitive information, eliminating concerns about data exposure to third-party servers.
While local LLMs have limitations, as they often lack the sophistication of leading models like ChatGPT, Gemini, or Sonnet, and require powerful machines for optimal performance, not all tasks demand the most advanced LLM. Even consumer-grade laptops from a few years ago can now adequately run local LLMs.
This short article introduces two popular desktop apps to run LLMs locally: LM Studio and Jan.
The Polished Studio
LM Studio is a popular choice among local LLM enthusiasts due to its sophisticated, user-friendly interface, appealing to all experience levels. It supports macOS, Windows, and Linux, and is compatible with NVIDIA, AMD, and Intel GPUs, even running LLMs on CPUs (though with reduced performance).
Getting started is easy: install the software and select a model to download. LM Studio provides a wizard-like interface for guidance, but manual selection is also straightforward. New users should begin with smaller models like LLama-3.2 3B, Qwen-3 4B, or Gemma-3 4B.
LLMs are packaged as model weights in specific formats (GGUF and MLX), depending on hardware and inference engine. These weights are typically quantized, a process that significantly compresses them with minimal performance loss. Once loaded, you can immediately chat with the model, similar to ChatGPT, Gemini, or Claude.
Additionally, LM Studio supports MCP (Model Context Protocol), providing access to thousands of MCP servers. This enhances chat sessions with various tools for tasks like accessing file systems, checking GitHub bugs, or even looking up weather forecasts.
Jan: A Flexibily Open-Source Alternative
For those interested in understanding the underlying mechanisms or extending the tool themselves, Jan presents a compelling open-source alternative to LM Studio. Its code is readily available on GitHub (github.com/menloresearch/jan).
The process for using Jan mirrors that of LM Studio and other local LLM desktop applications: install, launch, and select a model. Given that most of these applications share the same inference engines (MLX for Apple Silicon, llama.cpp with CUDA/Vulkan for others), they all access the same model repositories. Therefore, choosing Jan doesn't limit your model options.
While Jan's user interface may not be as polished as its proprietary counterparts, it remains intuitive and also supports MCP, offering a helpful dialog for server configuration.
Unlike LM Studio, Jan facilitates both local LLM operation and integration with various paid LLM providers such as OpenAI, HuggingFace, Anthropic, OpenRouter, and Gemini (requiring API keys when applicable). This enables you to centralize your chats within Jan, seamlessly switching between local and remote models.
Despite its less refined user interface compared to proprietary alternatives, Jan is user-friendly and includes MCP support with a useful server configuration dialog.
Hardware Requirements
Running LLM inference locally at a satisfactory speed necessitates capable hardware. Apple Silicon, even older models like the M1, proves quite capable due to its unified memory; for instance, a 2021 MacBook laptop with 16GB RAM or more is suitable for learning about local LLMs. Consumer GPUs from NVIDIA, particularly the new RTX series, offer a good balance of speed and cost. While not as power-efficient or portable as MacBooks, older NVIDIA GPUs are also affordable.
Your ideal hardware setup for running local LLMs depends on factors like your intended use, budget, and need for portability. For example, a local LLM used for coding assistance demands fast prompt processing, an area where Apple Silicon lags. While AMD GPUs offer a more budget-friendly option than NVIDIA, they might not deliver the same performance as NVIDIA's CUDA technology.
However, even with a budget of $300 to $1000, you can assemble a capable rig to enjoy local LLMs. Future posts will explore specific hardware configurations in more detail, helping you find your preferred setup, perhaps even your favorite!