Private chat with AI models on your network

Everything the server can do

A full chat surface,
nothing held back.

Whatever your model supports, LLMster renders it natively — with the same care Apple gives its own apps.

Streaming responses

Answers appear as the model generates them, formatted as you'd expect — paragraphs, lists and syntax-highlighted code — with no wait for the full reply.

Visible reasoning

For models that think before answering, the reasoning is kept in its own collapsible section — expand it to follow the logic, or leave it tucked away.

Models that take action

Connect your tools and the model can actually do things — look something up, query a database, file a ticket — and show its work as it goes.

Ask about what you see

Hand the model a screenshot, a photo, or a page of handwriting and just ask. The answer comes straight back in the same conversation.

Local-first, by architecture

Your conversations live on your server. Zero telemetry by design.

Nothing leaves the network

LLMster talks only to the LM Studio server you point it at. No telemetry, no accounts, no third-party cloud in the path.

No account, no copies

There's no LLMster sign-in and no copy of your conversations sitting on our servers. Your history stays between your device and the server you chose.

On-device speech

Voice chat uses on-device speech recognition. Talk to a model that's running on the machine in the next room — and nowhere else.

Headless-ready

No desktop app
required.

Keep your models on the machine that can handle them — a server in the closet or a workstation that never sleeps — and reach them from your phone. Run LM Studio, or just the standalone llmster daemon. Point the app at the host and start chatting.

Runs as a service Any host on your network

# Install the headless daemon — no GUI needed $ curl -fsSL https://lmstudio.ai/install.sh | bash $ lms get gemma-4-e4b $ lms server start ✓ Server listening on 0.0.0.0:1234 # Open LLMster → enter the host → connect

Done Choose Model

Sort & Filters

SortGB Size

FormatAll Formats

CapabilityAll Capabilities

Gemma 4 31B

19.9 GB GGUF

Qwen3 VL 30B

18.3 GB MLX

Olmo 3 32B Think

18.1 GB MLX

Mistral Small 24B

13.4 GB GGUF

Phi-4 14B

8.4 GB GGUF

Search Models

Model picker

The right model
for the moment.

A fast 4B for quick questions, a 30B that reasons, a vision model for screenshots — switch between them in a tap, without ever leaving the conversation.

Find it fast — search or sort by size so the model you want is one tap away, even with dozens installed.
Know before you load — see at a glance which models can see images, reason, or call your tools.
Swap mid-chat — change models without losing the thread; pick up right where you left off.
No surprises — watch a model load in real time so you know the second it's ready.

Vision models

Show it what
you mean.

Attach an image and ask. LLMster sends it to any vision-capable model on your server and renders the answer right in the thread — pictures are downscaled and kept on your device.

Up to four images — drop in screenshots, photos or scanned pages per message.
Zoom right in — open any image full-screen to read the small print or check a detail before you ask.
Never leaves your device — images are resized on your phone, so the originals stay with you.
Any vision model — routed to whichever vision-capable model you've loaded.

Dialog Mode

Talk to your model.
Out loud.

Start a hands-free conversation and trade turns with your local model in real time. LLMster listens with on-device speech recognition and speaks the reply back with on-device text-to-speech — the whole exchange stays on your hardware.

On-device STT — your voice is transcribed locally and sent as text. No audio leaves the device.
On-device TTS — replies are spoken back with the system voice, no cloud speech service in the path.
Natural turn-taking — it listens, you speak, it answers, then waits for you again — fully hands-free.
Haptic cues — a subtle tap when your turn is sent and when the reply lands.

5:48

Gemma 4 E4B

Listening…

On-device speech · real-time

Per-chat control

Tune every
conversation.

Every chat carries its own settings. Switch integrations on for the task at hand, give the model a role, and decide exactly how much of its work you want to see.

MCP integrations — flip servers like Linear, Notion or Figma on per chat.
Custom system prompt — set the model's role; it's sent with every later message.
Show or hide tool calls — see exactly what the model did, or keep the conversation clean and conclusions-only.
Reading, your way — hide the model's thinking and keep the view steady while it writes, so nothing jumps around as you read.

Per-chat model settings: render tool calls and system prompt

Live Activity in the Dynamic Island showing the model thinking

Lock-screen notification with the model's reply

LM Studio Response

Receiving response…

Notifications & Live Activities

Know the moment
it's done.

Kick off a long reply, then go do something else. LLMster tracks it from the Dynamic Island and taps you on the shoulder the second your model finishes — fired right on your phone, with no push servers in between.

Local-only alerts — notifications come from your phone, not through a push cloud that sees your chats.
Live in the Dynamic Island — glance up to see it thinking, then receiving, without opening the app.
Lands on your lock screen — the finished reply is waiting for you, ready to read at a tap.

The native chat client for local models.

A full chat surface,
nothing held back.

Streaming responses

Visible reasoning

Models that take action

Ask about what you see

Your conversations live on your server. Zero telemetry by design.

Nothing leaves the network

No account, no copies

On-device speech

No desktop app
required.

The right model
for the moment.

Show it what
you mean.

Talk to your model.
Out loud.

Tune every
conversation.

Know the moment
it's done.

Bring your models
to your pocket.

A full chat surface,nothing held back.

Streaming responses

Visible reasoning

Models that take action

Ask about what you see

Your conversations live on your server. Zero telemetry by design.

Nothing leaves the network

No account, no copies

On-device speech

No desktop apprequired.

The right modelfor the moment.

Show it whatyou mean.

Talk to your model.Out loud.

Tune everyconversation.

Know the momentit's done.

Bring your modelsto your pocket.

A full chat surface,
nothing held back.

No desktop app
required.

The right model
for the moment.

Show it what
you mean.

Talk to your model.
Out loud.

Tune every
conversation.

Know the moment
it's done.

Bring your models
to your pocket.