LLM Model Groups
How Cremind splits work between an expensive "high" reasoning model and a cheap "low" per-tool model to lower token cost.
Cremind does not call one model for everything. It sorts the providers and models you configure into two model groups — high and low — and routes work to whichever group fits. The expensive high model does the thinking; the cheap low model does the busywork of individual tool calls. The payoff is lower token cost without dumbing down the reasoning.
This page explains the two groups, how a model is resolved at call time, and how to configure it.
The two groups
| Group | Used for | Typical choice |
|---|---|---|
high | Reasoning and planning — the main agent loop | A strong, more expensive model |
low | Individual tool calls | A cheaper, faster model |
The reasoning loop runs on the high model: it decides what to do, reads tool results, and plans the next step. But when the agent fires off an individual tool call — reading a file, driving a browser, searching docs — that work is dispatched to the low model. Most tool calls are mechanical and don't need a frontier reasoner, so paying frontier prices for them is waste. Splitting the two means you pay the high rate only where it matters.
Why this lowers cost
Tool calls vastly outnumber reasoning turns in a typical task. Sending each of those to a cheaper model — while keeping the planning brain expensive — gives you lower overall token cost without trading away plan quality.
Per-tool defaults
A model group isn't only chosen by the loop — each built-in tool can declare which group it prefers. Every built-in tool's TOOL_CONFIG carries a default_model_group key. For example, the browser tool declares "low", because driving a page doesn't need the reasoning model.
So a tool call resolves its model in this order:
- Tool-level override — an explicit
llm_modelset for that specific tool (stored in SQLite per profile). - The tool's
default_model_group— taken from its built-inTOOL_CONFIG. - The
lowgroup — the ultimate fallback if neither of the above is set.
This is exactly what ModelGroupManager.create_llm_for_tool() does in app/lib/llm/model_groups.py.
How a group resolves to a model
Each group resolves to a concrete provider/model pair. The group value is a string whose first segment before / is the provider and whose remainder is the model identifier. For example:
groq/openai/gpt-oss-120b
└──┘ └────────────────┘
provider modelResolution follows a clear priority: SQLite dynamic config overrides the TOML defaults.
ModelGroupManager.get_provider_and_model("high")first asks the dynamic config storage (SQLite) formodel_group.high.- If SQLite has no value, it falls back to the TOML default at
llm.model_groups.highinapp/config/settings.toml. - If neither is set, it raises a setup error pointing you at Settings → LLM Providers.
# app/lib/llm/model_groups.py (abridged)
def get_provider_and_model(self, group: str, profile=None) -> tuple[str, str]:
# Try SQLite first
group_value = self.config_storage.get("llm_config", f"model_group.{group}", ...)
# Fall back to TOML
if not group_value:
group_value = dynaconf_settings.get(f"llm.model_groups.{group}")
...
return self._parse_group_value(group_value) # "provider/model" -> (provider, model)Each group can also carry a configured reasoning effort, looked up per profile from SQLite and passed through when the provider is created.
Configuring the groups
You assign a provider and model to each group in the Setup Wizard, under Settings → LLM Providers. Because the configuration is per profile, different profiles can use different models for the same group — a coding profile might point high at one model while a home profile points it at another.
Cremind installs LLM providers on demand; the available providers include Anthropic, OpenAI, and Groq.
If you see a "model group is not configured" error, one of the two groups has no provider assigned in either SQLite or TOML. Open Settings → LLM Providers and pick a model for the group named in the message.
Mental model
Think of it as a senior engineer and an intern. The senior (the high model) reads the situation, makes the plan, and reviews results. The intern (the low model) carries out the individual, well-scoped tasks the senior hands off. You get the senior's judgment on the decisions that matter and the intern's price on the rest.
Profiles
How Cremind profiles isolate skills, embeddings, tool visibility, and conversation history so one install can host several assistants at once.
Event-Driven Architecture
How Cremind reacts to external changes in sub-second time using a relay WebSocket, a markdown event log, and a filesystem watcher — instead of polling.