LLM Model Groups

How Cremind splits work between an expensive "high" reasoning model and a cheap "low" per-tool model to lower token cost.

Cremind does not call one model for everything. It sorts the providers and models you configure into two model groups — high and low — and routes work to whichever group fits. The expensive high model does the thinking; the cheap low model does the busywork of individual tool calls. The payoff is lower token cost without dumbing down the reasoning.

This page explains the two groups, how a model is resolved at call time, and how to configure it.

The two groups

Group	Used for	Typical choice
`high`	Reasoning and planning — the main agent loop	A strong, more expensive model
`low`	Individual tool calls	A cheaper, faster model

The reasoning loop runs on the high model: it decides what to do, reads tool results, and plans the next step. But when the agent fires off an individual tool call — reading a file, driving a browser, searching docs — that work is dispatched to the low model. Most tool calls are mechanical and don't need a frontier reasoner, so paying frontier prices for them is waste. Splitting the two means you pay the high rate only where it matters.

Why this lowers cost

Tool calls vastly outnumber reasoning turns in a typical task. Sending each of those to a cheaper model — while keeping the planning brain expensive — gives you lower overall token cost without trading away plan quality.

Per-tool defaults

A model group isn't only chosen by the loop — each built-in tool can declare which group it prefers. Every built-in tool's TOOL_CONFIG carries a default_model_group key. For example, the browser tool declares "low", because driving a page doesn't need the reasoning model.

So a tool call resolves its model in this order:

Tool-level override — an explicit llm_model set for that specific tool (stored in SQLite per profile).
The tool's default_model_group — taken from its built-in TOOL_CONFIG.
The low group — the ultimate fallback if neither of the above is set.

This is exactly what ModelGroupManager.create_llm_for_tool() does in app/lib/llm/model_groups.py.

How a group resolves to a model

Each group resolves to a concrete provider/model pair. The group value is a string whose first segment before / is the provider and whose remainder is the model identifier. For example:

groq/openai/gpt-oss-120b
└──┘ └────────────────┘
provider     model

Resolution follows a clear priority: SQLite dynamic config overrides the TOML defaults.

ModelGroupManager.get_provider_and_model("high") first asks the dynamic config storage (SQLite) for model_group.high.
If SQLite has no value, it falls back to the TOML default at llm.model_groups.high in app/config/settings.toml.
If neither is set, it raises a setup error pointing you at Settings → LLM Providers.

# app/lib/llm/model_groups.py (abridged)
def get_provider_and_model(self, group: str, profile=None) -> tuple[str, str]:
    # Try SQLite first
    group_value = self.config_storage.get("llm_config", f"model_group.{group}", ...)
    # Fall back to TOML
    if not group_value:
        group_value = dynaconf_settings.get(f"llm.model_groups.{group}")
    ...
    return self._parse_group_value(group_value)  # "provider/model" -> (provider, model)

Each group can also carry a configured reasoning effort, looked up per profile from SQLite and passed through when the provider is created.

Configuring the groups

You assign a provider and model to each group in the Setup Wizard, under Settings → LLM Providers. Because the configuration is per profile, different profiles can use different models for the same group — a coding profile might point high at one model while a home profile points it at another.

Cremind installs LLM providers on demand; the available providers include Anthropic, OpenAI, and Groq.

If you see a "model group is not configured" error, one of the two groups has no provider assigned in either SQLite or TOML. Open Settings → LLM Providers and pick a model for the group named in the message.

Mental model

Think of it as a senior engineer and an intern. The senior (the high model) reads the situation, makes the plan, and reviews results. The intern (the low model) carries out the individual, well-scoped tasks the senior hands off. You get the senior's judgment on the decisions that matter and the intern's price on the rest.

The Tool Plane

Where tool calls — the work the low model handles — come from.

Profiles

Why model configuration is per profile.