Trillim's Tokens

Trillim v0.7.1 Release Notes

March 27, 2026

v0.7.1 is a significant redesign of Trillim's SDK, server, and CLI around clearer contracts, better extensibility, and a more intuitive local AI workflow.

Trillim v0.7.1 is a significant redesign of the product surface, not just a small incremental release. The goal was simple: make Trillim easier to use in the common path, more extensible where it matters, and more truthful about what the runtime is actually doing.

What changed

The new shape of the system is more explicit:

  • Runtime is the supported sync pipeline over async-native components
  • Server is the small HTTP surface for local integration
  • LLM, STT, and TTS are the main component boundaries
  • the CLI is narrower and more intuitive instead of exposing every low-level knob

That redesign matters because it gives us clearer contracts for sessions, swaps, limits, admission control, and model metadata without turning the public API into a grab bag of legacy behavior.

The main CLI flow

For most users, the happy path is now straightforward:

uv add trillim
uv run trillim pull Trillim/BitNet-TRNQ
uv run trillim chat Trillim/BitNet-TRNQ
uv run trillim serve Trillim/BitNet-TRNQ

That is the right starting point for v0.7.1. Pull a managed bundle, use chat when you want a quick local shell, and use serve when you want the OpenAI-compatible HTTP surface.

What this release is optimizing for

Three themes drove the rewrite:

  • better extensibility through smaller, more concrete component boundaries
  • more intuitive day-to-day use through a simpler CLI and clearer model-store rules
  • more truthful runtime behavior around state, sessions, cleanup, timeouts, and swap semantics

In practice, that means fewer ambiguous behaviors, less configuration drift between surfaces, and a cleaner path from local testing to embedding Trillim inside an application.

How to get the most out of it

Start with the docs that match the way you actually want to use Trillim:

  • use Python Components if you want to embed the runtime directly in Python
  • use API Server if you want local HTTP routes and OpenAI-compatible clients
  • use Advanced SDK and Server Notes if you need the deeper operational details behind sessions, swaps, search orchestration, limits, and runtime behavior

v0.7.1 is the version where the SDK and server surfaces start to feel deliberately engineered instead of merely assembled. That gives us a better foundation for the releases that follow.