Trillim Adds Support For Ternary Bonsai

On April 16, 2026, PrismML released the ternary variants of Bonsai. One day later, Trillim added support for running them on x86 and Arm CPUs, with optimized AVX2 and NEON kernels.

That turnaround matters to us. Trillim is supposed to be the place where new efficient model families land quickly, without forcing users onto a separate local stack or waiting through a long integration cycle. If a model is interesting on real consumer hardware, we want it available through the same runtime, the same bundle flow, and the same SDK surface as everything else we ship.

This release is available now in v0.9.0. You can install it with pip install trillim or uv add trillim, then use the same Trillim workflow you already know from the CLI, Python components, and local server.

Why we moved quickly

Last week, we added support for 1-bit Bonsai in Trillim. That work was not meant to be a one-off. The broader goal is to make our inference engine, DarkNet, a serious CPU runtime for low-bit local inference, whether the model family is BitNet-style ternary or PrismML’s Bonsai line.

So when PrismML published Ternary Bonsai on April 16, 2026, the question for us was not whether we should support it. The question was how fast we could get support into users’ hands without compromising the runtime.

We spent the following day doing exactly that: wiring Ternary Bonsai into the existing Trillim flow, then optimizing the CPU kernels to the limit. The result is support on both AVX2 and Arm NEON in under a day.

What ships in `v0.9.0`

This release adds Ternary Bonsai support directly to Trillim and DarkNet.

Ternary Bonsai bundles run through the same Trillim-managed workflow as other supported models
DarkNet includes optimized inference paths for x86 AVX2 and Arm NEON
The feature is available immediately through the Trillim SDK, CLI, and local server surfaces

The practical point is simple: if you want to try Ternary Bonsai locally on CPU, you can do it today without changing tools.

Benchmarks

x86 AVX2 Results

These runs were collected on an Intel i7 Alder Lake laptop.

Model	pp=512 TPS	tg=256 TPS
Bonsai-1.7BT-TRNQ	169.34	36.62
Bonsai-4BT-TRNQ	66.95	17.26
Bonsai-8BT-TRNQ	34.13	8.76

As explained in previous benchmark blogs, the laptop is a noisy benchmark machine, and this one showed noticeable thermal and system-load swings during testing. We wanted to ship support quickly rather than wait for a perfectly isolated run, so treat these as representative numbers. The overall performance picture is still clear: the AVX2 path is real, fast, and immediately usable.

Arm NEON Results

These Arm runs were collected on a Mac Studio using the NEON dot-product instruction path.

`Bonsai-1.7BT-TRNQ`

Threads	pp=512 TPS	tg=256 TPS
1	71.27	28.80
4	252.62	92.67
8	496.88	145.66
10	581.56	150.93
20	1016.69	155.53

`Bonsai-4BT-TRNQ`

Threads	pp=512 TPS	tg=256 TPS
1	27.93	11.82
4	99.31	38.87
8	194.85	65.74
10	229.57	72.49
20	423.91	87.97

`Bonsai-8BT-TRNQ`

Threads	pp=512 TPS	tg=256 TPS
1	16.16	7.24
4	57.69	24.22
8	113.52	42.07
10	136.54	46.46
20	250.60	61.35

The Arm story is especially strong on prefill. As thread counts increase, the 1.7B, 4B, and 8B models all scale into throughput that makes local CPU inference feel much more practical on modern Apple Silicon-class hardware.

Why this release matters

Shipping Ternary Bonsai support in under a day is not just a speed milestone for us. It is a signal about the shape of Trillim.

We want the product surface to stay stable even as efficient model families evolve underneath it. That means new low-bit formats should land in the same runtime, under the same bundle flow, with the same chat, serve, and SDK interfaces instead of fragmenting the user experience.

That is what v0.9.0 delivers. Ternary Bonsai is available now in Trillim, with optimized AVX2 and Arm NEON support from day one.