GENERAL INTELLIGENCETechnology preview

State-of-the-art image generation, in your browser.

Bonsai Image 4B is built to run entirely in the browser, on your own GPU over WebGPU — nothing leaves the device, there is no per-prompt server cost, and iteration is instant because there is no round-trip. FLUX.2 [klein] quality from a diffusion transformer that fits in ~1 GB. Open weights, Apache-2.0.

Built from FLUX.2 [klein] 4B and re-quantized to binary / ternary weights, Bonsai is compact enough to download once, cache on the device, and run locally — laptops and phones included.

Open the studio Browse the gallery

Requires a WebGPU browser. Chrome / Edge 113+ are the primary verified path. Safari 18+ is a target; on-device verification is pending.

Model / readoutbonsai-image-4b

generation canvas

Interface visualization — not a generated image

BaseFLUX.2 [klein] 4B

LicenseApache-2.0

RuntimeWebGPU · on-device

ternary1.21 GB · 95%

binary0.93 GB · 88%

First-run download 3.43–3.89 GB · cached locally

Technology preview. The in-browser studio currently runs a reference mock while we vendor and verify the WebGPU runtime on-device. The technology is real today — see PrismML's live WebGPU demo.

01What it is

A studio for a model that fits on your device

General Intelligence is a studio for Bonsai Image 4B — an open-weight, 4B-parameter diffusion model built from FLUX.2 [klein] 4B and re-quantized to binary or ternary weights so it is compact enough to run locally in a WebGPU browser. You pick a variant and write a prompt; once the WebGPU runtime is in place, generation runs on your own hardware — nothing uploaded, nothing metered. When a result is worth selling, you publish it to the marketplace with its full provenance attached.

prompt

variant

on-device

provenance

Interface visualization — not generated images.See real generated work: gallery · PrismML WebGPU demo

02How it works

Three steps, all on your machine

No account, no upload, no queue. The model loads into your browser and stays there.

01
Pick a variant
Ternary for maximum quality, or binary for the smallest footprint. Each shows its exact download size up front.
02
Type a prompt
Describe what you want. Set steps and a seed, or leave them — the resolved seed is shown so any result is reproducible.
03
Generate on your GPU
Once the WebGPU runtime is in place, generation runs locally — no server round-trip, and nothing leaves your device until you publish.

First run downloads the model once (~3.4–3.9 GB) and caches it on your device; later runs load from cache.

Preview

In this preview the studio runs a reference mock so you can walk the full flow; real on-device generation lights up once the WebGPU runtime is verified on-device.

03Model variants

Two ways to trade footprint for fidelity

Bonsai re-quantizes the FLUX.2 [klein] diffusion transformer to binary or ternary weights. The result moves the quality–footprint frontier: 4B-class behaviour in a fraction of the memory.

Ternary Bonsai Image 4B

Quality

{−1, 0, +1} · 1.71 effective bits / weight

Quality vs [klein]95%

Transformer footprint1.21 GB · 6.4× smaller

The quality default. The extra zero state buys representational flexibility — better visual quality and prompt fidelity.

First-run download 3.89 GB

1-bit Bonsai Image 4B

Footprint

{−1, +1} · 1.125 effective bits / weight

Quality vs [klein]88%

Transformer footprint0.93 GB · 8.3× smaller

The footprint default. Brings the diffusion transformer below 1 GB — the right fit when memory and bandwidth are the constraint.

First-run download 3.43 GB

Benchmarks

GenEval (object composition & attribute binding); HPSv3 (human-preference & aesthetic quality); DPG-Bench (dense prompt following & semantic faithfulness).

Model	Footprint	GenEval	HPSv3	DPG-Bench	vs [klein]
Ternary Bonsai Image 4B	1.21 GB	0.723	12.22	0.851	95%
1-bit Bonsai Image 4B	0.93 GB	0.671	11.15	0.822	88%
FLUX.2 [klein] 4B	7.75 GB	0.819	12.84	0.853	100%
SDXL	5.14 GB	0.300	10.05	0.740	67%
Stable Diffusion 1.5	1.72 GB	0.396	4.20	0.601	51%

Source: PrismML launch benchmarks (GenEval / HPSv3 / DPG-Bench). Higher is better.

Mean-active memory

After the text encoder is offloaded.

512 × 512binary 1.5 GB · ternary 1.96 GB

1024 × 1024binary 1.95 GB · ternary 2.38 GB

Generation speed

512² image, 4 denoising steps.

512² on Mac M4 Pro≈ 6 s

512² on iPhone 17 Pro Max≈ 9.4 s

vs stock FP16 MFLUX (M4 Pro)up to 5.6× faster

PrismML reports that, to its knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.

04Why local

Image-making is iterative. The model should be too.

Cloud generation turns every prompt into a remote request — metered, billed, and gated behind latency. Once the model fits on the device, that friction disappears.

Private by default

Prompts and generated images stay on the device. Generation happens in the page, not on a server.

No marginal cost

Cloud generation makes every prompt a remote request with a per-prompt serving cost. On-device, there is no per-prompt server cost.

Instant iteration

Image-making is iterative — revise, compare, re-roll, discard. With no server round-trip per attempt, the creative loop sits inside the product.

05Marketplace

Publish what you make. License it.

Non-exclusive

Keep everything local while you iterate. When a result is worth selling, publish it to the marketplace and offer it under a non-exclusive license — the full provenance of how it was generated travels with it.

Browse the gallery

06Pricing & licensing

Commercial use of generated images

Apache-2.0

The recorded model-stack finding: Bonsai, its FLUX.2 [klein] 4B base, and its Qwen3-4B text encoder are all licensed Apache-2.0, which imposes no non-commercial restriction on generated outputs. Generated images therefore carry no license restriction from the model stack.

Listings are sold under a non-exclusive license — the creator keeps the right to relist or reuse.
Each image ships with its full provenance (model, variant, base, seed, sampler) recorded at generation time.
The model-stack finding above is informational, not legal advice.

Pick a variant

Type a prompt

Generate on your GPU