feat: SeFi-Image support by fszontagh · Pull Request #1707 · leejet/stable-diffusion.cpp

fszontagh · 2026-06-24T20:01:27Z

Summary

Adds inference support for SeFi-Image, a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: arXiv:2606.22568. See docs/sefi_image.md.

What's in:

VERSION_SEFI_IMAGE + version detection
Dual-time embedding block (semantic_embedder + texture_embedder, concat)
Per-stream Euler sampler with alpha-shift + delta_t
SeFi-aware Qwen3-VL conditioning (chat template, layers 9/18/27)
VAE BN normalization on packed texture latents
script/convert_sefi.py for converting diffusers checkpoint to single sd.cpp safetensors
--extra-sample-args sefi_alpha=0.3 / sefi_delta_t=0.1 overrides
Filename heuristic: turbo in path => alpha=1.0, else alpha=0.3

Related Issue / Discussion

Closes #1702.

Additional Information

Example

./build/bin/sd-cli \
  --model /path/to/sefi_1b_turbo.safetensors \
  --llm   /path/to/qwen3_vl_2b.safetensors \
  -p "a photograph of an orange tabby cat sitting on a couch" \
  --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
  --diffusion-fa --offload-to-cpu \
  -o out.png

Tested variants (all 7 from huggingface.co/SeFi-Image)

Variant	Encoder	Baseline (12GB VRAM)	`--max-vram 8 --stream-layers`
1B-Base	qwen3_vl_2b	ok 109s	ok 172s
1B-turbo	qwen3_vl_2b	ok 14s	ok 17s
2B-Base	qwen3_vl_2b	ok 229s	ok 296s
2B-turbo	qwen3_vl_2b	ok 29s	ok 25s
5B-Base	qwen3_vl_4b	OOM	ok 563s
5B-turbo	qwen3_vl_4b	OOM	ok 170s
5B-RL	qwen3_vl_4b	OOM	ok 587s

5B variants use Qwen3-VL-4B-Instruct as the text encoder (1B/2B use 2B). 5B needs streaming on 12GB-class GPUs.

Checklist

I have read and confirmed this PR follows the contribution guidelines.

GreenShadows · 2026-06-24T20:07:56Z

The quality seems surprisingly good for such a small model.

sz1kormar · 2026-06-25T15:30:26Z

Speeds are amazing on turbo given that you only need 4 steps and 1 cfg to pull it off. @fszontagh could you attach some of your images here to show off SeFi-Image as an independent tester ?

fszontagh · 2026-06-25T15:56:32Z

Speeds are amazing on turbo given that you only need 4 steps and 1 cfg to pull it off. @fszontagh could you attach some of your images here to show off SeFi-Image as an independent tester ?

I started the changes which are required by leejet. After i will drop some images here.

JohnLoveJoy · 2026-06-25T17:55:27Z

It looks really good. These are images I got from Reddit.

fszontagh · 2026-06-25T18:15:00Z

./build/bin/sd-cli \
  --diffusion-model /data/SD_MODELS/diffusion_models/sefi/sefi_5b_turbo.safetensors \
  --vae /data/SD_MODELS/vae/flux2_ae_from_sefi.safetensors \
  --llm /data/SD_MODELS/Text-encoder/sefi/qwen3_vl_4b.safetensors \
  -p "a photograph of an orange tabby cat sitting on a couch" \
  --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
  --extra-sample-args sefi_alpha=1.0 \
  --diffusion-fa --max-vram 8 --stream-layers --offload-to-cpu \
  -o /tmp/sefi_refactored.png

fszontagh · 2026-06-25T18:46:10Z

Pushed the rework. Per-comment:

auto_encoder_kl.hpp:516 (bn) - confirmed SeFi uses the standard Flux2 VAE. The bn.running_mean / bn.running_var weights match the hardcoded get_latents_mean_std constants within bf16. Dropped the SeFi-specific bn.* params, sefi_bn_apply, and the encode/decode branches. SeFi now uses the Flux2 VAE file directly (flux2_ae.safetensors); convert_sefi.py no longer emits a VAE file. diffusion_to_vae_latents still slices the 16 semantic channels before the standard Flux2 denormalize.
stable-diffusion.cpp:1341 (denoiser fork) - added SEFI_FLOW_PRED to the prediction_t enum. SeFi version maps to it. FLUX2_FLOW_PRED case is now version-agnostic; SEFI_FLOW_PRED case constructs SefiFlowDenoiser.
stable-diffusion.cpp:1348 (turbo knob) - removed the filename heuristic. Default timestep_shift_alpha = 1.0 (identity); base/RL pass --extra-sample-args sefi_alpha=0.3. No kAlphaTurbo / kAlphaBase constants left.
stable-diffusion.cpp:2084 (dual-time override) - moved into process_timesteps. process_timesteps now takes a step arg; the SeFi branch returns {sem_timesteps[step], tex_timesteps[step]}. Sample loop is back to a single process_timesteps call.
README.md:18 (Important news) - removed the SeFi-Image line.
name_conversion.cpp:1210 (prefixes) - removed the backbone. / dual_time_embed. prefix injection. convert_sefi.py already emits canonical model.diffusion_model.* keys.

Smoke matrix after rework: 5B-turbo (82 s), 5B-Base (510 s), 5B-RL (567 s) all produce the same orange tabby as before. Within ~3% of pre-rework wallclock.

Adds inference support for SeFi-Image (https://huggingface.co/SeFi-Image), a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: https://arxiv.org/abs/2606.22568. - VERSION_SEFI_IMAGE + SEFI_FLOW_PRED + version detection from weights - Dual-time embedding block (semantic + texture, concat) - SefiFlowDenoiser with alpha-shift + delta_t, dual-time override fed via process_timesteps; alpha exposed as --extra-sample-args sefi_alpha - Qwen3-VL conditioning (chat template, layers 9/18/27) - Reuses standard Flux2 VAE; semantic channels sliced in diffusion_to_vae_latents before the existing get_latents_mean_std path - script/convert_sefi.py emits transformer-only safetensors with canonical model.diffusion_model.* keys; VAE comes from flux2_ae

fszontagh · 2026-06-25T19:06:45Z

5b rl

5b base

a lovely cat holding a sign says 'SeFi.cpp' with 5b turbo: (~148s)

 ./build/bin/sd-cli \
    --diffusion-model /data/SD_MODELS/diffusion_models/sefi/sefi_5b_turbo.safetensors \
    --vae /data/SD_MODELS/vae/flux2_ae_from_sefi.safetensors \
    --llm /data/SD_MODELS/Text-encoder/sefi/qwen3_vl_4b.safetensors \
    -p "a lovely cat holding a sign says 'SeFi.cpp'" \
    --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
    --extra-sample-args sefi_alpha=1.0 \
    --diffusion-fa --max-vram 8 --stream-layers --offload-to-cpu \
    -o /tmp/sefi_sign.png

a lovely cat holding a sign says 'SeFi.cpp' with 5b base: (~540s)

./build/bin/sd-cli \
  --diffusion-model /data/SD_MODELS/diffusion_models/sefi/sefi_5b_base.safetensors \
  --vae /data/SD_MODELS/vae/flux2_ae_from_sefi.safetensors \
  --llm /data/SD_MODELS/Text-encoder/sefi/qwen3_vl_4b.safetensors \
  -p "a lovely cat holding a sign says 'SeFi.cpp'" \
  --cfg-scale 4.0 --steps 50 -W 1024 -H 1024 -s 42 \
  --extra-sample-args sefi_alpha=0.3 \
  --diffusion-fa --max-vram 8 --stream-layers --offload-to-cpu \
  -o /tmp/sefi_sign_base.png

leejet requested changes Jun 25, 2026

View reviewed changes

Comment thread src/model/vae/auto_encoder_kl.hpp Outdated

Comment thread src/stable-diffusion.cpp Outdated

Comment thread src/stable-diffusion.cpp Outdated

Comment thread src/stable-diffusion.cpp Outdated

Comment thread README.md Outdated

Comment thread src/name_conversion.cpp Outdated

fszontagh force-pushed the feat/sefi-image-prototype branch from 63ef957 to f27063d Compare June 25, 2026 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SeFi-Image support#1707

feat: SeFi-Image support#1707
fszontagh wants to merge 1 commit into
leejet:masterfrom
fszontagh:feat/sefi-image-prototype

fszontagh commented Jun 24, 2026 •

edited

Loading

Uh oh!

GreenShadows commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sz1kormar commented Jun 25, 2026

Uh oh!

fszontagh commented Jun 25, 2026

Uh oh!

JohnLoveJoy commented Jun 25, 2026

Uh oh!

fszontagh commented Jun 25, 2026 •

edited

Loading

Uh oh!

fszontagh commented Jun 25, 2026

Uh oh!

fszontagh commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

fszontagh commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue / Discussion

Additional Information

Example

Tested variants (all 7 from huggingface.co/SeFi-Image)

Checklist

Uh oh!

GreenShadows commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sz1kormar commented Jun 25, 2026

Uh oh!

fszontagh commented Jun 25, 2026

Uh oh!

JohnLoveJoy commented Jun 25, 2026

Uh oh!

fszontagh commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fszontagh commented Jun 25, 2026

Uh oh!

fszontagh commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fszontagh commented Jun 24, 2026 •

edited

Loading

fszontagh commented Jun 25, 2026 •

edited

Loading

fszontagh commented Jun 25, 2026 •

edited

Loading