feat: SeFi-Image support#1707
Open
fszontagh wants to merge 1 commit into
Open
Conversation
|
The quality seems surprisingly good for such a small model. |
leejet
requested changes
Jun 25, 2026
|
Speeds are amazing on turbo given that you only need 4 steps and 1 cfg to pull it off. @fszontagh could you attach some of your images here to show off SeFi-Image as an independent tester ? |
Contributor
Author
I started the changes which are required by leejet. After i will drop some images here. |
Contributor
Author
Contributor
Author
|
Pushed the rework. Per-comment:
Smoke matrix after rework: 5B-turbo (82 s), 5B-Base (510 s), 5B-RL (567 s) all produce the same orange tabby as before. Within ~3% of pre-rework wallclock. |
Adds inference support for SeFi-Image (https://huggingface.co/SeFi-Image), a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: https://arxiv.org/abs/2606.22568. - VERSION_SEFI_IMAGE + SEFI_FLOW_PRED + version detection from weights - Dual-time embedding block (semantic + texture, concat) - SefiFlowDenoiser with alpha-shift + delta_t, dual-time override fed via process_timesteps; alpha exposed as --extra-sample-args sefi_alpha - Qwen3-VL conditioning (chat template, layers 9/18/27) - Reuses standard Flux2 VAE; semantic channels sliced in diffusion_to_vae_latents before the existing get_latents_mean_std path - script/convert_sefi.py emits transformer-only safetensors with canonical model.diffusion_model.* keys; VAE comes from flux2_ae
63ef957 to
f27063d
Compare
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.







Summary
Adds inference support for SeFi-Image, a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: arXiv:2606.22568. See docs/sefi_image.md.
What's in:
VERSION_SEFI_IMAGE+ version detectionsemantic_embedder+texture_embedder, concat)script/convert_sefi.pyfor converting diffusers checkpoint to single sd.cpp safetensors--extra-sample-args sefi_alpha=0.3/sefi_delta_t=0.1overridesturboin path => alpha=1.0, else alpha=0.3Related Issue / Discussion
Closes #1702.
Additional Information
Example
./build/bin/sd-cli \ --model /path/to/sefi_1b_turbo.safetensors \ --llm /path/to/qwen3_vl_2b.safetensors \ -p "a photograph of an orange tabby cat sitting on a couch" \ --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \ --diffusion-fa --offload-to-cpu \ -o out.pngTested variants (all 7 from huggingface.co/SeFi-Image)
--max-vram 8 --stream-layers5B variants use
Qwen3-VL-4B-Instructas the text encoder (1B/2B use 2B). 5B needs streaming on 12GB-class GPUs.Checklist