feat: Add inference support for MiniT2I model. by KenForever1 · Pull Request #1683 · leejet/stable-diffusion.cpp

KenForever1 · 2026-06-19T11:02:08Z

Summary

Add inference support for MiniT2I in stable-diffusion.cpp.

This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.

Changes

Add MiniT2I model type detection and loading path.
Add MiniT2I::MiniT2IRunner implementation for MMJiT-style diffusion inference.
Add MiniT2I conditioner path using google/flan-t5-large.
Add MiniT2I sampling path with conditional/unconditional forward and CFG update.
Add backend support needed by MiniT2I graph execution.
Cache MiniT2I positional embeddings, text RoPE, and vision/joint RoPE in runner-level backend buffers.
Remove unused t_vec + pooled_text conditioning branch that is not consumed by the current MiniT2I graph.

Commits

b9493fa Add MiniT2I inference support
8de8f95 Optimize MiniT2I position cache
dfb6ca2 Remove unused MiniT2I conditioning branch

Models Used

MiniT2I diffusion model:

Model: MiniT2I/minit2i-b-16
Weight: transformer/diffusion_pytorch_model.safetensors

Text encoder:

Model: google/flan-t5-large
Weight: model.safetensors

Test Commands

Mac Metal test:

cd stable-diffusion.cpp

./build/bin/sd-cli \
  --backend metal \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /private/tmp/minit2i_metal.png \
  --threads 8

CUDA with diffusion flash attention:

cd stable-diffusion.cpp

./build-cuda/bin/sd-cli \
  --backend cuda \
  --diffusion-fa \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /tmp/minit2i_cuda_diffusion_fa.png \
  --threads 8

Validation Notes

MiniT2I generation succeeds on CUDA and Metal.
Position/RoPE cache optimization preserves model batch semantics.
Removing the unused conditioning branch produced identical output in local validation.
CUDA --diffusion-fa works with MiniT2I and reduces stable diffusion forward time significantly.

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.

Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.

leejet

Please follow the existing model docs and examples to add documentation and examples for MiniT2I.

leejet · 2026-06-26T13:57:22Z

+        }
+    };
+
+    inline std::string resolve_prefix(const String2TensorStorage& tensor_storage_map, const std::string& requested) {


Please use --diffusion-model to specify the model path instead of handling the prefix here.

leejet · 2026-06-26T13:59:03Z

            tensor_storage_map.find("model.diffusion_model.transformer_blocks.0.img_mlp.w1.weight") != tensor_storage_map.end()) {
            return VERSION_LENS;
        }
+        if ((tensor_storage_map.find("model.net.img_embedder.proj1.weight") != tensor_storage_map.end() &&


Checking for one representative weight name should be enough; there’s no need to check too many.

leejet · 2026-06-26T14:12:00Z

        int64_t last_progress_us     = ggml_time_us();
+        SamplePreviewContext preview = prepare_sample_preview_context();
+
+        if (sd_version_is_minit2i(version)) {


The generic sampling flow should be used, rather than adding a separate sampling branch.

KenForever1 added 3 commits June 18, 2026 15:49

Add MiniT2I inference support

b9493fa

Optimize MiniT2I position cache

8de8f95

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.

leejet requested changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add inference support for MiniT2I model.#1683

feat: Add inference support for MiniT2I model.#1683
KenForever1 wants to merge 3 commits into
leejet:masterfrom
KenForever1:master

KenForever1 commented Jun 19, 2026

Uh oh!

leejet left a comment

Uh oh!

leejet Jun 26, 2026

Uh oh!

leejet Jun 26, 2026

Uh oh!

leejet Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KenForever1 commented Jun 19, 2026

Summary

Changes

Commits

Models Used

Test Commands

Validation Notes

Uh oh!

leejet left a comment

Choose a reason for hiding this comment

Uh oh!

leejet Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

leejet Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

leejet Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants