Skip to content

feat: Add inference support for MiniT2I model.#1683

Open
KenForever1 wants to merge 3 commits into
leejet:masterfrom
KenForever1:master
Open

feat: Add inference support for MiniT2I model.#1683
KenForever1 wants to merge 3 commits into
leejet:masterfrom
KenForever1:master

Conversation

@KenForever1

Copy link
Copy Markdown

Summary

Add inference support for MiniT2I in stable-diffusion.cpp.

This PR adds a MiniT2I diffusion runner, T5/flan-t5 text conditioning integration, model detection/loading support, and MiniT2I-specific sampling flow. It also caches step-invariant positional embeddings/RoPE tensors and removes an unused conditioning branch after validating output consistency.

Changes

  • Add MiniT2I model type detection and loading path.
  • Add MiniT2I::MiniT2IRunner implementation for MMJiT-style diffusion inference.
  • Add MiniT2I conditioner path using google/flan-t5-large.
  • Add MiniT2I sampling path with conditional/unconditional forward and CFG update.
  • Add backend support needed by MiniT2I graph execution.
  • Cache MiniT2I positional embeddings, text RoPE, and vision/joint RoPE in runner-level backend buffers.
  • Remove unused t_vec + pooled_text conditioning branch that is not consumed by the current MiniT2I graph.

Commits

  • b9493fa Add MiniT2I inference support
  • 8de8f95 Optimize MiniT2I position cache
  • dfb6ca2 Remove unused MiniT2I conditioning branch

Models Used

MiniT2I diffusion model:

  • Model: MiniT2I/minit2i-b-16
  • Weight: transformer/diffusion_pytorch_model.safetensors

Text encoder:

  • Model: google/flan-t5-large
  • Weight: model.safetensors

Test Commands

Mac Metal test:

cd stable-diffusion.cpp

./build/bin/sd-cli \
  --backend metal \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /private/tmp/minit2i_metal.png \
  --threads 8

CUDA with diffusion flash attention:

cd stable-diffusion.cpp

./build-cuda/bin/sd-cli \
  --backend cuda \
  --diffusion-fa \
  --model MiniT2I/MiniT2I/minit2i-b-16/transformer/diffusion_pytorch_model.safetensors \
  --t5xxl google/flan-t5-large/model.safetensors \
  --prompt "a cat" \
  --steps 100 \
  --cfg-scale 6 \
  --width 512 \
  --height 512 \
  --seed 42 \
  --sampling-method euler \
  --rng cpu \
  --output /tmp/minit2i_cuda_diffusion_fa.png \
  --threads 8

Validation Notes

  • MiniT2I generation succeeds on CUDA and Metal.
  • Position/RoPE cache optimization preserves model batch semantics.
  • Removing the unused conditioning branch produced identical output in local validation.
  • CUDA --diffusion-fa works with MiniT2I and reduces stable diffusion forward time significantly.

Cache MiniT2I positional embeddings and text/vision RoPE tensors in a runner-level backend buffer. This avoids regenerating and uploading the same step-invariant constants for every denoise graph while preserving model batch semantics.
Drop the unused timestep and pooled-text vec path from MiniT2I graph construction. The Python reference currently passes this vec through unused block/final-layer parameters, and local validation produced identical output hashes before and after the cleanup.

@leejet leejet left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the existing model docs and examples to add documentation and examples for MiniT2I.

}
};

inline std::string resolve_prefix(const String2TensorStorage& tensor_storage_map, const std::string& requested) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use --diffusion-model to specify the model path instead of handling the prefix here.

Comment thread src/model_loader.cpp
tensor_storage_map.find("model.diffusion_model.transformer_blocks.0.img_mlp.w1.weight") != tensor_storage_map.end()) {
return VERSION_LENS;
}
if ((tensor_storage_map.find("model.net.img_embedder.proj1.weight") != tensor_storage_map.end() &&

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for one representative weight name should be enough; there’s no need to check too many.

Comment thread src/stable-diffusion.cpp
int64_t last_progress_us = ggml_time_us();
SamplePreviewContext preview = prepare_sample_preview_context();

if (sd_version_is_minit2i(version)) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic sampling flow should be used, rather than adding a separate sampling branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants