r/StableDiffusion

Flux2 klein 9B kv multi image reference

https://redd.it/1ryos4b
@rStableDiffusion

From the StableDiffusion community on Reddit: Flux2 klein 9B kv multi image reference

Explore this post and more from the StableDiffusion community

8 views12:40

r/StableDiffusion

7 views12:40

r/StableDiffusion

Simple Anima SEGS tiled upscale workflow (works with most models)

https://redd.it/1rye0p1
@rStableDiffusion

From the StableDiffusion community on Reddit: Simple Anima SEGS tiled upscale workflow (works with most models)

Explore this post and more from the StableDiffusion community

8 views13:40

r/StableDiffusion

7 views13:40

r/StableDiffusion

Ubisoft Chord PBR Material Estimation

I hadn't seen this mentioned anywhere, but Ubisoft has an open source model to make a PBR material from any image. It seems pretty amazing and already integrated into comfyui!

I found it by having this video come up on my youtube feed
https://www.youtube.com/watch?v=rE1M8_FaXtk

It seems pretty amazing: https://github.com/ubisoft/ubisoft-laforge-chord

https://github.com/ubisoft/ComfyUI-Chord?tab=readme-ov-file

https://redd.it/1ryvqpj
@rStableDiffusion

YouTube

Can Ubisoft’s CHORD Model Replace Substance Designer? I Tested It

Learn how to generate AAA-quality PBR materials using Ubisoft’s CHORD model inside @comfyorg

In this tutorial, I walk through the full workflow — from setup to output — showing how CHORD can create stunning textures ready for Unreal Engine and other real…

8 views14:40

r/StableDiffusion

Inpainting in 3 commands: remove objects or add accessories with any base model, no dedicated inpaint model needed

https://redd.it/1ryvv5p
@rStableDiffusion

From the StableDiffusion community on Reddit: Inpainting in 3 commands: remove objects or add accessories with any base model,…

Explore this post and more from the StableDiffusion community

6 views15:40

r/StableDiffusion

6 views15:40

r/StableDiffusion

PSA: Use the official LTX 2.3 workflow, not the ComfyUI included one. It's significantly better.

Most of the time I rely on the default ComfyUI workflows. They're producing results just as good as 90% of the overly-complicated workflows I see floating around online. So I was fighting with the default Comfy LTX 2.3 template for a while, just not getting anything good. Saw someone mention the official LTX workflows and figured I'd give it a try.

Yeah, huge difference. Easily makes LTX blow past WAN 2.2 into SOTA territory for me. So something's up with the Comfy default workflow.

If you're having issues with weird LTX 2 or LTX 2.3 generations, use the official workflow instead:

https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example\_workflows/2.3/LTX-2.3\_T2V\_I2V\_Single\_Stage\_Distilled\_Full.json

This runs the distilled and non-distilled at the same time. I find they pretty evenly trade blows to give me what I'm looking for, so I just left it as generating both.

https://redd.it/1rz1u3j
@rStableDiffusion

GitHub

ComfyUI-LTXVideo/example_workflows/2.3/LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json at master · Lightricks/ComfyUI-LTXVideo

LTX-Video Support for ComfyUI. Contribute to Lightricks/ComfyUI-LTXVideo development by creating an account on GitHub.

6 views17:40

r/StableDiffusion

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

ComfyUI Nodes for Filmmaking (LTX 2.3 Shot Sequencing, Keyframing, First Frame/Last Frame)

https://redd.it/1rz355d
@rStableDiffusion

6 views18:40

r/StableDiffusion

Nvidia SANA Video 2B

https://www.youtube.com/watch?list=TLGG-iNIhzqJ0OgyMDAzMjAyNg&v=7eNfDzA4yBs

Efficient-Large-Model/SANA-Video\_2B\_720p · Hugging Face

SANA-Video is a small, ultra-efficient diffusion model designed for rapid generation of high-quality, minute-long videos at resolutions up to 720×1280.

Key innovations and efficiency drivers include:

(1) Linear DiT: Leverages linear attention as the core operation, offering significantly more efficiency than vanilla attention when processing the massive number of tokens required for video generation.

(2) Constant-Memory KV Cache for Block Linear Attention: Implements a block-wise autoregressive approach that uses the cumulative properties of linear attention to maintain global context at a fixed memory cost, eliminating the traditional KV cache bottleneck and enabling efficient, minute-long video synthesis.

SANA-Video achieves exceptional efficiency and cost savings: its training cost is only 1% of MovieGen's (12 days on 64 H100 GPUs). Compared to modern state-of-the-art small diffusion models (e.g., Wan 2.1 and SkyReel-V2), SANA-Video maintains competitive performance while being 16× faster in measured latency. SANA-Video is deployable on RTX 5090 GPUs, accelerating the inference speed for a 5-second 720p video from 71s down to 29s (2.4× speedup), setting a new standard for low-cost, high-quality video generation.

More comparison samples here: SANA Video

https://redd.it/1rz153l
@rStableDiffusion

YouTube

SANA-Video Quality Comparison

This video shows how good can SANA-Video a pure linear transformer diffuison model can do.

6 views19:40

r/StableDiffusion

Training Lora with Ai Toolkit (about resolution)
https://redd.it/1rz5ifb
@rStableDiffusion

6 views20:40

r/StableDiffusion

Have you tried fish audio S2Pro?

What is your experience with it? Do you think it can compete with Elevenlabs?
I have tried it and it is 80% as good as Elevenlabs.

https://redd.it/1rz7wjh
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views21:40

r/StableDiffusion

I built a tool that creates LoRAs from images without any training — no gradient descent, no loss curves, no hyperparameters. Dataset in, LoRA out, 1-5 minutes.

I've been building an AI video production pipeline on 4×RTX 4090s and got frustrated with how long LoRA training takes. So I built NeuralGraft, which includes a new feature called LoRA Forge that constructs LoRAs from a folder of images using pure linear algebra — no training loop at all.

**How it works in 30 seconds:**

You give it a folder of images (10-100) and a base model checkpoint. It:

1. Extracts a "concept signature" from your images (81 visual features: color palette, texture, spatial frequency, contrast, structure)

2. Projects your images through each transformer block's weights

3. Discovers which activation directions encode your concept via closed-form regression

4. Constructs standard LoRA matrices (B @ A) from those directions via SVD

5. Outputs a standard .safetensors LoRA you can use in ComfyUI, diffusers, A1111 — anywhere

**CLI is one command:**

neuralgraft forge \\

\--base model.safetensors \\

\--images ./my_cinematic_shots/ \\

\--output cinematic-lora.safetensors \\

\--rank 16 \\

\--trigger-word "cinematic"

**What it's actually good for:**

\- Art style transfer (give it 20 frames from a film → get its visual style as a LoRA)

\- Color grading (reference color-graded images → color grading LoRA)

\- Texture/material quality (skin texture, fabric, surfaces)

\- Lighting mood (warm sunset, cold blue, neon)

\- Camera characteristics (specific lens look, DoF style)

**What it honestly struggles with (not trying to oversell):**

\- Specific face identity — faces are highly non-linear, use DreamBooth for that

\- Very fine character details (specific clothing patterns, logos)

\- Concepts the base model has never seen at all

**The math (for the curious):**

LoRA training discovers weight modification directions via gradient descent over thousands of steps. NeuralGraft discovers the same directions via closed-form linear regression on SVD-decomposed weights. Same result, different path — seconds of math instead of hours of training.

LoRA training: ΔW = B @ A (rank-r, learned over thousands of steps)

NeuralGraft: ΔW = U @ diag(d) @ V\^T (rank-k, computed in one SVD)

**Other things NeuralGraft can do:**

\- Permanently bake LoRAs into model weights (zero runtime overhead)

\- Graft capabilities from one model architecture into another (e.g., WAN 2.2 motion quality → LTX 2.3)

\- Spectral amplification (boost LoRA-improved directions in base weights)

Works with any DiT-based model: LTX Video, FLUX, SD3, HunyuanVideo, WAN, PixArt.

**Repo:** https://github.com/alokickstudios-coder/neuralgraft

**License:** Apache 2.0 (fully open source)

Built this primarily for video generation (LTX 2.3) but it works for image models too. Happy to answer questions about the approach or limitations.

https://redd.it/1rza04z
@rStableDiffusion

GitHub

GitHub - alokickstudios-coder/neuralgraft: Zero-training capability transfer & LoRA construction for diffusion models. Forge LoRAs…

Zero-training capability transfer & LoRA construction for diffusion models. Forge LoRAs from images without training. Graft capabilities across architectures. Hours of model training in min...

5 views22:40

r/StableDiffusion

SAMA 14b - Video Editing Model based off Wan 2.1 (Apache 2.0)

https://github.com/Cynthiazxy123/SAMA
https://huggingface.co/syxbb/SAMA-14B

https://redd.it/1rzauw4
@rStableDiffusion

GitHub

GitHub - Cynthiazxy123/SAMA: Official inference code for SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction…

Official inference code for SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing. - Cynthiazxy123/SAMA

5 views23:40

About

Blog

Apps

Platform