PSA: Use the official LTX 2.3 workflow, not the ComfyUI included one. It's significantly better.

Most of the time I rely on the default ComfyUI workflows. They're producing results just as good as 90% of the overly-complicated workflows I see floating around online. So I was fighting with the default Comfy LTX 2.3 template for a while, just not getting anything good. Saw someone mention the official LTX workflows and figured I'd give it a try.

Yeah, huge difference. Easily makes LTX blow past WAN 2.2 into SOTA territory for me. So something's up with the Comfy default workflow.

If you're having issues with weird LTX 2 or LTX 2.3 generations, use the official workflow instead:

https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example\_workflows/2.3/LTX-2.3\_T2V\_I2V\_Single\_Stage\_Distilled\_Full.json

This runs the distilled and non-distilled at the same time. I find they pretty evenly trade blows to give me what I'm looking for, so I just left it as generating both.

https://redd.it/1rz1u3j
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
ComfyUI Nodes for Filmmaking (LTX 2.3 Shot Sequencing, Keyframing, First Frame/Last Frame)

https://redd.it/1rz355d
@rStableDiffusion
Nvidia SANA Video 2B



https://www.youtube.com/watch?list=TLGG-iNIhzqJ0OgyMDAzMjAyNg&v=7eNfDzA4yBs

Efficient-Large-Model/SANA-Video\_2B\_720p · Hugging Face

SANA-Video is a small, ultra-efficient diffusion model designed for rapid generation of high-quality, minute-long videos at resolutions up to 720×1280.

Key innovations and efficiency drivers include:

(1) Linear DiT: Leverages linear attention as the core operation, offering significantly more efficiency than vanilla attention when processing the massive number of tokens required for video generation.

(2) Constant-Memory KV Cache for Block Linear Attention: Implements a block-wise autoregressive approach that uses the cumulative properties of linear attention to maintain global context at a fixed memory cost, eliminating the traditional KV cache bottleneck and enabling efficient, minute-long video synthesis.

SANA-Video achieves exceptional efficiency and cost savings: its training cost is only 1% of MovieGen's (12 days on 64 H100 GPUs). Compared to modern state-of-the-art small diffusion models (e.g., Wan 2.1 and SkyReel-V2), SANA-Video maintains competitive performance while being 16× faster in measured latency. SANA-Video is deployable on RTX 5090 GPUs, accelerating the inference speed for a 5-second 720p video from 71s down to 29s (2.4× speedup), setting a new standard for low-cost, high-quality video generation.


More comparison samples here: SANA Video

https://redd.it/1rz153l
@rStableDiffusion
Training Lora with Ai Toolkit (about resolution)
https://redd.it/1rz5ifb
@rStableDiffusion
Have you tried fish audio S2Pro?

What is your experience with it? Do you think it can compete with Elevenlabs?
I have tried it and it is 80% as good as Elevenlabs.

https://redd.it/1rz7wjh
@rStableDiffusion
I built a tool that creates LoRAs from images without any training — no gradient descent, no loss curves, no hyperparameters. Dataset in, LoRA out, 1-5 minutes.

I've been building an AI video production pipeline on 4×RTX 4090s and got frustrated with how long LoRA training takes. So I built NeuralGraft, which includes a new feature called LoRA Forge that constructs LoRAs from a folder of images using pure linear algebra — no training loop at all.

**How it works in 30 seconds:**

You give it a folder of images (10-100) and a base model checkpoint. It:

1. Extracts a "concept signature" from your images (81 visual features: color palette, texture, spatial frequency, contrast, structure)

2. Projects your images through each transformer block's weights

3. Discovers which activation directions encode your concept via closed-form regression

4. Constructs standard LoRA matrices (B @ A) from those directions via SVD

5. Outputs a standard .safetensors LoRA you can use in ComfyUI, diffusers, A1111 — anywhere

**CLI is one command:**

neuralgraft forge \\

\--base model.safetensors \\

\--images ./my_cinematic_shots/ \\

\--output cinematic-lora.safetensors \\

\--rank 16 \\

\--trigger-word "cinematic"

**What it's actually good for:**

\- Art style transfer (give it 20 frames from a film → get its visual style as a LoRA)

\- Color grading (reference color-graded images → color grading LoRA)

\- Texture/material quality (skin texture, fabric, surfaces)

\- Lighting mood (warm sunset, cold blue, neon)

\- Camera characteristics (specific lens look, DoF style)

**What it honestly struggles with (not trying to oversell):**

\- Specific face identity — faces are highly non-linear, use DreamBooth for that

\- Very fine character details (specific clothing patterns, logos)

\- Concepts the base model has never seen at all

**The math (for the curious):**

LoRA training discovers weight modification directions via gradient descent over thousands of steps. NeuralGraft discovers the same directions via closed-form linear regression on SVD-decomposed weights. Same result, different path — seconds of math instead of hours of training.

LoRA training: ΔW = B @ A (rank-r, learned over thousands of steps)

NeuralGraft: ΔW = U @ diag(d) @ V\^T (rank-k, computed in one SVD)

**Other things NeuralGraft can do:**

\- Permanently bake LoRAs into model weights (zero runtime overhead)

\- Graft capabilities from one model architecture into another (e.g., WAN 2.2 motion quality → LTX 2.3)

\- Spectral amplification (boost LoRA-improved directions in base weights)

Works with any DiT-based model: LTX Video, FLUX, SD3, HunyuanVideo, WAN, PixArt.

**Repo:** https://github.com/alokickstudios-coder/neuralgraft

**License:** Apache 2.0 (fully open source)

Built this primarily for video generation (LTX 2.3) but it works for image models too. Happy to answer questions about the approach or limitations.

https://redd.it/1rza04z
@rStableDiffusion