Have you tried fish audio S2Pro?
What is your experience with it? Do you think it can compete with Elevenlabs?
I have tried it and it is 80% as good as Elevenlabs.
https://redd.it/1rz7wjh
@rStableDiffusion
What is your experience with it? Do you think it can compete with Elevenlabs?
I have tried it and it is 80% as good as Elevenlabs.
https://redd.it/1rz7wjh
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I built a tool that creates LoRAs from images without any training — no gradient descent, no loss curves, no hyperparameters. Dataset in, LoRA out, 1-5 minutes.
I've been building an AI video production pipeline on 4×RTX 4090s and got frustrated with how long LoRA training takes. So I built NeuralGraft, which includes a new feature called LoRA Forge that constructs LoRAs from a folder of images using pure linear algebra — no training loop at all.
**How it works in 30 seconds:**
You give it a folder of images (10-100) and a base model checkpoint. It:
1. Extracts a "concept signature" from your images (81 visual features: color palette, texture, spatial frequency, contrast, structure)
2. Projects your images through each transformer block's weights
3. Discovers which activation directions encode your concept via closed-form regression
4. Constructs standard LoRA matrices (B @ A) from those directions via SVD
5. Outputs a standard .safetensors LoRA you can use in ComfyUI, diffusers, A1111 — anywhere
**CLI is one command:**
neuralgraft forge \\
\--base model.safetensors \\
\--images ./my_cinematic_shots/ \\
\--output cinematic-lora.safetensors \\
\--rank 16 \\
\--trigger-word "cinematic"
**What it's actually good for:**
\- Art style transfer (give it 20 frames from a film → get its visual style as a LoRA)
\- Color grading (reference color-graded images → color grading LoRA)
\- Texture/material quality (skin texture, fabric, surfaces)
\- Lighting mood (warm sunset, cold blue, neon)
\- Camera characteristics (specific lens look, DoF style)
**What it honestly struggles with (not trying to oversell):**
\- Specific face identity — faces are highly non-linear, use DreamBooth for that
\- Very fine character details (specific clothing patterns, logos)
\- Concepts the base model has never seen at all
**The math (for the curious):**
LoRA training discovers weight modification directions via gradient descent over thousands of steps. NeuralGraft discovers the same directions via closed-form linear regression on SVD-decomposed weights. Same result, different path — seconds of math instead of hours of training.
LoRA training: ΔW = B @ A (rank-r, learned over thousands of steps)
NeuralGraft: ΔW = U @ diag(d) @ V\^T (rank-k, computed in one SVD)
**Other things NeuralGraft can do:**
\- Permanently bake LoRAs into model weights (zero runtime overhead)
\- Graft capabilities from one model architecture into another (e.g., WAN 2.2 motion quality → LTX 2.3)
\- Spectral amplification (boost LoRA-improved directions in base weights)
Works with any DiT-based model: LTX Video, FLUX, SD3, HunyuanVideo, WAN, PixArt.
**Repo:** https://github.com/alokickstudios-coder/neuralgraft
**License:** Apache 2.0 (fully open source)
Built this primarily for video generation (LTX 2.3) but it works for image models too. Happy to answer questions about the approach or limitations.
https://redd.it/1rza04z
@rStableDiffusion
I've been building an AI video production pipeline on 4×RTX 4090s and got frustrated with how long LoRA training takes. So I built NeuralGraft, which includes a new feature called LoRA Forge that constructs LoRAs from a folder of images using pure linear algebra — no training loop at all.
**How it works in 30 seconds:**
You give it a folder of images (10-100) and a base model checkpoint. It:
1. Extracts a "concept signature" from your images (81 visual features: color palette, texture, spatial frequency, contrast, structure)
2. Projects your images through each transformer block's weights
3. Discovers which activation directions encode your concept via closed-form regression
4. Constructs standard LoRA matrices (B @ A) from those directions via SVD
5. Outputs a standard .safetensors LoRA you can use in ComfyUI, diffusers, A1111 — anywhere
**CLI is one command:**
neuralgraft forge \\
\--base model.safetensors \\
\--images ./my_cinematic_shots/ \\
\--output cinematic-lora.safetensors \\
\--rank 16 \\
\--trigger-word "cinematic"
**What it's actually good for:**
\- Art style transfer (give it 20 frames from a film → get its visual style as a LoRA)
\- Color grading (reference color-graded images → color grading LoRA)
\- Texture/material quality (skin texture, fabric, surfaces)
\- Lighting mood (warm sunset, cold blue, neon)
\- Camera characteristics (specific lens look, DoF style)
**What it honestly struggles with (not trying to oversell):**
\- Specific face identity — faces are highly non-linear, use DreamBooth for that
\- Very fine character details (specific clothing patterns, logos)
\- Concepts the base model has never seen at all
**The math (for the curious):**
LoRA training discovers weight modification directions via gradient descent over thousands of steps. NeuralGraft discovers the same directions via closed-form linear regression on SVD-decomposed weights. Same result, different path — seconds of math instead of hours of training.
LoRA training: ΔW = B @ A (rank-r, learned over thousands of steps)
NeuralGraft: ΔW = U @ diag(d) @ V\^T (rank-k, computed in one SVD)
**Other things NeuralGraft can do:**
\- Permanently bake LoRAs into model weights (zero runtime overhead)
\- Graft capabilities from one model architecture into another (e.g., WAN 2.2 motion quality → LTX 2.3)
\- Spectral amplification (boost LoRA-improved directions in base weights)
Works with any DiT-based model: LTX Video, FLUX, SD3, HunyuanVideo, WAN, PixArt.
**Repo:** https://github.com/alokickstudios-coder/neuralgraft
**License:** Apache 2.0 (fully open source)
Built this primarily for video generation (LTX 2.3) but it works for image models too. Happy to answer questions about the approach or limitations.
https://redd.it/1rza04z
@rStableDiffusion
GitHub
GitHub - alokickstudios-coder/neuralgraft: Zero-training capability transfer & LoRA construction for diffusion models. Forge LoRAs…
Zero-training capability transfer & LoRA construction for diffusion models. Forge LoRAs from images without training. Graft capabilities across architectures. Hours of model training in min...
SAMA 14b - Video Editing Model based off Wan 2.1 (Apache 2.0)
https://github.com/Cynthiazxy123/SAMA
https://huggingface.co/syxbb/SAMA-14B
https://redd.it/1rzauw4
@rStableDiffusion
https://github.com/Cynthiazxy123/SAMA
https://huggingface.co/syxbb/SAMA-14B
https://redd.it/1rzauw4
@rStableDiffusion
GitHub
GitHub - Cynthiazxy123/SAMA: Official inference code for SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction…
Official inference code for SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing. - Cynthiazxy123/SAMA
GPU Temps for Local Gen
What sort of temps are acceptable for local image generation? I generate images at 832x1216 and upscale by 1.5x and i'm seeing hot spot temps on my RTX 4080 peak out at 103c
is it time for me to replace the thermal paste on my GPU or is this expected temps? Worried that these temps will cause damage and be a costly replacement.
https://redd.it/1rz9je1
@rStableDiffusion
What sort of temps are acceptable for local image generation? I generate images at 832x1216 and upscale by 1.5x and i'm seeing hot spot temps on my RTX 4080 peak out at 103c
is it time for me to replace the thermal paste on my GPU or is this expected temps? Worried that these temps will cause damage and be a costly replacement.
https://redd.it/1rz9je1
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
What's the best pipeline to uniformize and upscale a large collection of old book cover scans?
https://redd.it/1rzbpeg
@rStableDiffusion
https://redd.it/1rzbpeg
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: What's the best pipeline to uniformize and upscale a large collection of old book…
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
How is this done? Are we going to live in a world of catfishing?
https://redd.it/1rzicfw
@rStableDiffusion
https://redd.it/1rzicfw
@rStableDiffusion
WTF is WanToDance? Are we getting a new toy soon?
https://github.com/modelscope/DiffSynth-Studio/pull/1361
https://redd.it/1rzgm6e
@rStableDiffusion
https://github.com/modelscope/DiffSynth-Studio/pull/1361
https://redd.it/1rzgm6e
@rStableDiffusion
GitHub
WanToDance by Artiprocher · Pull Request #1361 · modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models! Contribute to modelscope/DiffSynth-Studio development by creating an account on GitHub.
LTX 2.2 was nice but just not good enough. But I really think LTX 2.3 has finally gotten me to where I've basically stopped using WAN 2.2
For a long time, I considered LTX to be the worst of all the models. I've tried each release they've come out with. Some of the earlier ones were downright horrible, especially for their time.
But my God have they turned things around.
LTX 2.3 is by no means better than WAN 2.2 in every single way. But one thing that (in my humble opinion) can be said about LTX 2.3 is that, when you consider all factors, it is now overall the best video model that can be locally run, and it has reduced the need to fall back on WAN in a way that LTX 2.2 could not. Especially since ITV in 2.2 was an absolute nightmare to work with.
Things WAN 2.2 still has over LTX:
*Slightly better prompt comprehension and prompt following (as opposed to WAY better in LTX 2.2)
*Moderately better picture/video quality.
*LORA advantage due to its age.
On the flipside: having used LTX 2.3 a great deal since its release, it's painful to go back to WAN now.
*WAN is only 5 seconds ideally before it starts to break apart.
*WAN is dramatically slower than distilled LTX 2.3 or LTX 2.3 with the distill LORA
*WAN cannot do sound on its own (14b version)
*WAN is therefore more useful now as a base building block that passes its output along to something else.
When you're making 15 second videos with sound and highly convincing audio in one minute, it really starts to highlight how far WAN is falling behind, especially since 2.5 and 2.6 will likely never be local.
TL:DR
Generating T2V might still hold some advantage for WAN, but for ITV, it's basically obsolete now compared to LTX 2.3, and even on T2V, LTX 2.3 has made many gains. Since LTX is all we're likely to get, as open source seems to be drying up, it's good that the company behind it has gotten over a lot of their growing pains and is now putting up some seriously amazing tech.
https://redd.it/1rzjel2
@rStableDiffusion
For a long time, I considered LTX to be the worst of all the models. I've tried each release they've come out with. Some of the earlier ones were downright horrible, especially for their time.
But my God have they turned things around.
LTX 2.3 is by no means better than WAN 2.2 in every single way. But one thing that (in my humble opinion) can be said about LTX 2.3 is that, when you consider all factors, it is now overall the best video model that can be locally run, and it has reduced the need to fall back on WAN in a way that LTX 2.2 could not. Especially since ITV in 2.2 was an absolute nightmare to work with.
Things WAN 2.2 still has over LTX:
*Slightly better prompt comprehension and prompt following (as opposed to WAY better in LTX 2.2)
*Moderately better picture/video quality.
*LORA advantage due to its age.
On the flipside: having used LTX 2.3 a great deal since its release, it's painful to go back to WAN now.
*WAN is only 5 seconds ideally before it starts to break apart.
*WAN is dramatically slower than distilled LTX 2.3 or LTX 2.3 with the distill LORA
*WAN cannot do sound on its own (14b version)
*WAN is therefore more useful now as a base building block that passes its output along to something else.
When you're making 15 second videos with sound and highly convincing audio in one minute, it really starts to highlight how far WAN is falling behind, especially since 2.5 and 2.6 will likely never be local.
TL:DR
Generating T2V might still hold some advantage for WAN, but for ITV, it's basically obsolete now compared to LTX 2.3, and even on T2V, LTX 2.3 has made many gains. Since LTX is all we're likely to get, as open source seems to be drying up, it's good that the company behind it has gotten over a lot of their growing pains and is now putting up some seriously amazing tech.
https://redd.it/1rzjel2
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community