Custom face detection + segmentation models with dedicated ComfyUI nodes
https://redd.it/1rrlh4o
@rStableDiffusion
https://redd.it/1rrlh4o
@rStableDiffusion
π1
Last week in Image & Video Generation
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:
LTX-2.3 β Lightricks
Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one β see below.
Model | HuggingFace
https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player
Helios β PKU-YuanGroup
14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub
https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player
Kiwi-Edit
Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo
https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938
CubeComposer β TencentARC
Converts regular video to 4K 360Β° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace
https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0
HY-WU β Tencent
No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace
https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b
Spectrum
3β5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub
https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc
LTX Desktop β Community
Free local video editor built on LTX-2.3. Just works out of the box.
Reddit
LTX Desktop Linux Port β Community
Someone ported LTX Desktop to Linux. Didn't take long.
Reddit
LTX-2.3 Workflows β Community
12GB GGUF workflows covering i2v, t2v, v2v and more.
Reddit
https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player
LTX-2.3 Prompting Guide β Community
Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Reddit
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1rr9iwd
@rStableDiffusion
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:
LTX-2.3 β Lightricks
Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one β see below.
Model | HuggingFace
https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player
Helios β PKU-YuanGroup
14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub
https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player
Kiwi-Edit
Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo
https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938
CubeComposer β TencentARC
Converts regular video to 4K 360Β° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace
https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0
HY-WU β Tencent
No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace
https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b
Spectrum
3β5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub
https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc
LTX Desktop β Community
Free local video editor built on LTX-2.3. Just works out of the box.
LTX Desktop Linux Port β Community
Someone ported LTX Desktop to Linux. Didn't take long.
LTX-2.3 Workflows β Community
12GB GGUF workflows covering i2v, t2v, v2v and more.
https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player
LTX-2.3 Prompting Guide β Community
Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1rr9iwd
@rStableDiffusion
ltx.io
LTX-2.3: Introducing LTX's Latest AI Video Model | LTX Model
LTX-2.3 upgrades every dimension of AI video: sharper detail, cleaner audio, stronger motion, and native portrait β one generation model.
Anima Preview 2 posted on hugging face
https://huggingface.co/circlestone-labs/Anima/tree/main/splitfiles/diffusionmodels
https://redd.it/1rqy92r
@rStableDiffusion
https://huggingface.co/circlestone-labs/Anima/tree/main/splitfiles/diffusionmodels
https://redd.it/1rqy92r
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Anima Preview 2 posted on hugging face
Explore this post and more from the StableDiffusion community
New Image Edit model? HY-WU
Why is there no mention of HY-WU here? https://huggingface.co/tencent/HY-WU
Has anyone actually used it?
https://redd.it/1rrdpya
@rStableDiffusion
Why is there no mention of HY-WU here? https://huggingface.co/tencent/HY-WU
Has anyone actually used it?
https://redd.it/1rrdpya
@rStableDiffusion
huggingface.co
tencent/HY-WU Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
So... turns out Z-Image Base is really good at inpainting realism. Workflow + info in the comments!
https://redd.it/1rrqrpf
@rStableDiffusion
https://redd.it/1rrqrpf
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: So... turns out Z-Image Base is really good at inpainting realism. Workflow + infoβ¦
Explore this post and more from the StableDiffusion community
Anima-Preview2-8-Step-Turbo-Lora
https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05
Iβm happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.
You can download the model and find example workflows in the gallery/files sections here:
* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)
Recommended Settings
* **Steps:** 6β8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`
This LoRA was trained using renewable energy.
https://redd.it/1rrs5u0
@rStableDiffusion
https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05
Iβm happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.
You can download the model and find example workflows in the gallery/files sections here:
* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)
Recommended Settings
* **Steps:** 6β8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`
This LoRA was trained using renewable energy.
https://redd.it/1rrs5u0
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX 2.3 30 second clips @ 6.5 minutes w 16gb vram. Settings work for all kinds of clips. No janky animation. High detail in all kinds of clips try out the workflow.
https://redd.it/1rrq33f
@rStableDiffusion
https://redd.it/1rrq33f
@rStableDiffusion
I built a free local video captioner specifically tuned for LTX-2.3 training β
https://redd.it/1rrsd9i
@rStableDiffusion
https://redd.it/1rrsd9i
@rStableDiffusion
New FLUX.2 Klein 9b models have been released.
https://huggingface.co/black-forest-labs/FLUX.2-klein-9b-kv-fp8
https://redd.it/1rrw4lx
@rStableDiffusion
https://huggingface.co/black-forest-labs/FLUX.2-klein-9b-kv-fp8
https://redd.it/1rrw4lx
@rStableDiffusion
huggingface.co
black-forest-labs/FLUX.2-klein-9b-kv-fp8 Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
Flux 2 Klein 9B is now up to 2Γ faster with multiple reference images (new model)
https://x.com/bfl_ml/status/2032110512381837735
https://redd.it/1rrvnu2
@rStableDiffusion
https://x.com/bfl_ml/status/2032110512381837735
https://redd.it/1rrvnu2
@rStableDiffusion
X (formerly Twitter)
Black Forest Labs (@bfl_ml) on X
FLUX.2 [klein] 9B just got 2x faster at image editing, especially when you use multiple reference images. Same quality, no price increase.
This media is not supported in your browser
VIEW IN TELEGRAM
Down to 32s gen time for 10 seconds of Video+Audio by using DeepBeepMeep's UI. LTX-2 2.3 on a 4090 24gb.
https://redd.it/1rrre4d
@rStableDiffusion
https://redd.it/1rrre4d
@rStableDiffusion