Custom face detection + segmentation models with dedicated ComfyUI nodes
https://redd.it/1rrlh4o
@rStableDiffusion
https://redd.it/1rrlh4o
@rStableDiffusion
π1
Last week in Image & Video Generation
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:
LTX-2.3 β Lightricks
Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one β see below.
Model | HuggingFace
https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player
Helios β PKU-YuanGroup
14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub
https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player
Kiwi-Edit
Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo
https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938
CubeComposer β TencentARC
Converts regular video to 4K 360Β° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace
https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0
HY-WU β Tencent
No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace
https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b
Spectrum
3β5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub
https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc
LTX Desktop β Community
Free local video editor built on LTX-2.3. Just works out of the box.
Reddit
LTX Desktop Linux Port β Community
Someone ported LTX Desktop to Linux. Didn't take long.
Reddit
LTX-2.3 Workflows β Community
12GB GGUF workflows covering i2v, t2v, v2v and more.
Reddit
https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player
LTX-2.3 Prompting Guide β Community
Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Reddit
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1rr9iwd
@rStableDiffusion
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:
LTX-2.3 β Lightricks
Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one β see below.
Model | HuggingFace
https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player
Helios β PKU-YuanGroup
14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub
https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player
Kiwi-Edit
Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo
https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938
CubeComposer β TencentARC
Converts regular video to 4K 360Β° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace
https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0
HY-WU β Tencent
No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace
https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b
Spectrum
3β5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub
https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc
LTX Desktop β Community
Free local video editor built on LTX-2.3. Just works out of the box.
LTX Desktop Linux Port β Community
Someone ported LTX Desktop to Linux. Didn't take long.
LTX-2.3 Workflows β Community
12GB GGUF workflows covering i2v, t2v, v2v and more.
https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player
LTX-2.3 Prompting Guide β Community
Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1rr9iwd
@rStableDiffusion
ltx.io
LTX-2.3: Introducing LTX's Latest AI Video Model | LTX Model
LTX-2.3 upgrades every dimension of AI video: sharper detail, cleaner audio, stronger motion, and native portrait β one generation model.
Anima Preview 2 posted on hugging face
https://huggingface.co/circlestone-labs/Anima/tree/main/splitfiles/diffusionmodels
https://redd.it/1rqy92r
@rStableDiffusion
https://huggingface.co/circlestone-labs/Anima/tree/main/splitfiles/diffusionmodels
https://redd.it/1rqy92r
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Anima Preview 2 posted on hugging face
Explore this post and more from the StableDiffusion community
New Image Edit model? HY-WU
Why is there no mention of HY-WU here? https://huggingface.co/tencent/HY-WU
Has anyone actually used it?
https://redd.it/1rrdpya
@rStableDiffusion
Why is there no mention of HY-WU here? https://huggingface.co/tencent/HY-WU
Has anyone actually used it?
https://redd.it/1rrdpya
@rStableDiffusion
huggingface.co
tencent/HY-WU Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
So... turns out Z-Image Base is really good at inpainting realism. Workflow + info in the comments!
https://redd.it/1rrqrpf
@rStableDiffusion
https://redd.it/1rrqrpf
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: So... turns out Z-Image Base is really good at inpainting realism. Workflow + infoβ¦
Explore this post and more from the StableDiffusion community
Anima-Preview2-8-Step-Turbo-Lora
https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05
Iβm happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.
You can download the model and find example workflows in the gallery/files sections here:
* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)
Recommended Settings
* **Steps:** 6β8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`
This LoRA was trained using renewable energy.
https://redd.it/1rrs5u0
@rStableDiffusion
https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05
Iβm happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.
You can download the model and find example workflows in the gallery/files sections here:
* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)
Recommended Settings
* **Steps:** 6β8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`
This LoRA was trained using renewable energy.
https://redd.it/1rrs5u0
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX 2.3 30 second clips @ 6.5 minutes w 16gb vram. Settings work for all kinds of clips. No janky animation. High detail in all kinds of clips try out the workflow.
https://redd.it/1rrq33f
@rStableDiffusion
https://redd.it/1rrq33f
@rStableDiffusion
I built a free local video captioner specifically tuned for LTX-2.3 training β
https://redd.it/1rrsd9i
@rStableDiffusion
https://redd.it/1rrsd9i
@rStableDiffusion