Last week in Image & Video Generation
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:
LTX-2.3 — Lightricks
Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one — see below.
Model | HuggingFace
https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player
Helios — PKU-YuanGroup
14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub
https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player
Kiwi-Edit
Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo
https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938
CubeComposer — TencentARC
Converts regular video to 4K 360° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace
https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0
HY-WU — Tencent
No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace
https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b
Spectrum
3–5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub
https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc
LTX Desktop — Community
Free local video editor built on LTX-2.3. Just works out of the box.
Reddit
LTX Desktop Linux Port — Community
Someone ported LTX Desktop to Linux. Didn't take long.
Reddit
LTX-2.3 Workflows — Community
12GB GGUF workflows covering i2v, t2v, v2v and more.
Reddit
https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player
LTX-2.3 Prompting Guide — Community
Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Reddit
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1rr9iwd
@rStableDiffusion
I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:
LTX-2.3 — Lightricks
Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one — see below.
Model | HuggingFace
https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player
Helios — PKU-YuanGroup
14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub
https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player
Kiwi-Edit
Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo
https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938
CubeComposer — TencentARC
Converts regular video to 4K 360° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace
https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0
HY-WU — Tencent
No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace
https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b
Spectrum
3–5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub
https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc
LTX Desktop — Community
Free local video editor built on LTX-2.3. Just works out of the box.
LTX Desktop Linux Port — Community
Someone ported LTX Desktop to Linux. Didn't take long.
LTX-2.3 Workflows — Community
12GB GGUF workflows covering i2v, t2v, v2v and more.
https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player
LTX-2.3 Prompting Guide — Community
Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Checkout the full roundup for more demos, papers, and resources.
https://redd.it/1rr9iwd
@rStableDiffusion
ltx.io
LTX-2.3: Introducing LTX's Latest AI Video Model | LTX Model
LTX-2.3 upgrades every dimension of AI video: sharper detail, cleaner audio, stronger motion, and native portrait — one generation model.
Anima Preview 2 posted on hugging face
https://huggingface.co/circlestone-labs/Anima/tree/main/splitfiles/diffusionmodels
https://redd.it/1rqy92r
@rStableDiffusion
https://huggingface.co/circlestone-labs/Anima/tree/main/splitfiles/diffusionmodels
https://redd.it/1rqy92r
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Anima Preview 2 posted on hugging face
Explore this post and more from the StableDiffusion community
New Image Edit model? HY-WU
Why is there no mention of HY-WU here? https://huggingface.co/tencent/HY-WU
Has anyone actually used it?
https://redd.it/1rrdpya
@rStableDiffusion
Why is there no mention of HY-WU here? https://huggingface.co/tencent/HY-WU
Has anyone actually used it?
https://redd.it/1rrdpya
@rStableDiffusion
huggingface.co
tencent/HY-WU · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
So... turns out Z-Image Base is really good at inpainting realism. Workflow + info in the comments!
https://redd.it/1rrqrpf
@rStableDiffusion
https://redd.it/1rrqrpf
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: So... turns out Z-Image Base is really good at inpainting realism. Workflow + info…
Explore this post and more from the StableDiffusion community
Anima-Preview2-8-Step-Turbo-Lora
https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05
I’m happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.
You can download the model and find example workflows in the gallery/files sections here:
* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)
Recommended Settings
* **Steps:** 6–8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`
This LoRA was trained using renewable energy.
https://redd.it/1rrs5u0
@rStableDiffusion
https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05
I’m happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.
You can download the model and find example workflows in the gallery/files sections here:
* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)
Recommended Settings
* **Steps:** 6–8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`
This LoRA was trained using renewable energy.
https://redd.it/1rrs5u0
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX 2.3 30 second clips @ 6.5 minutes w 16gb vram. Settings work for all kinds of clips. No janky animation. High detail in all kinds of clips try out the workflow.
https://redd.it/1rrq33f
@rStableDiffusion
https://redd.it/1rrq33f
@rStableDiffusion
I built a free local video captioner specifically tuned for LTX-2.3 training —
https://redd.it/1rrsd9i
@rStableDiffusion
https://redd.it/1rrsd9i
@rStableDiffusion
New FLUX.2 Klein 9b models have been released.
https://huggingface.co/black-forest-labs/FLUX.2-klein-9b-kv-fp8
https://redd.it/1rrw4lx
@rStableDiffusion
https://huggingface.co/black-forest-labs/FLUX.2-klein-9b-kv-fp8
https://redd.it/1rrw4lx
@rStableDiffusion
huggingface.co
black-forest-labs/FLUX.2-klein-9b-kv-fp8 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Flux 2 Klein 9B is now up to 2× faster with multiple reference images (new model)
https://x.com/bfl_ml/status/2032110512381837735
https://redd.it/1rrvnu2
@rStableDiffusion
https://x.com/bfl_ml/status/2032110512381837735
https://redd.it/1rrvnu2
@rStableDiffusion
X (formerly Twitter)
Black Forest Labs (@bfl_ml) on X
FLUX.2 [klein] 9B just got 2x faster at image editing, especially when you use multiple reference images. Same quality, no price increase.
This media is not supported in your browser
VIEW IN TELEGRAM
Down to 32s gen time for 10 seconds of Video+Audio by using DeepBeepMeep's UI. LTX-2 2.3 on a 4090 24gb.
https://redd.it/1rrre4d
@rStableDiffusion
https://redd.it/1rrre4d
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Can't believe I can create 4k videos with a crap 12gb vram card in 20 mins
https://redd.it/1rya228
@rStableDiffusion
https://redd.it/1rya228
@rStableDiffusion
ComfyUI Tutorial: First Last Frame Animation LTX 2.3 Workflow
https://youtu.be/O1gUVbfC2tI
https://redd.it/1rxw0k3
@rStableDiffusion
https://youtu.be/O1gUVbfC2tI
https://redd.it/1rxw0k3
@rStableDiffusion
YouTube
ComfyUI Tutorial: First Last Frame Animation LTX 2.3 Workflow #comfyui #comfyuitutorial
hello everyone, welcome back in this tutorial, i will show you how to use the first and last frame workflow with the LTX2.3 Model for amazing ai video generation. this comfyui workflow is perfect for creating video with two images and it is optimized for…