Last week in Image & Video Generation

I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from last week:

LTX-2.3 — Lightricks

Better prompt following, native portrait mode up to 1080x1920. Community moved incredibly fast on this one — see below.
Model | HuggingFace

https://reddit.com/link/1rr9iwd/video/8quo4o9mxhog1/player

Helios — PKU-YuanGroup

14B video model running real-time on a single GPU. t2v, i2v, v2v up to a minute long. Worth testing yourself.
HuggingFace | GitHub

https://reddit.com/link/1rr9iwd/video/ciw3y2vmxhog1/player

Kiwi-Edit

Text or image prompt video editing with temporal consistency. Style swaps, object removal, background changes.
HuggingFace | Project | Demo

https://preview.redd.it/dx8lm1uoxhog1.png?width=1456&format=png&auto=webp&s=25d8c82bac43d01f4e425179cd725be8ac542938

CubeComposer — TencentARC

Converts regular video to 4K 360° seamlessly. Output quality is genuinely surprising.
Project | HuggingFace

https://preview.redd.it/rqds7zvpxhog1.png?width=1456&format=png&auto=webp&s=24de8610bc84023c30ac5574cbaf7b06040c29a0

HY-WU — Tencent

No-training personalized image edits. Face swaps and style transfer on the fly without fine-tuning.
Project | HuggingFace

https://preview.redd.it/l9p8ahrqxhog1.png?width=1456&format=png&auto=webp&s=63f78ee94170afcca6390a35c50539a8e40d025b

Spectrum

3–5x diffusion speedup via Chebyshev polynomial step prediction. No retraining required, plug into existing image and video pipelines.
GitHub

https://preview.redd.it/htdch9trxhog1.png?width=1456&format=png&auto=webp&s=41100093cedbeba7843e90cd36ce62e08841aabc

LTX Desktop — Community

Free local video editor built on LTX-2.3. Just works out of the box.
Reddit

LTX Desktop Linux Port — Community

Someone ported LTX Desktop to Linux. Didn't take long.
Reddit

LTX-2.3 Workflows — Community

12GB GGUF workflows covering i2v, t2v, v2v and more.
Reddit

https://reddit.com/link/1rr9iwd/video/westyyf3yhog1/player

LTX-2.3 Prompting Guide — Community

Community-written guide that gets into the specifics of prompting LTX-2.3 well.
Reddit


Checkout the full roundup for more demos, papers, and resources.



https://redd.it/1rr9iwd
@rStableDiffusion
Anima-Preview2-8-Step-Turbo-Lora

https://preview.redd.it/g15ojf2bgmog1.png?width=1024&format=png&auto=webp&s=e3e102e7f73329c100f48632e56fd8caa1e48c05

I’m happy to share with you my **Anima-Preview2-8-Step-Turbo-LoRA**.

You can download the model and find example workflows in the gallery/files sections here:

* [https://civitai.com/models/2460007?modelVersionId=2766518](https://civitai.com/models/2460007?modelVersionId=2766518)
* [https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA](https://huggingface.co/Einhorn/Anima-Preview2-Turbo-LoRA)

Recommended Settings

* **Steps:** 6–8
* **CFG Scale:** 1
* **Samplers:** `dpmpp_sde`, `dpmpp_2m_sde`, or `dpmpp_multistep`

This LoRA was trained using renewable energy.

https://redd.it/1rrs5u0
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX 2.3 30 second clips @ 6.5 minutes w 16gb vram. Settings work for all kinds of clips. No janky animation. High detail in all kinds of clips try out the workflow.

https://redd.it/1rrq33f
@rStableDiffusion
I built a free local video captioner specifically tuned for LTX-2.3 training —
https://redd.it/1rrsd9i
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Down to 32s gen time for 10 seconds of Video+Audio by using DeepBeepMeep's UI. LTX-2 2.3 on a 4090 24gb.

https://redd.it/1rrre4d
@rStableDiffusion