r/StableDiffusion
15 subscribers
29.5K photos
2.45K videos
1 file
14K links
Download Telegram
Preview with Flux Klein models in ComfyUI?

I tried to search for it, but haven't really found much info. Does anyone know if there's a way to make preview in ComfyUI work properly with Klein models? Using taesd method, the preview always lags a step behind, including showing the image from the previous generation after the first step, and the image it does show looks like it's not decoded properly, kind of noisy, and the colors are off. Like so:

https://preview.redd.it/rd28puh7y0sg1.png?width=1000&format=png&auto=webp&s=6ccd0141d7c0afcd2fe525afa146c9253f3de0f2

latent2rgb looks basically the same. Is there any way to get a normal preview?

https://redd.it/1s72dtm
@stablediffusion_r
Whats the verdict on Sage Attention 3 now? or stick with Sage 2.2?

I use Image Z Turbo, Wan 2.2 and LTX 2.3

I noticed that Sage Attention 3 altered the dress in a video of a dancing woman to a trousers when using LTX 2.3, I switched to Sage 2.2 and also tried disabling it and the issue was fixed

I actually thought it was the GGUF text encoder that causes the dress to turn into a pants but to my surprise it was Sage 3 that was causing it.

I went back to 2.2 only lost a few seconds speed by the quality was like if it' was disabled very good.

https://redd.it/1s73r4e
@stablediffusion_r
This media is not supported in your browser
VIEW IN TELEGRAM
I went from being a total dummy at ComfyUi to generating this I2V using LTX 2.3, I feel so proud of myself.

https://redd.it/1s76eod
@stablediffusion_r
What can you do if your hardware can generate 15,000 token/s?

[https://taalas.com/](https://taalas.com/)

Demo:

[https://chatjimmy.ai/](https://chatjimmy.ai/)

Saw this posted from r/Qwen_AI and r/LocalLLM today. I also remember seeing this from a few years ago when they first published their studies, but completely forgot about it.

Basically instead of inference on a graphics card where models are loaded onto memory, we burn the model into hardware. Remember CDs? It is cheap to build this compare to GPUs, they are using 6nm chips instead of the latest tech, no memories needed! The biggest downside is you can't swap models, there is no flexibility.

Thoughts? Would this making live streaming AI movies, games possible? You can have a MMO where every single npc have their own unique dialog with no delay for thousands of players.

What a crazy world we live in.

https://redd.it/1s77t1e
@stablediffusion_r
I see many people praising Klein, Zimage (turbo, base), and other models. But few examples. Please post here what you consider to represent the pinnacle of each model. Especially for photorealism.
https://redd.it/1s7ahcc
@stablediffusion_r
This media is not supported in your browser
VIEW IN TELEGRAM
For the many of you who claim to be getting very poor results/eyes/faces with LTX 2.3 ITV: do you have your distillation set too high? (First video, 0.6. Second video, 1.0)

https://redd.it/1s77hzb
@stablediffusion_r
I developed an LTX 2.3 program based on the desktop version of LTX, with optimizations that bypass the 32GB VRAM limitation. It integrates features such as start/end frames, text-to-video, image-to-video, lip-sync, and video enhancement. The links are in the comments.
https://redd.it/1s7g50w
@stablediffusion_r
Z-image character lora great success with onetrainer with these settings.

For z-image base.

Onetrainer github: https://github.com/Nerogar/OneTrainer

Go here https://civitai.com/articles/25701 and grab the file named z-image-base-onetrainer.json from the resources section. I can't share the results because reasons but give it a try, it blew my mind. Made it from random tips i also read on multiple subs so I thought I'd share it back.

I used around 50 images captioned briefly ( trigger. expression. Pose. Angle. Clothes. Background - 2-3 words each ) ex: "Natasha. Neutral expression. Reclined on sofa. Low angle handheld selfie. Wearing blue dress. Living room background."

Poses, long shots, low angles, high angles, selfies, positions, expressions, everything works like a charm (provided you captioned for them in your dataset).

Would be great if I found something similar for Chroma next.

My contribution is configured it so it works with 1024 res images since most of the guides I see are for 512.

Works incredible with generating at FHD; i use the distill lora with 8 steps so its reasonably fast: workflow: https://pastebin.com/UacpHZUG

I found that euler_cfg_pp with beta33 works really well if you want the instagram aesthetic; you can get the beta33 scheduler with this node: https://github.com/silveroxides/ComfyUI\_PowerShiftScheduler

What other sampler / schedulers have you found works well for realism?

https://redd.it/1s7fr2b
@stablediffusion_r