Hugging Face (Twitter)
RT @RisingSayak: Fast LoRA inference for Flux with Diffusers and PEFT 🚨
There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.
In our latest post, @BenjaminBossan and I show different techniques to optimize LoRA inference for the Flux family of models from @bfl_ml for image generation. Our recipe includes the use of:
1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯
We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a 2x speedup in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗
Even...
Перейти на оригинальный пост
RT @RisingSayak: Fast LoRA inference for Flux with Diffusers and PEFT 🚨
There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.
In our latest post, @BenjaminBossan and I show different techniques to optimize LoRA inference for the Flux family of models from @bfl_ml for image generation. Our recipe includes the use of:
1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯
We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a 2x speedup in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗
Even...
Перейти на оригинальный пост
Hugging Face (Twitter)
RT @reach_vb: NEW: GLM-4.5 & GLM-4.5-Air from @Zai_org - competitive w/ claude 4 opus and beats Gemini 2.5 Pro, MIT license🔥
> GLM-4.5: 355B total params, 32B active (MoE)
> GLM-4.5-Air: 106B total params, 12B active (MoE)
> "Thinking mode" (complex tasks) + "Non-thinking mode" (instant responses)
>128K context length + native function calling
Impressive benchmarks:
> AIME24: 91.0 (vs. Claude 4 Opus’s 75.7)
> MATH 500: 98.2 (vs. GPT-4.1’s 96.7)
> GPQA: 79.1 (vs. Gemini 2.5 Pro’s 84.4)
> SWE-bench Verified: 64.2 (vs. Claude 4 Sonnet’s 70.4)
> Terminal-Bench: 37.5 (vs. Claude 4 Opus’s 43.2)
> MoE - Loss-free balance routing + sigmoid gates
> Deeper, narrower - More layers, fewer experts (better reasoning).
> GQA: Partial RoPE + 96 attention heads
> 15T general + 7T code/reasoning tokens
Pretty solid model, looking forward for Inference providers to rise up and start serving this model! 🤗
RT @reach_vb: NEW: GLM-4.5 & GLM-4.5-Air from @Zai_org - competitive w/ claude 4 opus and beats Gemini 2.5 Pro, MIT license🔥
> GLM-4.5: 355B total params, 32B active (MoE)
> GLM-4.5-Air: 106B total params, 12B active (MoE)
> "Thinking mode" (complex tasks) + "Non-thinking mode" (instant responses)
>128K context length + native function calling
Impressive benchmarks:
> AIME24: 91.0 (vs. Claude 4 Opus’s 75.7)
> MATH 500: 98.2 (vs. GPT-4.1’s 96.7)
> GPQA: 79.1 (vs. Gemini 2.5 Pro’s 84.4)
> SWE-bench Verified: 64.2 (vs. Claude 4 Sonnet’s 70.4)
> Terminal-Bench: 37.5 (vs. Claude 4 Opus’s 43.2)
> MoE - Loss-free balance routing + sigmoid gates
> Deeper, narrower - More layers, fewer experts (better reasoning).
> GQA: Partial RoPE + 96 attention heads
> 15T general + 7T code/reasoning tokens
Pretty solid model, looking forward for Inference providers to rise up and start serving this model! 🤗
Hugging Face (Twitter)
RT @vanstriendaniel: HF Jobs just launched! 🚀
One command VLM based OCR with uv Scripts:
hf jobs uv run [script] ufo-images ufo-text
Classified UFO docs → clean markdown. Zero setup!
Try it → huggingface.co/uv-scripts
RT @vanstriendaniel: HF Jobs just launched! 🚀
One command VLM based OCR with uv Scripts:
hf jobs uv run [script] ufo-images ufo-text
Classified UFO docs → clean markdown. Zero setup!
Try it → huggingface.co/uv-scripts
Hugging Face (Twitter)
RT @charliermarsh: The new Hugging Face jobs CLI is powered by uv 🤗
You can use `hf jobs uv run` to initiate a job from a standalone Python script.
RT @charliermarsh: The new Hugging Face jobs CLI is powered by uv 🤗
You can use `hf jobs uv run` to initiate a job from a standalone Python script.
Hugging Face (Twitter)
RT @business: Zhipu is releasing its biggest open-source model to date, joining a growing number of Chinese firms ramping up their free artificial intelligence offerings
RT @business: Zhipu is releasing its biggest open-source model to date, joining a growing number of Chinese firms ramping up their free artificial intelligence offerings
Bloomberg.com
Chinese OpenAI Challenger Zhipu to Unveil New Open-Source Model
Zhipu is releasing its biggest open-source model to date, joining a growing number of Chinese firms ramping up their free artificial intelligence offerings.
Hugging Face (Twitter)
RT @ClementDelangue: The GSPO paper by @Alibaba_Qwen is already the third most popular one on @huggingface for the month of July.
I suspect this will have a massive impact on the field! https://huggingface.co/papers/month/2025-07
Also, let's get back to celebrate research papers as massive contributions to the field?
RT @ClementDelangue: The GSPO paper by @Alibaba_Qwen is already the third most popular one on @huggingface for the month of July.
I suspect this will have a massive impact on the field! https://huggingface.co/papers/month/2025-07
Also, let's get back to celebrate research papers as massive contributions to the field?
Hugging Face (Twitter)
RT @ClementDelangue: We thought we would get xAI open-source but got zAI so even better 😅😅😅
RT @ClementDelangue: We thought we would get xAI open-source but got zAI so even better 😅😅😅
Hugging Face (Twitter)
RT @roo_code: Roo Code now supports @huggingface🤗
Fast config. No extra hosting. And the ability to bring a whopping 91 models directly into your editor. Try it now!
RT @roo_code: Roo Code now supports @huggingface🤗
Fast config. No extra hosting. And the ability to bring a whopping 91 models directly into your editor. Try it now!
Hugging Face (Twitter)
RT @ivanfioravanti: GLM-4.5-Air-3bit for anybody out there with a Mac with 64GB that wants to try it, while DWQ is cooking 🔥
https://huggingface.co/mlx-community/GLM-4.5-Air-3bit
RT @ivanfioravanti: GLM-4.5-Air-3bit for anybody out there with a Mac with 64GB that wants to try it, while DWQ is cooking 🔥
https://huggingface.co/mlx-community/GLM-4.5-Air-3bit
Hugging Face (Twitter)
RT @ClementDelangue: How much are you using @huggingface's CLI? Mostly to upload and download models and datasets?
We just revamped it (welcome to `hf`!) and added the capability to run jobs directly on our infra. Useful?
RT @ClementDelangue: How much are you using @huggingface's CLI? Mostly to upload and download models and datasets?
We just revamped it (welcome to `hf`!) and added the capability to run jobs directly on our infra. Useful?
Hugging Face (Twitter)
RT @HuggingPapers: TencentARC unveils ARC-Hunyuan-Video-7B on Hugging Face.
A compact 7B multimodal model designed for deep, structured comprehension of real-world short videos, processing visual, audio, & text signals end-to-end.
RT @HuggingPapers: TencentARC unveils ARC-Hunyuan-Video-7B on Hugging Face.
A compact 7B multimodal model designed for deep, structured comprehension of real-world short videos, processing visual, audio, & text signals end-to-end.
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @Hesamation: a few months ago i shared this interactive blog post “LLM embeddings explained” on @huggingface and it gives me chills that people have actually found it helpful.
yesterday someone posted about it on LinkedIn, made me think about it after a while!
RT @Hesamation: a few months ago i shared this interactive blog post “LLM embeddings explained” on @huggingface and it gives me chills that people have actually found it helpful.
yesterday someone posted about it on LinkedIn, made me think about it after a while!
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @reach_vb: BOOM! Latest Qwen 30B A3B 2507 running blazingly fast on Mac powered by MLX 💥
mlx_lm.chat --model "lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-4bit"
That's it, try it out today!
RT @reach_vb: BOOM! Latest Qwen 30B A3B 2507 running blazingly fast on Mac powered by MLX 💥
mlx_lm.chat --model "lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-4bit"
That's it, try it out today!
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @roo_code: Got a favorite @huggingface model? Now it lives in your editor. 🤗
Roo Code makes it easy to connect your API key, choose from 90+ models, and select your preferred inference provider in just a few clicks.
Watch the quick tutorial and explore more: https://docs.roocode.com/providers/huggingface
RT @roo_code: Got a favorite @huggingface model? Now it lives in your editor. 🤗
Roo Code makes it easy to connect your API key, choose from 90+ models, and select your preferred inference provider in just a few clicks.
Watch the quick tutorial and explore more: https://docs.roocode.com/providers/huggingface
Hugging Face (Twitter)
RT @vanstriendaniel: I just processed 1000s of prompts using Qwen3-235B-A22B-Instruct-2507 across 4 GPUs!
How? Everyone plays their part:
@astral_sh UV handles dependencies
@huggingface Jobs handles GPUs
@Alibaba_Qwen handles the model
@vllm_project handles inference
One command. Zero complexity!
RT @vanstriendaniel: I just processed 1000s of prompts using Qwen3-235B-A22B-Instruct-2507 across 4 GPUs!
How? Everyone plays their part:
@astral_sh UV handles dependencies
@huggingface Jobs handles GPUs
@Alibaba_Qwen handles the model
@vllm_project handles inference
One command. Zero complexity!
Hugging Face (Twitter)
RT @lhoestq: > hf jobs is just out and damnnnn I love the uv integration 💛
@huggingface made their scripts uv-ready to run them on HF infra without setting up docker or dependencies.
E.g.
run DPO locally > uv run dpo․py
run DPO on HF > hf jobs uv run dpo․py
Bonus: --flavor for GPUs🔥
RT @lhoestq: > hf jobs is just out and damnnnn I love the uv integration 💛
@huggingface made their scripts uv-ready to run them on HF infra without setting up docker or dependencies.
E.g.
run DPO locally > uv run dpo․py
run DPO on HF > hf jobs uv run dpo․py
Bonus: --flavor for GPUs🔥
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @NielsRogge: Efficient LoFTR was just integrated into @huggingface Transformers!
It improves upon LoFTR, a detector-free image matcher, by being 2.5x faster. It can even surpass the SOTA efficient sparse matching pipeline SuperPoint + LightGlue.
Now available in a few lines of code!
RT @NielsRogge: Efficient LoFTR was just integrated into @huggingface Transformers!
It improves upon LoFTR, a detector-free image matcher, by being 2.5x faster. It can even surpass the SOTA efficient sparse matching pipeline SuperPoint + LightGlue.
Now available in a few lines of code!
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @jandotai: Jan v0.6.6 is out: Jan now runs fully on llama.cpp.
- Cortex is gone, local models now run on @ggerganov's llama.cpp
- Toggle between llama.cpp builds
- @huggingface added as a model provider
- Hub enhanced
- Images from MCPs render inline in chat
Update Jan or grab the latest.
RT @jandotai: Jan v0.6.6 is out: Jan now runs fully on llama.cpp.
- Cortex is gone, local models now run on @ggerganov's llama.cpp
- Toggle between llama.cpp builds
- @huggingface added as a model provider
- Hub enhanced
- Images from MCPs render inline in chat
Update Jan or grab the latest.
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @sleenyre: Happy to have released my first model. A lot of work went into making Krea 1 open source. https://twitter.com/krea_ai/status/1950921488871408075#m
RT @sleenyre: Happy to have released my first model. A lot of work went into making Krea 1 open source. https://twitter.com/krea_ai/status/1950921488871408075#m
Hugging Face (Twitter)
RT @NVSWSourcer: 👀 We just opened over 26M lines of synthetic data that was used to train the Llama Nemotron Super v1.5 model.
🔢 Find them on Hugging Face 🤗 bit.ly/4l8DIc7
RT @NVSWSourcer: 👀 We just opened over 26M lines of synthetic data that was used to train the Llama Nemotron Super v1.5 model.
🔢 Find them on Hugging Face 🤗 bit.ly/4l8DIc7
huggingface.co
nvidia/Nemotron-Post-Training-Dataset-v1 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.