Hugging Face
59 subscribers
606 photos
218 videos
1.05K links
Download Telegram
Hugging Face (Twitter)

RT @RisingSayak: Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminBossan and I show different techniques to optimize LoRA inference for the Flux family of models from @bfl_ml for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a 2x speedup in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗

Even...

Перейти на оригинальный пост
Hugging Face (Twitter)

RT @reach_vb: NEW: GLM-4.5 & GLM-4.5-Air from @Zai_org - competitive w/ claude 4 opus and beats Gemini 2.5 Pro, MIT license🔥

> GLM-4.5: 355B total params, 32B active (MoE)
> GLM-4.5-Air: 106B total params, 12B active (MoE)
> "Thinking mode" (complex tasks) + "Non-thinking mode" (instant responses)
>128K context length + native function calling

Impressive benchmarks:
> AIME24: 91.0 (vs. Claude 4 Opus’s 75.7)
> MATH 500: 98.2 (vs. GPT-4.1’s 96.7)
> GPQA: 79.1 (vs. Gemini 2.5 Pro’s 84.4)
> SWE-bench Verified: 64.2 (vs. Claude 4 Sonnet’s 70.4)
> Terminal-Bench: 37.5 (vs. Claude 4 Opus’s 43.2)

> MoE - Loss-free balance routing + sigmoid gates
> Deeper, narrower - More layers, fewer experts (better reasoning).
> GQA: Partial RoPE + 96 attention heads
> 15T general + 7T code/reasoning tokens

Pretty solid model, looking forward for Inference providers to rise up and start serving this model! 🤗
Hugging Face (Twitter)

RT @vanstriendaniel: HF Jobs just launched! 🚀

One command VLM based OCR with uv Scripts:

hf jobs uv run [script] ufo-images ufo-text

Classified UFO docs → clean markdown. Zero setup!

Try it → huggingface.co/uv-scripts
Hugging Face (Twitter)

RT @charliermarsh: The new Hugging Face jobs CLI is powered by uv 🤗

You can use `hf jobs uv run` to initiate a job from a standalone Python script.
Hugging Face (Twitter)

RT @ClementDelangue: The GSPO paper by @Alibaba_Qwen is already the third most popular one on @huggingface for the month of July.

I suspect this will have a massive impact on the field! https://huggingface.co/papers/month/2025-07

Also, let's get back to celebrate research papers as massive contributions to the field?
Hugging Face (Twitter)

RT @ClementDelangue: We thought we would get xAI open-source but got zAI so even better 😅😅😅
Hugging Face (Twitter)

RT @roo_code: Roo Code now supports @huggingface🤗

Fast config. No extra hosting. And the ability to bring a whopping 91 models directly into your editor. Try it now!
Hugging Face (Twitter)

RT @ivanfioravanti: GLM-4.5-Air-3bit for anybody out there with a Mac with 64GB that wants to try it, while DWQ is cooking 🔥

https://huggingface.co/mlx-community/GLM-4.5-Air-3bit
Hugging Face (Twitter)

RT @ClementDelangue: How much are you using @huggingface's CLI? Mostly to upload and download models and datasets?

We just revamped it (welcome to `hf`!) and added the capability to run jobs directly on our infra. Useful?
Hugging Face (Twitter)

RT @HuggingPapers: TencentARC unveils ARC-Hunyuan-Video-7B on Hugging Face.

A compact 7B multimodal model designed for deep, structured comprehension of real-world short videos, processing visual, audio, & text signals end-to-end.
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @Hesamation: a few months ago i shared this interactive blog post “LLM embeddings explained” on @huggingface and it gives me chills that people have actually found it helpful.

yesterday someone posted about it on LinkedIn, made me think about it after a while!
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @reach_vb: BOOM! Latest Qwen 30B A3B 2507 running blazingly fast on Mac powered by MLX 💥

mlx_lm.chat --model "lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-4bit"

That's it, try it out today!
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @roo_code: Got a favorite @huggingface model? Now it lives in your editor. 🤗

Roo Code makes it easy to connect your API key, choose from 90+ models, and select your preferred inference provider in just a few clicks.

Watch the quick tutorial and explore more: https://docs.roocode.com/providers/huggingface
Hugging Face (Twitter)

RT @vanstriendaniel: I just processed 1000s of prompts using Qwen3-235B-A22B-Instruct-2507 across 4 GPUs!

How? Everyone plays their part:
@astral_sh UV handles dependencies
@huggingface Jobs handles GPUs
@Alibaba_Qwen handles the model
@vllm_project handles inference

One command. Zero complexity!
Hugging Face (Twitter)

RT @lhoestq: > hf jobs is just out and damnnnn I love the uv integration 💛

@huggingface made their scripts uv-ready to run them on HF infra without setting up docker or dependencies.

E.g.
run DPO locally > uv run dpo․py
run DPO on HF > hf jobs uv run dpo․py

Bonus: --flavor for GPUs🔥
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @NielsRogge: Efficient LoFTR was just integrated into @huggingface Transformers!

It improves upon LoFTR, a detector-free image matcher, by being 2.5x faster. It can even surpass the SOTA efficient sparse matching pipeline SuperPoint + LightGlue.

Now available in a few lines of code!
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @jandotai: Jan v0.6.6 is out: Jan now runs fully on llama.cpp.

- Cortex is gone, local models now run on @ggerganov's llama.cpp
- Toggle between llama.cpp builds
- @huggingface added as a model provider
- Hub enhanced
- Images from MCPs render inline in chat

Update Jan or grab the latest.