Hugging Face
69 subscribers
708 photos
245 videos
1.21K links
Download Telegram
Hugging Face (Twitter)

RT @eliebakouch: The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty.
The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert.

1) New architecture
> Layers have 2 Attention blocks and both FFN and MoE, that way you can overlap the 2 all-to-all coms. (also it's only 28 layers but you have to take into account the 2 attention blocks).
> They add the zero-computational expert that tokens can choose and do nothing, kinda like a "sink" for easy tokens.
> For load balancing, they have a dsv3-like aux loss free to set the average real/fake expert per token. They apply a decay schedule to this bias update. They also do loss balance control.

2) Scaling
> They made changes to MLA/MoE to have variance alignment at init. The gains are pretty impressive in Figure 5, but i don't know to what extent this has impact later on.
> Model...

Перейти на оригинальный пост
Hugging Face (Twitter)

RT @Meituan_LongCat: 🚀 LongCat-Flash-Chat Launches!

▫️ 560B Total Params | 18.6B-31.3B Dynamic Activation
▫️ Trained on 20T Tokens | 100+ tokens/sec Inference
▫️ High Performance: TerminalBench 39.5 | τ²-Bench 67.7

🔗 Model: https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
💻 Try Now: longcat.ai
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @HuggingPapers: ByteDance Seed and Stanford introduce Mixture of Contexts (MoC) for long video generation, tackling the memory bottleneck with a novel sparse attention routing module.

It enables minute-long consistent videos with short-video cost.
Hugging Face (Twitter)

RT @AdinaYakup: Hunyuan-MT-7B 🔥 open translation model released by @TencentHunyuan

https://huggingface.co/collections/tencent/hunyuan-mt-68b42f76d473f82798882597

Supports 33 languages, including 5 ethnic minority languages in China 👀
Including a translation ensemble model: Chimera-7B
Full pipeline: pretrain > CPT > SFT > enhancement > ensemble refinement > SOTA performance at similar scale
Hugging Face (Twitter)

RT @multimodalart: a mysterious new button appeared on the @huggingface Spaces Nano Banana app 👀
Hugging Face (Twitter)

RT @reach_vb: that's a Chinese food delivery company absolutely mogging the competition https://twitter.com/reach_vb/status/1961833208737103997#m
Hugging Face (Twitter)

RT @MaziyarPanahi: need your help! list your top 5 datasets on @huggingface for rl training with verified answers.

- math
- code
- everyday stuff
Hugging Face (Twitter)

RT @MaziyarPanahi: 1/ shipping two synthetic med qa sets from @OpenMed_AI community, made by @mkurman88 (core contributor):

• med-synth qwen3-235b-a22b (2507)
• med-synth gemma 3 (27b-it)

datasets on @huggingface 👇
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @reach_vb: BOOM! Microsoft just released an upgraded VibeVoice Large ~10B Text to Speech model - MIT licensed 🔥

> Generate multi-speaker podcasts in minutes
> Works blazingly fast on ZeroGPU with H200 (FREE)

Try it out today! https://twitter.com/reach_vb/status/1960064616278417826#m
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @ClementDelangue: If you think @Apple is not doing much in AI, you're getting blindsided by the chatbot hype and not paying enough attention!

They just released FastVLM and MobileCLIP2 on @huggingface. The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time vision language model (VLM) applications! It can even do live video captioning 100% locally in your browser 🤯🤯🤯
Hugging Face (Twitter)

RT @eliebakouch: Super excited to announce that our research team at @huggingface will be doing an AMA on r/LocalLLaMA.

Come ask any questions to the team behind SmolLM, FineWeb and more! And who knows, maybe there’ll be a shiny new release to talk about?

Thursday 4th September, 8AM-11AM PST 🤗
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @reach_vb: 🎬 One prompt → a full video

GPT-5 + open models, stitched together with @OpenAI Codex + HF MCP Server 🤯
Hugging Face (Twitter)

RT @RisingSayak: ZeroGPU on 🤗 HF Spaces enables anyone to build delightful ML demos, benefitting from powerful compute. But, due to its serverless nature, it is hard to optimize these demos.

That CHANGES today 🪖

Use AoT compilation to melt our ZeroGPU servers 🔥

Details ⬇️