Hugging Face (Twitter)
RT @elonmusk: It’s a good model, sir https://twitter.com/victormustar/status/1960613514562752685#m
RT @elonmusk: It’s a good model, sir https://twitter.com/victormustar/status/1960613514562752685#m
Hugging Face (Twitter)
RT @ArtificialAnlys: NVIDIA has released Nemotron Nano 9B V2, a small 9B reasoning model that scores 43 on the Artificial Analysis Intelligence Index, the highest yet for <10B models
Nemotron 9B V2 is the first Nemotron model pre-trained by @NVIDIA. Previous Nemotron models have been developed by post-training on Meta Llama models.
Architecture & Training: The model uses a hybrid Mamba-Transformer architecture. NVIDIA pre-trained a 12B parameter base model and applied post-training with a range of techniques including RLHF and GRPO. The final 9B size was pruned from this model and re-trained with the base model as a teacher.
Small-model frontier: with only 9B parameters, Nemotron Nano 9B V2 is placed ahead of Llama 4 Maverick on our leaderboard, equal to Solar Pro 2 with reasoning and trails just behind gpt-oss-20B (high).
Along with this model, NVIDIA rele...
Перейти на оригинальный пост
RT @ArtificialAnlys: NVIDIA has released Nemotron Nano 9B V2, a small 9B reasoning model that scores 43 on the Artificial Analysis Intelligence Index, the highest yet for <10B models
Nemotron 9B V2 is the first Nemotron model pre-trained by @NVIDIA. Previous Nemotron models have been developed by post-training on Meta Llama models.
Architecture & Training: The model uses a hybrid Mamba-Transformer architecture. NVIDIA pre-trained a 12B parameter base model and applied post-training with a range of techniques including RLHF and GRPO. The final 9B size was pruned from this model and re-trained with the base model as a teacher.
Small-model frontier: with only 9B parameters, Nemotron Nano 9B V2 is placed ahead of Llama 4 Maverick on our leaderboard, equal to Solar Pro 2 with reasoning and trails just behind gpt-oss-20B (high).
Along with this model, NVIDIA rele...
Перейти на оригинальный пост
Hugging Face (Twitter)
RT @rohanpaul_ai: 🖼️ MiniCPM-V 4.5 just dropped on @huggingface
Apache 2 with free commercial use.
With only 8B parameters, it surpasses many SOTA models like GPT-4o-latest, Gemini-2.0 Pro, Qwen2.5-VL 72B for vision-language capabilities, making it the most performant MLLM under 30B parameters.
- combines strong vision, fast video handling, and tough OCR, so the headline is real capability with small compute.
- High resolution images up to 1.8M pixels pass through an LLaVA-UHD style path that uses 4x fewer visual tokens, which is why reading small text and dense PDFs holds up.
- The model pairs Qwen3-8B as the language core with a SigLIP2-400M vision tower, giving it a compact but capable backbone.
- On public leaderboards it posts 77.0 on OpenCompass, hits 2500 on MME, and leads document tasks like OCRBench 89.0, with strong video numbers on Video-MME, LVBench, and MLVU.
- A new unified 3D-Resampler packs 6 consecutive 448x448 frames into 64...
Перейти на оригинальный пост
RT @rohanpaul_ai: 🖼️ MiniCPM-V 4.5 just dropped on @huggingface
Apache 2 with free commercial use.
With only 8B parameters, it surpasses many SOTA models like GPT-4o-latest, Gemini-2.0 Pro, Qwen2.5-VL 72B for vision-language capabilities, making it the most performant MLLM under 30B parameters.
- combines strong vision, fast video handling, and tough OCR, so the headline is real capability with small compute.
- High resolution images up to 1.8M pixels pass through an LLaVA-UHD style path that uses 4x fewer visual tokens, which is why reading small text and dense PDFs holds up.
- The model pairs Qwen3-8B as the language core with a SigLIP2-400M vision tower, giving it a compact but capable backbone.
- On public leaderboards it posts 77.0 on OpenCompass, hits 2500 on MME, and leads document tasks like OCRBench 89.0, with strong video numbers on Video-MME, LVBench, and MLVU.
- A new unified 3D-Resampler packs 6 consecutive 448x448 frames into 64...
Перейти на оригинальный пост
Hugging Face (Twitter)
RT @ramin_m_h: Over 1 million Liquid foundation models downloaded through @huggingface! The community realized how far we can push with tiny models when they are designed from first principles. Proud of my team at @LiquidAI_!
Liquid Discord community: discord.com/invite/liquid-ai
Play with our models in Apollo: https://apps.apple.com/us/app/apollo-powered-by-liquid/id6448019325
Build with Liquid models in LEAP: leap.liquid.ai/
RT @ramin_m_h: Over 1 million Liquid foundation models downloaded through @huggingface! The community realized how far we can push with tiny models when they are designed from first principles. Proud of my team at @LiquidAI_!
Liquid Discord community: discord.com/invite/liquid-ai
Play with our models in Apollo: https://apps.apple.com/us/app/apollo-powered-by-liquid/id6448019325
Build with Liquid models in LEAP: leap.liquid.ai/
Hugging Face (Twitter)
RT @dylan_ebert_: These are the current best Generative 3D
Render:
#1 - CSM
#2 - TRELLIS (open-source)
#3 - Zaohaowu3D
Topology:
#1 - Hunyuan3D-2
#2 - TRELLIS (open-source)
#3 - Hunyuan3D-2.1
as voted/submitted openly on 3D Arena
RT @dylan_ebert_: These are the current best Generative 3D
Render:
#1 - CSM
#2 - TRELLIS (open-source)
#3 - Zaohaowu3D
Topology:
#1 - Hunyuan3D-2
#2 - TRELLIS (open-source)
#3 - Hunyuan3D-2.1
as voted/submitted openly on 3D Arena
Hugging Face (Twitter)
RT @Xianbao_QIAN: 500+ hours of real world manipulation data, covering residential, kitchen, retail and office settings. A important step towards generalized manipulation models!
Great work Galaxea team!
https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset
RT @Xianbao_QIAN: 500+ hours of real world manipulation data, covering residential, kitchen, retail and office settings. A important step towards generalized manipulation models!
Great work Galaxea team!
https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @DataChaz: This is wild.
A real-time webcam demo using SmolVLM from @huggingface and llama.cpp! 🤯
Running fully local on a MacBook M3.
RT @DataChaz: This is wild.
A real-time webcam demo using SmolVLM from @huggingface and llama.cpp! 🤯
Running fully local on a MacBook M3.
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @Tu7uruu: Just dropped on HF! HunyuanVideo-Foley from Tencent AI Lab an end-to-end Text-Video-to-Audio (TV2A) model that turns silent videos into lifelike soundscapes
> 100k-hour curated TV2A dataset via automated pipeline
> Modality-balanced MMDiT: dual-stream audio-video fusion + text cross-attention
> REPA loss: aligns internal states with self-supervised audio features → higher fidelity & stability
> DAC-VAE audio codec: 48kHz, continuous latents, strong reconstruction across speech/music/sfx
> SOTA on Kling-Audio-Eval, VGGSound, and MovieGen-Audio-Bench (audio quality, semantic + temporal alignment)
RT @Tu7uruu: Just dropped on HF! HunyuanVideo-Foley from Tencent AI Lab an end-to-end Text-Video-to-Audio (TV2A) model that turns silent videos into lifelike soundscapes
> 100k-hour curated TV2A dataset via automated pipeline
> Modality-balanced MMDiT: dual-stream audio-video fusion + text cross-attention
> REPA loss: aligns internal states with self-supervised audio features → higher fidelity & stability
> DAC-VAE audio codec: 48kHz, continuous latents, strong reconstruction across speech/music/sfx
> SOTA on Kling-Audio-Eval, VGGSound, and MovieGen-Audio-Bench (audio quality, semantic + temporal alignment)
Media is too big
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @TencentHunyuan: Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀
This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊
Key Innovations:
🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts.
🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements.
🔹High-Fidelity Audio: Using a Representation Alignment...
Перейти на оригинальный пост
RT @TencentHunyuan: Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀
This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊
Key Innovations:
🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts.
🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements.
🔹High-Fidelity Audio: Using a Representation Alignment...
Перейти на оригинальный пост
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @pollenrobotics: Two Reachy 2 setting and clearing the table, all in real time teleoperation!
Shot in a single take with all the successes... and a small fail👀
One example of what Reachy 2 can do: efficient, versatile object manipulation, with the precision needed for delicate or fragile tasks
RT @pollenrobotics: Two Reachy 2 setting and clearing the table, all in real time teleoperation!
Shot in a single take with all the successes... and a small fail👀
One example of what Reachy 2 can do: efficient, versatile object manipulation, with the precision needed for delicate or fragile tasks
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @reach_vb: 🚨 Apple just released FastVLM on Hugging Face - 0.5, 1.5 and 7B real-time VLMs with WebGPU support 🤯
> 85x faster and 3.4x smaller than comparable sized VLMs
> 7.9x faster TTFT for larger models
> designed to output fewer output tokens and reduce encoding time for high resolution images
Bonus: works in REALTIME directly in your browser powered by transformers.js and WebGPU 🔥
Try it out on the demo below 👇
RT @reach_vb: 🚨 Apple just released FastVLM on Hugging Face - 0.5, 1.5 and 7B real-time VLMs with WebGPU support 🤯
> 85x faster and 3.4x smaller than comparable sized VLMs
> 7.9x faster TTFT for larger models
> designed to output fewer output tokens and reduce encoding time for high resolution images
Bonus: works in REALTIME directly in your browser powered by transformers.js and WebGPU 🔥
Try it out on the demo below 👇
Media is too big
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @xenovacom: NEW: Apple releases FastVLM and MobileCLIP2 on Hugging Face! 🤗
The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications! 🤯
It can even do live video captioning 100% locally in your browser (zero install). Huge for accessibility!
RT @xenovacom: NEW: Apple releases FastVLM and MobileCLIP2 on Hugging Face! 🤗
The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications! 🤯
It can even do live video captioning 100% locally in your browser (zero install). Huge for accessibility!
Hugging Face (Twitter)
RT @RisingSayak: Lovely time presenting at #AIDev Amsterdam today ❤️
We explored some 📹 models (Wan, LTX, etc.), their existing capabilities, and limitations.
I am glad that the attendees found my presentation to be an enjoyable experience 🫡
Find the slides here ⬇️
bit.ly/open-vid-gen
RT @RisingSayak: Lovely time presenting at #AIDev Amsterdam today ❤️
We explored some 📹 models (Wan, LTX, etc.), their existing capabilities, and limitations.
I am glad that the attendees found my presentation to be an enjoyable experience 🫡
Find the slides here ⬇️
bit.ly/open-vid-gen
Hugging Face (Twitter)
RT @Xianbao_QIAN: Meituan just open sourced their new MoE LLM LongCat on @huggingface
It's exciting to see new players! The model looks very interesting too with technical report.
https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
RT @Xianbao_QIAN: Meituan just open sourced their new MoE LLM LongCat on @huggingface
It's exciting to see new players! The model looks very interesting too with technical report.
https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
Hugging Face (Twitter)
RT @NielsRogge: GLM-4.5 is beating Claude-4 Opus on the Berkeley Function Calling benchmark while costing 70x less
RT @NielsRogge: GLM-4.5 is beating Claude-4 Opus on the Berkeley Function Calling benchmark while costing 70x less
Hugging Face (Twitter)
RT @eliebakouch: The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty.
The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert.
1) New architecture
> Layers have 2 Attention blocks and both FFN and MoE, that way you can overlap the 2 all-to-all coms. (also it's only 28 layers but you have to take into account the 2 attention blocks).
> They add the zero-computational expert that tokens can choose and do nothing, kinda like a "sink" for easy tokens.
> For load balancing, they have a dsv3-like aux loss free to set the average real/fake expert per token. They apply a decay schedule to this bias update. They also do loss balance control.
2) Scaling
> They made changes to MLA/MoE to have variance alignment at init. The gains are pretty impressive in Figure 5, but i don't know to what extent this has impact later on.
> Model...
Перейти на оригинальный пост
RT @eliebakouch: The technical report of @Meituan_LongCat LongCat-Flash is crazy good and full of novelty.
The model is a 560B passive ~27B active MoE with adaptive number of active parameters depending on the context thanks to the Zero-Computational expert.
1) New architecture
> Layers have 2 Attention blocks and both FFN and MoE, that way you can overlap the 2 all-to-all coms. (also it's only 28 layers but you have to take into account the 2 attention blocks).
> They add the zero-computational expert that tokens can choose and do nothing, kinda like a "sink" for easy tokens.
> For load balancing, they have a dsv3-like aux loss free to set the average real/fake expert per token. They apply a decay schedule to this bias update. They also do loss balance control.
2) Scaling
> They made changes to MLA/MoE to have variance alignment at init. The gains are pretty impressive in Figure 5, but i don't know to what extent this has impact later on.
> Model...
Перейти на оригинальный пост
Hugging Face (Twitter)
RT @Meituan_LongCat: 🚀 LongCat-Flash-Chat Launches!
▫️ 560B Total Params | 18.6B-31.3B Dynamic Activation
▫️ Trained on 20T Tokens | 100+ tokens/sec Inference
▫️ High Performance: TerminalBench 39.5 | τ²-Bench 67.7
🔗 Model: https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
💻 Try Now: longcat.ai
RT @Meituan_LongCat: 🚀 LongCat-Flash-Chat Launches!
▫️ 560B Total Params | 18.6B-31.3B Dynamic Activation
▫️ Trained on 20T Tokens | 100+ tokens/sec Inference
▫️ High Performance: TerminalBench 39.5 | τ²-Bench 67.7
🔗 Model: https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
💻 Try Now: longcat.ai
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @HuggingPapers: ByteDance Seed and Stanford introduce Mixture of Contexts (MoC) for long video generation, tackling the memory bottleneck with a novel sparse attention routing module.
It enables minute-long consistent videos with short-video cost.
RT @HuggingPapers: ByteDance Seed and Stanford introduce Mixture of Contexts (MoC) for long video generation, tackling the memory bottleneck with a novel sparse attention routing module.
It enables minute-long consistent videos with short-video cost.