Hugging Face – Telegram

Hugging Face

76 subscribers

751 photos

257 videos

1.28K links

Download Telegram

About

Blog

Apps

Platform

Hugging Face (Twitter)

RT @rohanpaul_ai: MASSIVE. THE LARGEST open-sourced pdf data just dropped on @huggingface . Finepdfs

3 trillion tokens across 475 million documents in 1733 languages.

This is the largest publicly available corpus sourced exclusively from PDFs, containing about

The data was sourced from 105 CommonCrawl snapshots, spanning the summer of 2013 to February 2025, as well as refetched from the internet, and processed using 🏭 datatrove, huggingface's large scale data processing library.

This carefully deduplicated and filtered dataset comprises roughly 3.65 terabytes of 3T tokens. For PII and opt-out see Personal and Sensitive Information and opt-out.

The dataset is fully reproducible and released under the ODC-By 1.0 license. You will be able to access the reproduction code, ablation and evaluation setup in this GitHub repository soon 👷.

Compared to HTML datasets, despite being only mildly filtered, it achieves results nearly on par with...

Перейти на оригинальный пост

19 views22:18

Hugging Face (Twitter)

RT @iScienceLuvr: If you need to know how much time left you have to submit your paper, you can check "AI Conference Deadlines"

before there used to be a separate website maintained by PapersWithCode, but since PapersWithCode was shut down, it's now available on HuggingFace

21 views22:18

This media is not supported in your browser

VIEW IN TELEGRAM

Hugging Face (Twitter)

RT @mervenoyann: upgrade your transformers 🔥

it comes with insanely capable models like SAM2, KOSMOS2.5, Florence-2 and more 🤝

I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻‍♀️ fine-tuning will follow-up soon!

20 views22:18

Hugging Face (Twitter)

RT @MaziyarPanahi: Introducing MultiCaRe, open-source, multimodal clinical case datasets on @HuggingFace by @OpenMed_AI Community. Public and ready for load_dataset.

Images: 160K+ figures/subimages

Cases: 85K de-identified narratives + demographics

Articles: 85K metadata + abstracts

🧵 (1/7)

16 views23:35

Hugging Face (Twitter)

RT @Tim_Dettmers: It feels the coding agent frontier is now open-weights:

GLM 4.5 costs only $3/month and is on par with Sonnet
Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good

Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs -- too slow.

18 views23:35

Hugging Face (Twitter)

RT @HuggingPapers: Meta researchers just unveiled Set Block Decoding on Hugging Face.

It's a game-changer for language model inference, delivering 3-5x speedup in token generation with existing models.

No architectural changes needed, matches previous performance.

21 views23:35

Hugging Face (Twitter)

RT @Xianbao_QIAN: The new @TencentHunyuan image 2.1 model is really cool.

It reminds me of @Zai_org GLM 4.1. I love how these researchers being humble and calling great improvement 0.1

Both model & demo released on @huggingface

25 views23:35

Hugging Face (Twitter)

RT @tomaarsen: ModernBERT goes MULTILINGUAL!

One of the most requested models I've seen, @jhuclsp has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.

Stronger than an existing models at their sizes, while also much faster!

Details in 🧵

26 views23:37

This media is not supported in your browser

VIEW IN TELEGRAM

Hugging Face (Twitter)

RT @adrgrondin: I gave SmolLM3 by @huggingface a voice 🗣️

Here’s a demo of me talking with the model hands-free on iPhone, thanks to built-in voice activity detection

Everything runs fully on-device, powered by Apple MLX

28 views23:37

‌Hugging Face (Twitter)

RT @daftengine: aaaaand we're live on @huggingface documentation! Thank you to @lhoestq, @vanstriendaniel and the Hugging Face team for all their help pushing this through and excited for our continued collaboration!
na2.hubs.ly/H010TDt0

#Daft #HuggingFace #Multimodal #OpenSource

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

22 views20:33

Hugging Face (Twitter)

RT @vanstriendaniel: Visual-TableQA: Complex Table Reasoning Benchmark

- 2.5K - tables with 6K QA pairs
- Multi-step reasoning over visual structures
- 92% human validation agreement
- Under $100 generation cost

26 views20:33

This media is not supported in your browser

VIEW IN TELEGRAM

Hugging Face (Twitter)

Our 𝒻𝓇ℯℯ new experiment tracking library now supports logging images, videos, tables, and of course metrics. https://twitter.com/abidlabs/status/1965828375681142903#m

30 views20:34

This media is not supported in your browser

VIEW IN TELEGRAM

Hugging Face (Twitter)

RT @Xianbao_QIAN: IndexTT2 demo is now ready on @huggingface

https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo https://twitter.com/indiehackercase/status/1965454252706533738#m

28 views09:49

Hugging Face (Twitter)

RT @ClementDelangue: Super excited to bring hundreds of state-of-the-art open models (Kimi K2, Qwen3 Next, gpt-oss, Aya, GLM 4.5, Deepseek 3.1, Hermes 4, and dozens new ones every day) directly into @code & @Copilot, thanks to @huggingface inference providers!

This is powered by our amazing partners @CerebrasSystems, @FireworksAI_HQ, @Cohere_Labs, @GroqInc, @novita_labs, @togethercompute, and others who make this possible. 💪

Here’s why this is different than other APIs:
🧠 Open weights - models you can truly own, so they’ll never get nerfed or taken away from you
⚡ Multiple providers - automatically routing to get you the best speed, latency, and reliability
💸 Fair pricing - competitive rates with generous free tiers to experiment and build
🔁 Seamless switching - swap models on the fly without touching your code
🧩 Full transparency - know exactly what’s running and customize it however you want

The future of AI copilots is open and this is a big first step! 🚀

21 views01:06

Hugging Face (Twitter)

RT @_akhaliq: Qwen3-Next-80B-A3B is out

80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)

Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship.

Qwen3-Next-80B-A3B-Thinking outperforms Gemini-2.5-Flash-Thinking

both now available in anycoder for vibe coding

22 views01:06

Hugging Face (Twitter)

RT @reach_vb: You DO NOT want to miss this - All the tricks and optimisations used to make gpt-oss blazingly fast, all of it - in a blogpost (with benchmarks)! 🔥

We cover details ranging from MXFP4 quantisation to, pre-built kernels, Tensor/ Expert Parallelism, Continuous Batching and much more

Bonus: We add extensive benchmarks (along with reproducible scripts)! ⚡

23 views01:06

This media is not supported in your browser

VIEW IN TELEGRAM

Hugging Face (Twitter)

RT @reach_vb: BOOM! Starting today you can use open source frontier LLMs in @code with HF Inference Providers! 🔥

Use your inference credits on SoTA llms like GLM 4.5, Qwen3 Coder, DeepSeek 3.1 and more

All of it packaged in one simple extension - try it out today 🤗

26 views01:06

This media is not supported in your browser

VIEW IN TELEGRAM

Hugging Face (Twitter)

RT @hanouticelina: Starting today, you can use Hugging Face Inference Providers directly in GitHub Copilot Chat on @code! 🔥

which means you can access frontier open-source LLMs like Qwen3-Coder, gpt-oss and GLM-4.5 directly in VS Code, powered by our world-class inference partners - @CerebrasSystems, @Cohere_Labs, @FireworksAI_HQ, @GroqInc, @novita_labs, @togethercompute & more!

give it a try today! 🧵👇

33 views01:07