Hugging Face (Twitter)
RT @rohanpaul_ai: MASSIVE. THE LARGEST open-sourced pdf data just dropped on @huggingface . Finepdfs
3 trillion tokens across 475 million documents in 1733 languages.
This is the largest publicly available corpus sourced exclusively from PDFs, containing about
The data was sourced from 105 CommonCrawl snapshots, spanning the summer of 2013 to February 2025, as well as refetched from the internet, and processed using 🏭 datatrove, huggingface's large scale data processing library.
This carefully deduplicated and filtered dataset comprises roughly 3.65 terabytes of 3T tokens. For PII and opt-out see Personal and Sensitive Information and opt-out.
The dataset is fully reproducible and released under the ODC-By 1.0 license. You will be able to access the reproduction code, ablation and evaluation setup in this GitHub repository soon 👷.
Compared to HTML datasets, despite being only mildly filtered, it achieves results nearly on par with...
Перейти на оригинальный пост
RT @rohanpaul_ai: MASSIVE. THE LARGEST open-sourced pdf data just dropped on @huggingface . Finepdfs
3 trillion tokens across 475 million documents in 1733 languages.
This is the largest publicly available corpus sourced exclusively from PDFs, containing about
The data was sourced from 105 CommonCrawl snapshots, spanning the summer of 2013 to February 2025, as well as refetched from the internet, and processed using 🏭 datatrove, huggingface's large scale data processing library.
This carefully deduplicated and filtered dataset comprises roughly 3.65 terabytes of 3T tokens. For PII and opt-out see Personal and Sensitive Information and opt-out.
The dataset is fully reproducible and released under the ODC-By 1.0 license. You will be able to access the reproduction code, ablation and evaluation setup in this GitHub repository soon 👷.
Compared to HTML datasets, despite being only mildly filtered, it achieves results nearly on par with...
Перейти на оригинальный пост
Hugging Face (Twitter)
RT @iScienceLuvr: If you need to know how much time left you have to submit your paper, you can check "AI Conference Deadlines"
before there used to be a separate website maintained by PapersWithCode, but since PapersWithCode was shut down, it's now available on HuggingFace
RT @iScienceLuvr: If you need to know how much time left you have to submit your paper, you can check "AI Conference Deadlines"
before there used to be a separate website maintained by PapersWithCode, but since PapersWithCode was shut down, it's now available on HuggingFace
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @mervenoyann: upgrade your transformers 🔥
it comes with insanely capable models like SAM2, KOSMOS2.5, Florence-2 and more 🤝
I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻♀️ fine-tuning will follow-up soon!
RT @mervenoyann: upgrade your transformers 🔥
it comes with insanely capable models like SAM2, KOSMOS2.5, Florence-2 and more 🤝
I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻♀️ fine-tuning will follow-up soon!
Hugging Face (Twitter)
RT @MaziyarPanahi: Introducing MultiCaRe, open-source, multimodal clinical case datasets on @HuggingFace by @OpenMed_AI Community. Public and ready for load_dataset.
Images: 160K+ figures/subimages
Cases: 85K de-identified narratives + demographics
Articles: 85K metadata + abstracts
🧵 (1/7)
RT @MaziyarPanahi: Introducing MultiCaRe, open-source, multimodal clinical case datasets on @HuggingFace by @OpenMed_AI Community. Public and ready for load_dataset.
Images: 160K+ figures/subimages
Cases: 85K de-identified narratives + demographics
Articles: 85K metadata + abstracts
🧵 (1/7)
Hugging Face (Twitter)
RT @Tim_Dettmers: It feels the coding agent frontier is now open-weights:
GLM 4.5 costs only $3/month and is on par with Sonnet
Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good
Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs -- too slow.
RT @Tim_Dettmers: It feels the coding agent frontier is now open-weights:
GLM 4.5 costs only $3/month and is on par with Sonnet
Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good
Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs -- too slow.
Hugging Face (Twitter)
RT @HuggingPapers: Meta researchers just unveiled Set Block Decoding on Hugging Face.
It's a game-changer for language model inference, delivering 3-5x speedup in token generation with existing models.
No architectural changes needed, matches previous performance.
RT @HuggingPapers: Meta researchers just unveiled Set Block Decoding on Hugging Face.
It's a game-changer for language model inference, delivering 3-5x speedup in token generation with existing models.
No architectural changes needed, matches previous performance.
Hugging Face (Twitter)
RT @Xianbao_QIAN: The new @TencentHunyuan image 2.1 model is really cool.
It reminds me of @Zai_org GLM 4.1. I love how these researchers being humble and calling great improvement 0.1
Both model & demo released on @huggingface
RT @Xianbao_QIAN: The new @TencentHunyuan image 2.1 model is really cool.
It reminds me of @Zai_org GLM 4.1. I love how these researchers being humble and calling great improvement 0.1
Both model & demo released on @huggingface
Hugging Face (Twitter)
RT @tomaarsen: ModernBERT goes MULTILINGUAL!
One of the most requested models I've seen, @jhuclsp has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.
Stronger than an existing models at their sizes, while also much faster!
Details in 🧵
RT @tomaarsen: ModernBERT goes MULTILINGUAL!
One of the most requested models I've seen, @jhuclsp has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT.
Stronger than an existing models at their sizes, while also much faster!
Details in 🧵
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @adrgrondin: I gave SmolLM3 by @huggingface a voice 🗣️
Here’s a demo of me talking with the model hands-free on iPhone, thanks to built-in voice activity detection
Everything runs fully on-device, powered by Apple MLX
RT @adrgrondin: I gave SmolLM3 by @huggingface a voice 🗣️
Here’s a demo of me talking with the model hands-free on iPhone, thanks to built-in voice activity detection
Everything runs fully on-device, powered by Apple MLX
Hugging Face (Twitter)
RT @daftengine: aaaaand we're live on @huggingface documentation! Thank you to @lhoestq, @vanstriendaniel and the Hugging Face team for all their help pushing this through and excited for our continued collaboration!
na2.hubs.ly/H010TDt0
#Daft #HuggingFace #Multimodal #OpenSource
RT @daftengine: aaaaand we're live on @huggingface documentation! Thank you to @lhoestq, @vanstriendaniel and the Hugging Face team for all their help pushing this through and excited for our continued collaboration!
na2.hubs.ly/H010TDt0
#Daft #HuggingFace #Multimodal #OpenSource
huggingface.co
Daft
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Hugging Face (Twitter)
RT @vanstriendaniel: Visual-TableQA: Complex Table Reasoning Benchmark
- 2.5K - tables with 6K QA pairs
- Multi-step reasoning over visual structures
- 92% human validation agreement
- Under $100 generation cost
RT @vanstriendaniel: Visual-TableQA: Complex Table Reasoning Benchmark
- 2.5K - tables with 6K QA pairs
- Multi-step reasoning over visual structures
- 92% human validation agreement
- Under $100 generation cost
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
Our 𝒻𝓇ℯℯ new experiment tracking library now supports logging images, videos, tables, and of course metrics. https://twitter.com/abidlabs/status/1965828375681142903#m
Our 𝒻𝓇ℯℯ new experiment tracking library now supports logging images, videos, tables, and of course metrics. https://twitter.com/abidlabs/status/1965828375681142903#m
Hugging Face (Twitter)
RT @ClementDelangue: Super excited to bring hundreds of state-of-the-art open models (Kimi K2, Qwen3 Next, gpt-oss, Aya, GLM 4.5, Deepseek 3.1, Hermes 4, and dozens new ones every day) directly into @code & @Copilot, thanks to @huggingface inference providers!
This is powered by our amazing partners @CerebrasSystems, @FireworksAI_HQ, @Cohere_Labs, @GroqInc, @novita_labs, @togethercompute, and others who make this possible. 💪
Here’s why this is different than other APIs:
🧠 Open weights - models you can truly own, so they’ll never get nerfed or taken away from you
⚡ Multiple providers - automatically routing to get you the best speed, latency, and reliability
💸 Fair pricing - competitive rates with generous free tiers to experiment and build
🔁 Seamless switching - swap models on the fly without touching your code
🧩 Full transparency - know exactly what’s running and customize it however you want
The future of AI copilots is open and this is a big first step! 🚀
RT @ClementDelangue: Super excited to bring hundreds of state-of-the-art open models (Kimi K2, Qwen3 Next, gpt-oss, Aya, GLM 4.5, Deepseek 3.1, Hermes 4, and dozens new ones every day) directly into @code & @Copilot, thanks to @huggingface inference providers!
This is powered by our amazing partners @CerebrasSystems, @FireworksAI_HQ, @Cohere_Labs, @GroqInc, @novita_labs, @togethercompute, and others who make this possible. 💪
Here’s why this is different than other APIs:
🧠 Open weights - models you can truly own, so they’ll never get nerfed or taken away from you
⚡ Multiple providers - automatically routing to get you the best speed, latency, and reliability
💸 Fair pricing - competitive rates with generous free tiers to experiment and build
🔁 Seamless switching - swap models on the fly without touching your code
🧩 Full transparency - know exactly what’s running and customize it however you want
The future of AI copilots is open and this is a big first step! 🚀
Hugging Face (Twitter)
RT @_akhaliq: Qwen3-Next-80B-A3B is out
80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship.
Qwen3-Next-80B-A3B-Thinking outperforms Gemini-2.5-Flash-Thinking
both now available in anycoder for vibe coding
RT @_akhaliq: Qwen3-Next-80B-A3B is out
80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship.
Qwen3-Next-80B-A3B-Thinking outperforms Gemini-2.5-Flash-Thinking
both now available in anycoder for vibe coding
Hugging Face (Twitter)
RT @reach_vb: You DO NOT want to miss this - All the tricks and optimisations used to make gpt-oss blazingly fast, all of it - in a blogpost (with benchmarks)! 🔥
We cover details ranging from MXFP4 quantisation to, pre-built kernels, Tensor/ Expert Parallelism, Continuous Batching and much more
Bonus: We add extensive benchmarks (along with reproducible scripts)! ⚡
RT @reach_vb: You DO NOT want to miss this - All the tricks and optimisations used to make gpt-oss blazingly fast, all of it - in a blogpost (with benchmarks)! 🔥
We cover details ranging from MXFP4 quantisation to, pre-built kernels, Tensor/ Expert Parallelism, Continuous Batching and much more
Bonus: We add extensive benchmarks (along with reproducible scripts)! ⚡
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @reach_vb: BOOM! Starting today you can use open source frontier LLMs in @code with HF Inference Providers! 🔥
Use your inference credits on SoTA llms like GLM 4.5, Qwen3 Coder, DeepSeek 3.1 and more
All of it packaged in one simple extension - try it out today 🤗
RT @reach_vb: BOOM! Starting today you can use open source frontier LLMs in @code with HF Inference Providers! 🔥
Use your inference credits on SoTA llms like GLM 4.5, Qwen3 Coder, DeepSeek 3.1 and more
All of it packaged in one simple extension - try it out today 🤗
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)
RT @hanouticelina: Starting today, you can use Hugging Face Inference Providers directly in GitHub Copilot Chat on @code! 🔥
which means you can access frontier open-source LLMs like Qwen3-Coder, gpt-oss and GLM-4.5 directly in VS Code, powered by our world-class inference partners - @CerebrasSystems, @Cohere_Labs, @FireworksAI_HQ, @GroqInc, @novita_labs, @togethercompute & more!
give it a try today! 🧵👇
RT @hanouticelina: Starting today, you can use Hugging Face Inference Providers directly in GitHub Copilot Chat on @code! 🔥
which means you can access frontier open-source LLMs like Qwen3-Coder, gpt-oss and GLM-4.5 directly in VS Code, powered by our world-class inference partners - @CerebrasSystems, @Cohere_Labs, @FireworksAI_HQ, @GroqInc, @novita_labs, @togethercompute & more!
give it a try today! 🧵👇