Hugging Face
72 subscribers
739 photos
253 videos
1.26K links
Download Telegram
Hugging Face (Twitter)

RT @lhoestq: AI engineers don't have to struggle to get the datasets ready for training anymore

1/ Prepare your raw data (from database/crawls) into AI ready datasets

2/ Publish on @huggingface and so your team can look at the data and train/eval easily

+uploads are crazy fast with Xet!
Hugging Face (Twitter)

RT @moby763canary21: I'm finally having my Twitter moment 😂

Thanks @ClementDelangue and @huggingface
Hugging Face (Twitter)

RT @reach_vb: We just added OpenAI Codex CLI formal support in Hugging Face MCP Server - go play with it now!! 🔥
Hugging Face (Twitter)

RT @AdinaYakup: Latest update from @Kimi_Moonshot
Kimi K2 >>> Kimi K2-Instruct-0905🔥

https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905

32B activated / 1T total parameters
Enhanced agentic coding intelligence
Better frontend coding experience
256K context window for long horizon tasks
Hugging Face (Twitter)

RT @steren: @Gradio: What I really like about @Gradio is that you focus on your app's inputs, outputs, and logic, and then the framework derives a UI for these.
Because UI is a derivative, Gradio was also able to generate an API for the same inputs, and now is able to generate an MCP server. No change required.
Hugging Face (Twitter)

RT @ClementDelangue: We’re doing the work that nobody else wants to do! Welcome to FineVision, the best free open dataset to train vision language models. Let’s go open-source! https://twitter.com/andimarafioti/status/1963610118165000479#m
Hugging Face (Twitter)

RT @crystalsssup: landing on 🤗
> 256k context
> 60–100 TPS
> perfect for claude code/codex/roo etc. https://twitter.com/Kimi_Moonshot/status/1963802687230947698#m
Hugging Face (Twitter)

RT @ADarmouni: Honestly FineVision is a pretty impressive work of aggregation

200 training sets condensed in a dataset of 18B images, segmented in 9 different subcategories, multi-turn, with quality rating and very documented ablation studies?

As always, @huggingface delivers in open data
Hugging Face (Twitter)

RT @antoine_chaffin: Today is a big day
Today is Silksong day
But most importantly, today is the day I finally got HF socks!!!
Hugging Face (Twitter)

RT @maximelabonne: Liquid AI Japan cooked with this 350M param model on par with GPT-4o for English Japanese translation

That's a really nice example of fine-tuning done right 👌
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @mirkokiefer: My 2.5-year-old son controlling a robotic arm for the first time — and he genuinely picked it up faster than I did. He absolutely loves robots. The next generation will take over faster than we can blink.

That’s the @LeRobotHF so101, by the way.
This media is not supported in your browser
VIEW IN TELEGRAM
Hugging Face (Twitter)

RT @LeRobotHF: 🚀 Big news: we just added Reachy 2 to LeRobot!

Huge thanks to our friends at @pollenrobotics 💛🤗
Reachy 2 is also available in simulation, so you can try it out right away.
🎥 Check out the teleop & autonomous demo below!
Hugging Face (Twitter)

RT @QuixiAI: Cannot wait to try the new Kimi K2! @Kimi_Moonshot
Hugging Face (Twitter)

RT @Laz4rz: Brand new, fresh out of a French printer
Hugging Face (Twitter)

RT @Thom_Wolf: This is huge

Continuing our foundational work to enable anyone to train state of the art AI model, we’re thrilled to release « FinePDFs »

3T tokens of textual data that until now was locked away in PDFs, arguably some of the highest quality publicly available data out there.

We gathered FinePDF to create the largest permissively licensed corpus sourced exclusively from PDFs.

Amazingly challenging infra and processing work, h/t to the fineweb team https://twitter.com/HKydlicek/status/1964584936524124645#m
Hugging Face (Twitter)

RT @HKydlicek: We are releasing 📄 FinePDFs:
the largest PDF dataset spanning over half a billion documents!

- Long context: Documents are 2x longer than web text
- 3T tokens from high-demand domains like legal and science.
- Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora.
Hugging Face (Twitter)

RT @gpj: Released a new synthetic dataset: 1.5k [human] → 10k [synthetic] children’s stories.

Pipeline generated by @Kilo_Code and model switching from @poe_platform API 🙏🤗

https://huggingface.co/datasets/garethpaul/children-stories-dataset
Hugging Face (Twitter)

RT @maximelabonne: Pheww, another banger dataset from @huggingface!

> 3T tokens, 475M PDFs, 1733 languages

> Close to Nemotron-CC v2 and FineWeb-Edu+DCLM on its own (‼️)

> Greatly boosts perf when combined, likely because it provides high diversity that complements the other datasets well