ML Research Hub
32.7K subscribers
5.64K photos
358 videos
24 files
6.09K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
LongCat-Flash-Thinking-2601 Technical Report

📝 Summary:
LongCat-Flash-Thinking-2601 is a 560B MoE reasoning model that achieves state-of-the-art performance on agentic benchmarks. Its capabilities stem from a unified training framework, robust tool interaction, and a Heavy Thinking mode for complex reasoning.

🔹 Publication Date: Published on Jan 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16725
• PDF: https://arxiv.org/pdf/2601.16725

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MoE #ReasoningModels #AgentAI #LLM #AI
DeepSeek-V3 Technical Report

📝 Summary:
DeepSeek-V3 is an efficient Mixture-of-Experts language model 671B parameters using MLA and DeepSeekMoE architectures. It achieves strong performance, comparable to leading models, with highly stable and cost-effective training on 14.8T tokens.

🔹 Publication Date: Published on Dec 27, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2412.19437
• PDF: https://arxiv.org/pdf/2412.19437
• Github: https://github.com/deepseek-ai/deepseek-v3

🔹 Models citing this paper:
https://huggingface.co/deepseek-ai/DeepSeek-V3
https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base

Spaces citing this paper:
https://huggingface.co/spaces/nanotron/ultrascale-playbook
https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
https://huggingface.co/spaces/weege007/ultrascale-playbook

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DeepSeekV3 #MoE #LLM #AI #MachineLearning