✨LongCat-Flash-Thinking-2601 Technical Report
📝 Summary:
LongCat-Flash-Thinking-2601 is a 560B MoE reasoning model that achieves state-of-the-art performance on agentic benchmarks. Its capabilities stem from a unified training framework, robust tool interaction, and a Heavy Thinking mode for complex reasoning.
🔹 Publication Date: Published on Jan 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16725
• PDF: https://arxiv.org/pdf/2601.16725
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MoE #ReasoningModels #AgentAI #LLM #AI
📝 Summary:
LongCat-Flash-Thinking-2601 is a 560B MoE reasoning model that achieves state-of-the-art performance on agentic benchmarks. Its capabilities stem from a unified training framework, robust tool interaction, and a Heavy Thinking mode for complex reasoning.
🔹 Publication Date: Published on Jan 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16725
• PDF: https://arxiv.org/pdf/2601.16725
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MoE #ReasoningModels #AgentAI #LLM #AI
✨DeepSeek-V3 Technical Report
📝 Summary:
DeepSeek-V3 is an efficient Mixture-of-Experts language model 671B parameters using MLA and DeepSeekMoE architectures. It achieves strong performance, comparable to leading models, with highly stable and cost-effective training on 14.8T tokens.
🔹 Publication Date: Published on Dec 27, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2412.19437
• PDF: https://arxiv.org/pdf/2412.19437
• Github: https://github.com/deepseek-ai/deepseek-v3
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-V3
• https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
• https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
• https://huggingface.co/spaces/weege007/ultrascale-playbook
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DeepSeekV3 #MoE #LLM #AI #MachineLearning
📝 Summary:
DeepSeek-V3 is an efficient Mixture-of-Experts language model 671B parameters using MLA and DeepSeekMoE architectures. It achieves strong performance, comparable to leading models, with highly stable and cost-effective training on 14.8T tokens.
🔹 Publication Date: Published on Dec 27, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2412.19437
• PDF: https://arxiv.org/pdf/2412.19437
• Github: https://github.com/deepseek-ai/deepseek-v3
🔹 Models citing this paper:
• https://huggingface.co/deepseek-ai/DeepSeek-V3
• https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
• https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
• https://huggingface.co/spaces/weege007/ultrascale-playbook
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DeepSeekV3 #MoE #LLM #AI #MachineLearning
arXiv.org
DeepSeek-V3 Technical Report
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training,...