Forwarded from Research Papers PHD
We provide our services at competitive rates, backed by twenty years of experience. 📈
Please contact us via @Omidyzd62. 📩
Please contact us via @Omidyzd62. 📩
Telegram
اميد
You can contact @Omidyzd62 right away.
❤3👍3
🚀 Sber has released two open-source MoE models: GigaChat-3.1 Ultra and Lightning
Both code and weights are available under the MIT license on HuggingFace.
👉 Key details:
• Trained from scratch (not a finetune) on proprietary data and infrastructure
• Mixture-of-Experts (MoE) architecture
Models:
🧠 GigaChat-3.1 Ultra
• 702B MoE model for high-performance environments
• Outperforms DeepSeek-V3-0324 and Qwen3-235B on math and reasoning benchmarks
• Supports FP8 training and MTP
⚡️ GigaChat-3.1 Lightning
• 10B model (1.8B active parameters)
• Outperforms Qwen3-4B and Gemma-3-4B on Sber benchmarks
• Efficient local inference
• Up to 256k context
Engineering highlights:
• Custom metric to detect and reduce generation loops
• DPO training moved to native FP8
• Improvements in post-training pipeline
• Identified and fixed a critical issue affecting evaluation quality
🌍 Trained on 14 languages (optimized for English and Russian)
Use cases:
• chatbots
• AI assistants
• copilots
• internal ML systems
Sber provides a solid open foundation for developers to build production-ready AI systems with lower infrastructure costs.
Both code and weights are available under the MIT license on HuggingFace.
👉 Key details:
• Trained from scratch (not a finetune) on proprietary data and infrastructure
• Mixture-of-Experts (MoE) architecture
Models:
🧠 GigaChat-3.1 Ultra
• 702B MoE model for high-performance environments
• Outperforms DeepSeek-V3-0324 and Qwen3-235B on math and reasoning benchmarks
• Supports FP8 training and MTP
⚡️ GigaChat-3.1 Lightning
• 10B model (1.8B active parameters)
• Outperforms Qwen3-4B and Gemma-3-4B on Sber benchmarks
• Efficient local inference
• Up to 256k context
Engineering highlights:
• Custom metric to detect and reduce generation loops
• DPO training moved to native FP8
• Improvements in post-training pipeline
• Identified and fixed a critical issue affecting evaluation quality
🌍 Trained on 14 languages (optimized for English and Russian)
Use cases:
• chatbots
• AI assistants
• copilots
• internal ML systems
Sber provides a solid open foundation for developers to build production-ready AI systems with lower infrastructure costs.
❤3👍3💯1