ML Research Hub
32.6K subscribers
3.83K photos
197 videos
23 files
4.1K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
@CodeProgrammer Data Science Cheat Sheets.zip
596.3 MB
Data Science Cheat Sheets
Quick help to make a data scientist's life easier

https://t.iss.one/codeprogrammer 🔒

💡 #deeplearning #AI #ML #python
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4
Please open Telegram to view this post
VIEW IN TELEGRAM
👍32
🎉💯2024 Highly demanded Top 100+ IT Training courses FREE Giveaway in Networking, Project Management, Cloud and Cyber security including #CCNA 200-301, #CCNP 350-401 #Comptia, #PMP, #AWS, #Azure #Python, #Excel, #AI, #Google courses...... ⬇️📕

Get now & start whenever you want! Don't miss this chance to kickstart your IT career in 2024!

🔗👨‍💻Free CCNA Training Course: https://bit.ly/3BoYEdH
🔗🗒️Enroll Free Online Course: https://bit.ly/4dru404
🔗📝Download Free #IT Study Materials:https://bit.ly/3Y213Uj

🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP to Get Madness Discount 👇: https://chat.whatsapp.com/HqzBlMaOPci0wYvkEtcCDa

🔎Follow Social Media for Free e-Book:
https://linktr.ee/SPOTOSocialMedia
👍21
Please open Telegram to view this post
VIEW IN TELEGRAM
👍61
DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in #DeepSeek V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

Paper: https://arxiv.org/pdf/2412.19437v1.pdf

Code: https://github.com/deepseek-ai/deepseek-v3

#aiagents #ai #llm #ml #machinelearning #python

https://t.iss.one/DataScienceT 💚
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.

Paper: https://arxiv.org/pdf/2408.01800v1.pdf

Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v

Datasets: Video-MME

#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics

https://t.iss.one/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems.

paper: https://arxiv.org/pdf/2501.05366v1.pdf

Code: https://github.com/sunnynexus/search-o1

Datasets: Natural Questions - TriviaQA - MATH - HotpotQA - GPQA - Bamboogle

#Search_o1 #LargeReasoningModels #AgenticRAG #ReasonInDocuments #DynamicKnowledgeRetrieval #ComplexReasoning #ScienceMathCoding #OpenDomainQA #TrustworthyAI #IntelligentSystems #python

https://t.iss.one/DataScienceT 😱
Please open Telegram to view this post
VIEW IN TELEGRAM
👍31
🚀 Boost Your IT Exam Prep with SPOTO's FREE Study Materials! 🎉

💡 Ready to Pass Your IT Exam?
SPOTO is here to help you succeed! Get SPOTO FREE IT study materials to jumpstart your certification journey. Whether you're preparing for #Cisco, #AWS, #PMP, #Python, #Excel, #Google, #Microsoft, or other certifications, we've got you covered.

🔗🎒Download Free IT Certs Exam E-book: https://bit.ly/4fJSoLP

🔗👩‍💻Test Your IT Skills for Free: https://bit.ly/3PoKH39

🔗📝Download Free Cloud Certs Study Materials:https://bit.ly/4gI4KWk

🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
2
Some people asked me about a resource for learning about Transformers.

Here's a good one I am sharing again -- it covers just about everything you need to know.

brandonrohrer.com/transformers

Amazing stuff. It's totally worth your weekend.

#Transformers #DeepLearning #NLP #AI #MachineLearning #SelfAttention #DataScience #Technology #Python #LearningResource


https://t.iss.one/CodeProgrammer
👍5