Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.3K photos
125 videos
23 files
3.51K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
πŸ€–πŸ§  DeepEval: The Ultimate LLM Evaluation Framework for AI Developers

πŸ—“οΈ 07 Oct 2025
πŸ“š AI News & Trends

In today’s AI-driven world, large language models (LLMs) have become central to modern applications from chatbots to intelligent AI agents. However, ensuring the accuracy, reliability and safety of these models is a significant challenge. Even small errors, biases or hallucinations can result in misleading information, frustrated users or business setbacks. This is where DeepEval, an ...

#DeepEval #LLM #AIDevelopment #LanguageModels #ModelEvaluation #ArtificialIntelligence
❀2
✨CodeClash: Benchmarking Goal-Oriented Software Engineering

πŸ“ Summary:
CodeClash is a benchmark evaluating language models on open-ended, goal-oriented code development through competitive tournaments. It shows LMs struggle with strategic reasoning and long-term codebase maintenance, performing poorly against human experts.

πŸ”Ή Publication Date: Published on Nov 2

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.00839
β€’ PDF: https://arxiv.org/pdf/2511.00839

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#LanguageModels #SoftwareEngineering #AIEvaluation #CodeDevelopment #Benchmarking
❀1
✨Diffusion Language Models are Super Data Learners

πŸ“ Summary:
Diffusion Language Models DLMs consistently outperform autoregressive models, especially in low-data settings. This is due to any-order modeling, iterative bidirectional denoising, and Monte Carlo augmentation. DLMs maintain advantages at scale, achieving strong performance even by repeating limi...

πŸ”Ή Publication Date: Published on Nov 5

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.03276
β€’ PDF: https://arxiv.org/pdf/2511.03276
β€’ Project Page: https://github.com/JinjieNi/dlms-are-super-data-learners
β€’ Github: https://github.com/JinjieNi/OpenMoE2

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#DiffusionModels #LanguageModels #MachineLearning #LowDataLearning #AI
✨Dense Motion Captioning

πŸ“ Summary:
The paper introduces Dense Motion Captioning, a new task for 3D human motion understanding. It presents CompMo, a large dataset with complex, temporally annotated motions, and DEMO, a model combining a language model with a motion adapter to generate detailed, grounded captions.

πŸ”Ή Publication Date: Published on Nov 7

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.05369
β€’ PDF: https://arxiv.org/pdf/2511.05369
β€’ Project Page: https://xusy2333.com/demo/
β€’ Github: https://github.com/41xu/DEMO

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#MotionCaptioning #3DMotion #ComputerVision #LanguageModels #AIResearch
✨Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

πŸ“ Summary:
Llama-Embed-Nemotron-8B is an open-source text embedding model achieving state-of-the-art performance, especially in multilingual tasks. Its success comes from a novel data mix and detailed ablation studies, making it a universal solution.

πŸ”Ή Publication Date: Published on Nov 10

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.07025
β€’ PDF: https://arxiv.org/pdf/2511.07025

πŸ”Ή Models citing this paper:
β€’ https://huggingface.co/nvidia/llama-embed-nemotron-8b

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#TextEmbeddings #MultilingualNLP #CrossLingual #LanguageModels #AIResearch
✨Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models

πŸ“ Summary:
This paper proposes an AI agent framework for adaptive long-form writing. It uses recursive task decomposition and dynamically integrates retrieval, reasoning, and composition, overcoming rigid outline-based methods. The framework consistently outperforms state-of-the-art approaches.

πŸ”Ή Publication Date: Published on Mar 11

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2503.08275
β€’ PDF: https://arxiv.org/pdf/2503.08275
β€’ Github: https://github.com/principia-ai/WriteHERE

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#AI #LanguageModels #LongformWriting #NLP #GenerativeAI
❀1
✨AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

πŸ“ Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.

πŸ”Ή Publication Date: Published on Nov 18

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.14295
β€’ PDF: https://arxiv.org/pdf/2511.14295

✨ Datasets citing this paper:
β€’ https://huggingface.co/datasets/hammh0a/AraLingBench

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking