ML Research Hub

✨FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

📝 Summary:
FutureOmni is the first benchmark evaluating multimodal models ability to forecast future events from audio-visual data. Current models struggle, particularly with speech-heavy scenarios. The paper proposes an improved training strategy, Omni-Modal Future Forecasting, which enhances performance a...

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2601.13836
• PDF: https://arxiv.org/pdf/2601.13836
• Project Page: https://openmoss.github.io/FutureOmni
• Github: https://openmoss.github.io/FutureOmni

✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenMOSS-Team/FutureOmni

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalLLMs #FutureForecasting #AIResearch #DeepLearning #Benchmarking

145 views08:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Agentic-R: Learning to Retrieve for Agentic Search

📝 Summary:
This paper introduces Agentic-R, a new retriever training framework for agentic search. It uses both local relevance and global answer correctness metrics, with iterative optimization between the agent and retriever. Agentic-R consistently outperforms strong baselines on QA benchmarks.

🔹 Publication Date: Published on Jan 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11888
• PDF: https://arxiv.org/pdf/2601.11888

🔹 Models citing this paper:
• https://huggingface.co/liuwenhan/Agentic-R_e5

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AgenticSearch #InformationRetrieval #MachineLearning #QuestionAnswering #AIResearch

172 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

📝 Summary:
SciCoQA is a dataset containing 611 paper-code discrepancies for identifying mismatches between scientific publications and code. It shows that even advanced language models struggle significantly to detect these issues, with the best model finding less than half of real-world discrepancies.

🔹 Publication Date: Published on Jan 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.12910
• PDF: https://arxiv.org/pdf/2601.12910
• Project Page: https://ukplab.github.io/scicoqa
• Github: https://github.com/UKPLab/scicoqa

✨ Datasets citing this paper:
• https://huggingface.co/datasets/UKPLab/scicoqa

✨ Spaces citing this paper:
• https://huggingface.co/spaces/UKPLab/scicoqa

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SciCoQA #AcademicIntegrity #CodeQuality #NLP #ResearchData

182 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Think3D: Thinking with Space for Spatial Reasoning

📝 Summary:
Think3D improves vision-language models' 3D reasoning by enabling interactive spatial exploration using 3D reconstruction and camera operations. This training-free framework significantly boosts performance on spatial reasoning tasks for models like GPT-4.1 and Gemini 2.5 Pro, offering a path to ...

🔹 Publication Date: Published on Jan 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13029
• PDF: https://arxiv.org/pdf/2601.13029
• Github: https://github.com/zhangzaibin/spagent

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReasoning #SpatialAI #VisionLanguageModels #MachineLearning #ComputerVision

178 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨On the Evidentiary Limits of Membership Inference for Copyright Auditing

📝 Summary:
Membership inference attacks fail to reliably detect copyrighted text usage in large language models when training data is paraphrased using structure-aware methods that preserve semantic content. This suggests such attacks are brittle in adversarial settings and insufficient for standalone copyr...

🔹 Publication Date: Published on Jan 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.12937
• PDF: https://arxiv.org/pdf/2601.12937

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MembershipInference #Copyright #LLM #AISecurity #AI

238 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Forwarded from Machine Learning with Python

https://t.iss.one/DataScienceC

Udemy Coupons

ads: @HusseinSheikho

The first channel in Telegram that offers free
Udemy coupons

176 views10:00

ML Research Hub

✨Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD

📝 Summary:
This paper shows fundamental privacy-utility limitations for DP-SGD with shuffled sampling. Strong privacy requires substantial noise, severely limiting utility and causing significant accuracy degradation in practice. This holds even with many updates.

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10237
• PDF: https://arxiv.org/pdf/2601.10237

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DifferentialPrivacy #MachineLearning #Privacy #DataScience #AIResearch

231 views10:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

181 views11:04

ML Research Hub

✨LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

📝 Summary:
LightOnOCR-2-1B is a 1B-parameter end-to-end multilingual vision-language model for OCR. It converts document images to text, achieving state-of-the-art results while being smaller and faster. It also features improved image localization and robustness.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14251
• PDF: https://arxiv.org/pdf/2601.14251

🔹 Models citing this paper:
• https://huggingface.co/lightonai/LightOnOCR-1B-1025
• https://huggingface.co/lightonai/LightOnOCR-2-1B
• https://huggingface.co/lightonai/LightOnOCR-0.9B-32k-1025

✨ Datasets citing this paper:
• https://huggingface.co/datasets/lightonai/LightOnOCR-mix-0126
• https://huggingface.co/datasets/lightonai/LightOnOCR-bbox-mix-0126

✨ Spaces citing this paper:
• https://huggingface.co/spaces/lightonai/LightOnOCR-2-1B-Demo
• https://huggingface.co/spaces/lightonai/LightOnOCR-1B-Demo
• https://huggingface.co/spaces/lightonai/LightOnOCR-1B-Demo-zero

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#OCR #VisionLanguageModel #AI #DeepLearning #MultilingualAI

arXiv.org

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for...

We present \textbf{LightOnOCR-2-1B}, a 1B-parameter end-to-end multilingual vision--language model that converts document images (e.g., PDFs) into clean, naturally ordered text without brittle OCR...

195 views11:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

📝 Summary:
KAGE-Bench introduces KAGE-Env, a fast JAX 2D platformer that isolates visual shifts to systematically study RL generalization. It reveals strong failures for agents facing background or photometric changes, but less impact from agent appearance shifts.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14232
• PDF: https://arxiv.org/pdf/2601.14232
• Github: https://avanturist322.github.io/KAGEBench/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #RLGeneralization #MachineLearning #ComputerVision #AIResearch

175 views12:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

📝 Summary:
FantasyVLN proposes an implicit reasoning framework for vision-language navigation, overcoming the real-time issues of explicit Chain-of-Thought methods. It encodes imagined visual observations into a compact latent space during training. This enables real-time, reasoning-aware navigation with im...

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13976
• PDF: https://arxiv.org/pdf/2601.13976
• Project Page: https://fantasy-amap.github.io/fantasy-vln/
• Github: https://fantasy-amap.github.io/fantasy-vln/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageNavigation #ChainOfThought #MultimodalAI #EmbodiedAI #AIResearch

197 views12:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨METIS: Mentoring Engine for Thoughtful Inquiry & Solutions

📝 Summary:
METIS is an AI mentor for undergraduate research writing, outperforming GPT-5 and Claude Sonnet 4.5. It yields higher student scores and better document-grounded outputs, despite minor tool routing challenges.

🔹 Publication Date: Published on Jan 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13075
• PDF: https://arxiv.org/pdf/2601.13075

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #EdTech #LLM #ResearchWriting #Mentoring

218 views13:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Ark: An Open-source Python-based Framework for Robot Learning

📝 Summary:
ARK is a Python-first, open-source framework simplifying robotics development by integrating modern imitation learning and seamless simulation-to-physical robot interactions. It provides a Gym-style interface and reusable modules, lowering entry barriers for autonomous robot deployment.

🔹 Publication Date: Published on Jun 24, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21628
• PDF: https://arxiv.org/pdf/2506.21628
• Project Page: https://robotics-ark.github.io/ark_robotics.github.io/
• Github: https://github.com/orgs/Robotics-Ark/repositories

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

193 views13:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Finally Outshining the Random Baseline: A Simple and Effective Solution for Active Learning in 3D Biomedical Imaging

📝 Summary:
Class-stratified Scheduled Power Predictive Entropy (ClaSP PE) is a novel active learning strategy that improves 3D biomedical image segmentation by addressing class imbalance and selection redundancy...

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13677
• PDF: https://arxiv.org/pdf/2601.13677
• Github: https://github.com/MIC-DKFZ/nnActive/tree/nnActive_v2

🔹 Models citing this paper:
• https://huggingface.co/nnActive/Liver
• https://huggingface.co/nnActive/ToothFairy2_All
• https://huggingface.co/nnActive/word

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

208 views14:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

📝 Summary:
Researchers developed the Rank-Surprisal Ratio RSR metric to better select reasoning trajectories for teaching student LLMs. RSR balances alignment and informativeness, strongly correlating with improved student performance and outperforming prior methods in distillation.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14249
• PDF: https://arxiv.org/pdf/2601.14249
• Github: https://github.com/UmeanNever/RankSurprisalRatio

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

170 views15:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning

📝 Summary:
GRADFILTERING is an uncertainty-aware data selection framework for instruction tuning that uses gradient signal-to-noise ratio to improve LLM adaptation efficiency and performance. AI-generated summar...

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13697
• PDF: https://arxiv.org/pdf/2601.13697

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

200 views15:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Forwarded from Data Analytics

This repository collects everything you need to use AI and LLM in your projects.

120+ libraries, organized by development stages:

→ Model training, fine-tuning, and evaluation
→ Deploying applications with LLM and RAG
→ Fast and scalable model launch
→ Data extraction, crawlers, and scrapers
→ Creating autonomous LLM agents
→ Prompt optimization and security

Repo: https://github.com/KalyanKS-NLP/llm-engineer-toolkit

🥺

https://t.iss.one/DataAnalyticsX

Please open Telegram to view this post

VIEW IN TELEGRAM

❤3

137 views16:10

ML Research Hub

✨Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

📝 Summary:
This survey presents 'Locate, Steer, and Improve' as an actionable framework for mechanistic interpretability in LLMs. It shifts MI from an observational science to a systematic methodology for optimizing LLMs, leading to tangible improvements in their alignment, capability, and efficiency.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14004
• PDF: https://arxiv.org/pdf/2601.14004

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #MechanisticInterpretability #AI #AIAalignment #MachineLearning

178 views17:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

📝 Summary:
Intervention Training improves large language model reasoning by enabling fine-grained credit assignment through targeted corrections that localize errors and enhance reinforcement learning performanc...

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14209
• PDF: https://arxiv.org/pdf/2601.14209

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

176 views17:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

📝 Summary:
A comprehensive benchmark for evaluating LLM-based data agents across diverse data science tasks demonstrates superior performance for multimodal agents while highlighting persistent challenges in uns...

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.13591
• PDF: https://arxiv.org/pdf/2601.13591

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

248 views17:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RemoteVAR: Autoregressive Visual Modeling for Remote Sensing Change Detection

📝 Summary:
RemoteVAR is a visual autoregressive framework for remote sensing change detection that improves upon existing methods through multi-resolution feature fusion and autoregressive training tailored for ...

🔹 Publication Date: Published on Jan 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11898
• PDF: https://arxiv.org/pdf/2601.11898
• Github: https://github.com/yilmazkorkmaz1/RemoteVAR

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

268 views18:06

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform