ML Research Hub
32.9K subscribers
4.45K photos
273 videos
23 files
4.81K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
CPPO: Contrastive Perception for Vision Language Policy Optimization

📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.

🔹 Publication Date: Published on Jan 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch
Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping

📝 Summary:
Prithvi-CAFE improves flood mapping by integrating a pretrained Geo-Foundation Model encoder with a parallel CNN branch featuring attention modules. This hybrid approach effectively captures both global context and critical local details, achieving state-of-the-art results on Sen1Flood11 and Floo...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02315
• PDF: https://arxiv.org/pdf/2601.02315
• Github: https://github.com/Sk-2103/Prithvi-CAFE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#FloodMapping #DeepLearning #GeoAI #RemoteSensing #ComputerVision
Unified Thinker: A General Reasoning Modular Core for Image Generation

📝 Summary:
Unified Thinker introduces a modular reasoning core for image generation, decoupling a Thinker from the generator. It uses reinforcement learning to optimize visual correctness, substantially improving image reasoning and generation quality.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03127
• PDF: https://arxiv.org/pdf/2601.03127

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #AIResearch #ReinforcementLearning #DeepLearning #GenerativeAI
2
SimpleMem: Efficient Lifelong Memory for LLM Agents

📝 Summary:
SimpleMem is an efficient memory framework for LLM agents that uses semantic lossless compression. It employs a three-stage pipeline to distill, consolidate, and retrieve historical experiences efficiently. SimpleMem significantly improves accuracy and reduces token consumption by up to 30-fold c...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02553
• PDF: https://arxiv.org/pdf/2601.02553
• Project Page: https://aiming-lab.github.io/SimpleMem-Page/
• Github: https://aiming-lab.github.io/SimpleMem-Page/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AIAgents #LifelongLearning #AI #DeepLearning
👍1
RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization

📝 Summary:
RGS-SLAM is a robust Gaussian-splatting SLAM framework that uses a one-shot, correspondence-to-Gaussian initialization with DINOv3 descriptors. This method improves stability, accelerates convergence, and yields higher rendering fidelity and accuracy compared to existing systems.

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00705
• PDF: https://arxiv.org/pdf/2601.00705

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SLAM #GaussianSplatting #ComputerVision #Robotics #DeepLearning
👍1
Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

📝 Summary:
Gen3R combines reconstruction and video diffusion models to generate 3D scenes. It produces RGB videos and 3D geometry by aligning geometric and appearance latents. This achieves state-of-the-art results and improves reconstruction robustness.

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04090
• PDF: https://arxiv.org/pdf/2601.04090
• Project Page: https://xdimlab.github.io/Gen3R/
• Github: https://xdimlab.github.io/Gen3R/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #DiffusionModels #ComputerVision #3DReconstruction #DeepLearning
👍1
VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding

📝 Summary:
VERSE analyzes Vision-Language Models by visualizing latent representations to find error-prone clusters. It guides synthetic data generation to boost performance in these areas. This significantly improves F1 scores, allowing on-premise models to match or exceed top SaaS solutions.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05125
• PDF: https://arxiv.org/pdf/2601.05125
• Project Page: https://huggingface.co/spaces/de-Rodrigo/Embeddings
• Github: https://github.com/nachoDRT/VrDU-Doctor

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #DeepLearning #EmbeddingVisualization #SyntheticData #DocumentUnderstanding
ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting

📝 Summary:
ProFuse enhances open-vocabulary 3DGS understanding via an efficient, context-aware framework. It uses a pre-registration phase to fuse semantic features onto Gaussians for cross-view coherence, completing semantic attachment twice as fast as SOTA.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04754
• PDF: https://arxiv.org/pdf/2601.04754
• Project Page: https://chiou1203.github.io/ProFuse/
• Github: https://chiou1203.github.io/ProFuse/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGaussianSplatting #ComputerVision #OpenVocabulary #3DReconstruction #DeepLearning
Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

📝 Summary:
Targeting high-entropy tokens in vision-language models causes significant semantic degradation with reduced budgets. This attack strategy reveals critical transferable safety risks across different VLM architectures.

🔹 Publication Date: Published on Dec 26, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21815
• PDF: https://arxiv.org/pdf/2512.21815

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #AdversarialAI #AIsecurity #MachineLearning #DeepLearning
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning
1