ML Research Hub

✨AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

📝 Summary:
AnomalyVFM is a framework that enhances vision foundation models for zero-shot anomaly detection through synthetic dataset generation and parameter-efficient adaptation, achieving superior performance...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.20524
• PDF: https://arxiv.org/pdf/2601.20524
• Project Page: https://maticfuc.github.io/anomaly_vfm/
• Github: https://github.com/MaticFuc/AnomalyVFM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

227 views08:56

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

📝 Summary:
VRAG-RL introduces a reinforcement learning framework to empower vision-language models for understanding visually rich information. It uses adaptive visual perception and query optimization to enhance retrieval and reasoning, overcoming limitations of current RAG methods.

🔹 Publication Date: Published on May 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.22019
• PDF: https://arxiv.org/pdf/2505.22019
• Github: https://github.com/Alibaba-NLP/VRAG

🔹 Models citing this paper:
• https://huggingface.co/Qiuchen-Wang/Qwen2.5-VL-7B-VRAG

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#RAG #ReinforcementLearning #VisionLanguageModels #ComputerVision #AI

217 views09:56

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization

📝 Summary:
Researchers investigate how reinforcement learning with verifiable rewards can improve visual reasoning accuracy while maintaining logical consistency and visual grounding in multimodal reasoning mode...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08476
• PDF: https://arxiv.org/pdf/2604.08476

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤2

201 views10:57

✨ Explore Data Science 📝 Write your paper

✨Small Vision-Language Models are Smart Compressors for Long Video Understanding

📝 Summary:
Tempo is an efficient framework that compresses long videos for multimodal understanding by using a small vision-language model for temporal compression and adaptive token allocation to maintain inten...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08120
• PDF: https://arxiv.org/pdf/2604.08120
• Project Page: https://feielysia.github.io/tempo-page/
• Github: https://feielysia.github.io/tempo-page/

🔹 Models citing this paper:
• https://huggingface.co/Vision-CAIR/Tempo-6B

✨ Spaces citing this paper:
• https://huggingface.co/spaces/Vision-CAIR/Tempo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

224 views11:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation

📝 Summary:
A geometry-guided method for multi-camera depth estimation that improves consistency across overlapping images using cylindrical spatial attention mechanisms. AI-generated summary Self-supervised surr...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16428
• PDF: https://arxiv.org/pdf/2511.16428
• Project Page: https://abualhanud.github.io/CylinderDepthPage/
• Github: https://abualhanud.github.io/CylinderDepthPage/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

269 views13:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Training a Student Expert via Semi-Supervised Foundation Model Distillation

📝 Summary:
A semi-supervised distillation framework compresses vision foundation models into compact experts for instance segmentation. It uses limited labeled and abundant unlabeled data, employing a novel instance-aware contrastive loss. The student models outperform their teachers and state-of-the-art SSKD.

🔹 Publication Date: Published on Apr 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.03841
• PDF: https://arxiv.org/pdf/2604.03841

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

300 views14:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

📝 Summary:
Post-trained model capabilities can be transferred across different model scales through linear alignment of latent subspace directions without requiring retraining. AI-generated summary We investigat...

🔹 Publication Date: Published on Apr 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06377
• PDF: https://arxiv.org/pdf/2604.06377

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MachineLearning #AI #DeepLearning #ModelTransfer #SubspaceAlignment

356 views15:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration

📝 Summary:
QEIL v2 improves energy efficiency and performance of large language model inference on edge devices through physics-based adaptive optimization and workload-aware resource allocation. AI-generated su...

🔹 Publication Date: Published on Apr 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06057
• PDF: https://arxiv.org/pdf/2602.06057

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

396 views20:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images

📝 Summary:
Current vision-language models struggle to infer structured cultural metadata from images consistently across cultures. This paper introduces a new cross-cultural benchmark for this task. Results show models give fragmented, inconsistent, and weakly grounded predictions, revealing significant lim...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07338
• PDF: https://arxiv.org/pdf/2604.07338
• Project Page: https://huggingface.co/datasets/Carolyn-Jiang/Metadata-Inference

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Carolyn-Jiang/Metadata-Inference

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

483 views22:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Forwarded from Machine Learning with Python

📝 12 Essential Articles for Data Scientists

🏷 Article: Seq2Seq Learning with NN
https://arxiv.org/pdf/1409.3215
An introduction to Seq2Seq models, which serve as the foundation for machine translation utilizing deep learning.

🏷 Article: GANs
https://arxiv.org/pdf/1406.2661
An introduction to Generative Adversarial Networks (GANs) and the concept of generating synthetic data. This forms the basis for creating images and videos with artificial intelligence.

🏷 Article: Attention is All You Need
https://arxiv.org/pdf/1706.03762
This paper was revolutionary in natural language processing. It introduced the Transformer architecture, which underlies GPT, BERT, and contemporary intelligent language models.

🏷 Article: Deep Residual Learning
https://arxiv.org/pdf/1512.03385
This work introduced the ResNet model, enabling neural networks to achieve greater depth and accuracy without compromising the learning process.

🏷 Article: Batch Normalization
https://arxiv.org/pdf/1502.03167
This paper introduced a technique that facilitates faster and more stable training of neural networks.

🏷 Article: Dropout
https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
A straightforward method designed to prevent overfitting in neural networks.

🏷 Article: ImageNet Classification with DCNN
https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
The first successful application of a deep neural network for image recognition.

🏷 Article: Support-Vector Machines
https://link.springer.com/content/pdf/10.1007/BF00994018.pdf
This seminal work introduced the Support Vector Machine (SVM) algorithm, a widely utilized method for data classification.

🏷 Article: A Few Useful Things to Know About ML
https://homes.cs.washington.edu/~pedro/papers/cacm12.pdf
A comprehensive collection of practical and empirical insights regarding machine learning.

🏷 Article: Gradient Boosting Machine
https://www.cse.iitb.ac.in/~soumen/readings/papers/Friedman1999GreedyFuncApprox.pdf
This paper introduced the "Gradient Boosting" method, which serves as the foundation for many modern machine learning models, including XGBoost and LightGBM.

🏷 Article: Latent Dirichlet Allocation
https://jmlr.org/papers/volume3/blei03a/blei03a.pdf
This work introduced a model for text analysis capable of identifying the topics discussed within an article.

🏷 Article: Random Forests
https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf
This paper introduced the "Random Forest" algorithm, a powerful machine learning method that aggregates multiple models to achieve enhanced accuracy.

https://t.iss.one/CodeProgrammer

🌟

Please open Telegram to view this post

VIEW IN TELEGRAM

❤2

391 views06:16

ML Research Hub

✨RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

📝 Summary:
RefineAnything is a multimodal diffusion model for region-specific image refinement. It fixes local detail collapse while strictly preserving backgrounds using a Focus-and-Refine strategy and boundary-aware loss. This provides a practical solution for high-precision local editing.

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06870
• PDF: https://arxiv.org/pdf/2604.06870
• Project Page: https://limuloo.github.io/RefineAnything/
• Github: https://github.com/limuloo/RefineAnything

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #ImageEditing #ComputerVision #DeepLearning #GenerativeAI

345 views02:00

✨ Explore Data Science 📝 Write your paper

✨Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

📝 Summary:
Matrix-Game 3.0 is a memory-augmented diffusion model achieving real-time 720p interactive video generation with long-term temporal consistency. It uses an advanced data engine, a self-correction training framework with memory, and efficient inference strategies. This enables practical, industria...

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08995
• PDF: https://arxiv.org/pdf/2604.08995
• Project Page: https://matrix-game-v3.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #VideoGeneration #RealTimeAI #GenerativeAI #MachineLearning

157 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

📝 Summary:
CT-1 is a Vision-Language-Camera model that improves camera-controllable video generation. It uses a Diffusion Transformer and Wavelet Regularization Loss to accurately estimate camera trajectories, enabling precise video synthesis. This achieves 25.7% better accuracy than prior methods.

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09201
• PDF: https://arxiv.org/pdf/2604.09201
• Project Page: https://gulucaptain.github.io/Camera-Transformer-1/
• Github: https://github.com/gulucaptain/Camera-Transformer-1

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #VideoGeneration #ComputerVision #DiffusionModels #VisionLanguageModels

171 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ELT: Elastic Looped Transformers for Visual Generation

📝 Summary:
Elastic Looped Transformers utilize recurrent transformer architecture with weight-sharing and intra-loop self-distillation to achieve parameter-efficient visual generation with adjustable computation...

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09168
• PDF: https://arxiv.org/pdf/2604.09168

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

144 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

📝 Summary:
VisionFoundry creates synthetic visual question answering data using LLMs and text-to-image models to improve VLM visual perception. Training with this targeted data significantly boosts model performance on visual perception benchmarks like MMVP and CV-Bench-3D.

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09531
• PDF: https://arxiv.org/pdf/2604.09531
• Project Page: https://zlab-princeton.github.io/VisionFoundry/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLM #VisualPerception #SyntheticData #LLM #AI

160 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨EXAONE 4.5 Technical Report

📝 Summary:
EXAONE 4.5 is LG AI Research's first open-weight vision language model, integrating a visual encoder into EXAONE 4.0. It enhances document understanding and general language capabilities through targeted data and extended context, outperforming similar models in document tasks.

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08644
• PDF: https://arxiv.org/pdf/2604.08644
• Github: https://github.com/LG-AI-EXAONE/EXAONE-4.5

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #AI #DocumentUnderstanding #MultimodalAI #OpenSourceAI

155 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:43

This media is not supported in your browser

VIEW IN TELEGRAM

✨FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

📝 Summary:
FORGE introduces a multimodal manufacturing dataset, revealing that MLLM performance is limited by domain-specific knowledge, not visual grounding. Fine-tuning on FORGEs annotations significantly improves accuracy, offering a path for domain-adapted MLLMs.

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07413
• PDF: https://arxiv.org/pdf/2604.07413
• Project Page: https://ai4manufacturing.github.io/forge-web/
• Github: https://github.com/AI4Manufacturing/FORGE

✨ Datasets citing this paper:
• https://huggingface.co/datasets/AI4Manufacturing/forge

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#FORGE #MLLM #ManufacturingAI #MultimodalAI #DomainAdaptation

138 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:06

This media is not supported in your browser

VIEW IN TELEGRAM

✨Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video

📝 Summary:
A novel cross-modal emotion transfer approach generates expressive talking face videos by modeling emotion semantic vectors between speech and visual feature spaces, achieving superior emotion accurac...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07786
• PDF: https://arxiv.org/pdf/2604.07786
• Project Page: https://chanhyeok-choi.github.io/C-MET/
• Github: https://github.com/ChanHyeok-Choi/C-MET

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

148 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

✨WildDet3D: Scaling Promptable 3D Detection in the Wild

📝 Summary:
WildDet3D is a unified architecture for open-world 3D object detection, accepting multiple prompt types and integrating geometric cues. It leverages WildDet3D-Data, the largest 3D dataset, to achieve state-of-the-art performance across benchmarks, with significant gains from incorporating depth i...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08626
• PDF: https://arxiv.org/pdf/2604.08626
• Project Page: https://allenai.github.io/WildDet3D/
• Github: https://github.com/allenai/WildDet3D

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DObjectDetection #ComputerVision #DeepLearning #AI #Datasets

182 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Structured Causal Video Reasoning via Multi-Objective Alignment

📝 Summary:
This paper introduces Structured Event Facts for explicit causal video reasoning, moving beyond unstructured methods. It uses a multi-objective reinforcement learning pipeline to balance training goals, leading to Factum-4B. This model achieves reliable, stronger performance on complex temporal v...

🔹 Publication Date: Published on Apr 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04415
• PDF: https://arxiv.org/pdf/2604.04415

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#CausalAI #VideoReasoning #ReinforcementLearning #ComputerVision #AIResearch

177 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

📝 Summary:
ECHO is an efficient diffusion model for chest X-ray report generation. It achieves fast one-step-per-block inference using Direct Conditional Distillation and Response-Asymmetric Diffusion. ECHO delivers an 8x speedup and improved accuracy over state-of-the-art methods.

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09450
• PDF: https://arxiv.org/pdf/2604.09450
• Project Page: https://echo-midea-airc.github.io/
• Github: https://github.com/clf28/ECHO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

193 views04:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform