ML Research Hub

✨Video-CoE: Reinforcing Video Event Prediction via Chain of Events

📝 Summary:
Video-CoE introduces a Chain of Events CoE paradigm to improve video event prediction. It addresses MLLM limitations in logical reasoning and visual utilization by constructing temporal event chains and using enhanced training. CoE achieves state-of-the-art performance on VEP benchmarks.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14935
• PDF: https://arxiv.org/pdf/2603.14935

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoEventPrediction #ChainOfEvents #MLLM #ComputerVision #AI

134 views09:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:09

This media is not supported in your browser

VIEW IN TELEGRAM

✨Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass

📝 Summary:
CHROMM is a unified framework that jointly reconstructs cameras, scene point clouds, and human meshes from multi-person multi-view videos. It integrates strong priors, handles scale discrepancies, and uses multi-view fusion for faster, more robust human-scene reconstruction.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12789
• PDF: https://arxiv.org/pdf/2603.12789
• Project Page: https://nstar1125.github.io/chromm
• Github: https://nstar1125.github.io/chromm/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #HumanSceneReconstruction #MultiViewVideo #AIResearch

167 views16:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

📝 Summary:
V-JEPA 2.1 is a self-supervised model learning dense visual representations for images and videos. It combines dense predictive loss, deep self-supervision, multi-modal tokenizers, and scaling to achieve state-of-the-art performance across various benchmarks, significantly advancing visual unders...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14482
• PDF: https://arxiv.org/pdf/2603.14482
• Project Page: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/
• Github: https://github.com/facebookresearch/vjepa2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SelfSupervisedLearning #ComputerVision #DeepLearning #AI #VideoUnderstanding

204 views19:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Prompt-Free Universal Region Proposal Network

📝 Summary:
PF-RPN is a novel network that identifies potential objects without needing external prompts, improving flexibility. It uses Sparse Image-Aware Adapters and Cascade Self-Prompting to localize objects, validated across 19 datasets. This method works across diverse domains with limited data.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17554
• PDF: https://arxiv.org/pdf/2603.17554
• Github: https://github.com/tangqh03/PF-RPN

🔹 Models citing this paper:
• https://huggingface.co/tangqh/PF-RPN

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #DeepLearning #RPN #PromptFreeAI

228 views07:38

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

📝 Summary:
EffectErase is a new video object removal method that effectively erases dynamic objects and their visual effects. It introduces VOR, a large dataset for training, and uses reciprocal learning with task-aware guidance for high-quality results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19224
• PDF: https://arxiv.org/pdf/2603.19224
• Project Page: https://henghuiding.com/EffectErase/
• Github: https://github.com/FudanCVL/EffectErase

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoEditing #ComputerVision #ObjectRemoval #DeepLearning #AI

173 views08:38

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

📝 Summary:
VID-AD is a dataset for logical anomaly detection in industrial inspection, specifically addressing challenges from visual distractions. A new language-based framework is also proposed, which uses text descriptions and contrastive learning to capture logical attributes.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13964
• PDF: https://arxiv.org/pdf/2603.13964
• Github: https://github.com/nkthiroto/VID-AD

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AnomalyDetection #IndustrialInspection #ComputerVision #MachineLearning #Datasets

358 views14:40

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

📝 Summary:
DreamPartGen generates 3D objects by modeling part geometry and appearance with Duplex Part Latents. It captures inter-part relationships using Relational Semantic Latents for improved text-shape alignment. A co-denoising process ensures consistency and achieves state-of-the-art results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19216
• PDF: https://arxiv.org/pdf/2603.19216
• Project Page: https://plan-lab.github.io/dreampartgen

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DGeneration #GenerativeAI #DeepLearning #ComputerVision #TextTo3D

❤1

484 views19:41

✨ Explore Data Science 📝 Write your paper

ML Research Hub

2:45

This media is not supported in your browser

VIEW IN TELEGRAM

✨WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

📝 Summary:
This research investigates if 2D foundation image models inherently possess 3D world modeling capabilities. It proposes an agentic framework to leverage this, demonstrating that 2D models can synthesize expansive, consistent 3D worlds.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19708
• PDF: https://arxiv.org/pdf/2603.19708
• Project Page: https://ziyaerkoc.com/worldagents/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #ComputerVision #3DWorldModels #GenerativeAI #FoundationModels

113 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

📝 Summary:
LumosX enhances text-to-video generation by improving face-attribute alignment and subject consistency. It uses a new data pipeline to infer subject dependencies and Relational Attention mechanisms to explicitly link subjects with attributes, achieving state-of-the-art personalized multi-subject ...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20192
• PDF: https://arxiv.org/pdf/2603.20192
• Project Page: https://jiazheng-xing.github.io/lumosx-home/
• Github: https://github.com/alibaba-damo-academy/Lumos-Custom

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TextToVideo #VideoGeneration #PersonalizedAI #ComputerVision #DeepLearning

103 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Teaching an Agent to Sketch One Part at a Time

📝 Summary:
Researchers developed an agent that generates vector sketches incrementally, one part at a time. It uses a multi-modal language model and process-reward reinforcement learning with a new part-annotated dataset. This enables controllable and editable text-to-vector sketch generation.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19500
• PDF: https://arxiv.org/pdf/2603.19500

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #GenerativeAI #MachineLearning #ComputerVision #ReinforcementLearning

130 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

📝 Summary:
HopChain is a framework that synthesizes multi-hop vision-language reasoning data to improve VLMs. This data features logically dependent reasoning chains, addressing VLMs' struggle with complex reasoning. Training with HopChain data significantly enhances generalizable VLM performance across div...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17024
• PDF: https://arxiv.org/pdf/2603.17024

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLMs #DataSynthesis #MultiHopReasoning #AIResearch #ComputerVision

95 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

📝 Summary:
TerraScope is a new VLM for Earth Observation enabling pixel-grounded geospatial reasoning. It offers modality-flexible and multi-temporal capabilities, outperforming existing models on a new benchmark for accurate and interpretable results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19039
• PDF: https://arxiv.org/pdf/2603.19039
• Project Page: https://shuyansy.github.io/terrascope/
• Github: https://github.com/shuyansy/Earth-Observation-VLMs

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#EarthObservation #VLM #Geospatial #RemoteSensing #ComputerVision

114 views07:38

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering

📝 Summary:
HiMu is a training-free framework for long video QA. It efficiently selects relevant frames using hierarchical query decomposition with lightweight multimodal experts, preserving temporal and cross-modal structure. HiMu advances the efficiency-accuracy Pareto front.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18558
• PDF: https://arxiv.org/pdf/2603.18558
• Project Page: https://danbenami.github.io/HiMu.io/
• Github: https://github.com/DanBenAmi/HiMu

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoQA #MultimodalAI #ComputerVision #MachineLearning #AI

132 views07:39

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Versatile Editing of Video Content, Actions, and Dynamics without Training

📝 Summary:
DynaEdit is a training-free method for versatile video editing using pretrained text-to-video models. It addresses limitations in handling complex edits, actions, and object interactions by solving technical issues like misalignment and jitter, achieving state-of-the-art results.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17989
• PDF: https://arxiv.org/pdf/2603.17989
• Project Page: https://dynaedit.github.io

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoEditing #TextToVideo #GenerativeAI #ComputerVision #AIResearch

arXiv.org

Versatile Editing of Video Content, Actions, and Dynamics without Training

Controlled video generation has seen drastic improvements in recent years. However, editing actions and dynamic events, or inserting contents that should affect the behaviors of other objects in...

137 views10:41

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

📝 Summary:
This paper shifts VLM image tampering detection from coarse object masks to pixel-level analysis with semantic understanding. It introduces a new taxonomy, benchmark, and metrics to evaluate both localization accuracy and the meaning of image modifications. This offers a more rigorous standard fo...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20193
• PDF: https://arxiv.org/pdf/2603.20193
• Github: https://github.com/VILA-Lab/PIXAR

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLM #ImageTampering #DeepfakeDetection #ComputerVision #AIResearch

80 views21:45

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform