ML Research Hub
32.5K subscribers
6K photos
385 videos
24 files
6.49K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Prompt-Free Universal Region Proposal Network

📝 Summary:
PF-RPN is a novel network that identifies potential objects without needing external prompts, improving flexibility. It uses Sparse Image-Aware Adapters and Cascade Self-Prompting to localize objects, validated across 19 datasets. This method works across diverse domains with limited data.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17554
• PDF: https://arxiv.org/pdf/2603.17554
• Github: https://github.com/tangqh03/PF-RPN

🔹 Models citing this paper:
https://huggingface.co/tangqh/PF-RPN

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #DeepLearning #RPN #PromptFreeAI
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

📝 Summary:
EffectErase is a new video object removal method that effectively erases dynamic objects and their visual effects. It introduces VOR, a large dataset for training, and uses reciprocal learning with task-aware guidance for high-quality results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19224
• PDF: https://arxiv.org/pdf/2603.19224
• Project Page: https://henghuiding.com/EffectErase/
• Github: https://github.com/FudanCVL/EffectErase

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoEditing #ComputerVision #ObjectRemoval #DeepLearning #AI
Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

📝 Summary:
LLMs struggle with low-resource language translation due to data scarcity. WALAR, a novel RL method, uses only monolingual text to improve LLM translation by mitigating reward hacking in quality estimation models. This significantly outperforms existing multilingual LLMs.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13045
• PDF: https://arxiv.org/pdf/2603.13045
• Github: https://github.com/LeiLiLab/WALAR

🔹 Models citing this paper:
https://huggingface.co/lyf07/LLaMAX3-8B-Alpaca-WALAR
https://huggingface.co/lyf07/Translategemma-4B-it-WALAR
https://huggingface.co/lyf07/Qwen3-8B-WALAR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LLM #MultilingualTranslation #NLP #LowResourceLanguages
ReactMotion: Generating Reactive Listener Motions from Speaker Utterance

📝 Summary:
This paper introduces ReactMotion, a framework for generating natural listener body motions that react appropriately to speaker utterances. It uses a large dataset and preference-based training to create diverse, realistic responses, outperforming prior methods.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15083
• PDF: https://arxiv.org/pdf/2603.15083

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MachineLearning #HumanComputerInteraction #GenerativeAI #ComputerAnimation
SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation

📝 Summary:
SimulU proposes a training-free policy for long-form simultaneous speech-to-speech translation SimulS2S. It uses history management and cross-attention from pre-trained models to regulate input and output, achieving good quality-latency without specific training.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16924
• PDF: https://arxiv.org/pdf/2603.16924

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechToSpeech #SimultaneousTranslation #NLP #AI #DeepLearning
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

📝 Summary:
The paper presents AndroTMem, a framework and benchmark diagnosing interaction memory failures in long-horizon GUI agents. It proposes Anchored State Memory ASM, which uses causally linked intermediate-state anchors to overcome this bottleneck, improving task completion rates by up to 30%.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18429
• PDF: https://arxiv.org/pdf/2603.18429

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GUIAgents #AIMemory #AIAgents #AIResearch #HumanComputerInteraction
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark

📝 Summary:
PARSA-Bench is the first benchmark for Persian audio-language models, featuring 16 tasks covering speech, paralinguistics, and cultural audio comprehension. It reveals current models struggle with Persian's unique audio challenges like poetry and music, performing poorly on culturally-grounded ta...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14456
• PDF: https://arxiv.org/pdf/2603.14456

Datasets citing this paper:
https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PersianAI #AudioLanguageModels #NLP #Benchmarking #SpeechProcessing
Tinted Frames: Question Framing Blinds Vision-Language Models

📝 Summary:
Vision-language models suffer selective blindness, where linguistic framing degrades visual attention and performance. Constrained framings reduce focus on relevant image regions. A new prompt-tuning method improves visual grounding and performance across different framings.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19203
• PDF: https://arxiv.org/pdf/2603.19203
• Project Page: https://davidhalladay.github.io/tinted_frames_demo/
• Github: https://github.com/davidhalladay/Tinted-Frames

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #PromptEngineering #AIAttention #DeepLearning #AIResearch
VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

📝 Summary:
VID-AD is a dataset for logical anomaly detection in industrial inspection, specifically addressing challenges from visual distractions. A new language-based framework is also proposed, which uses text descriptions and contrastive learning to capture logical attributes.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13964
• PDF: https://arxiv.org/pdf/2603.13964
• Github: https://github.com/nkthiroto/VID-AD

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AnomalyDetection #IndustrialInspection #ComputerVision #MachineLearning #Datasets
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

📝 Summary:
MultiTempBench evaluates LLMs multilingual temporal reasoning across various calendars and languages. It finds that tokenization quality, specifically fragmentation of temporal data, is a major bottleneck that severely reduces accuracy in low-resource languages and less common calendar formats.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19017
• PDF: https://arxiv.org/pdf/2603.19017
• Github: https://github.com/gagan3012/mtb

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #TemporalReasoning #Tokenization #MultilingualAI #NLP
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

📝 Summary:
DreamPartGen generates 3D objects by modeling part geometry and appearance with Duplex Part Latents. It captures inter-part relationships using Relational Semantic Latents for improved text-shape alignment. A co-denoising process ensures consistency and achieves state-of-the-art results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19216
• PDF: https://arxiv.org/pdf/2603.19216
• Project Page: https://plan-lab.github.io/dreampartgen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #GenerativeAI #DeepLearning #ComputerVision #TextTo3D
1
Please open Telegram to view this post
VIEW IN TELEGRAM