✨Prompt-Free Universal Region Proposal Network
📝 Summary:
PF-RPN is a novel network that identifies potential objects without needing external prompts, improving flexibility. It uses Sparse Image-Aware Adapters and Cascade Self-Prompting to localize objects, validated across 19 datasets. This method works across diverse domains with limited data.
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17554
• PDF: https://arxiv.org/pdf/2603.17554
• Github: https://github.com/tangqh03/PF-RPN
🔹 Models citing this paper:
• https://huggingface.co/tangqh/PF-RPN
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #ComputerVision #DeepLearning #RPN #PromptFreeAI
📝 Summary:
PF-RPN is a novel network that identifies potential objects without needing external prompts, improving flexibility. It uses Sparse Image-Aware Adapters and Cascade Self-Prompting to localize objects, validated across 19 datasets. This method works across diverse domains with limited data.
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17554
• PDF: https://arxiv.org/pdf/2603.17554
• Github: https://github.com/tangqh03/PF-RPN
🔹 Models citing this paper:
• https://huggingface.co/tangqh/PF-RPN
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #ComputerVision #DeepLearning #RPN #PromptFreeAI
✨EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
📝 Summary:
EffectErase is a new video object removal method that effectively erases dynamic objects and their visual effects. It introduces VOR, a large dataset for training, and uses reciprocal learning with task-aware guidance for high-quality results.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19224
• PDF: https://arxiv.org/pdf/2603.19224
• Project Page: https://henghuiding.com/EffectErase/
• Github: https://github.com/FudanCVL/EffectErase
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoEditing #ComputerVision #ObjectRemoval #DeepLearning #AI
📝 Summary:
EffectErase is a new video object removal method that effectively erases dynamic objects and their visual effects. It introduces VOR, a large dataset for training, and uses reciprocal learning with task-aware guidance for high-quality results.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19224
• PDF: https://arxiv.org/pdf/2603.19224
• Project Page: https://henghuiding.com/EffectErase/
• Github: https://github.com/FudanCVL/EffectErase
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoEditing #ComputerVision #ObjectRemoval #DeepLearning #AI
✨Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation
📝 Summary:
LLMs struggle with low-resource language translation due to data scarcity. WALAR, a novel RL method, uses only monolingual text to improve LLM translation by mitigating reward hacking in quality estimation models. This significantly outperforms existing multilingual LLMs.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13045
• PDF: https://arxiv.org/pdf/2603.13045
• Github: https://github.com/LeiLiLab/WALAR
🔹 Models citing this paper:
• https://huggingface.co/lyf07/LLaMAX3-8B-Alpaca-WALAR
• https://huggingface.co/lyf07/Translategemma-4B-it-WALAR
• https://huggingface.co/lyf07/Qwen3-8B-WALAR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLM #MultilingualTranslation #NLP #LowResourceLanguages
📝 Summary:
LLMs struggle with low-resource language translation due to data scarcity. WALAR, a novel RL method, uses only monolingual text to improve LLM translation by mitigating reward hacking in quality estimation models. This significantly outperforms existing multilingual LLMs.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13045
• PDF: https://arxiv.org/pdf/2603.13045
• Github: https://github.com/LeiLiLab/WALAR
🔹 Models citing this paper:
• https://huggingface.co/lyf07/LLaMAX3-8B-Alpaca-WALAR
• https://huggingface.co/lyf07/Translategemma-4B-it-WALAR
• https://huggingface.co/lyf07/Qwen3-8B-WALAR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLM #MultilingualTranslation #NLP #LowResourceLanguages
✨ReactMotion: Generating Reactive Listener Motions from Speaker Utterance
📝 Summary:
This paper introduces ReactMotion, a framework for generating natural listener body motions that react appropriately to speaker utterances. It uses a large dataset and preference-based training to create diverse, realistic responses, outperforming prior methods.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15083
• PDF: https://arxiv.org/pdf/2603.15083
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MachineLearning #HumanComputerInteraction #GenerativeAI #ComputerAnimation
📝 Summary:
This paper introduces ReactMotion, a framework for generating natural listener body motions that react appropriately to speaker utterances. It uses a large dataset and preference-based training to create diverse, realistic responses, outperforming prior methods.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15083
• PDF: https://arxiv.org/pdf/2603.15083
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MachineLearning #HumanComputerInteraction #GenerativeAI #ComputerAnimation
✨SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation
📝 Summary:
SimulU proposes a training-free policy for long-form simultaneous speech-to-speech translation SimulS2S. It uses history management and cross-attention from pre-trained models to regulate input and output, achieving good quality-latency without specific training.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16924
• PDF: https://arxiv.org/pdf/2603.16924
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechToSpeech #SimultaneousTranslation #NLP #AI #DeepLearning
📝 Summary:
SimulU proposes a training-free policy for long-form simultaneous speech-to-speech translation SimulS2S. It uses history management and cross-attention from pre-trained models to regulate input and output, achieving good quality-latency without specific training.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16924
• PDF: https://arxiv.org/pdf/2603.16924
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechToSpeech #SimultaneousTranslation #NLP #AI #DeepLearning
✨AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
📝 Summary:
The paper presents AndroTMem, a framework and benchmark diagnosing interaction memory failures in long-horizon GUI agents. It proposes Anchored State Memory ASM, which uses causally linked intermediate-state anchors to overcome this bottleneck, improving task completion rates by up to 30%.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18429
• PDF: https://arxiv.org/pdf/2603.18429
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIAgents #AIMemory #AIAgents #AIResearch #HumanComputerInteraction
📝 Summary:
The paper presents AndroTMem, a framework and benchmark diagnosing interaction memory failures in long-horizon GUI agents. It proposes Anchored State Memory ASM, which uses causally linked intermediate-state anchors to overcome this bottleneck, improving task completion rates by up to 30%.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18429
• PDF: https://arxiv.org/pdf/2603.18429
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIAgents #AIMemory #AIAgents #AIResearch #HumanComputerInteraction
✨PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
📝 Summary:
PARSA-Bench is the first benchmark for Persian audio-language models, featuring 16 tasks covering speech, paralinguistics, and cultural audio comprehension. It reveals current models struggle with Persian's unique audio challenges like poetry and music, performing poorly on culturally-grounded ta...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14456
• PDF: https://arxiv.org/pdf/2603.14456
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PersianAI #AudioLanguageModels #NLP #Benchmarking #SpeechProcessing
📝 Summary:
PARSA-Bench is the first benchmark for Persian audio-language models, featuring 16 tasks covering speech, paralinguistics, and cultural audio comprehension. It reveals current models struggle with Persian's unique audio challenges like poetry and music, performing poorly on culturally-grounded ta...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14456
• PDF: https://arxiv.org/pdf/2603.14456
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PersianAI #AudioLanguageModels #NLP #Benchmarking #SpeechProcessing
✨Tinted Frames: Question Framing Blinds Vision-Language Models
📝 Summary:
Vision-language models suffer selective blindness, where linguistic framing degrades visual attention and performance. Constrained framings reduce focus on relevant image regions. A new prompt-tuning method improves visual grounding and performance across different framings.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19203
• PDF: https://arxiv.org/pdf/2603.19203
• Project Page: https://davidhalladay.github.io/tinted_frames_demo/
• Github: https://github.com/davidhalladay/Tinted-Frames
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #PromptEngineering #AIAttention #DeepLearning #AIResearch
📝 Summary:
Vision-language models suffer selective blindness, where linguistic framing degrades visual attention and performance. Constrained framings reduce focus on relevant image regions. A new prompt-tuning method improves visual grounding and performance across different framings.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19203
• PDF: https://arxiv.org/pdf/2603.19203
• Project Page: https://davidhalladay.github.io/tinted_frames_demo/
• Github: https://github.com/davidhalladay/Tinted-Frames
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #PromptEngineering #AIAttention #DeepLearning #AIResearch
✨VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction
📝 Summary:
VID-AD is a dataset for logical anomaly detection in industrial inspection, specifically addressing challenges from visual distractions. A new language-based framework is also proposed, which uses text descriptions and contrastive learning to capture logical attributes.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13964
• PDF: https://arxiv.org/pdf/2603.13964
• Github: https://github.com/nkthiroto/VID-AD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AnomalyDetection #IndustrialInspection #ComputerVision #MachineLearning #Datasets
📝 Summary:
VID-AD is a dataset for logical anomaly detection in industrial inspection, specifically addressing challenges from visual distractions. A new language-based framework is also proposed, which uses text descriptions and contrastive learning to capture logical attributes.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13964
• PDF: https://arxiv.org/pdf/2603.13964
• Github: https://github.com/nkthiroto/VID-AD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AnomalyDetection #IndustrialInspection #ComputerVision #MachineLearning #Datasets
✨What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?
📝 Summary:
MultiTempBench evaluates LLMs multilingual temporal reasoning across various calendars and languages. It finds that tokenization quality, specifically fragmentation of temporal data, is a major bottleneck that severely reduces accuracy in low-resource languages and less common calendar formats.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19017
• PDF: https://arxiv.org/pdf/2603.19017
• Github: https://github.com/gagan3012/mtb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #TemporalReasoning #Tokenization #MultilingualAI #NLP
📝 Summary:
MultiTempBench evaluates LLMs multilingual temporal reasoning across various calendars and languages. It finds that tokenization quality, specifically fragmentation of temporal data, is a major bottleneck that severely reduces accuracy in low-resource languages and less common calendar formats.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19017
• PDF: https://arxiv.org/pdf/2603.19017
• Github: https://github.com/gagan3012/mtb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #TemporalReasoning #Tokenization #MultilingualAI #NLP
✨DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising
📝 Summary:
DreamPartGen generates 3D objects by modeling part geometry and appearance with Duplex Part Latents. It captures inter-part relationships using Relational Semantic Latents for improved text-shape alignment. A co-denoising process ensures consistency and achieves state-of-the-art results.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19216
• PDF: https://arxiv.org/pdf/2603.19216
• Project Page: https://plan-lab.github.io/dreampartgen
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DGeneration #GenerativeAI #DeepLearning #ComputerVision #TextTo3D
📝 Summary:
DreamPartGen generates 3D objects by modeling part geometry and appearance with Duplex Part Latents. It captures inter-part relationships using Relational Semantic Latents for improved text-shape alignment. A co-denoising process ensures consistency and achieves state-of-the-art results.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19216
• PDF: https://arxiv.org/pdf/2603.19216
• Project Page: https://plan-lab.github.io/dreampartgen
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DGeneration #GenerativeAI #DeepLearning #ComputerVision #TextTo3D
❤1