✨Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
📝 Summary:
This paper introduces Principia, a new dataset for deriving mathematical objects, and training recipes using on-policy LLM judges. These methods significantly improve model performance and enable cross-format generalization in reasoning tasks, while also scaling test-time compute.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18886
• PDF: https://arxiv.org/pdf/2603.18886
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This paper introduces Principia, a new dataset for deriving mathematical objects, and training recipes using on-policy LLM judges. These methods significantly improve model performance and enable cross-format generalization in reasoning tasks, while also scaling test-time compute.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18886
• PDF: https://arxiv.org/pdf/2603.18886
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
📝 Summary:
Reinforcement learning infrastructure for multi-turn LLM agents that provides scalable rollout services and standardized sandbox environments for complex interactive tasks. AI-generated summary Multi-...
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18815
• PDF: https://arxiv.org/pdf/2603.18815
• Github: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Reinforcement learning infrastructure for multi-turn LLM agents that provides scalable rollout services and standardized sandbox environments for complex interactive tasks. AI-generated summary Multi-...
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18815
• PDF: https://arxiv.org/pdf/2603.18815
• Github: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨COT-FM: Cluster-wise Optimal Transport Flow Matching
📝 Summary:
COT-FM enhances Flow Matching by clustering target samples and assigning dedicated source distributions. This creates straighter probability paths, enabling faster and more reliable generation with improved quality across diverse tasks.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13395
• PDF: https://arxiv.org/pdf/2603.13395
• Project Page: https://embodiedai-ntu.github.io/cotfm/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
COT-FM enhances Flow Matching by clustering target samples and assigning dedicated source distributions. This creates straighter probability paths, enabling faster and more reliable generation with improved quality across diverse tasks.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13395
• PDF: https://arxiv.org/pdf/2603.13395
• Project Page: https://embodiedai-ntu.github.io/cotfm/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
📝 Summary:
A three-stage framework bridges semantic and kinematic conditions using discrete tokens and diffusion synthesis. Its core MoTok tokenizer achieves compact high-fidelity tokens, significantly boosting controllability, fidelity, and reducing token usage under strong kinematic constraints.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19227
• PDF: https://arxiv.org/pdf/2603.19227
• Project Page: https://rheallyc.github.io/projects/motok/
• Github: https://github.com/rheallyc/MoTok
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A three-stage framework bridges semantic and kinematic conditions using discrete tokens and diffusion synthesis. Its core MoTok tokenizer achieves compact high-fidelity tokens, significantly boosting controllability, fidelity, and reducing token usage under strong kinematic constraints.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19227
• PDF: https://arxiv.org/pdf/2603.19227
• Project Page: https://rheallyc.github.io/projects/motok/
• Github: https://github.com/rheallyc/MoTok
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
📝 Summary:
Top-tier MLLMs demonstrate limited capability in processing discrete symbols despite strong performance in complex reasoning, revealing a cognitive mismatch between visual perception and symbolic unde...
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18472
• PDF: https://arxiv.org/pdf/2603.18472
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Top-tier MLLMs demonstrate limited capability in processing discrete symbols despite strong performance in complex reasoning, revealing a cognitive mismatch between visual perception and symbolic unde...
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18472
• PDF: https://arxiv.org/pdf/2603.18472
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
📝 Summary:
A new benchmark called VisualToolChain-Bench is introduced to evaluate the tool-use capabilities of multimodal large language models in complex visual tasks requiring multi-step planning and diverse t...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15030
• PDF: https://arxiv.org/pdf/2603.15030
• Github: https://github.com/zhuzil/VTC-Bench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zzzhu/VTC-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A new benchmark called VisualToolChain-Bench is introduced to evaluate the tool-use capabilities of multimodal large language models in complex visual tasks requiring multi-step planning and diverse t...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15030
• PDF: https://arxiv.org/pdf/2603.15030
• Github: https://github.com/zhuzil/VTC-Bench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zzzhu/VTC-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
📝 Summary:
Loc3R-VLM enhances 2D Vision-Language Models with 3D understanding capabilities through spatial supervision from monocular video input, achieving superior performance in language-based localization an...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18002
• PDF: https://arxiv.org/pdf/2603.18002
• Project Page: https://kevinqu7.github.io/loc3r-vlm/
• Github: https://kevinqu7.github.io/loc3r-vlm/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Loc3R-VLM enhances 2D Vision-Language Models with 3D understanding capabilities through spatial supervision from monocular video input, achieving superior performance in language-based localization an...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18002
• PDF: https://arxiv.org/pdf/2603.18002
• Project Page: https://kevinqu7.github.io/loc3r-vlm/
• Github: https://kevinqu7.github.io/loc3r-vlm/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model
📝 Summary:
This novel framework enables 3D-aware video customization by decoupling spatial geometry from temporal motion using 1-frame optimization to build robust 3D priors. It also incorporates a visual conditioning module for enhanced texture generation and faster convergence.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18524
• PDF: https://arxiv.org/pdf/2603.18524
• Project Page: https://ko-lani.github.io/3DreamBooth
• Github: https://github.com/Ko-Lani/3DreamBooth
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This novel framework enables 3D-aware video customization by decoupling spatial geometry from temporal motion using 1-frame optimization to build robust 3D priors. It also incorporates a visual conditioning module for enhanced texture generation and faster convergence.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18524
• PDF: https://arxiv.org/pdf/2603.18524
• Project Page: https://ko-lani.github.io/3DreamBooth
• Github: https://github.com/Ko-Lani/3DreamBooth
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
📝 Summary:
MonoArt presents a unified framework for reconstructing articulated 3D objects from single images through progressive structural reasoning that enables stable articulation inference without external t...
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19231
• PDF: https://arxiv.org/pdf/2603.19231
• Project Page: https://lihaitian.com/MonoArt/
• Github: https://github.com/Quest4Science/MonoArt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MonoArt presents a unified framework for reconstructing articulated 3D objects from single images through progressive structural reasoning that enables stable articulation inference without external t...
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19231
• PDF: https://arxiv.org/pdf/2603.19231
• Project Page: https://lihaitian.com/MonoArt/
• Github: https://github.com/Quest4Science/MonoArt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MOSS-TTS Technical Report
📝 Summary:
MOSS-TTS is a speech generation model using discrete audio tokens and autoregressive modeling with capabilities for voice cloning, pronunciation control, and long-form generation across multiple langu...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18090
• PDF: https://arxiv.org/pdf/2603.18090
• Project Page: https://mosi.cn/models/moss-tts
• Github: https://github.com/OpenMOSS/MOSS-TTS
🔹 Models citing this paper:
• https://huggingface.co/OpenMOSS-Team/MOSS-TTS
• https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime
• https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer
✨ Spaces citing this paper:
• https://huggingface.co/spaces/OpenMOSS-Team/MOSS-TTS
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/JymNils/MOSS-TTS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MOSS-TTS is a speech generation model using discrete audio tokens and autoregressive modeling with capabilities for voice cloning, pronunciation control, and long-form generation across multiple langu...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18090
• PDF: https://arxiv.org/pdf/2603.18090
• Project Page: https://mosi.cn/models/moss-tts
• Github: https://github.com/OpenMOSS/MOSS-TTS
🔹 Models citing this paper:
• https://huggingface.co/OpenMOSS-Team/MOSS-TTS
• https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime
• https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer
✨ Spaces citing this paper:
• https://huggingface.co/spaces/OpenMOSS-Team/MOSS-TTS
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/JymNils/MOSS-TTS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
MOSS-TTS Technical Report
This technical report presents MOSS-TTS, a speech generation foundation model built on a scalable recipe: discrete audio tokens, autoregressive modeling, and large-scale pretraining. Built on...
🔥1
✨Prompt-Free Universal Region Proposal Network
📝 Summary:
PF-RPN is a novel network that identifies potential objects without needing external prompts, improving flexibility. It uses Sparse Image-Aware Adapters and Cascade Self-Prompting to localize objects, validated across 19 datasets. This method works across diverse domains with limited data.
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17554
• PDF: https://arxiv.org/pdf/2603.17554
• Github: https://github.com/tangqh03/PF-RPN
🔹 Models citing this paper:
• https://huggingface.co/tangqh/PF-RPN
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #ComputerVision #DeepLearning #RPN #PromptFreeAI
📝 Summary:
PF-RPN is a novel network that identifies potential objects without needing external prompts, improving flexibility. It uses Sparse Image-Aware Adapters and Cascade Self-Prompting to localize objects, validated across 19 datasets. This method works across diverse domains with limited data.
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17554
• PDF: https://arxiv.org/pdf/2603.17554
• Github: https://github.com/tangqh03/PF-RPN
🔹 Models citing this paper:
• https://huggingface.co/tangqh/PF-RPN
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #ComputerVision #DeepLearning #RPN #PromptFreeAI
✨EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
📝 Summary:
EffectErase is a new video object removal method that effectively erases dynamic objects and their visual effects. It introduces VOR, a large dataset for training, and uses reciprocal learning with task-aware guidance for high-quality results.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19224
• PDF: https://arxiv.org/pdf/2603.19224
• Project Page: https://henghuiding.com/EffectErase/
• Github: https://github.com/FudanCVL/EffectErase
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoEditing #ComputerVision #ObjectRemoval #DeepLearning #AI
📝 Summary:
EffectErase is a new video object removal method that effectively erases dynamic objects and their visual effects. It introduces VOR, a large dataset for training, and uses reciprocal learning with task-aware guidance for high-quality results.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19224
• PDF: https://arxiv.org/pdf/2603.19224
• Project Page: https://henghuiding.com/EffectErase/
• Github: https://github.com/FudanCVL/EffectErase
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoEditing #ComputerVision #ObjectRemoval #DeepLearning #AI
✨Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation
📝 Summary:
LLMs struggle with low-resource language translation due to data scarcity. WALAR, a novel RL method, uses only monolingual text to improve LLM translation by mitigating reward hacking in quality estimation models. This significantly outperforms existing multilingual LLMs.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13045
• PDF: https://arxiv.org/pdf/2603.13045
• Github: https://github.com/LeiLiLab/WALAR
🔹 Models citing this paper:
• https://huggingface.co/lyf07/LLaMAX3-8B-Alpaca-WALAR
• https://huggingface.co/lyf07/Translategemma-4B-it-WALAR
• https://huggingface.co/lyf07/Qwen3-8B-WALAR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLM #MultilingualTranslation #NLP #LowResourceLanguages
📝 Summary:
LLMs struggle with low-resource language translation due to data scarcity. WALAR, a novel RL method, uses only monolingual text to improve LLM translation by mitigating reward hacking in quality estimation models. This significantly outperforms existing multilingual LLMs.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13045
• PDF: https://arxiv.org/pdf/2603.13045
• Github: https://github.com/LeiLiLab/WALAR
🔹 Models citing this paper:
• https://huggingface.co/lyf07/LLaMAX3-8B-Alpaca-WALAR
• https://huggingface.co/lyf07/Translategemma-4B-it-WALAR
• https://huggingface.co/lyf07/Qwen3-8B-WALAR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLM #MultilingualTranslation #NLP #LowResourceLanguages
✨ReactMotion: Generating Reactive Listener Motions from Speaker Utterance
📝 Summary:
This paper introduces ReactMotion, a framework for generating natural listener body motions that react appropriately to speaker utterances. It uses a large dataset and preference-based training to create diverse, realistic responses, outperforming prior methods.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15083
• PDF: https://arxiv.org/pdf/2603.15083
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MachineLearning #HumanComputerInteraction #GenerativeAI #ComputerAnimation
📝 Summary:
This paper introduces ReactMotion, a framework for generating natural listener body motions that react appropriately to speaker utterances. It uses a large dataset and preference-based training to create diverse, realistic responses, outperforming prior methods.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15083
• PDF: https://arxiv.org/pdf/2603.15083
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MachineLearning #HumanComputerInteraction #GenerativeAI #ComputerAnimation
✨SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation
📝 Summary:
SimulU proposes a training-free policy for long-form simultaneous speech-to-speech translation SimulS2S. It uses history management and cross-attention from pre-trained models to regulate input and output, achieving good quality-latency without specific training.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16924
• PDF: https://arxiv.org/pdf/2603.16924
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechToSpeech #SimultaneousTranslation #NLP #AI #DeepLearning
📝 Summary:
SimulU proposes a training-free policy for long-form simultaneous speech-to-speech translation SimulS2S. It uses history management and cross-attention from pre-trained models to regulate input and output, achieving good quality-latency without specific training.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16924
• PDF: https://arxiv.org/pdf/2603.16924
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechToSpeech #SimultaneousTranslation #NLP #AI #DeepLearning
✨AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
📝 Summary:
The paper presents AndroTMem, a framework and benchmark diagnosing interaction memory failures in long-horizon GUI agents. It proposes Anchored State Memory ASM, which uses causally linked intermediate-state anchors to overcome this bottleneck, improving task completion rates by up to 30%.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18429
• PDF: https://arxiv.org/pdf/2603.18429
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIAgents #AIMemory #AIAgents #AIResearch #HumanComputerInteraction
📝 Summary:
The paper presents AndroTMem, a framework and benchmark diagnosing interaction memory failures in long-horizon GUI agents. It proposes Anchored State Memory ASM, which uses causally linked intermediate-state anchors to overcome this bottleneck, improving task completion rates by up to 30%.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18429
• PDF: https://arxiv.org/pdf/2603.18429
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIAgents #AIMemory #AIAgents #AIResearch #HumanComputerInteraction
✨PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
📝 Summary:
PARSA-Bench is the first benchmark for Persian audio-language models, featuring 16 tasks covering speech, paralinguistics, and cultural audio comprehension. It reveals current models struggle with Persian's unique audio challenges like poetry and music, performing poorly on culturally-grounded ta...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14456
• PDF: https://arxiv.org/pdf/2603.14456
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PersianAI #AudioLanguageModels #NLP #Benchmarking #SpeechProcessing
📝 Summary:
PARSA-Bench is the first benchmark for Persian audio-language models, featuring 16 tasks covering speech, paralinguistics, and cultural audio comprehension. It reveals current models struggle with Persian's unique audio challenges like poetry and music, performing poorly on culturally-grounded ta...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14456
• PDF: https://arxiv.org/pdf/2603.14456
✨ Datasets citing this paper:
• https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PersianAI #AudioLanguageModels #NLP #Benchmarking #SpeechProcessing
✨Tinted Frames: Question Framing Blinds Vision-Language Models
📝 Summary:
Vision-language models suffer selective blindness, where linguistic framing degrades visual attention and performance. Constrained framings reduce focus on relevant image regions. A new prompt-tuning method improves visual grounding and performance across different framings.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19203
• PDF: https://arxiv.org/pdf/2603.19203
• Project Page: https://davidhalladay.github.io/tinted_frames_demo/
• Github: https://github.com/davidhalladay/Tinted-Frames
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #PromptEngineering #AIAttention #DeepLearning #AIResearch
📝 Summary:
Vision-language models suffer selective blindness, where linguistic framing degrades visual attention and performance. Constrained framings reduce focus on relevant image regions. A new prompt-tuning method improves visual grounding and performance across different framings.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19203
• PDF: https://arxiv.org/pdf/2603.19203
• Project Page: https://davidhalladay.github.io/tinted_frames_demo/
• Github: https://github.com/davidhalladay/Tinted-Frames
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #PromptEngineering #AIAttention #DeepLearning #AIResearch
✨VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction
📝 Summary:
VID-AD is a dataset for logical anomaly detection in industrial inspection, specifically addressing challenges from visual distractions. A new language-based framework is also proposed, which uses text descriptions and contrastive learning to capture logical attributes.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13964
• PDF: https://arxiv.org/pdf/2603.13964
• Github: https://github.com/nkthiroto/VID-AD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AnomalyDetection #IndustrialInspection #ComputerVision #MachineLearning #Datasets
📝 Summary:
VID-AD is a dataset for logical anomaly detection in industrial inspection, specifically addressing challenges from visual distractions. A new language-based framework is also proposed, which uses text descriptions and contrastive learning to capture logical attributes.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13964
• PDF: https://arxiv.org/pdf/2603.13964
• Github: https://github.com/nkthiroto/VID-AD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AnomalyDetection #IndustrialInspection #ComputerVision #MachineLearning #Datasets
✨What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?
📝 Summary:
MultiTempBench evaluates LLMs multilingual temporal reasoning across various calendars and languages. It finds that tokenization quality, specifically fragmentation of temporal data, is a major bottleneck that severely reduces accuracy in low-resource languages and less common calendar formats.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19017
• PDF: https://arxiv.org/pdf/2603.19017
• Github: https://github.com/gagan3012/mtb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #TemporalReasoning #Tokenization #MultilingualAI #NLP
📝 Summary:
MultiTempBench evaluates LLMs multilingual temporal reasoning across various calendars and languages. It finds that tokenization quality, specifically fragmentation of temporal data, is a major bottleneck that severely reduces accuracy in low-resource languages and less common calendar formats.
🔹 Publication Date: Published on Mar 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19017
• PDF: https://arxiv.org/pdf/2603.19017
• Github: https://github.com/gagan3012/mtb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #TemporalReasoning #Tokenization #MultilingualAI #NLP