✨AVControl: Efficient Framework for Training Audio-Visual Controls
📝 Summary:
AVControl efficiently enables modular audio-visual generation by training diverse controls as separate LoRA adapters on a parallel canvas in LTX-2. It achieves superior performance on various tasks including depth and pose guidance, requiring minimal computational resources.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24793
• PDF: https://arxiv.org/pdf/2603.24793
• Project Page: https://matanby.github.io/AVControl/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioVisualAI #GenerativeAI #LoRA #EfficientAI #DeepLearning
📝 Summary:
AVControl efficiently enables modular audio-visual generation by training diverse controls as separate LoRA adapters on a parallel canvas in LTX-2. It achieves superior performance on various tasks including depth and pose guidance, requiring minimal computational resources.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24793
• PDF: https://arxiv.org/pdf/2603.24793
• Project Page: https://matanby.github.io/AVControl/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioVisualAI #GenerativeAI #LoRA #EfficientAI #DeepLearning
arXiv.org
AVControl: Efficient Framework for Training Audio-Visual Controls
Controlling video and audio generation requires diverse modalities, from depth and pose to camera trajectories and audio transformations, yet existing approaches either train a single monolithic...
❤1
✨PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders
📝 Summary:
PMT introduces a Plain Mask Decoder for fast image and video segmentation using frozen Vision Foundation Model encoders. This preserves VFM multi-task sharing, achieving competitive accuracy and significant speed improvements over prior methods.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25398
• PDF: https://arxiv.org/pdf/2603.25398
• Github: https://github.com/tue-mps/pmt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageSegmentation #VideoSegmentation #Transformers #ComputerVision #DeepLearning
📝 Summary:
PMT introduces a Plain Mask Decoder for fast image and video segmentation using frozen Vision Foundation Model encoders. This preserves VFM multi-task sharing, achieving competitive accuracy and significant speed improvements over prior methods.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25398
• PDF: https://arxiv.org/pdf/2603.25398
• Github: https://github.com/tue-mps/pmt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageSegmentation #VideoSegmentation #Transformers #ComputerVision #DeepLearning
✨IQuest-Coder-V1 Technical Report
📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning
📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning
✨Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models
📝 Summary:
This paper introduces training-free inference-time model steering to enhance Chain-of-Thought reasoning in Large Audio-Language Models. It achieves accuracy gains up to 4.4% and shows cross-modal transfer, where text-derived steering vectors efficiently guide speech reasoning. This positions mode...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14636
• PDF: https://arxiv.org/pdf/2603.14636
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MachineLearning #LALMs #ChainOfThought #ModelSteering
📝 Summary:
This paper introduces training-free inference-time model steering to enhance Chain-of-Thought reasoning in Large Audio-Language Models. It achieves accuracy gains up to 4.4% and shows cross-modal transfer, where text-derived steering vectors efficiently guide speech reasoning. This positions mode...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14636
• PDF: https://arxiv.org/pdf/2603.14636
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MachineLearning #LALMs #ChainOfThought #ModelSteering
✨Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition
📝 Summary:
CroBo is a visual state representation framework that learns what-is-where composition for robotics. It uses global-to-local reconstruction to encode scene element identities and spatial locations in a compact token. This enables tracking scene dynamics for sequential decision making.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13904
• PDF: https://arxiv.org/pdf/2603.13904
• Project Page: https://seokminlee-chris.github.io/CroBo-ProjectPage/
• Github: https://github.com/SeokminLee-Chris/CroBo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #ComputerVision #SceneUnderstanding #AI #StateRepresentation
📝 Summary:
CroBo is a visual state representation framework that learns what-is-where composition for robotics. It uses global-to-local reconstruction to encode scene element identities and spatial locations in a compact token. This enables tracking scene dynamics for sequential decision making.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13904
• PDF: https://arxiv.org/pdf/2603.13904
• Project Page: https://seokminlee-chris.github.io/CroBo-ProjectPage/
• Github: https://github.com/SeokminLee-Chris/CroBo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #ComputerVision #SceneUnderstanding #AI #StateRepresentation
❤1
✨VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
📝 Summary:
VFIG is a vision-language model that converts raster images into scalable vector graphics SVG. It employs a 66K dataset and hierarchical training for high-fidelity conversion, outperforming open-source models and matching proprietary ones.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.24575
• PDF: https://arxiv.org/pdf/2603.24575
• Project Page: https://vfig-proj.github.io/
• Github: https://github.com/RAIVNLab/VFig
🔹 Models citing this paper:
• https://huggingface.co/XunmeiLiu/VFIG-4B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/allenai/VFig-Image2SVG-Demo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #SVG #VectorGraphics #AI #ComputerVision
📝 Summary:
VFIG is a vision-language model that converts raster images into scalable vector graphics SVG. It employs a 66K dataset and hierarchical training for high-fidelity conversion, outperforming open-source models and matching proprietary ones.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.24575
• PDF: https://arxiv.org/pdf/2603.24575
• Project Page: https://vfig-proj.github.io/
• Github: https://github.com/RAIVNLab/VFig
🔹 Models citing this paper:
• https://huggingface.co/XunmeiLiu/VFIG-4B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/allenai/VFig-Image2SVG-Demo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #SVG #VectorGraphics #AI #ComputerVision
✨Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
📝 Summary:
ScratchMath introduces a benchmark for analyzing errors in student handwritten math. It reveals MLLMs significantly lag human experts in visual and logical reasoning, but proprietary models show potential for error explanation.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24961
• PDF: https://arxiv.org/pdf/2603.24961
• Project Page: https://bbsngg.github.io/ScratchMath/
• Github: https://github.com/ai-for-edu/ScratchMath
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ScratchMath introduces a benchmark for analyzing errors in student handwritten math. It reveals MLLMs significantly lag human experts in visual and logical reasoning, but proprietary models show potential for error explanation.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24961
• PDF: https://arxiv.org/pdf/2603.24961
• Project Page: https://bbsngg.github.io/ScratchMath/
• Github: https://github.com/ai-for-edu/ScratchMath
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
📝 Summary:
Language models typically give one answer, but many tasks have multiple solutions. This paper presents multi-answer RL, allowing LMs to generate multiple plausible answers with confidence in a single pass, improving diversity, accuracy, and computational efficiency.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24844
• PDF: https://arxiv.org/pdf/2603.24844
• Project Page: https://multi-answer-rl.github.io/
• Github: https://github.com/ishapuri/multi_answer_rl
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Language models typically give one answer, but many tasks have multiple solutions. This paper presents multi-answer RL, allowing LMs to generate multiple plausible answers with confidence in a single pass, improving diversity, accuracy, and computational efficiency.
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24844
• PDF: https://arxiv.org/pdf/2603.24844
• Project Page: https://multi-answer-rl.github.io/
• Github: https://github.com/ishapuri/multi_answer_rl
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨AVO: Agentic Variation Operators for Autonomous Evolutionary Search
📝 Summary:
Agentic variation operators enable autonomous discovery of performance-critical micro-architectural optimizations for attention kernels, outperforming state-of-the-art implementations on advanced GPU ...
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24517
• PDF: https://arxiv.org/pdf/2603.24517
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Agentic variation operators enable autonomous discovery of performance-critical micro-architectural optimizations for attention kernels, outperforming state-of-the-art implementations on advanced GPU ...
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24517
• PDF: https://arxiv.org/pdf/2603.24517
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching
📝 Summary:
WAFT-Stereo achieves state-of-the-art stereo matching performance by replacing cost volumes with warping techniques, demonstrating superior efficiency and accuracy on major benchmarks. AI-generated su...
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24836
• PDF: https://arxiv.org/pdf/2603.24836
• Github: https://github.com/princeton-vl/WAFT-Stereo
🔹 Models citing this paper:
• https://huggingface.co/MemorySlices/WAFT-Stereo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
WAFT-Stereo achieves state-of-the-art stereo matching performance by replacing cost volumes with warping techniques, demonstrating superior efficiency and accuracy on major benchmarks. AI-generated su...
🔹 Publication Date: Published on Mar 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24836
• PDF: https://arxiv.org/pdf/2603.24836
• Github: https://github.com/princeton-vl/WAFT-Stereo
🔹 Models citing this paper:
• https://huggingface.co/MemorySlices/WAFT-Stereo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
📝 Summary:
QuantAgent is a multi-agent LLM framework for high-frequency trading. It uses specialized agents for indicators, patterns, trends, and risk to make rapid decisions. It outperforms existing neural and rule-based systems in accuracy and returns.
🔹 Publication Date: Published on Sep 12, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.09995
• PDF: https://arxiv.org/pdf/2509.09995
• Project Page: https://Y-Research-SBU.github.io/QuantAgent/
• Github: https://github.com/Y-Research-SBU/QuantAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #MultiAgent #HighFrequencyTrading #FinTech #AlgorithmicTrading
📝 Summary:
QuantAgent is a multi-agent LLM framework for high-frequency trading. It uses specialized agents for indicators, patterns, trends, and risk to make rapid decisions. It outperforms existing neural and rule-based systems in accuracy and returns.
🔹 Publication Date: Published on Sep 12, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.09995
• PDF: https://arxiv.org/pdf/2509.09995
• Project Page: https://Y-Research-SBU.github.io/QuantAgent/
• Github: https://github.com/Y-Research-SBU/QuantAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #MultiAgent #HighFrequencyTrading #FinTech #AlgorithmicTrading
❤1