✨Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs
📝 Summary:
A hardware-software co-design framework is proposed for on-device LLMs. It models training loss and uses roofline analysis to link accuracy and latency, speeding up architecture selection. This yields better performance on target hardware.
🔹 Publication Date: Published on Feb 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10377
• PDF: https://arxiv.org/pdf/2602.10377
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A hardware-software co-design framework is proposed for on-device LLMs. It models training loss and uses roofline analysis to link accuracy and latency, speeding up architecture selection. This yields better performance on target hardware.
🔹 Publication Date: Published on Feb 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10377
• PDF: https://arxiv.org/pdf/2602.10377
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤2
✨Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
📝 Summary:
Fish-Speech is a novel TTS framework using a Dual-AR architecture with GFSQ for efficient codebook processing and high-fidelity speech. It leverages LLMs for linguistic feature extraction, streamlining multilingual support by eliminating G2P. This significantly improves TTS for complex scenarios ...
🔹 Publication Date: Published on Nov 2, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2411.01156
• PDF: https://arxiv.org/pdf/2411.01156
• Github: https://github.com/fishaudio/fish-speech
🔹 Models citing this paper:
• https://huggingface.co/fishaudio/fish-speech-1.5
• https://huggingface.co/fishaudio/fish-speech-1.4
• https://huggingface.co/ModelsLab/fish-speech-1.5
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/mediverseai/mediverse.ai
• https://huggingface.co/spaces/fishaudio/fish-agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToSpeech #LLM #SpeechSynthesis #Multilingual #AI
📝 Summary:
Fish-Speech is a novel TTS framework using a Dual-AR architecture with GFSQ for efficient codebook processing and high-fidelity speech. It leverages LLMs for linguistic feature extraction, streamlining multilingual support by eliminating G2P. This significantly improves TTS for complex scenarios ...
🔹 Publication Date: Published on Nov 2, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2411.01156
• PDF: https://arxiv.org/pdf/2411.01156
• Github: https://github.com/fishaudio/fish-speech
🔹 Models citing this paper:
• https://huggingface.co/fishaudio/fish-speech-1.5
• https://huggingface.co/fishaudio/fish-speech-1.4
• https://huggingface.co/ModelsLab/fish-speech-1.5
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/mediverseai/mediverse.ai
• https://huggingface.co/spaces/fishaudio/fish-agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToSpeech #LLM #SpeechSynthesis #Multilingual #AI
arXiv.org
Fish-Speech: Leveraging Large Language Models for Advanced...
Text-to-Speech (TTS) systems face ongoing challenges in processing complex linguistic features, handling polyphonic expressions, and producing natural-sounding multilingual speech - capabilities...
❤2
✨GPT-4 Technical Report
📝 Summary:
GPT-4 is a multimodal Transformer model accepting image and text inputs. It achieves human-level performance on professional and academic benchmarks through pre-training and post-training alignment. Its development prioritized predictable scaling.
🔹 Publication Date: Published on Mar 15, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2303.08774
• PDF: https://arxiv.org/pdf/2303.08774
• Github: https://github.com/openai/evals
🔹 Models citing this paper:
• https://huggingface.co/openchat/openchat_3.5
• https://huggingface.co/openchat/openchat-3.5-0106
• https://huggingface.co/openchat/openchat-3.5-1210
✨ Datasets citing this paper:
• https://huggingface.co/datasets/m-a-p/CHC-Bench
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ludwigstumpp/llm-leaderboard
• https://huggingface.co/spaces/dingliyu/skillmix
• https://huggingface.co/spaces/SSGHJKKNBVCXZWQ134578000JJBBBBNNNMKLL/AGI-Framework
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GPT4 #AI #LLM #MultimodalAI #DeepLearning
📝 Summary:
GPT-4 is a multimodal Transformer model accepting image and text inputs. It achieves human-level performance on professional and academic benchmarks through pre-training and post-training alignment. Its development prioritized predictable scaling.
🔹 Publication Date: Published on Mar 15, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2303.08774
• PDF: https://arxiv.org/pdf/2303.08774
• Github: https://github.com/openai/evals
🔹 Models citing this paper:
• https://huggingface.co/openchat/openchat_3.5
• https://huggingface.co/openchat/openchat-3.5-0106
• https://huggingface.co/openchat/openchat-3.5-1210
✨ Datasets citing this paper:
• https://huggingface.co/datasets/m-a-p/CHC-Bench
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ludwigstumpp/llm-leaderboard
• https://huggingface.co/spaces/dingliyu/skillmix
• https://huggingface.co/spaces/SSGHJKKNBVCXZWQ134578000JJBBBBNNNMKLL/AGI-Framework
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GPT4 #AI #LLM #MultimodalAI #DeepLearning
arXiv.org
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios,...
❤1
✨TimeGPT-1
📝 Summary:
TimeGPT is the first foundation model for time series analysis, leveraging deep learning to achieve superior zero-shot prediction accuracy and efficiency. It outperforms traditional methods, making precise predictions more accessible.
🔹 Publication Date: Published on Oct 5, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2310.03589
• PDF: https://arxiv.org/pdf/2310.03589
• Github: https://github.com/Nixtla/nixtla
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TimeGPT #TimeSeries #FoundationModels #DeepLearning #DataScience
📝 Summary:
TimeGPT is the first foundation model for time series analysis, leveraging deep learning to achieve superior zero-shot prediction accuracy and efficiency. It outperforms traditional methods, making precise predictions more accessible.
🔹 Publication Date: Published on Oct 5, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2310.03589
• PDF: https://arxiv.org/pdf/2310.03589
• Github: https://github.com/Nixtla/nixtla
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TimeGPT #TimeSeries #FoundationModels #DeepLearning #DataScience
✨LLM Agent Operating System
📝 Summary:
AIOS is an LLM agent operating system that addresses key challenges in agent deployment, including resource allocation, context switching, and concurrent execution. It optimizes these processes to enhance the reliability and efficiency of LLM agents.
🔹 Publication Date: Published on Mar 25, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2403.16971
• PDF: https://arxiv.org/pdf/2403.16971
• Github: https://github.com/agiresearch/aios
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMAgents #AIOS #OperatingSystems #AIResearch #ResourceManagement
📝 Summary:
AIOS is an LLM agent operating system that addresses key challenges in agent deployment, including resource allocation, context switching, and concurrent execution. It optimizes these processes to enhance the reliability and efficiency of LLM agents.
🔹 Publication Date: Published on Mar 25, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2403.16971
• PDF: https://arxiv.org/pdf/2403.16971
• Github: https://github.com/agiresearch/aios
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMAgents #AIOS #OperatingSystems #AIResearch #ResourceManagement
❤2
✨Adapting Web Agents with Synthetic Supervision
📝 Summary:
Web agents struggle to adapt to new websites due to limited data and poor synthetic data quality. SynthAgent is a framework that refines AI-generated tasks and collected trajectories to create high-quality synthetic supervision. This approach significantly improves web agent adaptation.
🔹 Publication Date: Published on Nov 8, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06101
• PDF: https://arxiv.org/pdf/2511.06101
• Project Page: https://github.com/aiming-lab/SynthAgent
• Github: https://github.com/aiming-lab/SynthAgent
🔹 Models citing this paper:
• https://huggingface.co/ChilleD/SynthAgent-SFT-Qwen2.5-VL-7B
• https://huggingface.co/ChilleD/SynthAgent-SFT-UI-TARS-1.5-7B
• https://huggingface.co/ChilleD/SynthAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WebAgents #SyntheticData #MachineLearning #AIResearch #DeepLearning
📝 Summary:
Web agents struggle to adapt to new websites due to limited data and poor synthetic data quality. SynthAgent is a framework that refines AI-generated tasks and collected trajectories to create high-quality synthetic supervision. This approach significantly improves web agent adaptation.
🔹 Publication Date: Published on Nov 8, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06101
• PDF: https://arxiv.org/pdf/2511.06101
• Project Page: https://github.com/aiming-lab/SynthAgent
• Github: https://github.com/aiming-lab/SynthAgent
🔹 Models citing this paper:
• https://huggingface.co/ChilleD/SynthAgent-SFT-Qwen2.5-VL-7B
• https://huggingface.co/ChilleD/SynthAgent-SFT-UI-TARS-1.5-7B
• https://huggingface.co/ChilleD/SynthAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WebAgents #SyntheticData #MachineLearning #AIResearch #DeepLearning
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control
📝 Summary:
This paper introduces a human-centric video world model for extended reality, using tracked head and hand poses for dexterous interaction. This system generates egocentric virtual environments, significantly improving user task performance and perceived control.
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18422
• PDF: https://arxiv.org/pdf/2602.18422
• Project Page: https://codeysun.github.io/generated-reality/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ExtendedReality #VideoGeneration #HumanComputerInteraction #VirtualEnvironments #AIResearch
📝 Summary:
This paper introduces a human-centric video world model for extended reality, using tracked head and hand poses for dexterous interaction. This system generates egocentric virtual environments, significantly improving user task performance and perceived control.
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18422
• PDF: https://arxiv.org/pdf/2602.18422
• Project Page: https://codeysun.github.io/generated-reality/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ExtendedReality #VideoGeneration #HumanComputerInteraction #VirtualEnvironments #AIResearch
✨Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty
📝 Summary:
This paper proposes using an action Jacobian penalty to remove unrealistic high-frequency signals from reinforcement learning policies without tuning. It introduces a Linear Policy Net architecture to reduce computational overhead, enabling faster convergence and efficient inference for learning ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18312
• PDF: https://arxiv.org/pdf/2602.18312
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #MachineLearning #PolicyLearning #DeepLearning #AI
📝 Summary:
This paper proposes using an action Jacobian penalty to remove unrealistic high-frequency signals from reinforcement learning policies without tuning. It introduces a Linear Policy Net architecture to reduce computational overhead, enabling faster convergence and efficient inference for learning ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18312
• PDF: https://arxiv.org/pdf/2602.18312
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #MachineLearning #PolicyLearning #DeepLearning #AI
This media is not supported in your browser
VIEW IN TELEGRAM
✨EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
📝 Summary:
EgoPush allows mobile robots to rearrange multiple objects in cluttered spaces using a single egocentric camera. It uses an object-centric latent space and stage-decomposed rewards for long-horizon tasks, outperforming end-to-end baselines and demonstrating sim-to-real transfer.
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18071
• PDF: https://arxiv.org/pdf/2602.18071
• Project Page: https://ai4ce.github.io/EgoPush/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #ComputerVision #AI #MachineLearning #RobotManipulation
📝 Summary:
EgoPush allows mobile robots to rearrange multiple objects in cluttered spaces using a single egocentric camera. It uses an object-centric latent space and stage-decomposed rewards for long-horizon tasks, outperforming end-to-end baselines and demonstrating sim-to-real transfer.
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18071
• PDF: https://arxiv.org/pdf/2602.18071
• Project Page: https://ai4ce.github.io/EgoPush/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #ComputerVision #AI #MachineLearning #RobotManipulation
✨Does Your Reasoning Model Implicitly Know When to Stop Thinking?
📝 Summary:
Large reasoning models implicitly know when to stop thinking, a capability obscured by current sampling. SAGE, a novel sampling paradigm, uncovers this efficient reasoning potential. Integrating SAGE into SAGE-RL boosts reasoning accuracy and efficiency on math benchmarks.
🔹 Publication Date: Published on Feb 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.08354
• PDF: https://arxiv.org/pdf/2602.08354
• Project Page: https://hzx122.github.io/sage-rl/
• Github: https://hzx122.github.io/sage-rl/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #LLMs #Reasoning #MachineLearning #Efficiency
📝 Summary:
Large reasoning models implicitly know when to stop thinking, a capability obscured by current sampling. SAGE, a novel sampling paradigm, uncovers this efficient reasoning potential. Integrating SAGE into SAGE-RL boosts reasoning accuracy and efficiency on math benchmarks.
🔹 Publication Date: Published on Feb 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.08354
• PDF: https://arxiv.org/pdf/2602.08354
• Project Page: https://hzx122.github.io/sage-rl/
• Github: https://hzx122.github.io/sage-rl/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #LLMs #Reasoning #MachineLearning #Efficiency
This media is not supported in your browser
VIEW IN TELEGRAM
✨SARAH: Spatially Aware Real-time Agentic Humans
📝 Summary:
SARAH provides real-time, spatially-aware conversational motion for VR agents. It uses a causal transformer VAE and flow matching to generate natural full-body movement responsive to user position and audio, achieving state-of-the-art quality at 300+ FPS.
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18432
• PDF: https://arxiv.org/pdf/2602.18432
• Project Page: https://evonneng.github.io/sarah/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VirtualReality #AI #GenerativeAI #HumanMotion #DeepLearning
📝 Summary:
SARAH provides real-time, spatially-aware conversational motion for VR agents. It uses a causal transformer VAE and flow matching to generate natural full-body movement responsive to user position and audio, achieving state-of-the-art quality at 300+ FPS.
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18432
• PDF: https://arxiv.org/pdf/2602.18432
• Project Page: https://evonneng.github.io/sarah/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VirtualReality #AI #GenerativeAI #HumanMotion #DeepLearning
❤1
✨VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
📝 Summary:
VESPO addresses LLM RL training instability by using a variational formulation with variance reduction. It provides a sequence-level correction without length normalization, ensuring stable training and consistent gains even with high policy staleness.
🔹 Publication Date: Published on Feb 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10693
• PDF: https://arxiv.org/pdf/2602.10693
• Github: https://github.com/FloyedShen/VESPO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #ReinforcementLearning #DeepLearning #AI #NLP
📝 Summary:
VESPO addresses LLM RL training instability by using a variational formulation with variance reduction. It provides a sequence-level correction without length normalization, ensuring stable training and consistent gains even with high policy staleness.
🔹 Publication Date: Published on Feb 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10693
• PDF: https://arxiv.org/pdf/2602.10693
• Github: https://github.com/FloyedShen/VESPO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #ReinforcementLearning #DeepLearning #AI #NLP
✨AudioX: Diffusion Transformer for Anything-to-Audio Generation
📝 Summary:
AudioX is a unified Diffusion Transformer for high-quality audio and music generation with natural language control. It processes diverse modalities using a novel multi-modal masked training strategy. This model outperforms specialized systems while offering remarkable versatility.
🔹 Publication Date: Published on Mar 13, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.10522
• PDF: https://arxiv.org/pdf/2503.10522
• Project Page: https://zeyuet.github.io/AudioX/
• Github: https://github.com/ZeyueT/AudioX
🔹 Models citing this paper:
• https://huggingface.co/HKUSTAudio/AudioX
• https://huggingface.co/HKUSTAudio/AudioX-MAF-MMDiT
• https://huggingface.co/Zeyue7/AudioX
✨ Datasets citing this paper:
• https://huggingface.co/datasets/HKUSTAudio/AudioX-IFcaps
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Zeyue7/AudioX
• https://huggingface.co/spaces/Napawit/AudioX
• https://huggingface.co/spaces/ar93092/atai
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioGeneration #DiffusionModels #Transformers #AI #MultimodalAI
📝 Summary:
AudioX is a unified Diffusion Transformer for high-quality audio and music generation with natural language control. It processes diverse modalities using a novel multi-modal masked training strategy. This model outperforms specialized systems while offering remarkable versatility.
🔹 Publication Date: Published on Mar 13, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.10522
• PDF: https://arxiv.org/pdf/2503.10522
• Project Page: https://zeyuet.github.io/AudioX/
• Github: https://github.com/ZeyueT/AudioX
🔹 Models citing this paper:
• https://huggingface.co/HKUSTAudio/AudioX
• https://huggingface.co/HKUSTAudio/AudioX-MAF-MMDiT
• https://huggingface.co/Zeyue7/AudioX
✨ Datasets citing this paper:
• https://huggingface.co/datasets/HKUSTAudio/AudioX-IFcaps
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Zeyue7/AudioX
• https://huggingface.co/spaces/Napawit/AudioX
• https://huggingface.co/spaces/ar93092/atai
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioGeneration #DiffusionModels #Transformers #AI #MultimodalAI
arXiv.org
AudioX: A Unified Framework for Anything-to-Audio Generation
Audio and music generation based on flexible multimodal control signals is a widely applicable topic, with the following key challenges: 1) a unified multimodal modeling framework, and 2)...
✨Selective Training for Large Vision Language Models via Visual Information Gain
📝 Summary:
This paper proposes Visual Information Gain VIG to quantify visual inputs contribution to prediction uncertainty in Large Vision Language Models. VIG enables selective training, improving visual grounding and reducing language bias with less supervision.
🔹 Publication Date: Published on Feb 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17186
• PDF: https://arxiv.org/pdf/2602.17186
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LVLMs #SelectiveTraining #VisualInformationGain #ComputerVision #AIResearch
📝 Summary:
This paper proposes Visual Information Gain VIG to quantify visual inputs contribution to prediction uncertainty in Large Vision Language Models. VIG enables selective training, improving visual grounding and reducing language bias with less supervision.
🔹 Publication Date: Published on Feb 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17186
• PDF: https://arxiv.org/pdf/2602.17186
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LVLMs #SelectiveTraining #VisualInformationGain #ComputerVision #AIResearch
✨DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
📝 Summary:
To address limitations in existing datasets, DeepVision-103K offers a comprehensive and visually diverse mathematical dataset for multimodal reasoning. It enhances model performance, visual perception, and reasoning in large multimodal models.
🔹 Publication Date: Published on Feb 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.16742
• PDF: https://arxiv.org/pdf/2602.16742
• Github: https://github.com/SKYLENAGE-AI/DeepVision-103K
✨ Datasets citing this paper:
• https://huggingface.co/datasets/skylenage/DeepVision-103K
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultimodalAI #ComputerVision #Datasets #AIResearch #DeepLearning
📝 Summary:
To address limitations in existing datasets, DeepVision-103K offers a comprehensive and visually diverse mathematical dataset for multimodal reasoning. It enhances model performance, visual perception, and reasoning in large multimodal models.
🔹 Publication Date: Published on Feb 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.16742
• PDF: https://arxiv.org/pdf/2602.16742
• Github: https://github.com/SKYLENAGE-AI/DeepVision-103K
✨ Datasets citing this paper:
• https://huggingface.co/datasets/skylenage/DeepVision-103K
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultimodalAI #ComputerVision #Datasets #AIResearch #DeepLearning
✨Mobile-Agent-v3: Foundamental Agents for GUI Automation
📝 Summary:
This paper introduces GUI-Owl and Mobile-Agent-v3, open-source GUI agent models and frameworks. Mobile-Agent-v3 achieves new state-of-the-art performance on GUI automation benchmarks like AndroidWorld and OSWorld by building on GUI-Owl's innovations in environment infrastructure, agent capabiliti...
🔹 Publication Date: Published on Aug 21, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/GUI-Owl-7B
• https://huggingface.co/mPLUG/GUI-Owl-32B
• https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIAgent #Automation #AI #OpenSource #MachineLearning
📝 Summary:
This paper introduces GUI-Owl and Mobile-Agent-v3, open-source GUI agent models and frameworks. Mobile-Agent-v3 achieves new state-of-the-art performance on GUI automation benchmarks like AndroidWorld and OSWorld by building on GUI-Owl's innovations in environment infrastructure, agent capabiliti...
🔹 Publication Date: Published on Aug 21, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/GUI-Owl-7B
• https://huggingface.co/mPLUG/GUI-Owl-32B
• https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIAgent #Automation #AI #OpenSource #MachineLearning
✨VidEoMT: Your ViT is Secretly Also a Video Segmentation Model
📝 Summary:
VidEoMT is a video segmentation model that eliminates complex tracking modules by using a Vision Transformer encoder with query propagation and fusion. This enables efficient temporal modeling, achieving competitive accuracy and 5-10x faster processing speeds.
🔹 Publication Date: Published on Feb 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17807
• PDF: https://arxiv.org/pdf/2602.17807
• Project Page: https://www.tue-mps.org/videomt/
• Github: https://github.com/tue-mps/videomt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoSegmentation #VisionTransformers #ComputerVision #DeepLearning #AIResearch
📝 Summary:
VidEoMT is a video segmentation model that eliminates complex tracking modules by using a Vision Transformer encoder with query propagation and fusion. This enables efficient temporal modeling, achieving competitive accuracy and 5-10x faster processing speeds.
🔹 Publication Date: Published on Feb 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17807
• PDF: https://arxiv.org/pdf/2602.17807
• Project Page: https://www.tue-mps.org/videomt/
• Github: https://github.com/tue-mps/videomt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoSegmentation #VisionTransformers #ComputerVision #DeepLearning #AIResearch