✨TAPS: Task Aware Proposal Distributions for Speculative Sampling
📝 Summary:
Speculative decoding quality depends on matching draft model training data to the downstream task. Task-specific training yields specialized drafters that are best combined at inference time using confidence-based routing, outperforming averaging. Confidence is a more effective routing signal tha...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27027
• PDF: https://arxiv.org/pdf/2603.27027
• Github: https://github.com/Moe-Zbeeb/TAPS
🔹 Models citing this paper:
• https://huggingface.co/zbeeb/Hass-MathInstruct_20epochs
• https://huggingface.co/zbeeb/Hass-ShareGPT_20epochs
• https://huggingface.co/zbeeb/Hass-Sharegpt-Mathinstruct-20epochs
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zbeeb/TAPS-Datasets
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeculativeDecoding #LLM #MachineLearning #AIResearch #NLP
📝 Summary:
Speculative decoding quality depends on matching draft model training data to the downstream task. Task-specific training yields specialized drafters that are best combined at inference time using confidence-based routing, outperforming averaging. Confidence is a more effective routing signal tha...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27027
• PDF: https://arxiv.org/pdf/2603.27027
• Github: https://github.com/Moe-Zbeeb/TAPS
🔹 Models citing this paper:
• https://huggingface.co/zbeeb/Hass-MathInstruct_20epochs
• https://huggingface.co/zbeeb/Hass-ShareGPT_20epochs
• https://huggingface.co/zbeeb/Hass-Sharegpt-Mathinstruct-20epochs
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zbeeb/TAPS-Datasets
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeculativeDecoding #LLM #MachineLearning #AIResearch #NLP
arXiv.org
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however,...
✨KAT-Coder-V2 Technical Report
📝 Summary:
KAT-Coder-V2 is an agentic coding model that uses a 'Specialize-then-Unify' approach across five expert domains. It employs novel training methods and infrastructure, achieving strong performance on SWE-bench, PinchBench, and other coding benchmarks.
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27703
• PDF: https://arxiv.org/pdf/2603.27703
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #Coding #LLM #MachineLearning #Research
📝 Summary:
KAT-Coder-V2 is an agentic coding model that uses a 'Specialize-then-Unify' approach across five expert domains. It employs novel training methods and infrastructure, achieving strong performance on SWE-bench, PinchBench, and other coding benchmarks.
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27703
• PDF: https://arxiv.org/pdf/2603.27703
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #Coding #LLM #MachineLearning #Research
✨Unified Number-Free Text-to-Motion Generation Via Flow Matching
📝 Summary:
Existing text-to-motion models struggle with variable agents, leading to inefficiency and errors. This paper proposes Unified Motion Flow UMF, a two-stage approach prior and reaction that uses P-Flow and S-Flow in a unified latent space. UMF effectively generates multi-person motion from text, mi...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27040
• PDF: https://arxiv.org/pdf/2603.27040
• Project Page: https://githubhgh.github.io/umf/
• Github: https://github.com/Githubhgh/UMF_CVPR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToMotion #FlowMatching #GenerativeAI #MotionSynthesis #DeepLearning
📝 Summary:
Existing text-to-motion models struggle with variable agents, leading to inefficiency and errors. This paper proposes Unified Motion Flow UMF, a two-stage approach prior and reaction that uses P-Flow and S-Flow in a unified latent space. UMF effectively generates multi-person motion from text, mi...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27040
• PDF: https://arxiv.org/pdf/2603.27040
• Project Page: https://githubhgh.github.io/umf/
• Github: https://github.com/Githubhgh/UMF_CVPR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToMotion #FlowMatching #GenerativeAI #MotionSynthesis #DeepLearning
✨Text Data Integration
📝 Summary:
This paper argues for integrating textual data into data integration systems, as current approaches largely focus on structured data. It will explore the challenges, state-of-the-art, and open problems in utilizing unstructured text.
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27055
• PDF: https://arxiv.org/pdf/2603.27055
• Project Page: https://dtim.upc.edu/en
• Github: https://github.com/dtim-upc/THOR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DataIntegration #UnstructuredData #TextData #NLP #DataScience
📝 Summary:
This paper argues for integrating textual data into data integration systems, as current approaches largely focus on structured data. It will explore the challenges, state-of-the-art, and open problems in utilizing unstructured text.
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27055
• PDF: https://arxiv.org/pdf/2603.27055
• Project Page: https://dtim.upc.edu/en
• Github: https://github.com/dtim-upc/THOR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DataIntegration #UnstructuredData #TextData #NLP #DataScience
✨A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI
📝 Summary:
This paper finds that even state-of-the-art multi-billion parameter AI models struggle with surgical tool detection, a seemingly simple task. Scaling models further offers diminishing returns, suggesting fundamental limitations for current Vision Language Models in surgical use cases beyond just ...
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27341
• PDF: https://arxiv.org/pdf/2603.27341
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SurgicalAI #MedicalAI #FoundationModels #VisionLanguageModels #AIHealthcare
📝 Summary:
This paper finds that even state-of-the-art multi-billion parameter AI models struggle with surgical tool detection, a seemingly simple task. Scaling models further offers diminishing returns, suggesting fundamental limitations for current Vision Language Models in surgical use cases beyond just ...
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27341
• PDF: https://arxiv.org/pdf/2603.27341
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SurgicalAI #MedicalAI #FoundationModels #VisionLanguageModels #AIHealthcare
This media is not supported in your browser
VIEW IN TELEGRAM
✨HandX: Scaling Bimanual Motion and Interaction Generation
📝 Summary:
HandX presents a new foundation for bimanual hand motion synthesis, offering a high-fidelity dataset, an LLM-driven annotation method, and new evaluation metrics. It enables high-quality dexterous motion generation, with scaling trends observed.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28766
• PDF: https://arxiv.org/pdf/2603.28766
• Project Page: https://github.com/handx-project/HandX
• Github: https://github.com/handx-project/HandX
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MotionSynthesis #BimanualInteraction #DexterousManipulation #AIResearch #LLM
📝 Summary:
HandX presents a new foundation for bimanual hand motion synthesis, offering a high-fidelity dataset, an LLM-driven annotation method, and new evaluation metrics. It enables high-quality dexterous motion generation, with scaling trends observed.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28766
• PDF: https://arxiv.org/pdf/2603.28766
• Project Page: https://github.com/handx-project/HandX
• Github: https://github.com/handx-project/HandX
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MotionSynthesis #BimanualInteraction #DexterousManipulation #AIResearch #LLM
✨AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding
📝 Summary:
AdaptToken enables efficient long video understanding for MLLMs by using model uncertainty to dynamically select relevant tokens. It allocates a global token budget and supports early stopping, significantly improving accuracy and reducing inference time across benchmarks.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28696
• PDF: https://arxiv.org/pdf/2603.28696
• Project Page: https://haozheqi.github.io/adapt-token
• Github: https://github.com/HaozheQi/AdaptToken
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MLLM #VideoUnderstanding #MachineLearning #AIResearch #TokenSelection
📝 Summary:
AdaptToken enables efficient long video understanding for MLLMs by using model uncertainty to dynamically select relevant tokens. It allocates a global token budget and supports early stopping, significantly improving accuracy and reducing inference time across benchmarks.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28696
• PDF: https://arxiv.org/pdf/2603.28696
• Project Page: https://haozheqi.github.io/adapt-token
• Github: https://github.com/HaozheQi/AdaptToken
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MLLM #VideoUnderstanding #MachineLearning #AIResearch #TokenSelection
This media is not supported in your browser
VIEW IN TELEGRAM
✨LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
📝 Summary:
This paper introduces KITScenes LongTail, a new dataset for long-tail driving events. It offers multi-view video, trajectories, and multilingual expert reasoning traces. This resource improves few-shot generalization and evaluates multimodal models instruction following capabilities.
🔹 Publication Date: Published on Mar 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23607
• PDF: https://arxiv.org/pdf/2603.23607
• Project Page: https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail
✨ Datasets citing this paper:
• https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AutonomousDriving #ComputerVision #Datasets #LongTailLearning #MultimodalAI
📝 Summary:
This paper introduces KITScenes LongTail, a new dataset for long-tail driving events. It offers multi-view video, trajectories, and multilingual expert reasoning traces. This resource improves few-shot generalization and evaluates multimodal models instruction following capabilities.
🔹 Publication Date: Published on Mar 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23607
• PDF: https://arxiv.org/pdf/2603.23607
• Project Page: https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail
✨ Datasets citing this paper:
• https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AutonomousDriving #ComputerVision #Datasets #LongTailLearning #MultimodalAI
Forwarded from Machine Learning with Python
Follow the Machine Learning with Python channel on WhatsApp: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
❤1
✨ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
📝 Summary:
ChartNet is a 1.5 million-sample multimodal dataset. It improves AI models chart understanding by providing diverse charts with aligned visual, textual, and numerical data. Fine-tuning on this open-source dataset significantly enhances performance in data visualization.
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27064
• PDF: https://arxiv.org/pdf/2603.27064
• Project Page: https://huggingface.co/datasets/ibm-granite/ChartNet
🔹 Models citing this paper:
• https://huggingface.co/ibm-granite/granite-4.0-3b-vision
• https://huggingface.co/beaupi/granite-4.0-3b-vision-oQ8
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ibm-granite/ChartNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ChartNet is a 1.5 million-sample multimodal dataset. It improves AI models chart understanding by providing diverse charts with aligned visual, textual, and numerical data. Fine-tuning on this open-source dataset significantly enhances performance in data visualization.
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27064
• PDF: https://arxiv.org/pdf/2603.27064
• Project Page: https://huggingface.co/datasets/ibm-granite/ChartNet
🔹 Models citing this paper:
• https://huggingface.co/ibm-granite/granite-4.0-3b-vision
• https://huggingface.co/beaupi/granite-4.0-3b-vision-oQ8
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ibm-granite/ChartNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding
📝 Summary:
STRIDE enables proactive video understanding by modeling temporal activation patterns through iterative denoising within sliding windows, improving timing decisions in streaming scenarios. AI-generate...
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27593
• PDF: https://arxiv.org/pdf/2603.27593
• Project Page: https://interlive-team.github.io/STRIDE/
• Github: https://interlive-team.github.io/STRIDE/
🔹 Models citing this paper:
• https://huggingface.co/interlive/STRIDE-2B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
STRIDE enables proactive video understanding by modeling temporal activation patterns through iterative denoising within sliding windows, improving timing decisions in streaming scenarios. AI-generate...
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27593
• PDF: https://arxiv.org/pdf/2603.27593
• Project Page: https://interlive-team.github.io/STRIDE/
• Github: https://interlive-team.github.io/STRIDE/
🔹 Models citing this paper:
• https://huggingface.co/interlive/STRIDE-2B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨SpatialLM: Training Large Language Models for Structured Indoor Modeling
📝 Summary:
SpatialLM, a multimodal large language model, processes 3D point cloud data to generate structured scene understanding outputs, achieving state-of-the-art performance in layout estimation and competit...
🔹 Publication Date: Published on Jun 9, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.07491
• PDF: https://arxiv.org/pdf/2506.07491
• Project Page: https://manycore-research.github.io/SpatialLM
• Github: https://github.com/manycore-research/SpatialLM
🔹 Models citing this paper:
• https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B
• https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Voxel51/spatial_lm_dataset
• https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset
• https://huggingface.co/datasets/manycore-research/SpatialLM-Testset
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SpatialLM, a multimodal large language model, processes 3D point cloud data to generate structured scene understanding outputs, achieving state-of-the-art performance in layout estimation and competit...
🔹 Publication Date: Published on Jun 9, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.07491
• PDF: https://arxiv.org/pdf/2506.07491
• Project Page: https://manycore-research.github.io/SpatialLM
• Github: https://github.com/manycore-research/SpatialLM
🔹 Models citing this paper:
• https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B
• https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Voxel51/spatial_lm_dataset
• https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset
• https://huggingface.co/datasets/manycore-research/SpatialLM-Testset
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
SpatialLM: Training Large Language Models for Structured Indoor Modeling
SpatialLM is a large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors,...
This media is not supported in your browser
VIEW IN TELEGRAM
✨INSID3: Training-Free In-Context Segmentation with DINOv3
📝 Summary:
INSID3 demonstrates training-free in-context segmentation using only frozen DINOv3 features. This minimalist approach achieves state-of-the-art results across various segmentation tasks with fewer parameters and no supervision.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28480
• PDF: https://arxiv.org/pdf/2603.28480
• Project Page: https://visinf.github.io/INSID3/
• Github: https://github.com/visinf/INSID3
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
INSID3 demonstrates training-free in-context segmentation using only frozen DINOv3 features. This minimalist approach achieves state-of-the-art results across various segmentation tasks with fewer parameters and no supervision.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28480
• PDF: https://arxiv.org/pdf/2603.28480
• Project Page: https://visinf.github.io/INSID3/
• Github: https://github.com/visinf/INSID3
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨LongCat-Next: Lexicalizing Modalities as Discrete Tokens
📝 Summary:
LongCat-Next introduces DiNA, a unified framework for native multimodal processing. It represents text, vision, and audio as discrete tokens in a shared space, enabling consistent autoregressive modeling. This reconciles understanding-generation conflicts and excels across many multimodal tasks.
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27538
• PDF: https://arxiv.org/pdf/2603.27538
• Project Page: https://longcat.chat/longcat-next/intro
• Github: https://github.com/meituan-longcat/LongCat-Next
🔹 Models citing this paper:
• https://huggingface.co/meituan-longcat/LongCat-Next
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
LongCat-Next introduces DiNA, a unified framework for native multimodal processing. It represents text, vision, and audio as discrete tokens in a shared space, enabling consistent autoregressive modeling. This reconciles understanding-generation conflicts and excels across many multimodal tasks.
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27538
• PDF: https://arxiv.org/pdf/2603.27538
• Project Page: https://longcat.chat/longcat-next/intro
• Github: https://github.com/meituan-longcat/LongCat-Next
🔹 Models citing this paper:
• https://huggingface.co/meituan-longcat/LongCat-Next
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨A Neural Score-Based Particle Method for the Vlasov-Maxwell-Landau System
📝 Summary:
This paper introduces neural score-based transport modeling SBTM for the Vlasov-Maxwell-Landau system. SBTM replaces the less efficient blob method, providing linear cost, greater accuracy, faster runtime, and lower memory, correctly simulating long-term plasma relaxation.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25832
• PDF: https://arxiv.org/pdf/2603.25832
• Github: https://github.com/Vilin97/Vlasov-Landau-SBTM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This paper introduces neural score-based transport modeling SBTM for the Vlasov-Maxwell-Landau system. SBTM replaces the less efficient blob method, providing linear cost, greater accuracy, faster runtime, and lower memory, correctly simulating long-term plasma relaxation.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25832
• PDF: https://arxiv.org/pdf/2603.25832
• Github: https://github.com/Vilin97/Vlasov-Landau-SBTM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
📝 Summary:
CARLA-Air unifies high-fidelity urban driving and multirotor flight simulation within a single Unreal Engine framework. This addresses the demand for joint air-ground agent modeling in photorealistic environments, supporting embodied intelligence research.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28032
• PDF: https://arxiv.org/pdf/2603.28032
• Project Page: https://github.com/louiszengCN/CarlaAir
• Github: https://github.com/louiszengCN/CarlaAir
🔹 Models citing this paper:
• https://huggingface.co/tianlezeng/CarlaAIr-v0.1.7
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
CARLA-Air unifies high-fidelity urban driving and multirotor flight simulation within a single Unreal Engine framework. This addresses the demand for joint air-ground agent modeling in photorealistic environments, supporting embodied intelligence research.
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28032
• PDF: https://arxiv.org/pdf/2603.28032
• Project Page: https://github.com/louiszengCN/CarlaAir
• Github: https://github.com/louiszengCN/CarlaAir
🔹 Models citing this paper:
• https://huggingface.co/tianlezeng/CarlaAIr-v0.1.7
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified...
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and...
✨SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering
📝 Summary:
A semantic-aware and geometry-guided token pruning framework is presented for efficient 3D question answering with multi-view images, achieving significant reductions in token budget and inference lat...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29437
• PDF: https://arxiv.org/pdf/2603.29437
• Project Page: https://github.com/intcomp/SegPruner
• Github: https://github.com/intcomp/SegPruner
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A semantic-aware and geometry-guided token pruning framework is presented for efficient 3D question answering with multi-view images, achieving significant reductions in token budget and inference lat...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29437
• PDF: https://arxiv.org/pdf/2603.29437
• Project Page: https://github.com/intcomp/SegPruner
• Github: https://github.com/intcomp/SegPruner
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering
Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language...
✨Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells
📝 Summary:
Lingshu-Cell is a masked discrete diffusion model that learns transcriptomic state distributions and enables conditional simulation of cellular perturbations across diverse tissues and species. AI-gen...
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25240
• PDF: https://arxiv.org/pdf/2603.25240
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Lingshu-Cell is a masked discrete diffusion model that learns transcriptomic state distributions and enables conditional simulation of cellular perturbations across diverse tissues and species. AI-gen...
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25240
• PDF: https://arxiv.org/pdf/2603.25240
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward
📝 Summary:
VGGRPO is a latent geometry-guided framework that enhances video diffusion models' geometric consistency through a Latent Geometry Model and latent-space reinforcement learning with camera motion and ...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26599
• PDF: https://arxiv.org/pdf/2603.26599
• Project Page: https://zhaochongan.github.io/projects/VGGRPO/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
VGGRPO is a latent geometry-guided framework that enhances video diffusion models' geometric consistency through a Latent Geometry Model and latent-space reinforcement learning with camera motion and ...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26599
• PDF: https://arxiv.org/pdf/2603.26599
• Project Page: https://zhaochongan.github.io/projects/VGGRPO/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research