✨Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR
📝 Summary:
LLM-based ASR improves with multimodal conversational context, especially for entities. Raw audio context is costly, so Abstract Compression replaces prior-turn audio with fixed latent tokens, retaining transcripts. This reduces computational cost while recovering some performance gains.
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26246
• PDF: https://arxiv.org/pdf/2603.26246
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #ASR #SpeechRecognition #NLP #AI
📝 Summary:
LLM-based ASR improves with multimodal conversational context, especially for entities. Raw audio context is costly, so Abstract Compression replaces prior-turn audio with fixed latent tokens, retaining transcripts. This reduces computational cost while recovering some performance gains.
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26246
• PDF: https://arxiv.org/pdf/2603.26246
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #ASR #SpeechRecognition #NLP #AI
✨It Takes Two: A Duet of Periodicity and Directionality for Burst Flicker Removal
📝 Summary:
Flicker artifacts in short-exposure photos are addressed by Flickerformer, a transformer-based architecture. It leverages flicker's intrinsic periodicity and directionality to effectively remove artifacts without introducing ghosting, outperforming existing methods.
🔹 Publication Date: Published on Mar 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22794
• PDF: https://arxiv.org/pdf/2603.22794
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageProcessing #DeepLearning #ComputerVision #Transformers #FlickerRemoval
📝 Summary:
Flicker artifacts in short-exposure photos are addressed by Flickerformer, a transformer-based architecture. It leverages flicker's intrinsic periodicity and directionality to effectively remove artifacts without introducing ghosting, outperforming existing methods.
🔹 Publication Date: Published on Mar 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22794
• PDF: https://arxiv.org/pdf/2603.22794
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageProcessing #DeepLearning #ComputerVision #Transformers #FlickerRemoval
✨Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
📝 Summary:
Medical imaging datasets are fragmented and small, limiting foundation model development. This survey of 1000+ open-access datasets proposes a metadata-driven fusion paradigm to integrate them, creating larger resources. This scales medical imaging data for more capable foundation models.
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27460
• PDF: https://arxiv.org/pdf/2603.27460
• Project Page: https://huggingface.co/datasets/General-Medical-AI/Project-Imaging-X
• Github: https://github.com/uni-medical/Project-Imaging-X
✨ Datasets citing this paper:
• https://huggingface.co/datasets/General-Medical-AI/Project-Imaging-X
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalImaging #FoundationModels #AI #DataScience #OpenData
📝 Summary:
Medical imaging datasets are fragmented and small, limiting foundation model development. This survey of 1000+ open-access datasets proposes a metadata-driven fusion paradigm to integrate them, creating larger resources. This scales medical imaging data for more capable foundation models.
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27460
• PDF: https://arxiv.org/pdf/2603.27460
• Project Page: https://huggingface.co/datasets/General-Medical-AI/Project-Imaging-X
• Github: https://github.com/uni-medical/Project-Imaging-X
✨ Datasets citing this paper:
• https://huggingface.co/datasets/General-Medical-AI/Project-Imaging-X
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalImaging #FoundationModels #AI #DataScience #OpenData
❤1
✨Falcon Perception
📝 Summary:
Falcon Perception introduces a unified early-fusion Transformer that processes images and text within a single architecture from the first layer. This simplifies perception systems and achieves improved mask prediction and OCR performance, outperforming traditional modular designs.
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27365
• PDF: https://arxiv.org/pdf/2603.27365
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tiiuae/PBench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Falcon Perception introduces a unified early-fusion Transformer that processes images and text within a single architecture from the first layer. This simplifies perception systems and achieves improved mask prediction and OCR performance, outperforming traditional modular designs.
🔹 Publication Date: Published on Mar 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27365
• PDF: https://arxiv.org/pdf/2603.27365
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tiiuae/PBench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions
📝 Summary:
A fully automated question-answer based evaluation pipeline and comprehensive benchmark are introduced for assessing creative image manipulation tasks under complex instructions, demonstrating strong ...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26174
• PDF: https://arxiv.org/pdf/2603.26174
• Github: https://github.com/ChonghuinanWang/CREval
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ChonghuinanWang/CREval
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A fully automated question-answer based evaluation pipeline and comprehensive benchmark are introduced for assessing creative image manipulation tasks under complex instructions, demonstrating strong ...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26174
• PDF: https://arxiv.org/pdf/2603.26174
• Github: https://github.com/ChonghuinanWang/CREval
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ChonghuinanWang/CREval
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
📝 Summary:
A new benchmark called BizGenEval is introduced to evaluate image generation models on commercial visual content creation tasks across multiple document types and capability dimensions. AI-generated s...
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25732
• PDF: https://arxiv.org/pdf/2603.25732
• Project Page: https://microsoft.github.io/BizGenEval/
• Github: https://github.com/microsoft/BizGenEval
✨ Datasets citing this paper:
• https://huggingface.co/datasets/microsoft/BizGenEval
✨ Spaces citing this paper:
• https://huggingface.co/spaces/microsoft/BizGenEval-Leaderboard
• https://huggingface.co/spaces/clarence-stark/BizGenEval-Leaderboard
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A new benchmark called BizGenEval is introduced to evaluate image generation models on commercial visual content creation tasks across multiple document types and capability dimensions. AI-generated s...
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25732
• PDF: https://arxiv.org/pdf/2603.25732
• Project Page: https://microsoft.github.io/BizGenEval/
• Github: https://github.com/microsoft/BizGenEval
✨ Datasets citing this paper:
• https://huggingface.co/datasets/microsoft/BizGenEval
✨ Spaces citing this paper:
• https://huggingface.co/spaces/microsoft/BizGenEval-Leaderboard
• https://huggingface.co/spaces/clarence-stark/BizGenEval-Leaderboard
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨TrajectoryMover: Generative Movement of Object Trajectories in Videos
📝 Summary:
TrajectoryAtlas enables generative video editing by generating large-scale synthetic paired video data and training a video generator to move object 3D motion trajectories while preserving plausibilit...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29092
• PDF: https://arxiv.org/pdf/2603.29092
• Project Page: https://chhatrekiran.github.io/trajectorymover/
• Github: https://github.com/kiranchhatre/TrajectoryMover
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
TrajectoryAtlas enables generative video editing by generating large-scale synthetic paired video data and training a video generator to move object 3D motion trajectories while preserving plausibilit...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29092
• PDF: https://arxiv.org/pdf/2603.29092
• Project Page: https://chhatrekiran.github.io/trajectorymover/
• Github: https://github.com/kiranchhatre/TrajectoryMover
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos
📝 Summary:
Colon-Bench is a new comprehensive benchmark dataset for colonoscopy AI, created using an agentic workflow. It features 528 full-procedure videos with dense annotations for 14 lesion types, enabling MLLM evaluation and performance improvements.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25645
• PDF: https://arxiv.org/pdf/2603.25645
• Project Page: https://abdullahamdi.com/colon-bench
• Github: https://github.com/ajhamdi/colon-bench-eval
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ajhamdi/colon-bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Colon-Bench is a new comprehensive benchmark dataset for colonoscopy AI, created using an agentic workflow. It features 528 full-procedure videos with dense annotations for 14 lesion types, enabling MLLM evaluation and performance improvements.
🔹 Publication Date: Published on Mar 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25645
• PDF: https://arxiv.org/pdf/2603.25645
• Project Page: https://abdullahamdi.com/colon-bench
• Github: https://github.com/ajhamdi/colon-bench-eval
✨ Datasets citing this paper:
• https://huggingface.co/datasets/ajhamdi/colon-bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
Media is too big
VIEW IN TELEGRAM
✨WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation
📝 Summary:
WorldFlow3D generates unbounded 3D worlds by modeling 3D data distributions as a flow matching problem. This latent-free approach achieves rapid convergence and high-quality generation with controllable geometric and texture properties. It outperforms existing methods on both real and synthetic s...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29089
• PDF: https://arxiv.org/pdf/2603.29089
• Project Page: https://princeton-computational-imaging.github.io/WorldFlow3D/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DGeneration #GenerativeAI #FlowMatching #ComputerGraphics #AIResearch
📝 Summary:
WorldFlow3D generates unbounded 3D worlds by modeling 3D data distributions as a flow matching problem. This latent-free approach achieves rapid convergence and high-quality generation with controllable geometric and texture properties. It outperforms existing methods on both real and synthetic s...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29089
• PDF: https://arxiv.org/pdf/2603.29089
• Project Page: https://princeton-computational-imaging.github.io/WorldFlow3D/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DGeneration #GenerativeAI #FlowMatching #ComputerGraphics #AIResearch
✨The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
📝 Summary:
Large language models exhibit systematic reasoning failures when surface cues conflict with feasibility constraints, demonstrating consistent heuristic biases that can be measured and partially mitiga...
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29025
• PDF: https://arxiv.org/pdf/2603.29025
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large language models exhibit systematic reasoning failures when surface cues conflict with feasibility constraints, demonstrating consistent heuristic biases that can be measured and partially mitiga...
🔹 Publication Date: Published on Mar 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29025
• PDF: https://arxiv.org/pdf/2603.29025
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Models
📝 Summary:
P a t i e n t e d u c a t i o n m a t e r i a l s f o r s o l i d - o r g a n t r a n s p l a n t a t i o n v a r y s u b s t a n t i a l l y a c r o s s U . S . c e n t e r s , y e t n o s y s t e m ...
🔹 Publication Date: Published on Mar 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21460
• PDF: https://arxiv.org/pdf/2603.21460
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
P a t i e n t e d u c a t i o n m a t e r i a l s f o r s o l i d - o r g a n t r a n s p l a n t a t i o n v a r y s u b s t a n t i a l l y a c r o s s U . S . c e n t e r s , y e t n o s y s t e m ...
🔹 Publication Date: Published on Mar 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21460
• PDF: https://arxiv.org/pdf/2603.21460
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data
📝 Summary:
A domain-adapted tabular large language model framework demonstrates improved few-shot Alzheimer's disease classification performance over traditional methods while maintaining stability under missing...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17191
• PDF: https://arxiv.org/pdf/2603.17191
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A domain-adapted tabular large language model framework demonstrates improved few-shot Alzheimer's disease classification performance over traditional methods while maintaining stability under missing...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17191
• PDF: https://arxiv.org/pdf/2603.17191
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
📝 Summary:
This paper introduces MPDiT, a multi-patch transformer for diffusion models. It processes larger patches in early layers for global context and smaller patches later for local details, reducing computation by up to fifty percent while maintaining generative performance.
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26357
• PDF: https://arxiv.org/pdf/2603.26357
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This paper introduces MPDiT, a multi-patch transformer for diffusion models. It processes larger patches in early layers for global context and smaller patches later for local details, reducing computation by up to fifty percent while maintaining generative performance.
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26357
• PDF: https://arxiv.org/pdf/2603.26357
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets
📝 Summary:
TokenDial enables precise attribute control in text-to-video models by using additive offsets in spatiotemporal token space for coherent edits without retraining. AI-generated summary We present Token...
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27520
• PDF: https://arxiv.org/pdf/2603.27520
• Project Page: https://tokendial.github.io/
• Github: https://github.com/ariannaliu/TokenDial
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToVideo #GenerativeAI #AIControl #VideoGeneration #DeepLearning
📝 Summary:
TokenDial enables precise attribute control in text-to-video models by using additive offsets in spatiotemporal token space for coherent edits without retraining. AI-generated summary We present Token...
🔹 Publication Date: Published on Mar 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27520
• PDF: https://arxiv.org/pdf/2603.27520
• Project Page: https://tokendial.github.io/
• Github: https://github.com/ariannaliu/TokenDial
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToVideo #GenerativeAI #AIControl #VideoGeneration #DeepLearning
Media is too big
VIEW IN TELEGRAM
✨OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
📝 Summary:
OmniRoam generates long-horizon panoramic videos using a two-stage approach for improved scene completeness and consistency. It first previews a trajectory-controlled video, then refines and extends it to high-resolution, long-range panoramas, enabling high-fidelity world wandering.
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.30045
• PDF: https://arxiv.org/pdf/2603.30045
• Project Page: https://yuheng.ink/project-page/omniroam/
• Github: https://github.com/yuhengliu02/OmniRoam
🔹 Models citing this paper:
• https://huggingface.co/Yuheng02/OmniRoam
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
OmniRoam generates long-horizon panoramic videos using a two-stage approach for improved scene completeness and consistency. It first previews a trajectory-controlled video, then refines and extends it to high-resolution, long-range panoramas, enabling high-fidelity world wandering.
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.30045
• PDF: https://arxiv.org/pdf/2603.30045
• Project Page: https://yuheng.ink/project-page/omniroam/
• Github: https://github.com/yuhengliu02/OmniRoam
🔹 Models citing this paper:
• https://huggingface.co/Yuheng02/OmniRoam
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model
📝 Summary:
CALM is a unified model bridging the gap between multi-turn conversations and tool use in language agents. Trained on a new multi-task dataset CALM-IT, it integrates both capabilities. CALM outperforms specialized models, including GPT-4o, across various benchmarks.
🔹 Publication Date: Published on Feb 12, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.08820
• PDF: https://arxiv.org/pdf/2502.08820
• Project Page: https://emrecanacikgoz.github.io/CoALM/
• Github: https://github.com/oumi-ai/oumi
🔹 Models citing this paper:
• https://huggingface.co/uiuc-convai/CoALM-8B
• https://huggingface.co/uiuc-convai/CoALM-405B
• https://huggingface.co/uiuc-convai/CoALM-70B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/uiuc-convai/CoALM-IT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
CALM is a unified model bridging the gap between multi-turn conversations and tool use in language agents. Trained on a new multi-task dataset CALM-IT, it integrates both capabilities. CALM outperforms specialized models, including GPT-4o, across various benchmarks.
🔹 Publication Date: Published on Feb 12, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.08820
• PDF: https://arxiv.org/pdf/2502.08820
• Project Page: https://emrecanacikgoz.github.io/CoALM/
• Github: https://github.com/oumi-ai/oumi
🔹 Models citing this paper:
• https://huggingface.co/uiuc-convai/CoALM-8B
• https://huggingface.co/uiuc-convai/CoALM-405B
• https://huggingface.co/uiuc-convai/CoALM-70B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/uiuc-convai/CoALM-IT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
Can a Single Model Master Both Multi-turn Conversations and Tool...
Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm....
✨Terminal Agents Suffice for Enterprise Automation
📝 Summary:
Simple terminal-based coding agents interacting directly with platform APIs, powered by foundation models, are highly effective for enterprise automation. These low-level agents match or outperform complex tool-augmented systems, demonstrating that elaborate agent architectures are often unnecess...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00073
• PDF: https://arxiv.org/pdf/2604.00073
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Simple terminal-based coding agents interacting directly with platform APIs, powered by foundation models, are highly effective for enterprise automation. These low-level agents match or outperform complex tool-augmented systems, demonstrating that elaborate agent architectures are often unnecess...
🔹 Publication Date: Published on Mar 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00073
• PDF: https://arxiv.org/pdf/2604.00073
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
📝 Summary:
A dynamic agentic framework called TAB addresses 3D visual grounding by decoupling spatial semantics resolution from 3D structure instantiation through 2D VLMs and multi-view geometry, achieving super...
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00528
• PDF: https://arxiv.org/pdf/2604.00528
• Github: https://github.com/WHB139426/TAB-Agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A dynamic agentic framework called TAB addresses 3D visual grounding by decoupling spatial semantics resolution from 3D structure instantiation through 2D VLMs and multi-view geometry, achieving super...
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00528
• PDF: https://arxiv.org/pdf/2604.00528
• Github: https://github.com/WHB139426/TAB-Agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation
📝 Summary:
GaussianGPT uses a transformer-based autoregressive approach with 3D rotary positional embeddings to generate 3D scenes by predicting Gaussian primitives, offering advantages over diffusion methods in...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26661
• PDF: https://arxiv.org/pdf/2603.26661
• Project Page: https://nicolasvonluetzow.github.io/GaussianGPT/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
GaussianGPT uses a transformer-based autoregressive approach with 3D rotary positional embeddings to generate 3D scenes by predicting Gaussian primitives, offering advantages over diffusion methods in...
🔹 Publication Date: Published on Mar 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26661
• PDF: https://arxiv.org/pdf/2603.26661
• Project Page: https://nicolasvonluetzow.github.io/GaussianGPT/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Universal YOCO for Efficient Depth Scaling
📝 Summary:
Universal YOCO YOCO-U merges YOCO architecture with recursive computation for efficient LLM depth scaling. It uses iterative processing in shallow attention layers, offering constant KV cache and better token utility.
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01220
• PDF: https://arxiv.org/pdf/2604.01220
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Universal YOCO YOCO-U merges YOCO architecture with recursive computation for efficient LLM depth scaling. It uses iterative processing in shallow attention layers, offering constant KV cache and better token utility.
🔹 Publication Date: Published on Apr 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01220
• PDF: https://arxiv.org/pdf/2604.01220
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research