Data Science | Machine Learning with Python for Researchers

✨Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning

154 views03:01

✨ Explore Data Science 📝 Write your paper

✨A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle

✨ Spaces citing this paper:
• https://huggingface.co/spaces/Kwai-Kolors/CoTyle

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning

121 views04:01

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

📝 Summary:
MVI-Bench introduces a new benchmark to evaluate Large Vision-Language Models robustness against misleading visual inputs. It utilizes a hierarchical taxonomy and a novel metric to uncover significant vulnerabilities in state-of-the-art LVLMs.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14159
• PDF: https://arxiv.org/pdf/2511.14159
• Github: https://github.com/chenyil6/MVI-Bench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LVLMs #ComputerVision #AIrobustness #MachineLearning #AI

131 views04:01

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

📝 Summary:
Text-only self-reflection is insufficient for long-form video understanding. REVISOR is a new framework enabling MLLMs to perform multimodal introspective reflection across text and visual modalities. This significantly enhances reasoning for long videos without extra fine-tuning, achieving stron...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13026
• PDF: https://arxiv.org/pdf/2511.13026

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #VideoUnderstanding #MLLMs #AIResearch #ComputerVision

125 views04:01

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Φeat: Physically-Grounded Feature Representation

📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI

195 views08:21

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨VIDEOP2R: Video Understanding from Perception to Reasoning

📝 Summary:
VideoP2R is a novel reinforcement fine-tuning framework for video understanding. It separately models perception and reasoning processes, using a new CoT dataset and a process-aware RL algorithm. This approach achieves state-of-the-art results on video reasoning benchmarks.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11113v1
• PDF: https://arxiv.org/pdf/2511.11113

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoUnderstanding #ReinforcementLearning #AIResearch #ComputerVision #Reasoning

262 views22:24

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning

❤1

206 views03:00

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

📝 Summary:
ARC-Chapter is a large-scale video chaptering model trained on millions of long video chapters, using a new bilingual and hierarchical dataset. It introduces a novel evaluation metric, GRACE, to better reflect real-world chaptering. The model achieves state-of-the-art performance and demonstrates...

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14349
• PDF: https://arxiv.org/pdf/2511.14349
• Project Page: https://arcchapter.github.io/index_en.html
• Github: https://github.com/TencentARC/ARC-Chapter

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoChaptering #AI #MachineLearning #VideoSummarization #ComputerVision

213 views06:02

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Medal S: Spatio-Textual Prompt Model for Medical Segmentation

📝 Summary:
Medal S is a medical segmentation foundation model using spatio-textual prompts for efficient, high-accuracy multi-class segmentation across diverse modalities. It uniquely aligns volumetric prompts with text embeddings and processes masks in parallel, significantly outperforming prior methods.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13001
• PDF: https://arxiv.org/pdf/2511.13001
• Github: https://github.com/yinghemedical/Medal-S

🔹 Models citing this paper:
• https://huggingface.co/spc819/Medal-S-V1.0

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MedicalSegmentation #FoundationModels #AI #DeepLearning #ComputerVision

237 views09:02

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨OmniParser for Pure Vision Based GUI Agent

📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.

🔹 Publication Date: Published on Aug 1, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser

🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser

✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k

✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning

arXiv.org

OmniParser for Pure Vision Based GUI Agent

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...

313 views09:03

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

📝 Summary:
VANS is a new model for Video-Next-Event Prediction VNEP that generates dynamic, visually and semantically accurate video responses. It uses reinforcement learning to align a Vision-Language Model with a Video Diffusion Model, achieving state-of-the-art performance.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16669
• PDF: https://arxiv.org/pdf/2511.16669
• Project Page: https://video-as-answer.github.io/
• Github: https://github.com/KlingTeam/VANS

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoAI #GenerativeAI #MachineLearning #ComputerVision #DeepLearning

116 views04:04

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Scaling Spatial Intelligence with Multimodal Foundation Models

📝 Summary:
SenseNova-SI is a new scaled multimodal foundation model that achieves superior spatial intelligence. By using 8 million diverse data samples, it sets unprecedented performance on various spatial benchmarks. The models are publicly released to foster further research.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13719
• PDF: https://arxiv.org/pdf/2511.13719
• Project Page: https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-8B
• Github: https://github.com/OpenSenseNova/SenseNova-SI

🔹 Models citing this paper:
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-8B
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-2B
• https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-2B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #FoundationModels #SpatialIntelligence #ComputerVision #AI

arXiv.org

Scaling Spatial Intelligence with Multimodal Foundation Models

Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to...

99 views04:05

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨First Frame Is the Place to Go for Video Content Customization

📝 Summary:
The first frame in video generation models functions as a conceptual memory buffer, storing visual elements for later reuse. This enables robust video content customization with minimal training examples, without major model changes.

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15700
• PDF: https://arxiv.org/pdf/2511.15700
• Project Page: https://firstframego.github.io
• Github: https://firstframego.github.io

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #ComputerVision #DeepLearning #AICustomization

116 views04:05

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨SAM 3D: 3Dfy Anything in Images

📝 Summary:
SAM 3D reconstructs 3D objects from single images, predicting geometry, texture, and layout. It uses a multi-stage training framework with synthetic pretraining and real-world alignment, breaking the 3D data barrier and achieving high human preference.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16624
• PDF: https://arxiv.org/pdf/2511.16624
• Project Page: https://ai.meta.com/sam3d/
• Github: https://github.com/facebookresearch/sam-3d-objects

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #AI #DeepLearning #SingleImage3D

99 views04:06

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

📝 Summary:
Thinking-while-Generating TwiG interleaves textual reasoning throughout the visual generation process. This on-the-fly multimodal interaction guides and reflects on visual content as it is created, resulting in more context-aware and semantically rich outputs.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16671
• PDF: https://arxiv.org/pdf/2511.16671
• Project Page: https://think-while-gen.github.io/
• Github: https://github.com/ZiyuGuo99/Thinking-while-Generating

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GenerativeAI #MultimodalAI #ComputerVision #NLP #AIResearch

98 views04:06

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

0:00

Media is too big

VIEW IN TELEGRAM

✨SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

📝 Summary:
SAM2S is a foundation model enhancing interactive video object segmentation in surgery. It leverages a new large benchmark, robust memory, and temporal learning to achieve superior accuracy 80.42 J and F and real-time performance in surgical video analysis.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16618
• PDF: https://arxiv.org/pdf/2511.16618
• Project Page: https://jinlab-imvr.github.io/SAM2S
• Github: https://github.com/jinlab-imvr/SAM2S

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SurgicalAI #MedicalImaging #ComputerVision #FoundationModels #DeepLearning

❤1

128 views04:07

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨NaTex: Seamless Texture Generation as Latent Color Diffusion

📝 Summary:
NaTex directly generates 3D textures using latent color diffusion and geometry-aware models. It predicts texture color in 3D space, outperforming prior methods in coherence and alignment by avoiding 2D multi-view limitations.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16317
• PDF: https://arxiv.org/pdf/2511.16317
• Project Page: https://natex-ldm.github.io/
• Github: https://natex-ldm.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TextureGeneration #DiffusionModels #3DGraphics #ComputerVision #DeepLearning

135 views04:07

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Draft and Refine with Visual Experts

📝 Summary:
The Draft and Refine DnR framework improves visual grounding in LVLMs. It uses a novel question-conditioned utilization metric to measure visual evidence reliance. DnR refines responses with external visual experts, reducing hallucinations and boosting accuracy.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11005
• PDF: https://arxiv.org/pdf/2511.11005
• Github: https://github.com/EavnJeong/Draft-and-Refine-with-Visual-Experts

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LVLMs #VisualGrounding #AIHallucinations #ComputerVision #DeepLearning

298 views07:08

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨BioBench: A Blueprint to Move Beyond ImageNet for Scientific ML Benchmarks

📝 Summary:
ImageNet accuracy poorly predicts performance on scientific imagery. BioBench is a new ecology vision benchmark unifying diverse tasks, kingdoms, and modalities with 3.1M images, offering a better evaluation for scientific ML.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16315
• PDF: https://arxiv.org/pdf/2511.16315
• Project Page: https://samuelstevens.me/biobench
• Github: https://github.com/samuelstevens/biobench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#BioBench #MachineLearning #ComputerVision #ScientificML #Ecology

❤1

206 views15:09

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Boosting Medical Visual Understanding From Multi-Granular Language Learning

📝 Summary:
MGLL enhances visual understanding by improving multi-label and cross-granularity alignment in image-text pretraining, outperforming existing methods in complex domains like medical imaging.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15943
• PDF: https://arxiv.org/pdf/2511.15943
• Project Page: https://github.com/HUANGLIZI/MGLL
• Github: https://github.com/HUANGLIZI/MGLL

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MedicalAI #ComputerVision #DeepLearning #NLP #ImageTextPretraining

❤2

294 views17:10

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform