Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.3K photos
125 videos
23 files
3.51K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

📝 Summary:
OmniVinci is an open-source omni-modal LLM that improves cross-modal understanding for audio, vision, and robotics. It features innovative architecture for better embedding alignment and temporal capture, along with efficient data curation. OmniVinci outperforms competitors while using significan...

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/omnivinci-enhancing-architecture-and-data-for-omni-modal-understanding-llm
• PDF: https://arxiv.org/pdf/2510.15870
• Project Page: https://nvlabs.github.io/OmniVinci/
• Github: https://github.com/NVlabs/OmniVinci

🔹 Models citing this paper:
https://huggingface.co/nvidia/omnivinci

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #MultimodalAI #Robotics #DeepLearning #OpenSource
🤖🧠 Pixeltable: The Future of Declarative Data Infrastructure for Multimodal AI Workloads

🗓️ 08 Nov 2025
📚 AI News & Trends

In the rapidly evolving AI landscape, building intelligent applications is no longer just about having powerful models. The real challenge lies in handling complex data pipelines, integrating multiple systems and scaling multimodal workloads efficiently. Traditional AI app development stacks involve databases, vector stores, ETL pipelines, model serving layers, orchestration tools, caching systems and lineage tracking ...

#Pixeltable #DeclarativeDataInfrastructure #MultimodalAI #AIDevelopment #DataPipelines #AIWorkloads
DeepEyesV2: Toward Agentic Multimodal Model

📝 Summary:
DeepEyesV2 is an agentic multimodal model that uses a two-stage training pipeline for robust tool integration. This method, combining a cold-start stage and reinforcement learning, effectively enables task-adaptive tool invocation for real-world reasoning tasks.

🔹 Publication Date: Published on Nov 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05271
• PDF: https://arxiv.org/pdf/2511.05271
• Project Page: https://visual-agent.github.io/
• Github: https://github.com/Visual-Agent/DeepEyes

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #AgenticAI #ReinforcementLearning #DeepLearning #AIResearch
🤖🧠 Pico-Banana-400K: The Breakthrough Dataset Advancing Text-Guided Image Editing

🗓️ 09 Nov 2025
📚 AI News & Trends

Text-guided image editing has rapidly evolved with powerful multimodal models capable of transforming images using simple natural-language instructions. These models can change object colors, modify lighting, add accessories, adjust backgrounds or even convert real photographs into artistic styles. However, the progress of research has been limited by one crucial bottleneck: the lack of large-scale, high-quality, ...

#TextGuidedEditing #MultimodalAI #ImageEditing #AIResearch #ComputerVision #DeepLearning
MPJudge: Towards Perceptual Assessment of Music-Induced Paintings

📝 Summary:
MPJudge is a new framework for assessing music-induced paintings. It integrates music features into a visual encoder using a modulation-based fusion mechanism, outperforming existing emotion models by directly modeling perceptual coherence. It also identifies music-relevant regions better.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07137
• PDF: https://arxiv.org/pdf/2511.07137

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MusicAndArt #ComputerVision #MachineLearning #DeepLearning #MultimodalAI
1
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

📝 Summary:
Omni-AVSR is a unified audio-visual LLM that efficiently supports ASR, VSR, and AVSR. It uses multi-granularity training and parameter-efficient adaptation to achieve high accuracy while significantly reducing resource use compared to separate models.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07253
• PDF: https://arxiv.org/pdf/2511.07253
• Project Page: https://umbertocappellazzo.github.io/Omni-AVSR
• Github: https://github.com/umbertocappellazzo/Omni-AVSR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechRecognition #LLM #MultimodalAI #DeepLearning #AIResearch
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

📝 Summary:
Ovi is a unified audio-video generation model using twin-DiT modules with blockwise cross-modal fusion. This innovative design ensures natural synchronization and high-quality multimodal outputs, simplifying previous multi-stage approaches.

🔹 Publication Date: Published on Sep 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.01284
• PDF: https://arxiv.org/pdf/2510.01284
• Project Page: https://aaxwaz.github.io/Ovi
• Github: https://github.com/character-ai/Ovi

🔹 Models citing this paper:
https://huggingface.co/chetwinlow1/Ovi
https://huggingface.co/rkfg/Ovi-fp8_quantized

Spaces citing this paper:
https://huggingface.co/spaces/akhaliq/Ovi
https://huggingface.co/spaces/deddytoyota/Ovi
https://huggingface.co/spaces/alexnasa/Ovi-ZEROGPU

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AudioVideoGeneration #MultimodalAI #DeepLearning #CrossModalFusion #AIResearch
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

📝 Summary:
Researchers developed a new framework to generate over 1M high-quality synthetic vision-centric reasoning questions with complex traces. Finetuning models on this data significantly improves vision-centric performance and surprisingly boosts text and audio reasoning, demonstrating strong cross-mo...

🔹 Publication Date: Published on Nov 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05705
• PDF: https://arxiv.org/pdf/2511.05705

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisualReasoning #AI #MachineLearning #MultimodalAI #ComputerVision
Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora

📝 Summary:
Wasm is a pipeline creating a new structured Arabic multimodal dataset from Common Crawl. It preserves document structure and supports both text-only and multimodal pre-training, addressing the lack of high-quality Arabic datasets.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07080
• PDF: https://arxiv.org/pdf/2511.07080

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ArabicNLP #MultimodalAI #DatasetCreation #Corpora #DataScience
1
MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

📝 Summary:
MM-CRITIC is a new benchmark evaluating Large Multimodal Models critique abilities across various dimensions and tasks. It uses expert-informed ground answers and GPT-4o for reliable scoring. This benchmark provides a comprehensive assessment of leading LMMs' critique capabilities.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09067
• PDF: https://arxiv.org/pdf/2511.09067

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LMMs #MultimodalAI #AIEvaluation #Benchmarking #AIResearch
EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation

📝 Summary:
EmoVid is a new multimodal, emotion-annotated video dataset designed for creative media like cartoons and movies. It bridges emotion understanding with video generation, significantly improving emotional expression and quality in generated videos. EmoVid establishes a new benchmark for affective ...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11002
• PDF: https://arxiv.org/pdf/2511.11002

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EmoVid #MultimodalAI #EmotionAI #VideoGeneration #VideoUnderstanding
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

📝 Summary:
WEAVE introduces a suite with a large dataset and benchmark to assess multi-turn context-dependent image generation and editing in multimodal models. It enables new capabilities like visual memory in models while exposing current limitations in these complex tasks.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11434
• PDF: https://arxiv.org/pdf/2511.11434
• Project Page: https://weichow23.github.io/weave/
• Github: https://github.com/weichow23/weave

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #ImageGeneration #GenerativeAI #ComputerVision #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

📝 Summary:
A parallel multimodal diffusion framework, MMaDA-Parallel, enhances cross-modal alignment and semantic consistency in thinking-aware image synthesis by addressing error propagation issues in sequentia...

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09611
• PDF: https://arxiv.org/pdf/2511.09611
• Project Page: https://tyfeld.github.io/mmadaparellel.github.io/
• Github: https://github.com/tyfeld/MMaDA-Parallel

🔹 Models citing this paper:
https://huggingface.co/tyfeld/MMaDA-Parallel-A
https://huggingface.co/tyfeld/MMaDA-Parallel-M

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #DiffusionModels #ImageSynthesis #LLM #AIResearch
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization

📝 Summary:
SafeGRPO introduces a self-rewarded, rule-governed framework for multimodal safety alignment in MLLMs. It integrates verifiable reward construction and step-guided safety thinking to improve robustness against compositional risks and enhance reasoning stability.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12982
• PDF: https://arxiv.org/pdf/2511.12982

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MLLMs #AISafety #MultimodalAI #ReinforcementLearning #AIResearch
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

📝 Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14210
• PDF: https://arxiv.org/pdf/2511.14210

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

📝 Summary:
Text-only self-reflection is insufficient for long-form video understanding. REVISOR is a new framework enabling MLLMs to perform multimodal introspective reflection across text and visual modalities. This significantly enhances reasoning for long videos without extra fine-tuning, achieving stron...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13026
• PDF: https://arxiv.org/pdf/2511.13026

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #VideoUnderstanding #MLLMs #AIResearch #ComputerVision
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

📝 Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.

🔹 Publication Date: Published on May 18, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2405.11273
• PDF: https://arxiv.org/pdf/2405.11273
• Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

📝 Summary:
This paper improves Extreme Multi-label Classification XMC by using larger decoder-only models and introduces ViXML, a vision-enhanced framework. ViXML efficiently integrates visual information, significantly outperforming text-only models and achieving new state-of-the-art.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13189
• PDF: https://arxiv.org/pdf/2511.13189

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #XMC #MultiModalAI #MachineLearning #AIResearch