Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.3K photos
125 videos
23 files
3.51K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions

📝 Summary:
This paper introduces FIBO, a text-to-image model trained on long structured captions to enhance prompt alignment and controllability. It proposes DimFusion for efficient processing and the TaBR evaluation protocol, achieving state-of-the-art results.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06876
• PDF: https://arxiv.org/pdf/2511.06876

🔹 Models citing this paper:
https://huggingface.co/briaai/FIBO

Spaces citing this paper:
https://huggingface.co/spaces/galdavidi/FIBO-Mashup
https://huggingface.co/spaces/briaai/FIBO
https://huggingface.co/spaces/briaai/Fibo-local

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToImage #GenerativeAI #DiffusionModels #AI #MachineLearning
KLASS: KL-Guided Fast Inference in Masked Diffusion Models

📝 Summary:
KLASS accelerates masked diffusion model inference by using KL divergence to identify stable, high-confidence predictions. It unmasks multiple tokens per iteration, significantly speeding up generation and improving quality across text, image, and molecular tasks.

🔹 Publication Date: Published on Nov 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05664
• PDF: https://arxiv.org/pdf/2511.05664
• Github: https://github.com/shkim0116/KLASS

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #MachineLearning #AIResearch #ModelAcceleration
1
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

📝 Summary:
Ming-UniAudio introduces a unified speech LLM and tokenizer for joint understanding, generation, and instruction-based free-form editing. It overcomes token representation issues, achieves state-of-the-art results, and establishes a new benchmark for editing.

🔹 Publication Date: Published on Oct 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05516
• PDF: https://arxiv.org/pdf/2511.05516
• Project Page: https://xqacmer.github.io/Ming-Unitok-Audio.github.io/
• Github: https://github.com/inclusionAI/Ming-UniAudio

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechLLM #AI #NLP #GenerativeAI #MachineLearning
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

📝 Summary:
The Generative Semantic Workspace GSW enhances LLMs for long-context reasoning and episodic memory. This neuro-inspired framework builds structured representations of evolving situations, outperforming RAG baselines by 20% and reducing context tokens by 51%. GSW provides human-like episodic memor...

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07587
• PDF: https://arxiv.org/pdf/2511.07587

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #RAG #EpisodicMemory #GenerativeAI #NeuroAI
TiDAR: Think in Diffusion, Talk in Autoregression

📝 Summary:
TiDAR is a hybrid diffusion-autoregressive model achieving high throughput and AR-level quality. It drafts tokens with diffusion and samples autoregressively in a single pass, outperforming existing methods and delivering 4.71x to 5.91x faster generation.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08923
• PDF: https://arxiv.org/pdf/2511.08923

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MachineLearning #DiffusionModels #AutoregressiveModels #GenerativeAI
This media is not supported in your browser
VIEW IN TELEGRAM
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising

📝 Summary:
Time-to-Move TTM is a training-free framework for precise motion and appearance controlled video generation using I2V diffusion models. It employs crude reference animations as motion cues and introduces dual-clock denoising for flexible alignment, outperforming training-based methods.

🔹 Publication Date: Published on Nov 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08633
• PDF: https://arxiv.org/pdf/2511.08633
• Project Page: https://time-to-move.github.io/
• Github: https://github.com/time-to-move/TTM

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #MotionControl #ComputerVision
Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance

📝 Summary:
ASAG is a novel diffusion guidance method that uses optimal transport and the Sinkhorn algorithm to adversarially disrupt attention scores. It weakens misleading attention alignments by injecting an adversarial cost, improving sample quality, controllability, and fidelity without model retraining.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07499
• PDF: https://arxiv.org/pdf/2511.07499

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #AdversarialAI #OptimalTransport #GenerativeAI #DeepLearning
Media is too big
VIEW IN TELEGRAM
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

📝 Summary:
UniVA is an open-source multi-agent framework that unifies video understanding, segmentation, editing, and generation. It uses a Plan-and-Act architecture with hierarchical memory to enable complex, iterative video workflows. This system aims to advance agentic video intelligence.

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08521
• PDF: https://arxiv.org/pdf/2511.08521
• Project Page: https://univa.online/
• Github: https://github.com/univa-agent/univa

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoAI #AIagents #GenerativeAI #ComputerVision #OpenSource
Black-Box On-Policy Distillation of Large Language Models

📝 Summary:
Generative Adversarial Distillation GAD is a new black-box on-policy method for distilling LLMs. GAD trains a student generator and a discriminator for adaptive feedback, surpassing traditional distillation. It enables student LLMs to perform comparably to proprietary teachers.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10643
• PDF: https://arxiv.org/pdf/2511.10643

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #AIDistillation #MachineLearning #GenerativeAI #DeepLearning
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

📝 Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09057
• PDF: https://arxiv.org/pdf/2511.09057

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WorldModels #AI #Simulation #GenerativeAI #Robotics
1
Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models

📝 Summary:
This paper proposes an AI agent framework for adaptive long-form writing. It uses recursive task decomposition and dynamically integrates retrieval, reasoning, and composition, overcoming rigid outline-based methods. The framework consistently outperforms state-of-the-art approaches.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.08275
• PDF: https://arxiv.org/pdf/2503.08275
• Github: https://github.com/principia-ai/WriteHERE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #LanguageModels #LongformWriting #NLP #GenerativeAI
1
Transformer Explainer: Interactive Learning of Text-Generative Models

📝 Summary:
Transformer Explainer is an interactive web tool for non-experts to understand the GPT-2 model. It allows real-time experimentation with user input, visualizing how internal components predict text. This broadens access to education about modern generative AI.

🔹 Publication Date: Published on Aug 8, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.04619
• PDF: https://arxiv.org/pdf/2408.04619
• Project Page: https://poloclub.github.io/transformer-explainer/
• Github: https://github.com/helblazer811/ManimML

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #GenerativeAI #Transformers #AIeducation #ExplainableAI
❤‍🔥1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

📝 Summary:
WEAVE introduces a suite with a large dataset and benchmark to assess multi-turn context-dependent image generation and editing in multimodal models. It enables new capabilities like visual memory in models while exposing current limitations in these complex tasks.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11434
• PDF: https://arxiv.org/pdf/2511.11434
• Project Page: https://weichow23.github.io/weave/
• Github: https://github.com/weichow23/weave

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #ImageGeneration #GenerativeAI #ComputerVision #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

📝 Summary:
GGBench is a new benchmark for evaluating geometric generative reasoning in unified multimodal models. It addresses a critical gap by assessing integrated cognitive processes, requiring language comprehension and precise visual generation to actively construct solutions. This sets a rigorous stan...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11134
• PDF: https://arxiv.org/pdf/2511.11134

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GGBench #MultimodalAI #GeometricReasoning #GenerativeAI #AIResearch
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

📝 Summary:
Part-X-MLLM is a 3D multimodal large language model that unifies diverse 3D tasks by generating structured programs from RGB point clouds and language prompts. It outputs part-level data and edit commands, enabling state-of-the-art 3D generation and editing through one interface.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13647
• PDF: https://arxiv.org/pdf/2511.13647
• Project Page: https://chunshi.wang/Part-X-MLLM/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3D #MLLM #GenerativeAI #ComputerVision #AIResearch
Back to Basics: Let Denoising Generative Models Denoise

📝 Summary:
Denoising diffusion models should predict clean images directly, not noise, leveraging the data manifold assumption. The paper introduces JiT, a model using simple, large-patch Transformers that achieves competitive generative results on ImageNet.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13720
• PDF: https://arxiv.org/pdf/2511.13720
• Github: https://github.com/LTH14/JiT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #DeepLearning #ComputerVision #AIResearch
1
A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

📝 Summary:
This paper proposes a decentralized RAG system using a blockchain-based mechanism to score data source reliability. It dynamically evaluates sources, boosting performance by 10.7% compared to centralized systems and achieving 56% cost savings in unreliable environments.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07577
• PDF: https://arxiv.org/pdf/2511.07577
• Github: https://github.com/yining610/Reliable-dRAG

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RAG #Blockchain #DecentralizedAI #GenerativeAI #AIResearch
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

📝 Summary:
Current video model benchmarks miss assessing Chain-of-Frames CoF reasoning, crucial for world simulators. Gen-ViRe is a new benchmark that decomposes CoF reasoning into cognitive subtasks, offering the first quantitative assessment. It reveals poor reasoning depth despite impressive visual quali...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13853
• PDF: https://arxiv.org/pdf/2511.13853

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #WorldSimulators #VisualReasoning #GenerativeAI #Benchmarks
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

📝 Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.

🔹 Publication Date: Published on Oct 15

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
• PDF: https://arxiv.org/pdf/2510.13344
• Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
• Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio

🔹 Models citing this paper:
https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning