β¨VideoSSR: Video Self-Supervised Reinforcement Learning
π Summary:
VideoSSR is a novel self-supervised reinforcement learning framework that leverages intrinsic video information to generate high-quality training data. It uses three pretext tasks and the VideoSSR-30K dataset, improving MLLM performance across 17 benchmarks by over 5%.
πΉ Publication Date: Published on Nov 9
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.06281
β’ PDF: https://arxiv.org/pdf/2511.06281
β’ Project Page: https://github.com/lcqysl/VideoSSR
β’ Github: https://github.com/lcqysl/VideoSSR
πΉ Models citing this paper:
β’ https://huggingface.co/yhx12/VideoSSR
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ReinforcementLearning #SelfSupervisedLearning #VideoAI #MachineLearning #DeepLearning
π Summary:
VideoSSR is a novel self-supervised reinforcement learning framework that leverages intrinsic video information to generate high-quality training data. It uses three pretext tasks and the VideoSSR-30K dataset, improving MLLM performance across 17 benchmarks by over 5%.
πΉ Publication Date: Published on Nov 9
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.06281
β’ PDF: https://arxiv.org/pdf/2511.06281
β’ Project Page: https://github.com/lcqysl/VideoSSR
β’ Github: https://github.com/lcqysl/VideoSSR
πΉ Models citing this paper:
β’ https://huggingface.co/yhx12/VideoSSR
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ReinforcementLearning #SelfSupervisedLearning #VideoAI #MachineLearning #DeepLearning
β¨The Path Not Taken: RLVR Provably Learns Off the Principals
π Summary:
RLVR learns by modifying parameters off principal directions in low-curvature subspaces, appearing sparse due to optimization bias. This distinct optimization regime contrasts with SFT, meaning SFT-era fine-tuning methods are flawed for RLVR.
πΉ Publication Date: Published on Nov 11
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.08567
β’ PDF: https://arxiv.org/pdf/2511.08567
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#RLVR #MachineLearning #Optimization #DeepLearning #AIResearch
π Summary:
RLVR learns by modifying parameters off principal directions in low-curvature subspaces, appearing sparse due to optimization bias. This distinct optimization regime contrasts with SFT, meaning SFT-era fine-tuning methods are flawed for RLVR.
πΉ Publication Date: Published on Nov 11
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.08567
β’ PDF: https://arxiv.org/pdf/2511.08567
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#RLVR #MachineLearning #Optimization #DeepLearning #AIResearch
π₯1
β¨TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
π Summary:
TimeSearch-R improves long-form video understanding by optimizing temporal search with reinforcement learning. It uses GRPO-CSV to verify searched frame completeness, leading to improved reasoning. This achieves state-of-the-art performance on multiple video benchmarks.
πΉ Publication Date: Published on Nov 7
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.05489
β’ PDF: https://arxiv.org/pdf/2511.05489
β’ Github: https://github.com/Time-Search/TimeSearch-R
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VideoUnderstanding #ReinforcementLearning #DeepLearning #AIResearch #ComputerVision
π Summary:
TimeSearch-R improves long-form video understanding by optimizing temporal search with reinforcement learning. It uses GRPO-CSV to verify searched frame completeness, leading to improved reasoning. This achieves state-of-the-art performance on multiple video benchmarks.
πΉ Publication Date: Published on Nov 7
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.05489
β’ PDF: https://arxiv.org/pdf/2511.05489
β’ Github: https://github.com/Time-Search/TimeSearch-R
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VideoUnderstanding #ReinforcementLearning #DeepLearning #AIResearch #ComputerVision
β¨Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance
π Summary:
ASAG is a novel diffusion guidance method that uses optimal transport and the Sinkhorn algorithm to adversarially disrupt attention scores. It weakens misleading attention alignments by injecting an adversarial cost, improving sample quality, controllability, and fidelity without model retraining.
πΉ Publication Date: Published on Nov 10
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.07499
β’ PDF: https://arxiv.org/pdf/2511.07499
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#DiffusionModels #AdversarialAI #OptimalTransport #GenerativeAI #DeepLearning
π Summary:
ASAG is a novel diffusion guidance method that uses optimal transport and the Sinkhorn algorithm to adversarially disrupt attention scores. It weakens misleading attention alignments by injecting an adversarial cost, improving sample quality, controllability, and fidelity without model retraining.
πΉ Publication Date: Published on Nov 10
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.07499
β’ PDF: https://arxiv.org/pdf/2511.07499
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#DiffusionModels #AdversarialAI #OptimalTransport #GenerativeAI #DeepLearning
β¨Efficient Guided Generation for Large Language Models
π Summary:
This paper introduces an efficient method to guide large language model text generation. It uses regular expressions and context-free grammars with minimal added overhead, making guided generation practical.
πΉ Publication Date: Published on Jul 19, 2023
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2307.09702
β’ PDF: https://arxiv.org/pdf/2307.09702
β’ Github: https://github.com/normal-computing/outlines
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMs #TextGeneration #NLP #AI #DeepLearning
π Summary:
This paper introduces an efficient method to guide large language model text generation. It uses regular expressions and context-free grammars with minimal added overhead, making guided generation practical.
πΉ Publication Date: Published on Jul 19, 2023
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2307.09702
β’ PDF: https://arxiv.org/pdf/2307.09702
β’ Github: https://github.com/normal-computing/outlines
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMs #TextGeneration #NLP #AI #DeepLearning
β¨Motif 2 12.7B technical report
π Summary:
Motif-2-12.7B is an efficient LLM combining Grouped Differential Attention and system-level optimizations. It achieves competitive performance across diverse benchmarks with a smaller model size.
πΉ Publication Date: Published on Nov 7
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.07464
β’ PDF: https://arxiv.org/pdf/2511.07464
πΉ Models citing this paper:
β’ https://huggingface.co/Motif-Technologies/optimizer
β’ https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Instruct
β’ https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Base
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLM #AI #DeepLearning #EfficientAI #AttentionMechanisms
π Summary:
Motif-2-12.7B is an efficient LLM combining Grouped Differential Attention and system-level optimizations. It achieves competitive performance across diverse benchmarks with a smaller model size.
πΉ Publication Date: Published on Nov 7
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.07464
β’ PDF: https://arxiv.org/pdf/2511.07464
πΉ Models citing this paper:
β’ https://huggingface.co/Motif-Technologies/optimizer
β’ https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Instruct
β’ https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Base
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLM #AI #DeepLearning #EfficientAI #AttentionMechanisms
β¨Black-Box On-Policy Distillation of Large Language Models
π Summary:
Generative Adversarial Distillation GAD is a new black-box on-policy method for distilling LLMs. GAD trains a student generator and a discriminator for adaptive feedback, surpassing traditional distillation. It enables student LLMs to perform comparably to proprietary teachers.
πΉ Publication Date: Published on Nov 13
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.10643
β’ PDF: https://arxiv.org/pdf/2511.10643
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMs #AIDistillation #MachineLearning #GenerativeAI #DeepLearning
π Summary:
Generative Adversarial Distillation GAD is a new black-box on-policy method for distilling LLMs. GAD trains a student generator and a discriminator for adaptive feedback, surpassing traditional distillation. It enables student LLMs to perform comparably to proprietary teachers.
πΉ Publication Date: Published on Nov 13
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.10643
β’ PDF: https://arxiv.org/pdf/2511.10643
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMs #AIDistillation #MachineLearning #GenerativeAI #DeepLearning
β¨Virtual Width Networks
π Summary:
Virtual Width Networks VWN enhance model efficiency by expanding representational width without increasing computational cost. VWN accelerates optimization and improves loss reduction, showing a log-linear scaling relation between virtual width and loss.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11238
β’ PDF: https://arxiv.org/pdf/2511.11238
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#NeuralNetworks #DeepLearning #ModelEfficiency #MachineLearning #AI
π Summary:
Virtual Width Networks VWN enhance model efficiency by expanding representational width without increasing computational cost. VWN accelerates optimization and improves loss reduction, showing a log-linear scaling relation between virtual width and loss.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11238
β’ PDF: https://arxiv.org/pdf/2511.11238
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#NeuralNetworks #DeepLearning #ModelEfficiency #MachineLearning #AI
β¨DoPE: Denoising Rotary Position Embedding
π Summary:
DoPE improves Transformer length generalization by detecting and mitigating noisy frequency bands in positional embeddings. This training-free method enhances retrieval accuracy and reasoning stability across extended contexts up to 64K tokens.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09146
β’ PDF: https://arxiv.org/pdf/2511.09146
β’ Project Page: https://The-physical-picture-of-LLMs.github.io
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#Transformers #PositionalEmbedding #LLMs #DeepLearning #AIResearch
π Summary:
DoPE improves Transformer length generalization by detecting and mitigating noisy frequency bands in positional embeddings. This training-free method enhances retrieval accuracy and reasoning stability across extended contexts up to 64K tokens.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09146
β’ PDF: https://arxiv.org/pdf/2511.09146
β’ Project Page: https://The-physical-picture-of-LLMs.github.io
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#Transformers #PositionalEmbedding #LLMs #DeepLearning #AIResearch
β¨Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
π Summary:
VLMs degrade under test-time domain shifts. Spectrum-Aware Test-Time Steering STS is a lightweight method that adapts VLM latent representations by steering them using textual embedding subspaces, without backpropagation. STS surpasses state-of-the-art, offering faster inference and less memory.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09809
β’ PDF: https://arxiv.org/pdf/2511.09809
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VisionLanguageModels #ZeroShotGeneralization #DomainAdaptation #DeepLearning #AI
π Summary:
VLMs degrade under test-time domain shifts. Spectrum-Aware Test-Time Steering STS is a lightweight method that adapts VLM latent representations by steering them using textual embedding subspaces, without backpropagation. STS surpasses state-of-the-art, offering faster inference and less memory.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09809
β’ PDF: https://arxiv.org/pdf/2511.09809
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VisionLanguageModels #ZeroShotGeneralization #DomainAdaptation #DeepLearning #AI
β¨TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
π Summary:
TiViBench is a new benchmark assessing image-to-video models reasoning across four dimensions and 24 tasks. Commercial models show stronger reasoning potential. VideoTPO, a test-time strategy, significantly enhances performance, advancing reasoning in video generation.
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13704
β’ PDF: https://arxiv.org/pdf/2511.13704
β’ Project Page: https://haroldchen19.github.io/TiViBench-Page/
β’ Github: https://haroldchen19.github.io/TiViBench-Page/
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VideoGeneration #AIBenchmark #ComputerVision #DeepLearning #AIResearch
π Summary:
TiViBench is a new benchmark assessing image-to-video models reasoning across four dimensions and 24 tasks. Commercial models show stronger reasoning potential. VideoTPO, a test-time strategy, significantly enhances performance, advancing reasoning in video generation.
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13704
β’ PDF: https://arxiv.org/pdf/2511.13704
β’ Project Page: https://haroldchen19.github.io/TiViBench-Page/
β’ Github: https://haroldchen19.github.io/TiViBench-Page/
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#VideoGeneration #AIBenchmark #ComputerVision #DeepLearning #AIResearch
β¨Back to Basics: Let Denoising Generative Models Denoise
π Summary:
Denoising diffusion models should predict clean images directly, not noise, leveraging the data manifold assumption. The paper introduces JiT, a model using simple, large-patch Transformers that achieves competitive generative results on ImageNet.
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13720
β’ PDF: https://arxiv.org/pdf/2511.13720
β’ Github: https://github.com/LTH14/JiT
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#DiffusionModels #GenerativeAI #DeepLearning #ComputerVision #AIResearch
π Summary:
Denoising diffusion models should predict clean images directly, not noise, leveraging the data manifold assumption. The paper introduces JiT, a model using simple, large-patch Transformers that achieves competitive generative results on ImageNet.
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13720
β’ PDF: https://arxiv.org/pdf/2511.13720
β’ Github: https://github.com/LTH14/JiT
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#DiffusionModels #GenerativeAI #DeepLearning #ComputerVision #AIResearch
β€1
β¨UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity
π Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13714
β’ PDF: https://arxiv.org/pdf/2511.13714
β’ Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
β’ Github: https://github.com/yujunwei04/UnSAMv2
β¨ Spaces citing this paper:
β’ https://huggingface.co/spaces/yujunwei04/UnSAMv2
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
π Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13714
β’ PDF: https://arxiv.org/pdf/2511.13714
β’ Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
β’ Github: https://github.com/yujunwei04/UnSAMv2
β¨ Spaces citing this paper:
β’ https://huggingface.co/spaces/yujunwei04/UnSAMv2
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
β¨Error-Driven Scene Editing for 3D Grounding in Large Language Models
π Summary:
DEER-3D improves 3D LLM grounding by iteratively editing and retraining models. It diagnoses predicate-level errors, then generates targeted 3D scene edits as counterfactuals to enhance spatial understanding and accuracy.
πΉ Publication Date: Published on Nov 18
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.14086
β’ PDF: https://arxiv.org/pdf/2511.14086
β’ Github: https://github.com/zhangyuejoslin/Deer-3D
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMs #3DGrounding #ComputerVision #DeepLearning #AIResearch
π Summary:
DEER-3D improves 3D LLM grounding by iteratively editing and retraining models. It diagnoses predicate-level errors, then generates targeted 3D scene edits as counterfactuals to enhance spatial understanding and accuracy.
πΉ Publication Date: Published on Nov 18
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.14086
β’ PDF: https://arxiv.org/pdf/2511.14086
β’ Github: https://github.com/zhangyuejoslin/Deer-3D
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMs #3DGrounding #ComputerVision #DeepLearning #AIResearch
β¨Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
π Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.
πΉ Publication Date: Published on Nov 18
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.14210
β’ PDF: https://arxiv.org/pdf/2511.14210
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
π Summary:
Orion is a visual agent framework that orchestrates specialized computer vision tools to execute complex visual workflows. It achieves competitive performance on benchmarks and enables autonomous, tool-driven visual reasoning.
πΉ Publication Date: Published on Nov 18
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.14210
β’ PDF: https://arxiv.org/pdf/2511.14210
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ComputerVision #AIagents #VisualReasoning #MultimodalAI #DeepLearning
β¨A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
π Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.
πΉ Publication Date: Published on Nov 13
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.10555
β’ PDF: https://arxiv.org/pdf/2511.10555
β’ Project Page: https://Kwai-Kolors.github.io/CoTyle/
β’ Github: https://github.com/Kwai-Kolors/CoTyle
β¨ Spaces citing this paper:
β’ https://huggingface.co/spaces/Kwai-Kolors/CoTyle
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
π Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.
πΉ Publication Date: Published on Nov 13
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.10555
β’ PDF: https://arxiv.org/pdf/2511.10555
β’ Project Page: https://Kwai-Kolors.github.io/CoTyle/
β’ Github: https://github.com/Kwai-Kolors/CoTyle
β¨ Spaces citing this paper:
β’ https://huggingface.co/spaces/Kwai-Kolors/CoTyle
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
β¨Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
π Summary:
This paper clarifies RL for LLM Agents by extending the MDP framework. It introduces Agent-R1, a modular and flexible training framework, demonstrating its effectiveness on Multihop QA tasks.
πΉ Publication Date: Published on Nov 18
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.14460
β’ PDF: https://arxiv.org/pdf/2511.14460
β’ Github: https://github.com/0russwest0/Agent-R1
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMAgents #ReinforcementLearning #AI #DeepLearning #NLP
π Summary:
This paper clarifies RL for LLM Agents by extending the MDP framework. It introduces Agent-R1, a modular and flexible training framework, demonstrating its effectiveness on Multihop QA tasks.
πΉ Publication Date: Published on Nov 18
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.14460
β’ PDF: https://arxiv.org/pdf/2511.14460
β’ Github: https://github.com/0russwest0/Agent-R1
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLMAgents #ReinforcementLearning #AI #DeepLearning #NLP
β¨UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
π Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.
πΉ Publication Date: Published on Oct 15
πΉ Paper Links:
β’ arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
β’ PDF: https://arxiv.org/pdf/2510.13344
β’ Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
β’ Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio
πΉ Models citing this paper:
β’ https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
π Summary:
UniMoE-Audio unifies speech and music generation using a novel Dynamic-Capacity Mixture-of-Experts framework. It addresses data imbalance and task conflicts through a hybrid expert design and a three-stage training, achieving state-of-the-art performance and synergistic cross-domain learning.
πΉ Publication Date: Published on Oct 15
πΉ Paper Links:
β’ arXiv Page: https://arxivexplained.com/papers/unimoe-audio-unified-speech-and-music-generation-with-dynamic-capacity-moe
β’ PDF: https://arxiv.org/pdf/2510.13344
β’ Project Page: https://mukioxun.github.io/Uni-MoE-site/home.html
β’ Github: https://github.com/HITsz-TMG/Uni-MoE/blob/master/UniMoE-Audio
πΉ Models citing this paper:
β’ https://huggingface.co/HIT-TMG/UniMoE-Audio-Preview
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#SpeechGeneration #MusicGeneration #MixtureOfExperts #GenerativeAI #DeepLearning
β¨Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
π Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.
πΉ Publication Date: Published on May 18, 2024
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2405.11273
β’ PDF: https://arxiv.org/pdf/2405.11273
β’ Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
π Summary:
Uni-MoE introduces a sparse Multimodal Mixture of Experts LLM efficiently handling diverse data types. It uses modality-specific encoders and a progressive training strategy, reducing performance bias and improving collaboration across modalities.
πΉ Publication Date: Published on May 18, 2024
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2405.11273
β’ PDF: https://arxiv.org/pdf/2405.11273
β’ Github: https://github.com/hitsz-tmg/umoe-scaling-unified-multimodal-llms
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#MultimodalAI #LLMs #MixtureOfExperts #DeepLearning #AIResearch
β¨Ξ¦eat: Physically-Grounded Feature Representation
π Summary:
Ξ¦eat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11270
β’ PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
π Summary:
Ξ¦eat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11270
β’ PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI