✨In-Video Instructions: Visual Signals as Generative Control
📝 Summary:
This paper introduces In-Video Instruction for controllable image-to-video generation. It embeds visual signals like text or arrows directly into frames as instructions, offering precise, spatial-aware control over object actions. Experiments show video models reliably execute these visual cues.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19401
• PDF: https://arxiv.org/pdf/2511.19401
• Project Page: https://fangggf.github.io/In-Video/
• Github: https://fangggf.github.io/In-Video/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #AIResearch #DeepLearning
📝 Summary:
This paper introduces In-Video Instruction for controllable image-to-video generation. It embeds visual signals like text or arrows directly into frames as instructions, offering precise, spatial-aware control over object actions. Experiments show video models reliably execute these visual cues.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19401
• PDF: https://arxiv.org/pdf/2511.19401
• Project Page: https://fangggf.github.io/In-Video/
• Github: https://fangggf.github.io/In-Video/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #AIResearch #DeepLearning
✨DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
✨Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
✨Plan-X: Instruct Video Generation via Semantic Planning
📝 Summary:
Plan-X improves instruction-aligned video generation by integrating a Semantic Planner with diffusion models. The planner generates semantic tokens that guide video synthesis, reducing visual hallucinations. This framework combines language models for reasoning with diffusion models for photoreal...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17986
• PDF: https://arxiv.org/pdf/2511.17986
• Project Page: https://byteaigc.github.io/Plan-X/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #AI #ComputerVision #DeepLearning
📝 Summary:
Plan-X improves instruction-aligned video generation by integrating a Semantic Planner with diffusion models. The planner generates semantic tokens that guide video synthesis, reducing visual hallucinations. This framework combines language models for reasoning with diffusion models for photoreal...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17986
• PDF: https://arxiv.org/pdf/2511.17986
• Project Page: https://byteaigc.github.io/Plan-X/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #AI #ComputerVision #DeepLearning
✨Continuous Thought Machines
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
✨LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
✨DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
arXiv.org
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via...
✨HunyuanVideo 1.5 Technical Report
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
✨Flow Map Distillation Without Data
📝 Summary:
This paper introduces a data-free framework for flow map distillation, eliminating the need for external datasets. By sampling only from the prior distribution, it avoids data mismatch risks and achieves state-of-the-art fidelity with minimal sampling steps, surpassing all data-based alternatives.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19428
• PDF: https://arxiv.org/pdf/2511.19428
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#FlowMapDistillation #DataFreeLearning #MachineLearning #DeepLearning #AIResearch
📝 Summary:
This paper introduces a data-free framework for flow map distillation, eliminating the need for external datasets. By sampling only from the prior distribution, it avoids data mismatch risks and achieves state-of-the-art fidelity with minimal sampling steps, surpassing all data-based alternatives.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19428
• PDF: https://arxiv.org/pdf/2511.19428
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#FlowMapDistillation #DataFreeLearning #MachineLearning #DeepLearning #AIResearch
✨Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
📝 Summary:
Training multi-agent systems with distinct LLMs faces optimization challenges. M-GRPO, a hierarchical GRPO extension, addresses this by aligning heterogeneous trajectories and decoupling agent training. This improves stability and sample efficiency for tool-augmented reasoning tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13288
• PDF: https://arxiv.org/pdf/2511.13288
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #ReinforcementLearning #DeepLearning #LLM #AI
📝 Summary:
Training multi-agent systems with distinct LLMs faces optimization challenges. M-GRPO, a hierarchical GRPO extension, addresses this by aligning heterogeneous trajectories and decoupling agent training. This improves stability and sample efficiency for tool-augmented reasoning tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13288
• PDF: https://arxiv.org/pdf/2511.13288
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #ReinforcementLearning #DeepLearning #LLM #AI
✨Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems
📝 Summary:
A Sparse Autoencoder extracts interaction-aware monosemantic concepts from recommender embeddings. Its prediction-aware training aligns these with model predictions, enabling controllable personalization and interpretability.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18024
• PDF: https://arxiv.org/pdf/2511.18024
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RecommenderSystems #DeepLearning #AI #Interpretability #Personalization
📝 Summary:
A Sparse Autoencoder extracts interaction-aware monosemantic concepts from recommender embeddings. Its prediction-aware training aligns these with model predictions, enabling controllable personalization and interpretability.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18024
• PDF: https://arxiv.org/pdf/2511.18024
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RecommenderSystems #DeepLearning #AI #Interpretability #Personalization
✨MIST: Mutual Information Via Supervised Training
📝 Summary:
MIST is a data-driven neural network that estimates mutual information. Trained on synthetic data, it uses attention and quantile regression for uncertainty. It outperforms classical methods, offering faster and more reliable MI estimation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18945
• PDF: https://arxiv.org/pdf/2511.18945
• Github: https://github.com/grgera/mist
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MutualInformation #NeuralNetworks #MachineLearning #DeepLearning #DataScience
📝 Summary:
MIST is a data-driven neural network that estimates mutual information. Trained on synthetic data, it uses attention and quantile regression for uncertainty. It outperforms classical methods, offering faster and more reliable MI estimation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18945
• PDF: https://arxiv.org/pdf/2511.18945
• Github: https://github.com/grgera/mist
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MutualInformation #NeuralNetworks #MachineLearning #DeepLearning #DataScience
✨PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
✨MSRNet: A Multi-Scale Recursive Network for Camouflaged Object Detection
📝 Summary:
MSRNet proposes a Multi-Scale Recursive Network for camouflaged object detection. It uses a Pyramid Vision Transformer and recursive feature refinement to overcome challenges with small and multiple objects, achieving state-of-the-art results.
🔹 Publication Date: Published on Nov 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12810
• PDF: https://arxiv.org/pdf/2511.12810
🔹 Models citing this paper:
• https://huggingface.co/linaa98/MSRNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CamouflagedObjectDetection #ObjectDetection #ComputerVision #DeepLearning #AIResearch
📝 Summary:
MSRNet proposes a Multi-Scale Recursive Network for camouflaged object detection. It uses a Pyramid Vision Transformer and recursive feature refinement to overcome challenges with small and multiple objects, achieving state-of-the-art results.
🔹 Publication Date: Published on Nov 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12810
• PDF: https://arxiv.org/pdf/2511.12810
🔹 Models citing this paper:
• https://huggingface.co/linaa98/MSRNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CamouflagedObjectDetection #ObjectDetection #ComputerVision #DeepLearning #AIResearch
❤1
✨Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
📝 Summary:
Upsample Anything is a novel test-time optimization framework that enhances low-resolution features to high-resolution outputs without training. It learns an anisotropic Gaussian kernel per image, acting as a universal edge-aware operator. This method achieves state-of-the-art results in tasks li...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16301
• PDF: https://arxiv.org/pdf/2511.16301
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Upsampling #ComputerVision #ImageProcessing #DeepLearning #AI
📝 Summary:
Upsample Anything is a novel test-time optimization framework that enhances low-resolution features to high-resolution outputs without training. It learns an anisotropic Gaussian kernel per image, acting as a universal edge-aware operator. This method achieves state-of-the-art results in tasks li...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16301
• PDF: https://arxiv.org/pdf/2511.16301
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Upsampling #ComputerVision #ImageProcessing #DeepLearning #AI
✨EvoVLA: Self-Evolving Vision-Language-Action Model
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
Media is too big
VIEW IN TELEGRAM
✨One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
📝 Summary:
One4D is a unified framework for 4D generation and reconstruction, producing synchronized RGB frames and pointmaps. It uses Unified Masked Conditioning for varying input sparsities and Decoupled LoRA Control to achieve high-quality results across diverse tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18922
• PDF: https://arxiv.org/pdf/2511.18922
• Project Page: https://mizhenxing.github.io/One4D
• Github: https://mizhenxing.github.io/One4D
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#4DGeneration #4DReconstruction #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
One4D is a unified framework for 4D generation and reconstruction, producing synchronized RGB frames and pointmaps. It uses Unified Masked Conditioning for varying input sparsities and Decoupled LoRA Control to achieve high-quality results across diverse tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18922
• PDF: https://arxiv.org/pdf/2511.18922
• Project Page: https://mizhenxing.github.io/One4D
• Github: https://mizhenxing.github.io/One4D
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#4DGeneration #4DReconstruction #ComputerVision #DeepLearning #GenerativeAI
❤1
✨MedSAM3: Delving into Segment Anything with Medical Concepts
📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
✨iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels
📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels
This media is not supported in your browser
VIEW IN TELEGRAM
✨PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding
📝 Summary:
PhysChoreo generates physically realistic and controllable videos from a single image. It reconstructs part-aware physical properties and simulates dynamic behavior, outperforming existing methods.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20562
• PDF: https://arxiv.org/pdf/2511.20562
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #PhysicalSimulation #ComputerVision #DeepLearning #AIResearch
📝 Summary:
PhysChoreo generates physically realistic and controllable videos from a single image. It reconstructs part-aware physical properties and simulates dynamic behavior, outperforming existing methods.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20562
• PDF: https://arxiv.org/pdf/2511.20562
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #PhysicalSimulation #ComputerVision #DeepLearning #AIResearch