✨Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ComputerVision #Robotics #VLAModels #DeepLearning
📝 Summary:
Mantis is a VLA framework with Disentangled Visual Foresight DVF and a diffusion Transformer. DVF decouples visual foresight from the backbone, improving action prediction, comprehension, and reasoning while reducing training complexity. Mantis achieves high success rates and strong instruction-f...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16175
• PDF: https://arxiv.org/pdf/2511.16175
• Github: https://github.com/zhijie-group/Mantis
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ComputerVision #Robotics #VLAModels #DeepLearning
✨VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
📝 Summary:
VisMem equips Vision-Language Models with dynamic latent vision memories, inspired by human cognition. This framework helps VLMs maintain perceptual fidelity and semantic consistency, significantly boosting performance on complex visual tasks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11007
• PDF: https://arxiv.org/pdf/2511.11007
• Github: https://github.com/YU-deep/VisMem.git
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisMem #VisionLanguageModels #AI #DeepLearning #ComputerVision
📝 Summary:
VisMem equips Vision-Language Models with dynamic latent vision memories, inspired by human cognition. This framework helps VLMs maintain perceptual fidelity and semantic consistency, significantly boosting performance on complex visual tasks.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11007
• PDF: https://arxiv.org/pdf/2511.11007
• Github: https://github.com/YU-deep/VisMem.git
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisMem #VisionLanguageModels #AI #DeepLearning #ComputerVision
✨General Agentic Memory Via Deep Research
📝 Summary:
GAM is a novel framework for AI memory addressing information loss in static systems. It uses JIT principles with a memorizer and researcher to create optimized contexts at runtime. This improves memory efficiency and task completion, leveraging LLMs and reinforcement learning.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18423
• PDF: https://arxiv.org/pdf/2511.18423
• Github: https://github.com/VectorSpaceLab/general-agentic-memory
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #LLMs #ReinforcementLearning #AIMemory #DeepLearning
📝 Summary:
GAM is a novel framework for AI memory addressing information loss in static systems. It uses JIT principles with a memorizer and researcher to create optimized contexts at runtime. This improves memory efficiency and task completion, leveraging LLMs and reinforcement learning.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18423
• PDF: https://arxiv.org/pdf/2511.18423
• Github: https://github.com/VectorSpaceLab/general-agentic-memory
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #LLMs #ReinforcementLearning #AIMemory #DeepLearning
✨In-Video Instructions: Visual Signals as Generative Control
📝 Summary:
This paper introduces In-Video Instruction for controllable image-to-video generation. It embeds visual signals like text or arrows directly into frames as instructions, offering precise, spatial-aware control over object actions. Experiments show video models reliably execute these visual cues.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19401
• PDF: https://arxiv.org/pdf/2511.19401
• Project Page: https://fangggf.github.io/In-Video/
• Github: https://fangggf.github.io/In-Video/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #AIResearch #DeepLearning
📝 Summary:
This paper introduces In-Video Instruction for controllable image-to-video generation. It embeds visual signals like text or arrows directly into frames as instructions, offering precise, spatial-aware control over object actions. Experiments show video models reliably execute these visual cues.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19401
• PDF: https://arxiv.org/pdf/2511.19401
• Project Page: https://fangggf.github.io/In-Video/
• Github: https://fangggf.github.io/In-Video/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #ComputerVision #AIResearch #DeepLearning
✨DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
📝 Summary:
DeCo is a frequency-decoupled pixel diffusion framework that improves image generation by separating high-frequency details and low-frequency semantics. It uses a lightweight pixel decoder for details and a DiT for semantics, achieving superior efficiency and quality over existing pixel diffusion...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19365
• PDF: https://arxiv.org/pdf/2511.19365
• Project Page: https://zehong-ma.github.io/DeCo/
• Github: https://github.com/Zehong-Ma/DeCo
🔹 Models citing this paper:
• https://huggingface.co/zehongma/DeCo
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zehongma/DeCo
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DiffusionModels #ComputerVision #DeepLearning #DeCo
✨Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
📝 Summary:
Controllable Layer Decomposition CLD enables fine-grained, controllable separation of raster images into editable RGBA layers, overcoming traditional compositing limitations. Using LD-DiT and MLCA, CLD surpasses existing methods in quality and control. It produces layers directly usable in design...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16249
• PDF: https://arxiv.org/pdf/2511.16249
• Github: https://github.com/monkek123King/CLD
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #ImageEditing #LayerDecomposition
✨Plan-X: Instruct Video Generation via Semantic Planning
📝 Summary:
Plan-X improves instruction-aligned video generation by integrating a Semantic Planner with diffusion models. The planner generates semantic tokens that guide video synthesis, reducing visual hallucinations. This framework combines language models for reasoning with diffusion models for photoreal...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17986
• PDF: https://arxiv.org/pdf/2511.17986
• Project Page: https://byteaigc.github.io/Plan-X/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #AI #ComputerVision #DeepLearning
📝 Summary:
Plan-X improves instruction-aligned video generation by integrating a Semantic Planner with diffusion models. The planner generates semantic tokens that guide video synthesis, reducing visual hallucinations. This framework combines language models for reasoning with diffusion models for photoreal...
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17986
• PDF: https://arxiv.org/pdf/2511.17986
• Project Page: https://byteaigc.github.io/Plan-X/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #AI #ComputerVision #DeepLearning
✨Continuous Thought Machines
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
📝 Summary:
The Continuous Thought Machine CTM reintroduces neural timing and synchronization to deep learning for complex sequential reasoning and biologically plausible AI. It uses neuron-level temporal processing and synchronization as a latent representation, performing well on diverse tasks with adaptiv...
🔹 Publication Date: Published on May 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.05522
• PDF: https://arxiv.org/pdf/2505.05522
• Github: https://github.com/SakanaAI/continuous-thought-machines
🔹 Models citing this paper:
• https://huggingface.co/SakanaAI/ctm-imagenet
• https://huggingface.co/SakanaAI/ctm-maze-large
✨ Spaces citing this paper:
• https://huggingface.co/spaces/Uday/ctm-energy-based-halting
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DeepLearning #NeuralNetworks #BiologicallyInspiredAI #TemporalAI
✨LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
LucidFlux is a caption-free universal image restoration framework using a large diffusion transformer. It employs a dual-branch conditioner and adaptive modulation for robust restoration, avoiding text prompts by using SigLIP features. This approach outperforms existing methods by intelligently c...
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22414
• PDF: https://arxiv.org/pdf/2509.22414
• Project Page: https://w2genai-lab.github.io/LucidFlux/
• Github: https://github.com/W2GenAI-Lab/LucidFlux
🔹 Models citing this paper:
• https://huggingface.co/W2GenAI/LucidFlux
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRestoration #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
✨DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
📝 Summary:
RLER is introduced to train deep research models for long-form tasks by using rubrics that co-evolve with the policy model. Enabling DR Tulu-8B to outperform open models and match proprietary systems while being more cost-effective.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19399
• PDF: https://arxiv.org/pdf/2511.19399
• Project Page: https://github.com/rlresearch/dr-tulu
• Github: https://github.com/rlresearch/dr-tulu
🔹 Models citing this paper:
• https://huggingface.co/rl-research/DR-Tulu-8B
• https://huggingface.co/rl-research/DR-Tulu-SFT-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/rl-research/dr-tulu-sft-data
• https://huggingface.co/datasets/rl-research/dr-tulu-rl-data
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #MachineLearning
arXiv.org
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Deep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via...
✨HunyuanVideo 1.5 Technical Report
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
📝 Summary:
HunyuanVideo 1.5 is a lightweight, open-source video generation model achieving state-of-the-art visual quality and motion coherence. It employs an advanced DiT architecture with SSTA and an efficient video super-resolution network, enabling high-quality video creation on consumer GPUs.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18870
• PDF: https://arxiv.org/pdf/2511.18870
• Github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #AI #DeepLearning #OpenSource #DiffusionModels
✨Flow Map Distillation Without Data
📝 Summary:
This paper introduces a data-free framework for flow map distillation, eliminating the need for external datasets. By sampling only from the prior distribution, it avoids data mismatch risks and achieves state-of-the-art fidelity with minimal sampling steps, surpassing all data-based alternatives.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19428
• PDF: https://arxiv.org/pdf/2511.19428
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#FlowMapDistillation #DataFreeLearning #MachineLearning #DeepLearning #AIResearch
📝 Summary:
This paper introduces a data-free framework for flow map distillation, eliminating the need for external datasets. By sampling only from the prior distribution, it avoids data mismatch risks and achieves state-of-the-art fidelity with minimal sampling steps, surpassing all data-based alternatives.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19428
• PDF: https://arxiv.org/pdf/2511.19428
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#FlowMapDistillation #DataFreeLearning #MachineLearning #DeepLearning #AIResearch
✨Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
📝 Summary:
Training multi-agent systems with distinct LLMs faces optimization challenges. M-GRPO, a hierarchical GRPO extension, addresses this by aligning heterogeneous trajectories and decoupling agent training. This improves stability and sample efficiency for tool-augmented reasoning tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13288
• PDF: https://arxiv.org/pdf/2511.13288
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #ReinforcementLearning #DeepLearning #LLM #AI
📝 Summary:
Training multi-agent systems with distinct LLMs faces optimization challenges. M-GRPO, a hierarchical GRPO extension, addresses this by aligning heterogeneous trajectories and decoupling agent training. This improves stability and sample efficiency for tool-augmented reasoning tasks.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13288
• PDF: https://arxiv.org/pdf/2511.13288
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #ReinforcementLearning #DeepLearning #LLM #AI
✨Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems
📝 Summary:
A Sparse Autoencoder extracts interaction-aware monosemantic concepts from recommender embeddings. Its prediction-aware training aligns these with model predictions, enabling controllable personalization and interpretability.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18024
• PDF: https://arxiv.org/pdf/2511.18024
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RecommenderSystems #DeepLearning #AI #Interpretability #Personalization
📝 Summary:
A Sparse Autoencoder extracts interaction-aware monosemantic concepts from recommender embeddings. Its prediction-aware training aligns these with model predictions, enabling controllable personalization and interpretability.
🔹 Publication Date: Published on Nov 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18024
• PDF: https://arxiv.org/pdf/2511.18024
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RecommenderSystems #DeepLearning #AI #Interpretability #Personalization
✨MIST: Mutual Information Via Supervised Training
📝 Summary:
MIST is a data-driven neural network that estimates mutual information. Trained on synthetic data, it uses attention and quantile regression for uncertainty. It outperforms classical methods, offering faster and more reliable MI estimation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18945
• PDF: https://arxiv.org/pdf/2511.18945
• Github: https://github.com/grgera/mist
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MutualInformation #NeuralNetworks #MachineLearning #DeepLearning #DataScience
📝 Summary:
MIST is a data-driven neural network that estimates mutual information. Trained on synthetic data, it uses attention and quantile regression for uncertainty. It outperforms classical methods, offering faster and more reliable MI estimation.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18945
• PDF: https://arxiv.org/pdf/2511.18945
• Github: https://github.com/grgera/mist
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MutualInformation #NeuralNetworks #MachineLearning #DeepLearning #DataScience
✨PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
📝 Summary:
PC-Agent is a hierarchical multi-agent framework improving MLLM-based GUI agents for complex PC tasks. It uses an Active Perception Module and a hierarchical decision-making architecture with Manager, Progress, and Decision agents. A Reflection agent provides feedback. It achieved a 32% task succ...
🔹 Publication Date: Published on Feb 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.14282
• PDF: https://arxiv.org/pdf/2502.14282
• Github: https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent
✨ Spaces citing this paper:
• https://huggingface.co/spaces/junyangwang0410/PC-Agent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultiAgentSystems #AIAgents #MLLMs #PCAutomation #DeepLearning
✨MSRNet: A Multi-Scale Recursive Network for Camouflaged Object Detection
📝 Summary:
MSRNet proposes a Multi-Scale Recursive Network for camouflaged object detection. It uses a Pyramid Vision Transformer and recursive feature refinement to overcome challenges with small and multiple objects, achieving state-of-the-art results.
🔹 Publication Date: Published on Nov 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12810
• PDF: https://arxiv.org/pdf/2511.12810
🔹 Models citing this paper:
• https://huggingface.co/linaa98/MSRNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CamouflagedObjectDetection #ObjectDetection #ComputerVision #DeepLearning #AIResearch
📝 Summary:
MSRNet proposes a Multi-Scale Recursive Network for camouflaged object detection. It uses a Pyramid Vision Transformer and recursive feature refinement to overcome challenges with small and multiple objects, achieving state-of-the-art results.
🔹 Publication Date: Published on Nov 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12810
• PDF: https://arxiv.org/pdf/2511.12810
🔹 Models citing this paper:
• https://huggingface.co/linaa98/MSRNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CamouflagedObjectDetection #ObjectDetection #ComputerVision #DeepLearning #AIResearch
❤1
✨Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
📝 Summary:
Upsample Anything is a novel test-time optimization framework that enhances low-resolution features to high-resolution outputs without training. It learns an anisotropic Gaussian kernel per image, acting as a universal edge-aware operator. This method achieves state-of-the-art results in tasks li...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16301
• PDF: https://arxiv.org/pdf/2511.16301
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Upsampling #ComputerVision #ImageProcessing #DeepLearning #AI
📝 Summary:
Upsample Anything is a novel test-time optimization framework that enhances low-resolution features to high-resolution outputs without training. It learns an anisotropic Gaussian kernel per image, acting as a universal edge-aware operator. This method achieves state-of-the-art results in tasks li...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16301
• PDF: https://arxiv.org/pdf/2511.16301
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Upsampling #ComputerVision #ImageProcessing #DeepLearning #AI
✨EvoVLA: Self-Evolving Vision-Language-Action Model
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
Media is too big
VIEW IN TELEGRAM
✨One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control
📝 Summary:
One4D is a unified framework for 4D generation and reconstruction, producing synchronized RGB frames and pointmaps. It uses Unified Masked Conditioning for varying input sparsities and Decoupled LoRA Control to achieve high-quality results across diverse tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18922
• PDF: https://arxiv.org/pdf/2511.18922
• Project Page: https://mizhenxing.github.io/One4D
• Github: https://mizhenxing.github.io/One4D
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#4DGeneration #4DReconstruction #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
One4D is a unified framework for 4D generation and reconstruction, producing synchronized RGB frames and pointmaps. It uses Unified Masked Conditioning for varying input sparsities and Decoupled LoRA Control to achieve high-quality results across diverse tasks.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18922
• PDF: https://arxiv.org/pdf/2511.18922
• Project Page: https://mizhenxing.github.io/One4D
• Github: https://mizhenxing.github.io/One4D
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#4DGeneration #4DReconstruction #ComputerVision #DeepLearning #GenerativeAI
❤1