🔹 Title: ChartCap: Mitigating Hallucination of Dense Chart Captioning
🔹 Publication Date: Published on Aug 5
🔹 Abstract: ChartCap, a large-scale dataset with dense, type-specific captions for real-world charts, improves caption accuracy and reduces hallucinations in vision language models. AI-generated summary Generating accurate, informative, and hallucination-free captions for charts remains challenging for vision language models , primarily due to the lack of large-scale, high-quality datasets of real-world charts . However, existing real-world chart datasets suffer from the inclusion of extraneous information that cannot be inferred from the chart and failure to sufficiently capture structural elements and key insights . Therefore, we introduce ChartCap, a large-scale dataset of 565K real-world chart images paired with type-specific, dense captions that exclude extraneous information and highlight both structural elements and key insights in detail. To build ChartCap, we design a four-stage pipeline that generates captions using only the discernible data from the chart and employ a cycle consistency-based human verification , which accelerates quality control without sacrificing accuracy. Additionally, we propose a novel metric, the Visual Consistency Score , which evaluates caption quality by measuring the similarity between the chart regenerated from a caption and the original chart, independent of reference captions. Extensive experiments confirms that models fine-tuned on ChartCap consistently generate more accurate and informative captions with reduced hallucinations , surpassing both open-source and proprietary models and even human-annotated captions.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03164
• PDF: https://arxiv.org/pdf/2508.03164
• Project Page: https://junyoung-00.github.io/ChartCap/
• Github: https://junyoung-00.github.io/ChartCap/
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/junyoung-00/ChartCap
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 5
🔹 Abstract: ChartCap, a large-scale dataset with dense, type-specific captions for real-world charts, improves caption accuracy and reduces hallucinations in vision language models. AI-generated summary Generating accurate, informative, and hallucination-free captions for charts remains challenging for vision language models , primarily due to the lack of large-scale, high-quality datasets of real-world charts . However, existing real-world chart datasets suffer from the inclusion of extraneous information that cannot be inferred from the chart and failure to sufficiently capture structural elements and key insights . Therefore, we introduce ChartCap, a large-scale dataset of 565K real-world chart images paired with type-specific, dense captions that exclude extraneous information and highlight both structural elements and key insights in detail. To build ChartCap, we design a four-stage pipeline that generates captions using only the discernible data from the chart and employ a cycle consistency-based human verification , which accelerates quality control without sacrificing accuracy. Additionally, we propose a novel metric, the Visual Consistency Score , which evaluates caption quality by measuring the similarity between the chart regenerated from a caption and the original chart, independent of reference captions. Extensive experiments confirms that models fine-tuned on ChartCap consistently generate more accurate and informative captions with reduced hallucinations , surpassing both open-source and proprietary models and even human-annotated captions.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03164
• PDF: https://arxiv.org/pdf/2508.03164
• Project Page: https://junyoung-00.github.io/ChartCap/
• Github: https://junyoung-00.github.io/ChartCap/
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/junyoung-00/ChartCap
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: AlignGuard-LoRA: Alignment-Preserving Fine-Tuning via Fisher-Guided Decomposition and Riemannian-Geodesic Collision Regularization
🔹 Publication Date: Published on Aug 4
🔹 Abstract: AlignGuard-LoRA (AGL) is a framework that preserves alignment during fine-tuning of large language models by introducing regularization techniques and a diagnostic benchmark to mitigate alignment drift. AI-generated summary Low-rank adaptation ( LoRA ) has become a standard tool for efficiently fine-tuning large language models (LLMs). Yet, even minor LoRA updates can induce alignment drift , weakening safety and behavioral constraints through entangled parameter changes. To address this, we propose AlignGuard-LoRA (AGL), a principled framework for preserving alignment during finetuning. AGL introduces several key components: a primary task loss for supervision, Fisher Information Matrix-based regularization to restrict updates in alignment-sensitive subspaces, and task-specific regularization to stabilize the integration of new knowledge. We further introduce collision-aware regularization, blending Riemannian overlap -- which penalizes coordinate-wise interference -- and geodesic separation -- which encourages disjoint update geometry. We curate DriftCaps , a targeted diagnostic benchmark of safe and unsafe prompts designed to quantify alignment drift and safety degradation. Empirical evaluations show that AGL mitigates alignment drift by up to 50% on safety-critical benchmarks without degrading downstream task performance. Comprehensive ablation confirms that each component contributes distinctly to preserving latent safety behaviors. Finally, we derive and validate a scaling law for catastrophic forgetting , revealing that AGL flattens post-finetuning loss escalation while preserving adaptation dynamics . AGL is a structurally grounded refinement of LoRA , ensuring alignment preservation with minimal trade-offs. To encourage further exp lora tion and development, we open-source our implementation.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02079
• PDF: https://arxiv.org/pdf/2508.02079
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 4
🔹 Abstract: AlignGuard-LoRA (AGL) is a framework that preserves alignment during fine-tuning of large language models by introducing regularization techniques and a diagnostic benchmark to mitigate alignment drift. AI-generated summary Low-rank adaptation ( LoRA ) has become a standard tool for efficiently fine-tuning large language models (LLMs). Yet, even minor LoRA updates can induce alignment drift , weakening safety and behavioral constraints through entangled parameter changes. To address this, we propose AlignGuard-LoRA (AGL), a principled framework for preserving alignment during finetuning. AGL introduces several key components: a primary task loss for supervision, Fisher Information Matrix-based regularization to restrict updates in alignment-sensitive subspaces, and task-specific regularization to stabilize the integration of new knowledge. We further introduce collision-aware regularization, blending Riemannian overlap -- which penalizes coordinate-wise interference -- and geodesic separation -- which encourages disjoint update geometry. We curate DriftCaps , a targeted diagnostic benchmark of safe and unsafe prompts designed to quantify alignment drift and safety degradation. Empirical evaluations show that AGL mitigates alignment drift by up to 50% on safety-critical benchmarks without degrading downstream task performance. Comprehensive ablation confirms that each component contributes distinctly to preserving latent safety behaviors. Finally, we derive and validate a scaling law for catastrophic forgetting , revealing that AGL flattens post-finetuning loss escalation while preserving adaptation dynamics . AGL is a structurally grounded refinement of LoRA , ensuring alignment preservation with minimal trade-offs. To encourage further exp lora tion and development, we open-source our implementation.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02079
• PDF: https://arxiv.org/pdf/2508.02079
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning
🔹 Publication Date: Published on Aug 3
🔹 Abstract: An uncertainty-driven framework for automated process reward data construction and aggregation methods improves the effectiveness and efficiency of Process-Level Reward Models in mathematical reasoning tasks. AI-generated summary Large language models have demonstrated remarkable capabilities in complex math ematical reasoning tasks, but they inevitably generate errors throughout multi-step solutions. Process-level Reward Models ( PRMs ) have shown great promise by providing supervision and evaluation at each intermediate step, thereby effectively improving the models' reasoning abilities. However, training effective PRMs requires high-quality process reward data, yet existing methods for constructing such data are often labour-intensive or inefficient. In this paper, we propose an uncertainty-driven framework for automated process reward data construction, encompassing both data generation and annotation processes for PRMs . Additionally, we identify the limitations of both majority vote and PRMs , and introduce two generic uncertainty-aware output aggregation methods: Hybrid Majority Reward Vote and Weighted Reward Frequency Vote , which combine the strengths of majority vote with PRMs . Extensive experiments on ProcessBench , MATH , and GSMPlus show the effectiveness and efficiency of the proposed PRM data construction framework, and demonstrate that the two output aggregation methods further improve the math ematical reasoning abilities across diverse PRMs . The code and data will be publicly available at https://github.com/Jiuzhouh/UnPRM.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01773
• PDF: https://arxiv.org/pdf/2508.01773
• Github: https://github.com/Jiuzhouh/UnPRM
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 3
🔹 Abstract: An uncertainty-driven framework for automated process reward data construction and aggregation methods improves the effectiveness and efficiency of Process-Level Reward Models in mathematical reasoning tasks. AI-generated summary Large language models have demonstrated remarkable capabilities in complex math ematical reasoning tasks, but they inevitably generate errors throughout multi-step solutions. Process-level Reward Models ( PRMs ) have shown great promise by providing supervision and evaluation at each intermediate step, thereby effectively improving the models' reasoning abilities. However, training effective PRMs requires high-quality process reward data, yet existing methods for constructing such data are often labour-intensive or inefficient. In this paper, we propose an uncertainty-driven framework for automated process reward data construction, encompassing both data generation and annotation processes for PRMs . Additionally, we identify the limitations of both majority vote and PRMs , and introduce two generic uncertainty-aware output aggregation methods: Hybrid Majority Reward Vote and Weighted Reward Frequency Vote , which combine the strengths of majority vote with PRMs . Extensive experiments on ProcessBench , MATH , and GSMPlus show the effectiveness and efficiency of the proposed PRM data construction framework, and demonstrate that the two output aggregation methods further improve the math ematical reasoning abilities across diverse PRMs . The code and data will be publicly available at https://github.com/Jiuzhouh/UnPRM.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01773
• PDF: https://arxiv.org/pdf/2508.01773
• Github: https://github.com/Jiuzhouh/UnPRM
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages
✅ https://t.iss.one/addlist/8_rRW2scgfRhOTc0
✅ https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
🔹 Title: Thyme: Think Beyond Images
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11630
• PDF: https://arxiv.org/pdf/2508.11630
• Project Page: https://thyme-vl.github.io/
• Github: https://github.com/yfzhang114/Thyme
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Keye/Thyme-RL
• https://huggingface.co/datasets/Kwai-Keye/Thyme-SFT
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11630
• PDF: https://arxiv.org/pdf/2508.11630
• Project Page: https://thyme-vl.github.io/
• Github: https://github.com/yfzhang114/Thyme
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Kwai-Keye/Thyme-RL
• https://huggingface.co/datasets/Kwai-Keye/Thyme-SFT
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title: PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11116
• PDF: https://arxiv.org/pdf/2508.11116
• Github: https://github.com/Li-Z-Q/PaperRegister
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11116
• PDF: https://arxiv.org/pdf/2508.11116
• Github: https://github.com/Li-Z-Q/PaperRegister
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title: StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11203
• PDF: https://arxiv.org/pdf/2508.11203
• Github: https://kwanyun.github.io/stylemm_page/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11203
• PDF: https://arxiv.org/pdf/2508.11203
• Github: https://kwanyun.github.io/stylemm_page/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title: SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation
🔹 Publication Date: Published on Aug 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06429
• PDF: https://arxiv.org/pdf/2508.06429
• Github: https://github.com/GuidoManni/SPARSE
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06429
• PDF: https://arxiv.org/pdf/2508.06429
• Github: https://github.com/GuidoManni/SPARSE
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤4
🔹 Title: Controlling Multimodal LLMs via Reward-guided Decoding
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11616
• PDF: https://arxiv.org/pdf/2508.11616
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11616
• PDF: https://arxiv.org/pdf/2508.11616
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title: FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11255
• PDF: https://arxiv.org/pdf/2508.11255
• Project Page: https://fantasy-amap.github.io/fantasy-talking2/
• Github: https://fantasy-amap.github.io/fantasy-talking2/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11255
• PDF: https://arxiv.org/pdf/2508.11255
• Project Page: https://fantasy-amap.github.io/fantasy-talking2/
• Github: https://fantasy-amap.github.io/fantasy-talking2/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: TexVerse: A Universe of 3D Objects with High-Resolution Textures
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10868
• PDF: https://arxiv.org/pdf/2508.10868
• Github: https://github.com/yiboz2001/TexVerse
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YiboZhang2001/TexVerse
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10868
• PDF: https://arxiv.org/pdf/2508.10868
• Github: https://github.com/yiboz2001/TexVerse
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YiboZhang2001/TexVerse
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title: X-Node: Self-Explanation is All We Need
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10461
• PDF: https://arxiv.org/pdf/2508.10461
• Github: https://github.com/basiralab/X-Node
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10461
• PDF: https://arxiv.org/pdf/2508.10461
• Github: https://github.com/basiralab/X-Node
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10894
• PDF: https://arxiv.org/pdf/2508.10894
• Github: https://github.com/IGNF/MAESTRO
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10894
• PDF: https://arxiv.org/pdf/2508.10894
• Github: https://github.com/IGNF/MAESTRO
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: SSRL: Self-Search Reinforcement Learning
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10874
• PDF: https://arxiv.org/pdf/2508.10874
• Project Page: https://huggingface.co/collections/TsinghuaC3I/ssrl-6899957a64d4a31f7f43bc88
• Github: https://github.com/TsinghuaC3I/SSRL
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10874
• PDF: https://arxiv.org/pdf/2508.10874
• Project Page: https://huggingface.co/collections/TsinghuaC3I/ssrl-6899957a64d4a31f7f43bc88
• Github: https://github.com/TsinghuaC3I/SSRL
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
🔹 Title: DINOv3
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://huggingface.co/collections/srimadhav/project-ideas-6896680486c631bc7d6cedd6
• Github: https://github.com/facebookresearch/dinov3
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/atalaydenknalbant/DINOv3
• https://huggingface.co/spaces/merve/dinov3-viz
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://huggingface.co/collections/srimadhav/project-ideas-6896680486c631bc7d6cedd6
• Github: https://github.com/facebookresearch/dinov3
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/atalaydenknalbant/DINOv3
• https://huggingface.co/spaces/merve/dinov3-viz
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
🔹 Title: XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.10395
• PDF: https://arxiv.org/pdf/2508.10395
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.10395
• PDF: https://arxiv.org/pdf/2508.10395
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title: DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework
🔹 Publication Date: Published on Aug 4
🔹 Abstract: DreamVVT, a two-stage framework using Diffusion Transformers and LoRA adapters, enhances video virtual try-on by leveraging unpaired human-centric data and pretrained models to preserve garment details and temporal consistency. AI-generated summary Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired garment-centric datasets and fail to effectively leverage priors of advanced visual models and test-time inputs, making it challenging to accurately preserve fine-grained garment details and maintain temporal consistency in unconstrained scenarios. To address these challenges, we propose DreamVVT, a carefully designed two-stage framework built upon Diffusion Transformers ( DiTs ), which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios. To further leverage prior knowledge from pretrained models and test-time inputs, in the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model ( VLM ), to synthesize high-fidelity and semantically consistent keyframe try-on images. These images serve as complementary appearance guidance for subsequent video generation. In the second stage, skeleton maps together with fine-grained motion and appearance descriptions are extracted from the input content, and these along with the keyframe try-on images are then fed into a pretrained video generation model enhanced with LoRA adapters . This ensures long-term temporal coherence for unseen regions and enables highly plausible dynamic motions. Extensive quantitative and qualitative experiments demonstrate that DreamVVT surpasses existing methods in preserving detailed garment content and temporal stability in real-world scenarios. Our project page https://virtu-lab.github.io/
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02807
• PDF: https://arxiv.org/pdf/2508.02807
• Github: https://virtu-lab.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 4
🔹 Abstract: DreamVVT, a two-stage framework using Diffusion Transformers and LoRA adapters, enhances video virtual try-on by leveraging unpaired human-centric data and pretrained models to preserve garment details and temporal consistency. AI-generated summary Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired garment-centric datasets and fail to effectively leverage priors of advanced visual models and test-time inputs, making it challenging to accurately preserve fine-grained garment details and maintain temporal consistency in unconstrained scenarios. To address these challenges, we propose DreamVVT, a carefully designed two-stage framework built upon Diffusion Transformers ( DiTs ), which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios. To further leverage prior knowledge from pretrained models and test-time inputs, in the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model ( VLM ), to synthesize high-fidelity and semantically consistent keyframe try-on images. These images serve as complementary appearance guidance for subsequent video generation. In the second stage, skeleton maps together with fine-grained motion and appearance descriptions are extracted from the input content, and these along with the keyframe try-on images are then fed into a pretrained video generation model enhanced with LoRA adapters . This ensures long-term temporal coherence for unseen regions and enables highly plausible dynamic motions. Extensive quantitative and qualitative experiments demonstrate that DreamVVT surpasses existing methods in preserving detailed garment content and temporal stability in real-world scenarios. Our project page https://virtu-lab.github.io/
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02807
• PDF: https://arxiv.org/pdf/2508.02807
• Github: https://virtu-lab.github.io/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title: BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10975
• PDF: https://arxiv.org/pdf/2508.10975
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.10975
• PDF: https://arxiv.org/pdf/2508.10975
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
🔹 Publication Date: Published on Aug 7
🔹 Abstract: MOSEv2, a more challenging dataset, highlights the limitations of current VOS methods in real-world scenarios with increased complexity and diverse challenges. AI-generated summary Video object segmentation ( VOS ) aims to segment specified target objects throughout a video. Although state-of-the-art methods have achieved impressive performance (e.g., 90+% J&F ) on existing benchmarks such as DAVIS and YouTube-VOS , these datasets primarily contain salient, dominant, and isolated objects, limiting their generalization to real-world scenarios. To advance VOS toward more realistic environments, coMplex video Object SEgmentation ( MOSEv1 ) was introduced to facilitate VOS research in complex scenes. Building on the strengths and limitations of MOSEv1 , we present MOSEv2 , a significantly more challenging dataset designed to further advance VOS methods under real-world conditions. MOSEv2 consists of 5,024 videos and over 701,976 high-quality masks for 10,074 objects across 200 categories. Compared to its predecessor, MOSEv2 introduces significantly greater scene complexity , including more frequent object disappearance and reappearance, severe occlusions and crowding , smaller objects, as well as a range of new challenges such as adverse weather (e.g., rain, snow, fog), low-light scenes (e.g., nighttime, underwater), multi-shot sequences, camouflaged objects , non-physical targets (e.g., shadows, reflections), scenarios requiring external knowledge , etc. We benchmark 20 representative VOS methods under 5 different settings and observe consistent performance drops. For example, SAM2 drops from 76.4% on MOSEv1 to only 50.9% on MOSEv2 . We further evaluate 9 video object tracking methods and find similar declines, demonstrating that MOSEv2 presents challenges across tasks. These results highlight that despite high accuracy on existing datasets, current VOS methods still struggle under real-world complexities. MOSEv2 is publicly available at https://MOSE.video.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05630
• PDF: https://arxiv.org/pdf/2508.05630
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 7
🔹 Abstract: MOSEv2, a more challenging dataset, highlights the limitations of current VOS methods in real-world scenarios with increased complexity and diverse challenges. AI-generated summary Video object segmentation ( VOS ) aims to segment specified target objects throughout a video. Although state-of-the-art methods have achieved impressive performance (e.g., 90+% J&F ) on existing benchmarks such as DAVIS and YouTube-VOS , these datasets primarily contain salient, dominant, and isolated objects, limiting their generalization to real-world scenarios. To advance VOS toward more realistic environments, coMplex video Object SEgmentation ( MOSEv1 ) was introduced to facilitate VOS research in complex scenes. Building on the strengths and limitations of MOSEv1 , we present MOSEv2 , a significantly more challenging dataset designed to further advance VOS methods under real-world conditions. MOSEv2 consists of 5,024 videos and over 701,976 high-quality masks for 10,074 objects across 200 categories. Compared to its predecessor, MOSEv2 introduces significantly greater scene complexity , including more frequent object disappearance and reappearance, severe occlusions and crowding , smaller objects, as well as a range of new challenges such as adverse weather (e.g., rain, snow, fog), low-light scenes (e.g., nighttime, underwater), multi-shot sequences, camouflaged objects , non-physical targets (e.g., shadows, reflections), scenarios requiring external knowledge , etc. We benchmark 20 representative VOS methods under 5 different settings and observe consistent performance drops. For example, SAM2 drops from 76.4% on MOSEv1 to only 50.9% on MOSEv2 . We further evaluate 9 video object tracking methods and find similar declines, demonstrating that MOSEv2 presents challenges across tasks. These results highlight that despite high accuracy on existing datasets, current VOS methods still struggle under real-world complexities. MOSEv2 is publicly available at https://MOSE.video.
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05630
• PDF: https://arxiv.org/pdf/2508.05630
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Title: Ovis2.5 Technical Report
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11737
• PDF: https://arxiv.org/pdf/2508.11737
• Github: https://github.com/AIDC-AI/Ovis
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-9B
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-2B
• https://huggingface.co/spaces/Agung1453/Ovis2.5-9B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔹 Publication Date: Published on Aug 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.11737
• PDF: https://arxiv.org/pdf/2508.11737
• Github: https://github.com/AIDC-AI/Ovis
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-9B
• https://huggingface.co/spaces/AIDC-AI/Ovis2.5-2B
• https://huggingface.co/spaces/Agung1453/Ovis2.5-9B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1