Media is too big
VIEW IN TELEGRAM
✨Grounding World Simulation Models in a Real-World Metropolis
📝 Summary:
Seoul World Model SWM renders video simulations of actual cities, not imagined environments. It grounds autoregressive video generation using real street-view images, overcoming data challenges. SWM generates spatially faithful, long-horizon urban videos for diverse camera paths and scenarios.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15583
• PDF: https://seoul-world-model.github.io/SWM_paper.pdf
• Project Page: https://seoul-world-model.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Seoul World Model SWM renders video simulations of actual cities, not imagined environments. It grounds autoregressive video generation using real street-view images, overcoming data challenges. SWM generates spatially faithful, long-horizon urban videos for diverse camera paths and scenarios.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15583
• PDF: https://seoul-world-model.github.io/SWM_paper.pdf
• Project Page: https://seoul-world-model.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Panoramic Affordance Prediction
📝 Summary:
Affordance prediction serves as a critical bridge between perception and action in embodied AI. However, existing research is confined to pinhole camera models, which suffer from narrow Fields of View...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15558
• PDF: https://arxiv.org/pdf/2603.15558
• Project Page: https://zixinzhang02.github.io/Panoramic-Affordance-Prediction/
• Github: https://zixinzhang02.github.io/Panoramic-Affordance-Prediction/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Affordance prediction serves as a critical bridge between perception and action in embodied AI. However, existing research is confined to pinhole camera models, which suffer from narrow Fields of View...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15558
• PDF: https://arxiv.org/pdf/2603.15558
• Project Page: https://zixinzhang02.github.io/Panoramic-Affordance-Prediction/
• Github: https://zixinzhang02.github.io/Panoramic-Affordance-Prediction/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
Panoramic Affordance Prediction
Affordance prediction serves as a critical bridge between perception and action in embodied AI. However, existing research is confined to pinhole camera models, which suffer from narrow Fields of...
✨MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos
📝 Summary:
Multimodal Large Language Models (MLLMs) have shown strong performance in visual and audio understanding when evaluated in isolation. However, their ability to jointly reason over omni-modal (visual, ...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14145
• PDF: https://arxiv.org/pdf/2603.14145
• Project Page: https://huggingface.co/datasets/nvidia/MMOU
✨ Datasets citing this paper:
• https://huggingface.co/datasets/nvidia/MMOU
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Multimodal Large Language Models (MLLMs) have shown strong performance in visual and audio understanding when evaluated in isolation. However, their ability to jointly reason over omni-modal (visual, ...
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14145
• PDF: https://arxiv.org/pdf/2603.14145
• Project Page: https://huggingface.co/datasets/nvidia/MMOU
✨ Datasets citing this paper:
• https://huggingface.co/datasets/nvidia/MMOU
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models
📝 Summary:
Federal Open Market Committee (FOMC) statements are a major source of monetary-policy information, and even subtle changes in their wording can move global financial markets. A central task is therefo...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14313
• PDF: https://arxiv.org/pdf/2603.14313
• Project Page: https://yixuantt.github.io/DeltaConsistent/
• Github: https://github.com/yixuantt/DeltaConsistentScoring
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Federal Open Market Committee (FOMC) statements are a major source of monetary-policy information, and even subtle changes in their wording can move global financial markets. A central task is therefo...
🔹 Publication Date: Published on Mar 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14313
• PDF: https://arxiv.org/pdf/2603.14313
• Project Page: https://yixuantt.github.io/DeltaConsistent/
• Github: https://github.com/yixuantt/DeltaConsistentScoring
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
📝 Summary:
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial g...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15594
• PDF: https://github.com/rui-ye/OpenSeeker/blob/main/assets/OpenSeeker.pdf
• Github: https://github.com/rui-ye/OpenSeeker
🔹 Models citing this paper:
• https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial g...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15594
• PDF: https://github.com/rui-ye/OpenSeeker/blob/main/assets/OpenSeeker.pdf
• Github: https://github.com/rui-ye/OpenSeeker
🔹 Models citing this paper:
• https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
OpenSeeker: Democratizing Frontier Search Agents by Fully...
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by...
✨WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics
📝 Summary:
Existing web-generation benchmarks rely on text prompts or static screenshots as input. However, videos naturally convey richer signals such as interaction flow, transition timing, and motion continui...
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13391
• PDF: https://arxiv.org/pdf/2603.13391
• Project Page: https://webvr-benchmark.github.io/
• Github: https://github.com/broalantaps/WebVR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Existing web-generation benchmarks rely on text prompts or static screenshots as input. However, videos naturally convey richer signals such as interaction flow, transition timing, and motion continui...
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13391
• PDF: https://arxiv.org/pdf/2603.13391
• Project Page: https://webvr-benchmark.github.io/
• Github: https://github.com/broalantaps/WebVR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Attention Residuals
📝 Summary:
Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with dept...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15031
• PDF: https://arxiv.org/pdf/2603.15031
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with dept...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15031
• PDF: https://arxiv.org/pdf/2603.15031
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
Attention Residuals
Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with...
✨HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions
📝 Summary:
HSImul3R presents a unified framework for 3D reconstruction of human-scene interactions that bridges the perception-simulation gap through physics-grounded bidirectional optimization and reinforcement...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15612
• PDF: https://arxiv.org/pdf/2603.15612
• Project Page: https://yukangcao.github.io/HSImul3R/
• Github: https://yukangcao.github.io/HSImul3R/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
HSImul3R presents a unified framework for 3D reconstruction of human-scene interactions that bridges the perception-simulation gap through physics-grounded bidirectional optimization and reinforcement...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15612
• PDF: https://arxiv.org/pdf/2603.15612
• Project Page: https://yukangcao.github.io/HSImul3R/
• Github: https://yukangcao.github.io/HSImul3R/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
📝 Summary:
Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15611
• PDF: https://arxiv.org/pdf/2603.15611
• Project Page: https://zju-real.github.io/Code-A1/
• Github: https://github.com/ZJU-REAL/Code-A1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15611
• PDF: https://arxiv.org/pdf/2603.15611
• Project Page: https://zju-real.github.io/Code-A1/
• Github: https://github.com/ZJU-REAL/Code-A1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨EvoClaw: Evaluating AI Agents on Continuous Software Evolution
📝 Summary:
With AI agents increasingly deployed as long-running systems, it becomes essential to autonomously construct and continuously evolve customized software to enable interaction within dynamic environmen...
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13428
• PDF: https://arxiv.org/pdf/2603.13428
• Project Page: https://evo-claw.com/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
With AI agents increasingly deployed as long-running systems, it becomes essential to autonomously construct and continuously evolve customized software to enable interaction within dynamic environmen...
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13428
• PDF: https://arxiv.org/pdf/2603.13428
• Project Page: https://evo-claw.com/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤2
✨MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
📝 Summary:
Knowledge-aware concept customization binds textual knowledge to visual concepts through a two-stage framework that learns visual anchors and updates textual knowledge for high-fidelity generation, su...
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12743
• PDF: https://arxiv.org/pdf/2603.12743
• Project Page: https://chenyangzhu1.github.io/MoKus/
• Github: https://github.com/HKUST-LongGroup/MoKus
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Knowledge-aware concept customization binds textual knowledge to visual concepts through a two-stage framework that learns visual anchors and updates textual knowledge for high-fidelity generation, su...
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12743
• PDF: https://arxiv.org/pdf/2603.12743
• Project Page: https://chenyangzhu1.github.io/MoKus/
• Github: https://github.com/HKUST-LongGroup/MoKus
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
📝 Summary:
TERMINATOR is an early-exit method for large reasoning models to prevent overthinking during Chain-of-Thought reasoning. It learns optimal exit points by predicting the first arrival of the final answer. This reduces reasoning length by 14%-55% without performance loss.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12529
• PDF: https://arxiv.org/pdf/2603.12529
• Project Page: https://terminator-llm.github.io/
• Github: https://terminator-llm.github.io
🔹 Models citing this paper:
• https://huggingface.co/acnagle/Terminator-Qwen3-8B
• https://huggingface.co/acnagle/Terminator-Qwen3-14B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
TERMINATOR is an early-exit method for large reasoning models to prevent overthinking during Chain-of-Thought reasoning. It learns optimal exit points by predicting the first arrival of the final answer. This reduces reasoning length by 14%-55% without performance loss.
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12529
• PDF: https://arxiv.org/pdf/2603.12529
• Project Page: https://terminator-llm.github.io/
• Github: https://terminator-llm.github.io
🔹 Models citing this paper:
• https://huggingface.co/acnagle/Terminator-Qwen3-8B
• https://huggingface.co/acnagle/Terminator-Qwen3-14B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty
📝 Summary:
LLM reasoning involves procedural information and epistemic verbalization, which is externalized uncertainty. This verbalization drives continued information acquisition and is crucial for strong reasoning performance.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15500
• PDF: https://arxiv.org/pdf/2603.15500
• Github: https://github.com/beanie00/strategic-information-allocation-llm-reasoning
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Reasoning #AI #MachineLearning #Uncertainty
📝 Summary:
LLM reasoning involves procedural information and epistemic verbalization, which is externalized uncertainty. This verbalization drives continued information acquisition and is crucial for strong reasoning performance.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15500
• PDF: https://arxiv.org/pdf/2603.15500
• Github: https://github.com/beanie00/strategic-information-allocation-llm-reasoning
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Reasoning #AI #MachineLearning #Uncertainty
✨Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
📝 Summary:
This study unifies Supervised Fine-Tuning SFT and Reinforcement Learning RL for post-training Large Language Models. It reviews both techniques, their interplay, and emerging hybrid approaches. The paper identifies trends from recent studies and clarifies when each method is most effective.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13985
• PDF: https://arxiv.org/pdf/2603.13985
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #SupervisedFineTuning #ReinforcementLearning #AI #MachineLearning
📝 Summary:
This study unifies Supervised Fine-Tuning SFT and Reinforcement Learning RL for post-training Large Language Models. It reviews both techniques, their interplay, and emerging hybrid approaches. The paper identifies trends from recent studies and clarifies when each method is most effective.
🔹 Publication Date: Published on Mar 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13985
• PDF: https://arxiv.org/pdf/2603.13985
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #SupervisedFineTuning #ReinforcementLearning #AI #MachineLearning
✨SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation
📝 Summary:
SNCE is a novel training objective for large-codebook discrete image generators. It supervises models with a soft categorical distribution over neighboring tokens, based on embedding proximity, instead of hard one-hot targets. This approach significantly improves convergence speed and overall gen...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15150
• PDF: https://arxiv.org/pdf/2603.15150
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #GeometryAware #AIResearch
📝 Summary:
SNCE is a novel training objective for large-codebook discrete image generators. It supervises models with a soft categorical distribution over neighboring tokens, based on embedding proximity, instead of hard one-hot targets. This approach significantly improves convergence speed and overall gen...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15150
• PDF: https://arxiv.org/pdf/2603.15150
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageGeneration #DeepLearning #ComputerVision #GeometryAware #AIResearch
✨Mixture-of-Depths Attention
📝 Summary:
Mixture-of-Depths Attention MoDA addresses signal degradation in deep LLMs by allowing attention heads to access KV pairs from current and preceding layers. MoDA improves perplexity by 0.2 and downstream task performance by 2.11% with low overhead. It is a promising primitive for depth scaling.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15619
• PDF: https://arxiv.org/pdf/2603.15619
• Project Page: https://github.com/hustvl/MoDA
• Github: https://github.com/hustvl/MoDA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AttentionMechanisms #DeepLearning #AIResearch #NLP
📝 Summary:
Mixture-of-Depths Attention MoDA addresses signal degradation in deep LLMs by allowing attention heads to access KV pairs from current and preceding layers. MoDA improves perplexity by 0.2 and downstream task performance by 2.11% with low overhead. It is a promising primitive for depth scaling.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15619
• PDF: https://arxiv.org/pdf/2603.15619
• Project Page: https://github.com/hustvl/MoDA
• Github: https://github.com/hustvl/MoDA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AttentionMechanisms #DeepLearning #AIResearch #NLP
✨Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods
📝 Summary:
STALL is a training-free, model-agnostic detector for generated videos. It jointly models spatial and temporal evidence from real-data statistics within a probabilistic framework. STALL consistently outperforms prior image and video-based baselines, improving reliable detection.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15026
• PDF: https://arxiv.org/pdf/2603.15026
• Project Page: https://omerbenhayun.github.io/stall-video/
• Github: https://github.com/OmerBenHayun/stall-video
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Deepfakes #VideoDetection #ComputerVision #AI #DigitalForensics
📝 Summary:
STALL is a training-free, model-agnostic detector for generated videos. It jointly models spatial and temporal evidence from real-data statistics within a probabilistic framework. STALL consistently outperforms prior image and video-based baselines, improving reliable detection.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15026
• PDF: https://arxiv.org/pdf/2603.15026
• Project Page: https://omerbenhayun.github.io/stall-video/
• Github: https://github.com/OmerBenHayun/stall-video
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Deepfakes #VideoDetection #ComputerVision #AI #DigitalForensics
✨GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering
📝 Summary:
GlyphPrinter improves visual text rendering by addressing glyph accuracy. It introduces Region-Grouped DPO R-GDPO with region-level preferences from the GlyphCorrector dataset, significantly enhancing precision. This outperforms existing methods.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15616
• PDF: https://arxiv.org/pdf/2603.15616
• Project Page: https://henghuiding.com/GlyphPrinter/
• Github: https://github.com/FudanCVL/GlyphPrinter
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GlyphRendering #DeepLearning #ComputerVision #AIResearch #TextRendering
📝 Summary:
GlyphPrinter improves visual text rendering by addressing glyph accuracy. It introduces Region-Grouped DPO R-GDPO with region-level preferences from the GlyphCorrector dataset, significantly enhancing precision. This outperforms existing methods.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15616
• PDF: https://arxiv.org/pdf/2603.15616
• Project Page: https://henghuiding.com/GlyphPrinter/
• Github: https://github.com/FudanCVL/GlyphPrinter
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GlyphRendering #DeepLearning #ComputerVision #AIResearch #TextRendering
✨Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
📝 Summary:
A benchmark and metric suite for poster generation evaluates visual quality, coherence, and content accuracy, leading to a multi-agent pipeline that outperforms existing models with reduced computatio...
🔹 Publication Date: Published on May 27, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.21497
• PDF: https://arxiv.org/pdf/2505.21497
• Project Page: https://paper2poster.github.io/
• Github: https://paper2poster.github.io/
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Paper2Poster/Paper2Poster
✨ Spaces citing this paper:
• https://huggingface.co/spaces/KevinQHLin/Paper2Poster
• https://huggingface.co/spaces/camel-ai/Paper2Poster
• https://huggingface.co/spaces/wangrongsheng/Paper2Poster
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A benchmark and metric suite for poster generation evaluates visual quality, coherence, and content accuracy, leading to a multi-agent pipeline that outperforms existing models with reduced computatio...
🔹 Publication Date: Published on May 27, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.21497
• PDF: https://arxiv.org/pdf/2505.21497
• Project Page: https://paper2poster.github.io/
• Github: https://paper2poster.github.io/
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Paper2Poster/Paper2Poster
✨ Spaces citing this paper:
• https://huggingface.co/spaces/KevinQHLin/Paper2Poster
• https://huggingface.co/spaces/camel-ai/Paper2Poster
• https://huggingface.co/spaces/wangrongsheng/Paper2Poster
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Learning Latent Proxies for Controllable Single-Image Relighting
📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
✨Efficient Document Parsing via Parallel Token Prediction
📝 Summary:
PTP is a novel method to accelerate document parsing by overcoming slow autoregressive decoding in VLMs. It enables parallel token generation using learnable tokens, significantly boosting speed 1.6x-2.2x while reducing hallucinations and showing strong generalization.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15206
• PDF: https://arxiv.org/pdf/2603.15206
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DocumentParsing #VLMs #ParallelProcessing #AIEfficiency #NLP
📝 Summary:
PTP is a novel method to accelerate document parsing by overcoming slow autoregressive decoding in VLMs. It enables parallel token generation using learnable tokens, significantly boosting speed 1.6x-2.2x while reducing hallucinations and showing strong generalization.
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15206
• PDF: https://arxiv.org/pdf/2603.15206
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DocumentParsing #VLMs #ParallelProcessing #AIEfficiency #NLP