AI & ML Papers

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

❤1

388 views19:56

🔥 ReDesign: Recovering Editable Design Structures from Images via Agentic Decomposition

💡 The paper ReDesign presents a novel approach to recovering editable design structures from images, a common and costly bottleneck in modern design workflows. The problem is challenging because it requires recovering multiple attributes such as typography, vector geometry, colors, grouping, and layer ordering. The proposed method, ReDesign, uses an agentic framework that grows an editable layer hierarchy by selecting and composing specialized tools across modalities. To ensure reliability despite imperfect tool outputs, the framework introduces a verification mechanism at each expansion step, providing local accept, prune, or retry feedback that prevents error accumulation and avoids large-scale reruns.

The authors evaluate the method's editability at scale using the Figma Edit Replay Benchmark, consisting of 909 raw Figma files and 14796 controlled edit instructions that replay edits on reconstructed outputs. The results show that ReDesign achieves strong visual fidelity while delivering the highest editability across layout, color, and text edits, outperforming layered decomposition baselines and serial tool use pipelines. The paper's contributions include the introduction of the ReDesign framework, the Figma Edit Replay Benchmark, and the demonstration of the method's effectiveness in recovering editable design structures from images. Overall, the paper presents a significant advancement in the field of design recovery and editing, with potential applications in various design workflows.

📅 Published on Jul 28

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.25565
• PDF: https://arxiv.org/pdf/2607.25565
• Project Page: https://jintae-00.github.io/ReDesign/

📊 Datasets citing this paper:
• https://huggingface.co/datasets/Jintae-Park/ReDesign-Figma909

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#ComputerVision #GraphicDesignAutomation #ImageProcessing #VectorGraphicsRecovery #DesignStructureExtraction

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

369 views05:56

This media is not supported in your browser

0:00

VIEW IN TELEGRAM

363 views05:56

🔥 CodeNib: A Multi-View Data System for Serving Repository Context to Coding Agents

💡 The paper introduces CodeNib, a multi-view data system designed to provide repository context to coding agents. The problem addressed is that coding agents repeatedly search, navigate, and retain context from evolving repositories, but disconnected indexes, language servers, and task-local histories force repeated discovery and obscure life cycle costs. CodeNib builds reusable lexical, dense, and structural views per repository commit, maps outputs to repository-relative source ranges, maintains selected views across edits, and serves ranked search, symbol navigation, and bounded context through one runtime.

The method involves creating a system that can efficiently serve context to coding agents by building and maintaining multiple views of the repository. The system is evaluated across 100 snapshots, mapping quality-cost frontiers across the repository-context life cycle. The results show that when outputs match an independent rebuild, graph and vector updates are 8.7 times and 25.4 times faster at the median.

The system is also evaluated on a static-navigation subset matching normalized live-server locations, where the median per-request live/static latency ratio is 4.7 times. Additionally, the results show that selected context policies preserve localization with 50-87 percent fewer trajectory tokens than paired grep/read. Overall, the results support multi-view repository-context serving with explicit, operation-specific validity boundaries, providing a solution to the problem of efficiently serving context to coding agents.

📅 Published on Jul 28

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.25431
• PDF: https://arxiv.org/pdf/2607.25431
• Project Page: https://codenib.ai

📊 Datasets citing this paper:
• https://huggingface.co/datasets/sysevol-ai/codenib-synthesis

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#CodeAnalysisTools #MultiViewDataSystems #CodingAgents #RepositoryContext #SoftwareDevelopmentTools

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

559 views05:56

This media is not supported in your browser

0:30

586 views05:56

450 views15:56

🔥 TurboVLA: Real-Time Vision-Language-Action Model at 32 Hz on an RTX 4090 with <1 GB VRAM

💡 The paper introduces TurboVLA, a new vision-language-action model that achieves real-time performance at 32 Hz on an RTX 4090 with less than 1 GB VRAM. The conventional approach to vision-language-action models involves using a large language model as the central interface between perception and action, which incurs substantial computational and memory overhead. In contrast, TurboVLA reformulates the conventional pathway as a direct vision-language-action mapping, where visual observations and language instructions are independently encoded and directly exchanged through lightweight bidirectional vision-language interaction. This simplified design constructs task-conditioned representations directly from visual and linguistic features, significantly reducing computational and memory costs.

The TurboVLA model achieves 97.7 percent average success with only 0.2 billion parameters, 31.2 ms inference latency, and 0.9 GB inference VRAM on a consumer-grade RTX 4090, matching or outperforming substantially larger vision-language-action policies. The results establish TurboVLA as a simple and effective alternative to the prevailing language-centric vision-language-action paradigm, offering a new perspective on how vision, language, and action can be connected for efficient robotic manipulation. The code for TurboVLA is available online, making it accessible for further research and development. Overall, the paper presents a significant contribution to the field of vision-language-action modeling, demonstrating the potential for real-time and efficient robotic manipulation using a novel and simplified approach.

📅 Published on Jul 29

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.27205
• PDF: https://arxiv.org/pdf/2607.27205
• Project Page: https://h-embodvis.github.io/TurboVLA/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#VisionLanguageAction #RealTimeAI #EfficientDeepLearning #VisionLanguageInteraction #LowMemoryComputing

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

525 views15:56

315 views11:57

Photo

🔥 DistillAlign: Coordinating Mode Covering and Mode Seeking in Autoregressive Video Distillation

💡 This paper addresses the issue of existing autoregressive video distillation methods which typically decouple the initialization and distribution matching stages, leading to suboptimal results. The authors argue that a good initialization should match the mode coverage of the target distribution, rather than merely pursuing high quality. To analyze this, they introduce a distributional evaluation protocol that measures precision and coverage between student and teacher distributions in a shared latent space.

The authors find that some initializations reach high precision but low coverage, leading to suboptimal refinement, while mode-covering ones preserve broader support. Furthermore, even when the target distributions are aligned, the reverse-KL objective of distribution matching can still drive the student towards high-probability teacher regions in late training, reducing coverage and diversity.

To address this, the authors propose joint distillation, which combines the mode-seeking objective of distribution matching with a consistency distillation-based mode-covering constraint. The experiments show that their method improves generation quality, coverage, and diversity. Notably, even with a smaller teacher model, their method outperforms baselines refined with a larger teacher model, underscoring the importance of distributional alignment in autoregressive video distillation.

The main contributions of this paper are the introduction of a distributional evaluation protocol and the proposal of joint distillation, which coordinates mode covering and mode seeking in autoregressive video distillation. The results demonstrate the effectiveness of the proposed method in improving the quality and diversity of generated videos.

📅 Published on Jul 29

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.26811
• PDF: https://arxiv.org/pdf/2607.26811
• Project Page: https://lijiaxing0213.github.io/DistillAlign/

📊 Datasets citing this paper:
• https://huggingface.co/datasets/LiJiaxing/DistillAlign_1p3b_25K

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#AutoregressiveVideoDistillation #ModeCovering #DistributionMatching #VideoDistillationMethods #LatentSpaceAnalysis

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

372 views11:57

🔥 Frontis-MA1: Training an AI4AI Model towards Recursive Self-Improvement in Machine Learning Engineering

💡 The paper introduces Frontis-MA1, a model that achieves recursive self-improvement in machine learning engineering. Recursive self-improvement requires AI systems to improve the process of building AI, and machine learning engineering offers a concrete test bed for studying this capability. The authors propose OpenMLE, an open full-stack system for recursive self-improvement research in machine learning engineering, which includes a verifiable task environment with execution feedback, operator learning, and long-horizon search.

The Frontis-MA1 model is trained as a meta-evolution agent for machine learning engineering, aligning post-training and inference around four atomic program-evolution operators: Draft, Improve, Debug, and Crossover. These operators are trained via execution-grounded self-supervised training and reinforcement learning on data duplicated against all evaluation benchmarks, then composed into long-horizon search, coupling learning and evolution in a single loop.

The results show that Frontis-MA1 improves the Medal Average from 39.39% to 60.61% over its base model with OpenMLE-Evo, and reaches 71.21% with OpenMLE-Evo-Max, exceeding GPT-5.5+Codex and approaching GPT-5.6 Soland and the 2.8T Kimi K3. On the held-out Nature Bench Lite, both components transfer: with the framework fixed, swapping in the trained model raises Match-SOTA from 50% to 70%, and with the model fixed, swapping in OpenMLE-Evo raises it from 20% to 50%.

The authors release the model weights and the full OpenMLE stack to enable reproducible research on executable AI4AI towards recursive self-improvement. The paper demonstrates the effectiveness of Frontis-MA1 and OpenMLE in achieving recursive self-improvement in machine learning engineering, and provides a foundation for further research in this area.

📅 Published on Jul 30

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.28568
• PDF: https://arxiv.org/pdf/2607.28568
• Project Page: https://frontisai.github.io/OpenRSI/

🤖 Models citing this paper:
• https://huggingface.co/FrontisAI/Frontis-MA1-35B-GGUF
• https://huggingface.co/FrontisAI/Frontis-MA1-30B
• https://huggingface.co/FrontisAI/Frontis-MA1-30B-GGUF

📊 Datasets citing this paper:
• https://huggingface.co/datasets/FrontisAI/OpenMLE-Tasks
• https://huggingface.co/datasets/FrontisAI/OpenMLE-SFT-Traces

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#MachineLearningEngineering #ArtificialIntelligenceForAI #RecursiveSelfImprovement #MetaLearningAlgorithms #AIModelTraining

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

154 views21:57

🔗 Subscribe to the channel

💪 Become a member: Want better aim in Free Fire? 🚀 Discover the sensitivity secrets that pro players swear by! 🎯 | InsideAds

124 viewsedited 21:57

💪 Become a member: Want better aim in Free Fire? 🚀 Discover the sensitivity secrets that pro players swear by! 🎯 | InsideAds

🔥 VideoCoCo: Code-as-CoT for Physically-Consistent Video Generation via an Agentic Dual-Engine System

💡 The paper introduces VideoCoCo, a novel framework for physically consistent video generation. The problem addressed is that existing text-to-video models struggle to generate videos with physically consistent dynamics, as they must infer the temporal evolution of a scene implicitly from a highly compressed text prompt. Current chain-of-thought approaches introduce intermediate plans or visual states, but these representations are typically non-executable or temporally sparse, limiting their ability to instantiate and control the complete spatiotemporal process.

To address this limitation, VideoCoCo uses an agentic dual-engine system, where executable Blender code serves as a process-level chain of thought. Given a text prompt, a coding agent synthesizes a Blender program that explicitly specifies the scene and its temporal evolution. The executable simulation engine runs the program to produce a deterministic spatiotemporal draft, which is then transformed into a photorealistic video by a generative video engine through draft-conditioned editing.

The authors also construct VideoCoCo-3K, a curated dataset of draft-instruction-target triplets to adapt the video editor to simulated drafts. The results demonstrate that VideoCoCo improves the OmniWeaving baseline from 0.475 to 0.558 on PhyGenBench and from 52.18 to 77.88 on VBench-2.0, achieving the best average score on both benchmarks. The findings show that executable code provides an effective, controllable, and inspectable intermediate representation for physically consistent video generation.

📅 Published on Jul 29

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.27380
• PDF: https://arxiv.org/pdf/2607.27380

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#VideoGeneration #PhysicallyConsistentModeling #CodeAsCoT #AgenticDualEngine #TextToVideoSynthesis

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

143 views21:57

136 views21:57

🔥 INTACT: Isomorphic Intent-to-Action Learning for Search-Free World Models

💡 The paper introduces INTACT, a novel approach to learning world models that can predict how actions change a scene and recover actions for a desired change without expensive test-time search. The existing forward latent world models predict how actions change a scene, but recovering actions for a desired change is only possible through expensive test-time search. INTACT turns action-labeled, reward-free trajectories into a deployable intent-to-action interface. Each transition supplies physical intent, while a future goal supplies deployment intent. The architecture is isomorphic between local and goal motion-intent backbone-input graphs through an identical four-slot grammar and shared parameters.

The method provides intact transfer from RGB evidence to action-effective latent intent coordinates and from intent families to their corresponding action-law families. Asymmetric endpoint gradients ground physical successors and fix future goals as anchors, joining representation learning and control without pointwise latent matching or globally linear dynamics. The resulting coordinates support a robust distributional action law, where its conditional means serve directly as a search-free policy, while sampling remains available for diversity or optional verification.

The results show that one-epoch, zero-search models reach high success rates on four official tasks, achieving 85.78, 100.00, 97.67, and 97.89 percent success. Optional local cross-entropy method centered on the Direct plan reaches 96.86 percent macro success using 384 instead of 9000 candidate sequences, reducing sampling by 23.44 times while improving pure cross-entropy method by 16.00 points. One shared four-task encoder reaches 89.39 percent E5 Direct macro and improves every task over jointly trained models, while predicted expert action-family kNN tracks Direct success at 0.954. Direct inference takes 2.9-5.5 milliseconds. Overall, INTACT provides a robust and efficient approach to learning world models that can predict and recover actions for desired changes without expensive test-time search.

📅 Published on Jul 28

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.26056
• PDF: https://arxiv.org/pdf/2607.26056
• Project Page: https://zju3dv.github.io/INTACT-JEPA/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#IntentToActionLearning #WorldModels #IsomorphicLearning #SearchFreePlanning #LatentWorldModels

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

172 views21:58

189 views21:58

🔥 PhiZero: A World Model Built Around Physical Language

💡 The paper introduces PhiZero, a physical world model that uses a compact discrete representation of world-state transitions, referred to as physical language. Existing physical world models typically predict future videos directly in pixel space, which leaves the underlying world dynamics implicit within high-dimensional visual predictors. In contrast, PhiZero is motivated by humans' ability to abstract predictive structure from visual experience and organize it in natural language for explicit reasoning.

The method used in PhiZero involves learning physical language from in-the-wild videos through self-supervision and using it to explicitly reason about how the physical world evolves. PhiZero adopts a reason-then-render paradigm, where it first infers future world evolution as a physical-language sequence and then renders the inferred transitions into videos.

The results of extensive experiments across generation and understanding benchmarks validate the ability of PhiZero to model physically coherent world evolution. The paper also shows the potential of PhiZero for realistic and interactive world modeling, fine-grained action-conditioned simulation, and zero-shot motion transfer. Overall, PhiZero provides a new approach to physical world modeling that is based on a compact and discrete representation of world-state transitions, and has the potential to enable more efficient and effective modeling of complex physical systems.

📅 Published on Jul 30

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2607.28624
• PDF: https://arxiv.org/pdf/2607.28624
• Project Page: https://phi-zero.github.io/

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#PhysicalLanguageModeling #WorldModelArchitecture #DiscreteRepresentationLearning #SelfSupervisedLearning #PhysicalWorldReasoning

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

250 views21:58

271 views21:58

135 views07:53

🔥 On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

💡 The paper addresses the limited generalization of Supervised Fine-Tuning SFT for Large Language Models LLMs compared to reinforcement learning. The authors analyze the standard SFT gradients and find that they implicitly encode a problematic reward structure that restricts the model's generalization capabilities. To overcome this, they propose Dynamic Fine-Tuning DFT, a simple yet effective method that stabilizes gradient updates for each token by dynamically rescaling the objective function with the probability of the token. This approach significantly outperforms standard SFT across multiple challenging benchmarks and base models, demonstrating greatly improved generalization. Additionally, DFT shows competitive results in offline reinforcement learning settings, providing a simpler alternative to existing methods. The authors provide a theoretical motivation for their approach and demonstrate its effectiveness through experiments, substantially advancing the performance of SFT. The proposed method is a single-line code change, making it easy to implement, and the code will be made available for further use. Overall, the paper contributes to the improvement of SFT for LLMs, bridging the gap between theoretical insights and practical solutions.

📅 Published on Aug 7, 2025

🔗 Links:
• GitHub: https://github.com/huggingface
• arXiv: https://arxiv.org/abs/2508.05629
• PDF: https://arxiv.org/pdf/2508.05629

🤖 Models citing this paper:
• https://huggingface.co/Naphula/Cthulhu-70B-v1
• https://huggingface.co/Liang0223/Qwen-2.5-Math-1.5B-DPO

📊 Datasets citing this paper:
• https://huggingface.co/datasets/egotools-dev/a100_20260502

━━━━━━━━━━━━━━━━━━━━━━━━
📢 By: https://t.iss.one/PaperNexus

#ReinforcementLearningForLLMs #SupervisedFineTuningLimitations #RewardRectificationTechniques #DynamicFineTuningMethods #LargeLanguageModelGeneralization

The AI community building the future. Hugging Face has 458 repositories available. Follow their code on GitHub.

168 views07:53