🔹 Title:
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
Mind2Web 2 benchmark evaluates agentic search systems with a suite of realistic, long-horizon tasks, introducing an Agent-as-a-Judge framework to assess accuracy and source attribution. AI-generated summary Agentic search such as Deep Research systems , where large language models autonomously browse the web, synthesize information, and return comprehensive citation-backed answers , represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing evaluation benchmarks and methodologies, which largely assume short search horizons and static answers . In this paper, we introduce Mind2Web 2, a benchmark of 130 realistic, high-quality, and long-horizon tasks that require real-time web browsing and extensive information synthesis , constructed with over 1,000 hours of human labor. To address the challenge of evaluating time-varying and complex answers, we propose a novel Agent-as-a-Judge framework. Our method constructs task-specific judge agents based on a tree-structured rubric design to automatically assess both answer correctness and source attribution . We conduct a comprehensive evaluation of nine frontier agentic search systems and human performance , along with a detailed error analysis to draw insights for future development. The best-performing system, OpenAI Deep Research , can already achieve 50-70% of human performance while spending half the time, showing a great potential. Altogether, Mind2Web 2 provides a rigorous foundation for developing and benchmarking the next generation of agentic search systems .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21506
• PDF: https://arxiv.org/pdf/2506.21506
• Project Page: https://osu-nlp-group.github.io/Mind2Web-2
• Github: https://github.com/OSU-NLP-Group/Mind2Web-2/
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/osunlp/Mind2Web-2
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
Mind2Web 2 benchmark evaluates agentic search systems with a suite of realistic, long-horizon tasks, introducing an Agent-as-a-Judge framework to assess accuracy and source attribution. AI-generated summary Agentic search such as Deep Research systems , where large language models autonomously browse the web, synthesize information, and return comprehensive citation-backed answers , represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing evaluation benchmarks and methodologies, which largely assume short search horizons and static answers . In this paper, we introduce Mind2Web 2, a benchmark of 130 realistic, high-quality, and long-horizon tasks that require real-time web browsing and extensive information synthesis , constructed with over 1,000 hours of human labor. To address the challenge of evaluating time-varying and complex answers, we propose a novel Agent-as-a-Judge framework. Our method constructs task-specific judge agents based on a tree-structured rubric design to automatically assess both answer correctness and source attribution . We conduct a comprehensive evaluation of nine frontier agentic search systems and human performance , along with a detailed error analysis to draw insights for future development. The best-performing system, OpenAI Deep Research , can already achieve 50-70% of human performance while spending half the time, showing a great potential. Altogether, Mind2Web 2 provides a rigorous foundation for developing and benchmarking the next generation of agentic search systems .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21506
• PDF: https://arxiv.org/pdf/2506.21506
• Project Page: https://osu-nlp-group.github.io/Mind2Web-2
• Github: https://github.com/OSU-NLP-Group/Mind2Web-2/
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/osunlp/Mind2Web-2
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how...
❤1
🔹 Title:
Intelligent Operation and Maintenance and Prediction Model Optimization for Improving Wind Power Generation Efficiency
🔹 Publication Date: Published on Jun 19
🔹 Abstract:
This study explores the effectiveness of predictive maintenance models and the optimization of intelligent Operation and Maintenance (O&M) systems in improving wind power generation efficiency. Through qualitative research, structured interviews were conducted with five wind farm engineers and maintenance managers, each with extensive experience in turbine operations. Using thematic analysis, the study revealed that while predictive maintenance models effectively reduce downtime by identifying major faults, they often struggle with detecting smaller, gradual failures. Key challenges identified include false positives, sensor malfunctions, and difficulties in integrating new models with older turbine systems. Advanced technologies such as digital twins, SCADA systems, and condition monitoring have significantly enhanced turbine maintenance practices. However, these technologies still require improvements, particularly in AI refinement and real-time data integration. The findings emphasize the need for continuous development to fully optimize wind turbine performance and support the broader adoption of renewable energy.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.16095
• PDF: https://arxiv.org/pdf/2506.16095
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Intelligent Operation and Maintenance and Prediction Model Optimization for Improving Wind Power Generation Efficiency
🔹 Publication Date: Published on Jun 19
🔹 Abstract:
This study explores the effectiveness of predictive maintenance models and the optimization of intelligent Operation and Maintenance (O&M) systems in improving wind power generation efficiency. Through qualitative research, structured interviews were conducted with five wind farm engineers and maintenance managers, each with extensive experience in turbine operations. Using thematic analysis, the study revealed that while predictive maintenance models effectively reduce downtime by identifying major faults, they often struggle with detecting smaller, gradual failures. Key challenges identified include false positives, sensor malfunctions, and difficulties in integrating new models with older turbine systems. Advanced technologies such as digital twins, SCADA systems, and condition monitoring have significantly enhanced turbine maintenance practices. However, these technologies still require improvements, particularly in AI refinement and real-time data integration. The findings emphasize the need for continuous development to fully optimize wind turbine performance and support the broader adoption of renewable energy.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.16095
• PDF: https://arxiv.org/pdf/2506.16095
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
🔹 Publication Date: Published on Jun 25
🔹 Abstract:
HiWave enhances ultra-high-resolution image synthesis using pretrained diffusion models through a two-stage pipeline involving DDIM inversion and wavelet-based detail enhancement, improving visual fidelity and reducing artifacts. AI-generated summary Diffusion models have emerged as the leading approach for image synthesis , demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts , including object duplication and spatial incoherence . In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline : generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures . Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality . A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20452
• PDF: https://arxiv.org/pdf/2506.20452
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling
🔹 Publication Date: Published on Jun 25
🔹 Abstract:
HiWave enhances ultra-high-resolution image synthesis using pretrained diffusion models through a two-stage pipeline involving DDIM inversion and wavelet-based detail enhancement, improving visual fidelity and reducing artifacts. AI-generated summary Diffusion models have emerged as the leading approach for image synthesis , demonstrating exceptional photorealism and diversity. However, training diffusion models at high resolutions remains computationally prohibitive, and existing zero-shot generation techniques for synthesizing images beyond training resolutions often produce artifacts , including object duplication and spatial incoherence . In this paper, we introduce HiWave, a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis using pretrained diffusion models. Our method employs a two-stage pipeline : generating a base image from the pretrained model followed by a patch-wise DDIM inversion step and a novel wavelet-based detail enhancer module. Specifically, we first utilize inversion methods to derive initial noise vectors that preserve global coherence from the base image. Subsequently, during sampling, our wavelet-domain detail enhancer retains low-frequency components from the base image to ensure structural consistency, while selectively guiding high-frequency components to enrich fine details and textures . Extensive evaluations using Stable Diffusion XL demonstrate that HiWave effectively mitigates common visual artifacts seen in prior methods, achieving superior perceptual quality . A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons, highlighting its effectiveness for high-quality, ultra-high-resolution image synthesis without requiring retraining or architectural modifications.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20452
• PDF: https://arxiv.org/pdf/2506.20452
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤🔥1
🔹 Title:
Resa: Transparent Reasoning Models via SAEs
🔹 Publication Date: Published on Jun 11
🔹 Abstract:
SAE-Tuning efficiently elicits strong reasoning in language models by leveraging sparse autoencoders, enabling cost-effective performance gains without extensive retraining. AI-generated summary How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning ( SAE-Tuning ) procedure. This method first trains an SAE to capture reasoning abilities from a source model, and then uses the trained SAE to guide a standard supervised fine-tuning process to elicit such abilities in a target model, all using verified question-answer data without any reasoning traces. Notably, when applied to certain base models before further RL post-training, SAE-Tuning retains >97% of its RL-trained counterpart's reasoning performance while reducing training costs by >2000x to roughly \1 and training time by >450x to around 20 minutes. Furthermore, when applied to lightly RL-trained models (e.g., within 1 hour on 2 GPUs), it enables reasoning performance such as 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23 for only around 1 additional cost. Surprisingly, the reasoning abilities extracted via SAEs are potentially both generalizable and modular. Generality means abilities extracted from one dataset still elevate performance on a larger and overlapping corpus. Modularity means abilities extracted from Qwen or Qwen-Math can be attached to the R1-Distill model at test time, without any retraining, and yield comparable gains. Extensive ablations validate these findings and all artifacts are fully open-sourced.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09967
• PDF: https://arxiv.org/pdf/2506.09967
• Project Page: https://shangshangwang.notion.site/resa
• Github: https://github.com/shangshang-wang/Resa
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Resa: Transparent Reasoning Models via SAEs
🔹 Publication Date: Published on Jun 11
🔹 Abstract:
SAE-Tuning efficiently elicits strong reasoning in language models by leveraging sparse autoencoders, enabling cost-effective performance gains without extensive retraining. AI-generated summary How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning ( SAE-Tuning ) procedure. This method first trains an SAE to capture reasoning abilities from a source model, and then uses the trained SAE to guide a standard supervised fine-tuning process to elicit such abilities in a target model, all using verified question-answer data without any reasoning traces. Notably, when applied to certain base models before further RL post-training, SAE-Tuning retains >97% of its RL-trained counterpart's reasoning performance while reducing training costs by >2000x to roughly \1 and training time by >450x to around 20 minutes. Furthermore, when applied to lightly RL-trained models (e.g., within 1 hour on 2 GPUs), it enables reasoning performance such as 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23 for only around 1 additional cost. Surprisingly, the reasoning abilities extracted via SAEs are potentially both generalizable and modular. Generality means abilities extracted from one dataset still elevate performance on a larger and overlapping corpus. Modularity means abilities extracted from Qwen or Qwen-Math can be attached to the R1-Distill model at test time, without any retraining, and yield comparable gains. Extensive ablations validate these findings and all artifacts are fully open-sourced.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09967
• PDF: https://arxiv.org/pdf/2506.09967
• Project Page: https://shangshangwang.notion.site/resa
• Github: https://github.com/shangshang-wang/Resa
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Resa: Transparent Reasoning Models via SAEs
How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained...
❤1
🔹 Title:
WorldVLA: Towards Autoregressive Action World Model
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
WorldVLA, an autoregressive action world model integrating vision-language-action (VLA) and world models, enhances performance through mutual understanding and generation, improving action prediction and sequence generation with an attention mask strategy. AI-generated summary We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action generation . Meanwhile, the action model generates the subsequent actions based on image observations, aiding in visual understanding and in turn helps visual generation of the world model . We demonstrate that WorldVLA outperforms standalone action and world model s, highlighting the mutual enhancement between the world model and the action model. In addition, we find that the performance of the action model deteriorates when generating sequences of actions in an autoregressive manner. This phenomenon can be attributed to the model's limited generalization capability for action prediction , leading to the propagation of errors from earlier actions to subsequent ones. To address this issue, we propose an attention mask strategy that selectively masks prior actions during the generation of the current action, which shows significant performance improvement in the action chunk generation task.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
WorldVLA: Towards Autoregressive Action World Model
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
WorldVLA, an autoregressive action world model integrating vision-language-action (VLA) and world models, enhances performance through mutual understanding and generation, improving action prediction and sequence generation with an attention mask strategy. AI-generated summary We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action generation . Meanwhile, the action model generates the subsequent actions based on image observations, aiding in visual understanding and in turn helps visual generation of the world model . We demonstrate that WorldVLA outperforms standalone action and world model s, highlighting the mutual enhancement between the world model and the action model. In addition, we find that the performance of the action model deteriorates when generating sequences of actions in an autoregressive manner. This phenomenon can be attributed to the model's limited generalization capability for action prediction , leading to the propagation of errors from earlier actions to subsequent ones. To address this issue, we propose an attention mask strategy that selectively masks prior actions during the generation of the current action, which shows significant performance improvement in the action chunk generation task.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
🔹 Publication Date: Published on Jun 18
🔹 Abstract:
HeurAgenix, a two-stage hyper-heuristic framework using large language models, evolves and selects heuristics dynamically for combinatorial optimization problems, achieving performance on par with specialized solvers. AI-generated summary Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce HeurAgenix, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heuristic evolution phase, HeurAgenix leverages an LLM to compare seed heuristic solutions with higher-quality solutions and extract reusable evolution strategies. During problem solving, it dynamically picks the most promising heuristic for each problem state, guided by the LLM's perception ability. For flexibility, this selector can be either a state-of-the-art LLM or a fine-tuned lightweight model with lower inference cost. To mitigate the scarcity of reliable supervision caused by CO complexity, we fine-tune the lightweight heuristic selector with a dual-reward mechanism that jointly exploits singals from selection preferences and state perception , enabling robust selection under noisy annotations. Extensive experiments on canonical benchmarks show that HeurAgenix not only outperforms existing LLM-based hyper-heuristics but also matches or exceeds specialized solvers. Code is available at https://github.com/microsoft/HeurAgenix.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.15196
• PDF: https://arxiv.org/pdf/2506.15196
• Project Page: https://github.com/microsoft/HeurAgenix
• Github: https://github.com/microsoft/HeurAgenix
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
🔹 Publication Date: Published on Jun 18
🔹 Abstract:
HeurAgenix, a two-stage hyper-heuristic framework using large language models, evolves and selects heuristics dynamically for combinatorial optimization problems, achieving performance on par with specialized solvers. AI-generated summary Heuristic algorithms play a vital role in solving combinatorial optimization (CO) problems, yet traditional designs depend heavily on manual expertise and struggle to generalize across diverse instances. We introduce HeurAgenix, a two-stage hyper-heuristic framework powered by large language models (LLMs) that first evolves heuristics and then selects among them automatically. In the heuristic evolution phase, HeurAgenix leverages an LLM to compare seed heuristic solutions with higher-quality solutions and extract reusable evolution strategies. During problem solving, it dynamically picks the most promising heuristic for each problem state, guided by the LLM's perception ability. For flexibility, this selector can be either a state-of-the-art LLM or a fine-tuned lightweight model with lower inference cost. To mitigate the scarcity of reliable supervision caused by CO complexity, we fine-tune the lightweight heuristic selector with a dual-reward mechanism that jointly exploits singals from selection preferences and state perception , enabling robust selection under noisy annotations. Extensive experiments on canonical benchmarks show that HeurAgenix not only outperforms existing LLM-based hyper-heuristics but also matches or exceeds specialized solvers. Code is available at https://github.com/microsoft/HeurAgenix.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.15196
• PDF: https://arxiv.org/pdf/2506.15196
• Project Page: https://github.com/microsoft/HeurAgenix
• Github: https://github.com/microsoft/HeurAgenix
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
🔹 Publication Date: Published on Jun 24
🔹 Abstract:
An automated data-curation pipeline for software engineering improves large language model performance on SWE tasks, achieving state-of-the-art results with and without test-time scaling techniques. AI-generated summary Software engineering (SWE) has recently emerged as a crucial testbed for next-generation LLM agents , demanding inherent capabilities in two critical dimensions: sustained iterative problem-solving (e.g., >50 interaction rounds) and long-context dependency resolution (e.g., >32k tokens). However, the data curation process in SWE remains notoriously time-consuming, as it heavily relies on manual annotation for code file filtering and the setup of dedicated runtime environments to execute and validate unit tests . Consequently, most existing datasets are limited to only a few thousand GitHub-sourced instances. To this end, we propose an incremental, automated data-curation pipeline that systematically scales both the volume and diversity of SWE datasets. Our dataset comprises 10,169 real-world Python task instances from 2,531 distinct GitHub repositories, each accompanied by a task specified in natural language and a dedicated runtime-environment image for automated unit-test validation. We have carefully curated over 8,000 successfully runtime-validated training trajectories from our proposed SWE dataset. When fine-tuning the Skywork-SWE model on these trajectories, we uncover a striking data scaling phenomenon: the trained model's performance for software engineering capabilities in LLMs continues to improve as the data size increases, showing no signs of saturation. Notably, our Skywork-SWE model achieves 38.0% pass@1 accuracy on the SWE-bench Verified benchmark without using verifiers or multiple rollouts, establishing a new state-of-the-art (SOTA) among the Qwen2.5-Coder-32B-based LLMs built on the OpenHands agent framework . Furthermore, with the incorporation of test-time scaling techniques , the performance further improves to 47.0% accuracy, surpassing the previous SOTA results for sub-32B parameter models. We release the Skywork-SWE-32B model checkpoint to accelerate future research.
🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.19290
• PDF: https://arxiv.org/pdf/2506.19290
• Project Page: https://quixotic-sting-239.notion.site/eb17f379610040ceb54da5d5d24065bd
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
🔹 Publication Date: Published on Jun 24
🔹 Abstract:
An automated data-curation pipeline for software engineering improves large language model performance on SWE tasks, achieving state-of-the-art results with and without test-time scaling techniques. AI-generated summary Software engineering (SWE) has recently emerged as a crucial testbed for next-generation LLM agents , demanding inherent capabilities in two critical dimensions: sustained iterative problem-solving (e.g., >50 interaction rounds) and long-context dependency resolution (e.g., >32k tokens). However, the data curation process in SWE remains notoriously time-consuming, as it heavily relies on manual annotation for code file filtering and the setup of dedicated runtime environments to execute and validate unit tests . Consequently, most existing datasets are limited to only a few thousand GitHub-sourced instances. To this end, we propose an incremental, automated data-curation pipeline that systematically scales both the volume and diversity of SWE datasets. Our dataset comprises 10,169 real-world Python task instances from 2,531 distinct GitHub repositories, each accompanied by a task specified in natural language and a dedicated runtime-environment image for automated unit-test validation. We have carefully curated over 8,000 successfully runtime-validated training trajectories from our proposed SWE dataset. When fine-tuning the Skywork-SWE model on these trajectories, we uncover a striking data scaling phenomenon: the trained model's performance for software engineering capabilities in LLMs continues to improve as the data size increases, showing no signs of saturation. Notably, our Skywork-SWE model achieves 38.0% pass@1 accuracy on the SWE-bench Verified benchmark without using verifiers or multiple rollouts, establishing a new state-of-the-art (SOTA) among the Qwen2.5-Coder-32B-based LLMs built on the OpenHands agent framework . Furthermore, with the incorporation of test-time scaling techniques , the performance further improves to 47.0% accuracy, surpassing the previous SOTA results for sub-32B parameter models. We release the Skywork-SWE-32B model checkpoint to accelerate future research.
🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.19290
• PDF: https://arxiv.org/pdf/2506.19290
• Project Page: https://quixotic-sting-239.notion.site/eb17f379610040ceb54da5d5d24065bd
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
MADrive: Memory-Augmented Driving Scene Modeling
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
MADrive enhances scene reconstruction for autonomous driving by integrating visually similar 3D car assets from an external memory bank to achieve photorealistic synthesis of altered scenarios. AI-generated summary Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting . However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive , a memory-augmented reconstruction framework designed to extend the capabilities of existing scene reconstruction methods by replacing observed vehicles with visually similar 3D assets retrieved from a large-scale external memory bank. Specifically, we release MAD-Cars , a curated dataset of {sim}70K 360{\deg} car videos captured in the wild and present a retrieval module that finds the most similar car instances in the memory bank, reconstructs the corresponding 3D assets from video, and integrates them into the target scene through orientation alignment and relighting . The resulting replacements provide complete multi-view representations of vehicles in the scene, enabling photorealistic synthesis of substantially altered configurations, as demonstrated in our experiments. Project page: https://yandex-research.github.io/ madrive /
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21520
• PDF: https://arxiv.org/pdf/2506.21520
• Project Page: https://yandex-research.github.io/madrive/
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/yandex/mad-cars
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MADrive: Memory-Augmented Driving Scene Modeling
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
MADrive enhances scene reconstruction for autonomous driving by integrating visually similar 3D car assets from an external memory bank to achieve photorealistic synthesis of altered scenarios. AI-generated summary Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting . However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive , a memory-augmented reconstruction framework designed to extend the capabilities of existing scene reconstruction methods by replacing observed vehicles with visually similar 3D assets retrieved from a large-scale external memory bank. Specifically, we release MAD-Cars , a curated dataset of {sim}70K 360{\deg} car videos captured in the wild and present a retrieval module that finds the most similar car instances in the memory bank, reconstructs the corresponding 3D assets from video, and integrates them into the target scene through orientation alignment and relighting . The resulting replacements provide complete multi-view representations of vehicles in the scene, enabling photorealistic synthesis of substantially altered configurations, as demonstrated in our experiments. Project page: https://yandex-research.github.io/ madrive /
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21520
• PDF: https://arxiv.org/pdf/2506.21520
• Project Page: https://yandex-research.github.io/madrive/
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/yandex/mad-cars
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
MADrive: Memory-Augmented Driving Scene Modeling
Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting. However, the resulting reconstructions...
❤3
🔹 Title:
Generative Blocks World: Moving Things Around in Pictures
🔹 Publication Date: Published on Jun 25
🔹 Abstract:
A generative method that edits 3D scenes using convex primitives and regenerates images with enhanced texture consistency and visual fidelity. AI-generated summary We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives , and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is generated by a flow-based method which is conditioned on depth and a texture hint . Our texture hint takes into account the modified 3D primitives, exceeding texture-consistency provided by existing key-value caching techniques. These texture hint s (a) allow accurate object and camera moves and (b) largely preserve the identity of objects depicted. Quantitative and qualitative experiments demonstrate that our approach outperforms prior works in visual fidelity , editability , and compositional generalization .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20703
• PDF: https://arxiv.org/pdf/2506.20703
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Generative Blocks World: Moving Things Around in Pictures
🔹 Publication Date: Published on Jun 25
🔹 Abstract:
A generative method that edits 3D scenes using convex primitives and regenerates images with enhanced texture consistency and visual fidelity. AI-generated summary We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives , and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is generated by a flow-based method which is conditioned on depth and a texture hint . Our texture hint takes into account the modified 3D primitives, exceeding texture-consistency provided by existing key-value caching techniques. These texture hint s (a) allow accurate object and camera moves and (b) largely preserve the identity of objects depicted. Quantitative and qualitative experiments demonstrate that our approach outperforms prior works in visual fidelity , editability , and compositional generalization .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20703
• PDF: https://arxiv.org/pdf/2506.20703
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
SAM4D: Segment Anything in Camera and LiDAR Streams
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
SAM4D is a multi-modal and temporal foundation model for segmentation in autonomous driving using Unified Multi-modal Positional Encoding and Motion-aware Cross-modal Memory Attention, with a multi-modal automated data engine generating pseudo-labels. AI-generated summary We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to align camera and LiDAR features in a shared 3D space , enabling seamless cross-modal prompting and interaction. Additionally, we propose Motion-aware Cross-modal Memory Attention (MCMA), which leverages ego-motion compensation to enhance temporal consistency and long-horizon feature retrieval, ensuring robust segmentation across dynamically changing autonomous driving scenes. To avoid annotation bottlenecks, we develop a multi-modal automated data engine that synergizes VFM-driven video masklets, spatiotemporal 4D reconstruction , and cross-modal masklet fusion . This framework generates camera - LiDAR aligned pseudo-labels at a speed orders of magnitude faster than human annotation while preserving VFM-derived semantic fidelity in point cloud representations. We conduct extensive experiments on the constructed Waymo-4DSeg , which demonstrate the powerful cross-modal segmentation ability and great potential in data annotation of proposed SAM4D.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21547
• PDF: https://arxiv.org/pdf/2506.21547
• Project Page: https://SAM4D-Project.github.io
• Github: https://github.com/CN-ADLab/SAM4D
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
SAM4D: Segment Anything in Camera and LiDAR Streams
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
SAM4D is a multi-modal and temporal foundation model for segmentation in autonomous driving using Unified Multi-modal Positional Encoding and Motion-aware Cross-modal Memory Attention, with a multi-modal automated data engine generating pseudo-labels. AI-generated summary We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to align camera and LiDAR features in a shared 3D space , enabling seamless cross-modal prompting and interaction. Additionally, we propose Motion-aware Cross-modal Memory Attention (MCMA), which leverages ego-motion compensation to enhance temporal consistency and long-horizon feature retrieval, ensuring robust segmentation across dynamically changing autonomous driving scenes. To avoid annotation bottlenecks, we develop a multi-modal automated data engine that synergizes VFM-driven video masklets, spatiotemporal 4D reconstruction , and cross-modal masklet fusion . This framework generates camera - LiDAR aligned pseudo-labels at a speed orders of magnitude faster than human annotation while preserving VFM-derived semantic fidelity in point cloud representations. We conduct extensive experiments on the constructed Waymo-4DSeg , which demonstrate the powerful cross-modal segmentation ability and great potential in data annotation of proposed SAM4D.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21547
• PDF: https://arxiv.org/pdf/2506.21547
• Project Page: https://SAM4D-Project.github.io
• Github: https://github.com/CN-ADLab/SAM4D
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
SAM4D: Segment Anything in Camera and LiDAR Streams
We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to...
❤2
🔹 Title:
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
🔹 Publication Date: Published on Jun 13
🔹 Abstract:
A diffusion-based framework generates aligned novel views of images and geometry using warping-and-inpainting with cross-modal attention distillation and proximity-based mesh conditioning, achieving high-fidelity synthesis and 3D completion. AI-generated summary We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation , where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction . We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion . Project page is available at https://cvlab-kaist.github.io/MoAI.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.11924
• PDF: https://arxiv.org/pdf/2506.11924
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
🔹 Publication Date: Published on Jun 13
🔹 Abstract:
A diffusion-based framework generates aligned novel views of images and geometry using warping-and-inpainting with cross-modal attention distillation and proximity-based mesh conditioning, achieving high-fidelity synthesis and 3D completion. AI-generated summary We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation , where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction . We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion . Project page is available at https://cvlab-kaist.github.io/MoAI.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.11924
• PDF: https://arxiv.org/pdf/2506.11924
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
🔹 Title:
FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
FairyGen generates story-driven cartoon videos from a single drawing by disentangling character modeling and background styling, employing MLLM for storyboards, style propagation for consistency, and MMDiT-based diffusion models for motion. AI-generated summary We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing, while faithfully preserving its unique artistic style. Unlike previous storytelling methods that primarily focus on character consistency and basic motion, FairyGen explicitly disentangles character modeling from stylized background generation and incorporates cinematic shot design to support expressive and coherent storytelling. Given a single character sketch, we first employ an MLLM to generate a structured storyboard with shot-level descriptions that specify environment settings, character actions, and camera perspectives. To ensure visual consistency, we introduce a style propagation adapter that captures the character's visual style and applies it to the background, faithfully retaining the character's full visual identity while synthesizing style-consistent scenes. A shot design module further enhances visual diversity and cinematic quality through frame cropping and multi-view synthesis based on the storyboard . To animate the story, we reconstruct a 3D proxy of the character to derive physically plausible motion sequences, which are then used to fine-tune an MMDiT-based image-to-video diffusion model. We further propose a two-stage motion customization adapter: the first stage learns appearance features from temporally unordered frames, disentangling identity from motion; the second stage models temporal dynamics using a timestep-shift strategy with frozen identity weights. Once trained, FairyGen directly renders diverse and coherent video scenes aligned with the storyboard . Extensive experiments demonstrate that our system produces animations that are stylistically faithful, narratively structured natural motion, highlighting its potential for personalized and engaging story animation. The code will be available at https://github.com/GVCLab/FairyGen
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21272
• PDF: https://arxiv.org/pdf/2506.21272
• Project Page: https://jayleejia.github.io/FairyGen/
• Github: https://github.com/GVCLab/FairyGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
🔹 Publication Date: Published on Jun 26
🔹 Abstract:
FairyGen generates story-driven cartoon videos from a single drawing by disentangling character modeling and background styling, employing MLLM for storyboards, style propagation for consistency, and MMDiT-based diffusion models for motion. AI-generated summary We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing, while faithfully preserving its unique artistic style. Unlike previous storytelling methods that primarily focus on character consistency and basic motion, FairyGen explicitly disentangles character modeling from stylized background generation and incorporates cinematic shot design to support expressive and coherent storytelling. Given a single character sketch, we first employ an MLLM to generate a structured storyboard with shot-level descriptions that specify environment settings, character actions, and camera perspectives. To ensure visual consistency, we introduce a style propagation adapter that captures the character's visual style and applies it to the background, faithfully retaining the character's full visual identity while synthesizing style-consistent scenes. A shot design module further enhances visual diversity and cinematic quality through frame cropping and multi-view synthesis based on the storyboard . To animate the story, we reconstruct a 3D proxy of the character to derive physically plausible motion sequences, which are then used to fine-tune an MMDiT-based image-to-video diffusion model. We further propose a two-stage motion customization adapter: the first stage learns appearance features from temporally unordered frames, disentangling identity from motion; the second stage models temporal dynamics using a timestep-shift strategy with frozen identity weights. Once trained, FairyGen directly renders diverse and coherent video scenes aligned with the storyboard . Extensive experiments demonstrate that our system produces animations that are stylistically faithful, narratively structured natural motion, highlighting its potential for personalized and engaging story animation. The code will be available at https://github.com/GVCLab/FairyGen
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.21272
• PDF: https://arxiv.org/pdf/2506.21272
• Project Page: https://jayleejia.github.io/FairyGen/
• Github: https://github.com/GVCLab/FairyGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
Seedance 1.0: Exploring the Boundaries of Video Generation Models
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
Seedance 1.0 offers high-performance video generation by integrating advanced data curation, efficient architecture, post-training optimization, and model acceleration, resulting in superior quality and speed. AI-generated summary Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning , enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm , which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning , and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability , precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09113
• PDF: https://arxiv.org/pdf/2506.09113
• Project Page: https://seed.bytedance.com/seedance
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Seedance 1.0: Exploring the Boundaries of Video Generation Models
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
Seedance 1.0 offers high-performance video generation by integrating advanced data curation, efficient architecture, post-training optimization, and model acceleration, resulting in superior quality and speed. AI-generated summary Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning , enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm , which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning , and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability , precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09113
• PDF: https://arxiv.org/pdf/2506.09113
• Project Page: https://seed.bytedance.com/seedance
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt...
❤2
Article Title:
Large Language Models for Cyber Security: A Systematic Literature Review
Article Date: 8 May 2024
Article Description:
The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2405.04760v4.pdf
GitHub:
• https://github.com/hiyouga/llama-efficient-tuning
Datasets:
• 10,000 People - Human Pose Recognition Data
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Large Language Models for Cyber Security: A Systematic Literature Review
Article Date: 8 May 2024
Article Description:
The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2405.04760v4.pdf
GitHub:
• https://github.com/hiyouga/llama-efficient-tuning
Datasets:
• 10,000 People - Human Pose Recognition Data
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
🔹 Publication Date: Published on Jun 12
🔹 Abstract:
PosterCraft improves aesthetic poster generation through a unified, modular pipeline with enhanced text rendering, region-aware fine-tuning, aesthetic reinforcement learning, and joint vision-language refinement. AI-generated summary Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking layouts, and overall stylistic harmony. To address this, we propose PosterCraft, a unified framework that abandons prior modular pipelines and rigid, predefined layouts, allowing the model to freely explore coherent, visually compelling compositions. PosterCraft employs a carefully designed, cascaded workflow to optimize the generation of high-aesthetic posters: (i) large-scale text-rendering optimization on our newly introduced Text-Render-2M dataset; (ii) region-aware supervised fine-tuning on HQ-Poster100K ; (iii) aesthetic-text-reinforcement learning via best-of-n preference optimization ; and (iv) joint vision-language feedback refinement. Each stage is supported by a fully automated data-construction pipeline tailored to its specific needs, enabling robust training without complex architectural modifications. Evaluated on multiple experiments, PosterCraft significantly outperforms open-source baselines in rendering accuracy, layout coherence, and overall visual appeal-approaching the quality of SOTA commercial systems. Our code, models, and datasets can be found in the Project page: https://ephemeral182.github.io/PosterCraft
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.10741
• PDF: https://arxiv.org/pdf/2506.10741
• Project Page: https://ephemeral182.github.io/PosterCraft/
• Github: https://ephemeral182.github.io/PosterCraft
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
🔹 Publication Date: Published on Jun 12
🔹 Abstract:
PosterCraft improves aesthetic poster generation through a unified, modular pipeline with enhanced text rendering, region-aware fine-tuning, aesthetic reinforcement learning, and joint vision-language refinement. AI-generated summary Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking layouts, and overall stylistic harmony. To address this, we propose PosterCraft, a unified framework that abandons prior modular pipelines and rigid, predefined layouts, allowing the model to freely explore coherent, visually compelling compositions. PosterCraft employs a carefully designed, cascaded workflow to optimize the generation of high-aesthetic posters: (i) large-scale text-rendering optimization on our newly introduced Text-Render-2M dataset; (ii) region-aware supervised fine-tuning on HQ-Poster100K ; (iii) aesthetic-text-reinforcement learning via best-of-n preference optimization ; and (iv) joint vision-language feedback refinement. Each stage is supported by a fully automated data-construction pipeline tailored to its specific needs, enabling robust training without complex architectural modifications. Evaluated on multiple experiments, PosterCraft significantly outperforms open-source baselines in rendering accuracy, layout coherence, and overall visual appeal-approaching the quality of SOTA commercial systems. Our code, models, and datasets can be found in the Project page: https://ephemeral182.github.io/PosterCraft
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.10741
• PDF: https://arxiv.org/pdf/2506.10741
• Project Page: https://ephemeral182.github.io/PosterCraft/
• Github: https://ephemeral182.github.io/PosterCraft
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation...
Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking...
❤1
🔹 Title:
Scaling Reasoning without Attention
🔹 Publication Date: Published on May 28
🔹 Abstract:
A new attention-free language model using state space dual layers and a two-phase curriculum fine-tuning strategy outperforms Transformer models on complex reasoning tasks. AI-generated summary Large language models ( LLMs ) have made significant advances in complex reasoning tasks, yet they remain bottlenecked by two core challenges: architectural inefficiency due to reliance on Transformers , and a lack of structured fine-tuning for high-difficulty domains. We introduce \ourmodel, an attention-free language model that addresses both issues through architectural and data-centric innovations. Built on the state space dual (SSD) layers of Mamba-2 , our model eliminates the need for self-attention and key-value caching, enabling fixed-memory, constant-time inference . To train it for complex reasoning, we propose a two-phase curriculum fine-tuning strategy based on the PromptCoT synthesis paradigm , which generates pedagogically structured problems via abstract concept selection and rationale-guided generation. On benchmark evaluations, \ourmodel-7B outperforms strong Transformer and hybrid models of comparable scale, and even surpasses the much larger Gemma3-27B by 2.6\% on AIME 24 , 0.6\% on AIME 25 , and 3.0\% on Livecodebench . These results highlight the potential of state space models as efficient and scalable alternatives to attention-based architectures for high-capacity reasoning .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2505.22425
• PDF: https://arxiv.org/pdf/2505.22425
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Scaling Reasoning without Attention
🔹 Publication Date: Published on May 28
🔹 Abstract:
A new attention-free language model using state space dual layers and a two-phase curriculum fine-tuning strategy outperforms Transformer models on complex reasoning tasks. AI-generated summary Large language models ( LLMs ) have made significant advances in complex reasoning tasks, yet they remain bottlenecked by two core challenges: architectural inefficiency due to reliance on Transformers , and a lack of structured fine-tuning for high-difficulty domains. We introduce \ourmodel, an attention-free language model that addresses both issues through architectural and data-centric innovations. Built on the state space dual (SSD) layers of Mamba-2 , our model eliminates the need for self-attention and key-value caching, enabling fixed-memory, constant-time inference . To train it for complex reasoning, we propose a two-phase curriculum fine-tuning strategy based on the PromptCoT synthesis paradigm , which generates pedagogically structured problems via abstract concept selection and rationale-guided generation. On benchmark evaluations, \ourmodel-7B outperforms strong Transformer and hybrid models of comparable scale, and even surpasses the much larger Gemma3-27B by 2.6\% on AIME 24 , 0.6\% on AIME 25 , and 3.0\% on Livecodebench . These results highlight the potential of state space models as efficient and scalable alternatives to attention-based architectures for high-capacity reasoning .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2505.22425
• PDF: https://arxiv.org/pdf/2505.22425
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Scaling Reasoning without Attention
Large language models (LLMs) have made significant advances in complex reasoning tasks, yet they remain bottlenecked by two core challenges: architectural inefficiency due to reliance on...
❤2
Article Title:
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
Article Date: 17 May 2025
Article Description:
Large Language Model (LLM)-based search agents have shown remarkable capabilities in solving complex tasks by dynamically decomposing problems and addressing them through interleaved reasoning and retrieval. However, this interleaved paradigm introduces substantial efficiency bottlenecks. First, we observe that both highly accurate and overly approximate retrieval methods degrade system efficiency: exact search incurs significant retrieval overhead, while coarse retrieval requires additional reasoning steps during generation. Second, we identify inefficiencies in system design, including improper scheduling and frequent retrieval stalls, which lead to cascading latency -- where even minor delays in retrieval amplify end-to-end inference time. To address these challenges, we introduce SearchAgent-X, a high-efficiency inference framework for LLM-based search agents. SearchAgent-X leverages high-recall approximate retrieval and incorporates two key techniques: priority-aware scheduling and non-stall retrieval. Extensive experiments demonstrate that SearchAgent-X consistently outperforms state-of-the-art systems such as vLLM and HNSW-based retrieval across diverse tasks, achieving up to 3.4$\times$ higher throughput and 5$\times$ lower latency, without compromising generation quality. SearchAgent-X is available at https://github.com/tiannuo-yang/SearchAgent-X.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.12065v1.pdf
GitHub:
• https://github.com/tiannuo-yang/searchagent-x
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
Article Date: 17 May 2025
Article Description:
Large Language Model (LLM)-based search agents have shown remarkable capabilities in solving complex tasks by dynamically decomposing problems and addressing them through interleaved reasoning and retrieval. However, this interleaved paradigm introduces substantial efficiency bottlenecks. First, we observe that both highly accurate and overly approximate retrieval methods degrade system efficiency: exact search incurs significant retrieval overhead, while coarse retrieval requires additional reasoning steps during generation. Second, we identify inefficiencies in system design, including improper scheduling and frequent retrieval stalls, which lead to cascading latency -- where even minor delays in retrieval amplify end-to-end inference time. To address these challenges, we introduce SearchAgent-X, a high-efficiency inference framework for LLM-based search agents. SearchAgent-X leverages high-recall approximate retrieval and incorporates two key techniques: priority-aware scheduling and non-stall retrieval. Extensive experiments demonstrate that SearchAgent-X consistently outperforms state-of-the-art systems such as vLLM and HNSW-based retrieval across diverse tasks, achieving up to 3.4$\times$ higher throughput and 5$\times$ lower latency, without compromising generation quality. SearchAgent-X is available at https://github.com/tiannuo-yang/SearchAgent-X.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.12065v1.pdf
GitHub:
• https://github.com/tiannuo-yang/searchagent-x
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
🔹 Title:
Sekai: A Video Dataset towards World Exploration
🔹 Publication Date: Published on Jun 18
🔹 Abstract:
Sekai, a worldwide video dataset with comprehensive annotations, is introduced to support world exploration applications, enhancing video generation models. AI-generated summary Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning ``world'' in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view ( FPV and UVA ) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories . Experiments demonstrate the quality of the dataset. And, we use a subset to train an interactive video world exploration model , named YUME (meaning ``dream'' in Japanese). We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.15675
• PDF: https://arxiv.org/pdf/2506.15675
• Project Page: https://huggingface.co/datasets/Lixsp11/Sekai-Project
• Github: https://github.com/Lixsp11/sekai-codebase
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Lixsp11/Sekai-Project
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Sekai: A Video Dataset towards World Exploration
🔹 Publication Date: Published on Jun 18
🔹 Abstract:
Sekai, a worldwide video dataset with comprehensive annotations, is introduced to support world exploration applications, enhancing video generation models. AI-generated summary Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning ``world'' in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view ( FPV and UVA ) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories . Experiments demonstrate the quality of the dataset. And, we use a subset to train an interactive video world exploration model , named YUME (meaning ``dream'' in Japanese). We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.15675
• PDF: https://arxiv.org/pdf/2506.15675
• Project Page: https://huggingface.co/datasets/Lixsp11/Sekai-Project
• Github: https://github.com/Lixsp11/sekai-codebase
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Lixsp11/Sekai-Project
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Sekai: A Video Dataset towards World Exploration
Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for...
❤1
Article Title:
MUSt3R: Multi-view Network for Stereo 3D Reconstruction
Article Date: Yohann Cabon
Article Description:
DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses. Under the hood, however, DUSt3R processes image pairs, regressing local 3D reconstructions that need to be aligned in a global coordinate system. The number of pairs, growing quadratically, is an inherent limitation that becomes especially concerning for robust and fast optimization in the case of large image collections. In this paper, we propose an extension of DUSt3R from pairs to multiple views, that addresses all aforementioned concerns. Indeed, we propose a Multi-view Network for Stereo 3D Reconstruction, or MUSt3R, that modifies the DUSt3R architecture by making it symmetric and extending it to directly predict 3D structure for all views in a common coordinate frame. Second, we entail the model with a multi-layer memory mechanism which allows to reduce the computational complexity and to scale the reconstruction to large collections, inferring thousands of 3D pointmaps at high frame-rates with limited added complexity. The framework is designed to perform 3D reconstruction both offline and online, and hence can be seamlessly applied to SfM and visual SLAM scenarios showing state-of-the-art performance on various 3D downstream tasks, including uncalibrated Visual Odometry, relative camera pose, scale and focal estimation, 3D reconstruction and multi-view depth estimation.PDFAbstractCVPR 2025 PDFCVPR 2025 Abstract
PDF Download Link:
https://arxiv.org/pdf/2503.01661v1.pdf
GitHub:
• https://github.com/naver/must3r
Datasets:
• KITTI
• ScanNet
• TUM RGB-D
• MegaDepth
• ETH3D
• BlendedMVS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MUSt3R: Multi-view Network for Stereo 3D Reconstruction
Article Date: Yohann Cabon
Article Description:
DUSt3R introduced a novel paradigm in geometric computer vision by proposing a model that can provide dense and unconstrained Stereo 3D Reconstruction of arbitrary image collections with no prior information about camera calibration nor viewpoint poses. Under the hood, however, DUSt3R processes image pairs, regressing local 3D reconstructions that need to be aligned in a global coordinate system. The number of pairs, growing quadratically, is an inherent limitation that becomes especially concerning for robust and fast optimization in the case of large image collections. In this paper, we propose an extension of DUSt3R from pairs to multiple views, that addresses all aforementioned concerns. Indeed, we propose a Multi-view Network for Stereo 3D Reconstruction, or MUSt3R, that modifies the DUSt3R architecture by making it symmetric and extending it to directly predict 3D structure for all views in a common coordinate frame. Second, we entail the model with a multi-layer memory mechanism which allows to reduce the computational complexity and to scale the reconstruction to large collections, inferring thousands of 3D pointmaps at high frame-rates with limited added complexity. The framework is designed to perform 3D reconstruction both offline and online, and hence can be seamlessly applied to SfM and visual SLAM scenarios showing state-of-the-art performance on various 3D downstream tasks, including uncalibrated Visual Odometry, relative camera pose, scale and focal estimation, 3D reconstruction and multi-view depth estimation.PDFAbstractCVPR 2025 PDFCVPR 2025 Abstract
PDF Download Link:
https://arxiv.org/pdf/2503.01661v1.pdf
GitHub:
• https://github.com/naver/must3r
Datasets:
• KITTI
• ScanNet
• TUM RGB-D
• MegaDepth
• ETH3D
• BlendedMVS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
Article Title:
Spiking Graph Convolutional Networks
Article Date: 5 May 2022
Article Description:
Graph Convolutional Networks (GCNs) achieve an impressive performance due to the remarkable representation ability in learning the graph information. However, GCNs, when implemented on a deep network, require expensive computation power, making them difficult to be deployed on battery-powered devices. In contrast, Spiking Neural Networks (SNNs), which perform a bio-fidelity inference process, offer an energy-efficient neural architecture. In this work, we propose SpikingGCN, an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs. The original graph data are encoded into spike trains based on the incorporation of graph convolution. We further model biological information processing by utilizing a fully connected layer combined with neuron nodes. In a wide range of scenarios (e.g. citation networks, image graph classification, and recommender systems), our experimental results show that the proposed method could gain competitive performance against state-of-the-art approaches. Furthermore, we show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis, which demonstrates its great potential to construct environment-friendly machine learning models.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2205.02767v2.pdf
GitHub:
• https://github.com/zulunzhu/spikinggcn
Datasets:
• MNIST
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Spiking Graph Convolutional Networks
Article Date: 5 May 2022
Article Description:
Graph Convolutional Networks (GCNs) achieve an impressive performance due to the remarkable representation ability in learning the graph information. However, GCNs, when implemented on a deep network, require expensive computation power, making them difficult to be deployed on battery-powered devices. In contrast, Spiking Neural Networks (SNNs), which perform a bio-fidelity inference process, offer an energy-efficient neural architecture. In this work, we propose SpikingGCN, an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs. The original graph data are encoded into spike trains based on the incorporation of graph convolution. We further model biological information processing by utilizing a fully connected layer combined with neuron nodes. In a wide range of scenarios (e.g. citation networks, image graph classification, and recommender systems), our experimental results show that the proposed method could gain competitive performance against state-of-the-art approaches. Furthermore, we show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis, which demonstrates its great potential to construct environment-friendly machine learning models.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2205.02767v2.pdf
GitHub:
• https://github.com/zulunzhu/spikinggcn
Datasets:
• MNIST
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
👍1