Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.21K photos
115 videos
23 files
3.41K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title:
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

🔹 Publication Date: Published on Mar 13

🔹 Abstract:
A study analyzes various Gradient Inversion Attacks (GIA) in Federated Learning (FL), categorizing them and providing insights into their effectiveness and practicality, while suggesting a defense pipeline for privacy protection. AI-generated summary Federated Learning (FL) has emerged as a promising privacy-preserving collaborative model training paradigm without sharing raw data. However, recent studies have revealed that private information can still be leaked through shared gradient information and attacked by Gradient Inversion Attacks (GIA). While many GIA methods have been proposed, a detailed analysis, evaluation, and summary of these methods are still lacking. Although various survey papers summarize existing privacy attacks in FL, few studies have conducted extensive experiments to unveil the effectiveness of GIA and their associated limiting factors in this context. To fill this gap, we first undertake a systematic review of GIA and categorize existing methods into three types, i.e., optimization-based GIA (OP-GIA), generation-based GIA (GEN-GIA), and analytics-based GIA (ANA-GIA). Then, we comprehensively analyze and evaluate the three types of GIA in FL, providing insights into the factors that influence their performance, practicality, and potential threats. Our findings indicate that OP-GIA is the most practical attack setting despite its unsatisfactory performance, while GEN-GIA has many dependencies and ANA-GIA is easily detectable, making them both impractical. Finally, we offer a three-stage defense pipeline to users when designing FL frameworks and protocols for better privacy protection and share some future research directions from the perspectives of attackers and defenders that we believe should be pursued. We hope that our study can help researchers design more robust FL frameworks to defend against these attacks.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.11514
• PDF: https://arxiv.org/pdf/2503.11514
• Project Page: https://pengxin-guo.github.io/FLPrivacy/
• Github: https://pengxin-guo.github.io/FLPrivacy/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1🔥1
🔹 Title:
π^3: Scalable Permutation-Equivariant Visual Geometry Learning

🔹 Publication Date: Published on Jul 17

🔹 Abstract:
A permutation-equivariant neural network, $\pi^3$, reconstructs visual geometry without a fixed reference view, achieving state-of-the-art performance in camera pose estimation, depth estimation, and point map reconstruction. AI-generated summary We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation , monocular/ video depth estimation , and dense point map reconstruction. Code and models are publicly available.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.13347
• PDF: https://arxiv.org/pdf/2507.13347
• Project Page: https://yyfz.github.io/pi3/
• Github: https://yyfz.github.io/pi3/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/yyfz233/Pi3
==================================

For more data science resources:

https://t.iss.one/DataScienceT
🔹 Title: Persona Vectors: Monitoring and Controlling Character Traits in Language Models

🔹 Publication Date: Published on Jul 29

🔹 Abstract: Persona vectors in large language models can monitor and control personality changes during training and deployment, enabling the identification and mitigation of undesirable traits. AI-generated summary Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space - persona vectors -underlying several traits, such as evil , sycophancy , and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant's personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors . These shifts can be mitigated through post-hoc intervention , or avoided in the first place with a new preventative steering method . Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.21509

• PDF: https://arxiv.org/pdf/2507.21509

• Github: https://github.com/safety-research/persona_vectors/tree/main

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🎉3
🔹 Title: Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

🔹 Publication Date: Published on Aug 1

🔹 Abstract: DAEDAL, a novel training-free denoising strategy, enables dynamic length adaptation in Diffusion Large Language Models, improving performance and computational efficiency. AI-generated summary Diffusion Large Language Models ( DLLMs ) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models , offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models . DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric . 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion , ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio . By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs , bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00819

• PDF: https://arxiv.org/pdf/2508.00819

• Github: https://github.com/Li-Jinsong/DAEDAL

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title:
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

🔹 Publication Date: Published on Jul 14

🔹 Abstract:
Artic addresses latency issues in AI Video Chat by optimizing video streaming and frame rate adaptation to enhance MLLM accuracy and reduce bitrate. AI-generated summary AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model ( MLLM ). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty and instability, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we propose Artic, an AI-oriented Real-time Communication framework, exploring the network requirement shift from "humans watching video" to "AI understanding video". To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To avoid packet retransmission, we propose Loss-Resilient Adaptive Frame Rate that leverages previous frames to substitute for lost/delayed frames while avoiding bitrate waste. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark ( DeViBench ). Finally, we discuss some open questions and ongoing solutions for AI Video Chat.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.10510
• PDF: https://arxiv.org/pdf/2507.10510

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2🎉1
🔹 Title: 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

🔹 Publication Date: Published on Jul 31

🔹 Abstract: 3D-R1 enhances 3D scene understanding through a high-quality synthetic dataset, reinforcement learning with GRPO, and dynamic view selection, achieving significant improvements in reasoning and generalization. AI-generated summary Large vision-language models ( VLMs ) have made significant strides in 2D visual understanding tasks, sparking interest in extending these capabilities to 3D scene understanding . However, current 3D VLMs often struggle with robust reasoning and generalization due to limitations in high-quality spatial data and the static nature of viewpoint assumptions. To address these challenges, we propose 3D-R1 , a foundation model that enhances the reasoning capabilities of 3D VLMs . Specifically, we first construct a high-quality synthetic dataset with CoT, named Scene-30K , leveraging existing 3D-VL datasets and a data engine based on Gemini 2.5 Pro . It serves as cold-start initialization data for 3D-R1 . Moreover, we leverage RLHF policy such as GRPO in the reinforcement learning training process to enhance reasoning capabilities and introduce three reward functions: a perception reward , a semantic similarity reward and a format reward to maintain detection accuracy and answer semantic precision. Furthermore, we introduce a dynamic view selection strategy that adaptively chooses the most informative perspectives for 3D scene understanding . Extensive experiments demonstrate that 3D-R1 delivers an average improvement of 10% across various 3D scene benchmarks, highlighting its effectiveness in enhancing reasoning and generalization in 3D scene understanding . Code: https://github.com/AIGeeksGroup/ 3D-R1 . Website: https://aigeeksgroup.github.io/ 3D-R1 .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23478

• PDF: https://arxiv.org/pdf/2507.23478

• Project Page: https://aigeeksgroup.github.io/3D-R1

• Github: https://github.com/AIGeeksGroup/3D-R1

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title:
Multi-Label Knowledge Distillation

🔹 Publication Date: Published on Aug 12, 2023

🔹 Abstract:
The proposed method improves multi-label knowledge distillation by decomposing it into binary classification problems and leveraging label-wise embeddings to enhance feature representation distinctiveness. AI-generated summary Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2308.06453
• PDF: https://arxiv.org/pdf/2308.06453

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3👍1
🔹 Title: PixNerd: Pixel Neural Field Diffusion

🔹 Publication Date: Published on Jul 31

🔹 Abstract: Pixel Neural Field Diffusion (PixNerd) achieves high-quality image generation in a single-scale, single-stage process without VAEs or complex pipelines, and extends to text-to-image applications with competitive performance. AI-generated summary The current success of diffusion transformers heavily depends on the compressed latent space shaped by the pre-trained variational autoencoder (VAE). However, this two-stage training paradigm inevitably introduces accumulated errors and decoding artifacts. To address the aforementioned problems, researchers return to pixel space at the cost of complicated cascade pipelines and increased token complexity. In contrast to their efforts, we propose to model the patch-wise decoding with neural field and present a single-scale , single-stage , efficient, end-to-end solution , coined as pixel neural field diffusion~(PixelNerd). Thanks to the efficient neural field representation in PixNerd , we directly achieved 2.15 FID on ImageNet 256times256 and 2.84 FID on ImageNet 512times512 without any complex cascade pipeline or VAE. We also extend our PixNerd framework to text-to-image applications. Our PixNerd -XXL/16 achieved a competitive 0.73 overall score on the GenEval benchmark and 80.9 overall score on the DPG benchmark .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23268

• PDF: https://arxiv.org/pdf/2507.23268

• Project Page: https://huggingface.co/spaces/MCG-NJU/PixNerd

• Github: https://github.com/MCG-NJU/PixNerd

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/MCG-NJU/PixNerd
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: CellForge: Agentic Design of Virtual Cell Models

🔹 Publication Date: Published on Aug 4

🔹 Abstract: CellForge, an agentic system using a multi-agent framework, transforms raw single-cell multi-omics data into optimized computational models for virtual cells, outperforming state-of-the-art methods in single-cell perturbation prediction. AI-generated summary Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantities such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design , where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts , drug treatments , and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02276

• PDF: https://arxiv.org/pdf/2508.02276

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2👍1
🔹 Title: Qwen-Image Technical Report

🔹 Publication Date: Published on Aug 4

🔹 Abstract: Qwen-Image, an image generation model, advances text rendering and image editing through a comprehensive data pipeline, progressive training, and dual-encoding mechanism. AI-generated summary We present Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. To address the challenges of complex text rendering, we design a comprehensive data pipeline that includes large-scale data collection, filtering, annotation, synthesis, and balancing. Moreover, we adopt a progressive training strategy that starts with non-text-to-text rendering, evolves from simple to complex textual inputs, and gradually scales up to paragraph-level descriptions. This curriculum learning approach substantially enhances the model's native text rendering capabilities. As a result, Qwen-Image not only performs exceptionally well in alphabetic languages such as English, but also achieves remarkable progress on more challenging logographic languages like Chinese. To enhance image editing consistency, we introduce an improved multi-task training paradigm that incorporates not only traditional text-to-image (T2I) and text-image-to-image (TI2I) tasks but also image-to-image (I2I) reconstruction, effectively aligning the latent representations between Qwen2.5-VL and MMDiT. Furthermore, we separately feed the original image into Qwen2.5-VL and the VAE encoder to obtain semantic and reconstructive representations, respectively. This dual-encoding mechanism enables the editing module to strike a balance between preserving semantic consistency and maintaining visual fidelity . Qwen-Image achieves state-of-the-art performance, demonstrating its strong capabilities in both image generation and editing across multiple benchmarks.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02324

• PDF: https://arxiv.org/pdf/2508.02324

• Github: https://github.com/QwenLM/Qwen-Image

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
👍2
🔹 Title: ReMoMask: Retrieval-Augmented Masked Motion Generation

🔹 Publication Date: Published on Aug 4

🔹 Abstract: ReMoMask, a unified framework, addresses limitations in text-to-motion generation by integrating a Bidirectional Momentum Text-Motion Model, Semantic Spatio-temporal Attention, and RAG-Classier-Free Guidance, achieving state-of-the-art performance on HumanML3D and KIT-ML benchmarks. AI-generated summary Text-to-Motion (T2M) generation aims to synthesize realistic and semantically aligned human motion sequences from natural language descriptions. However, current approaches face dual challenges: Generative models (e.g., diffusion models) suffer from limited diversity, error accumulation, and physical implausibility, while Retrieval-Augmented Generation (RAG) methods exhibit diffusion inertia, partial-mode collapse, and asynchronous artifacts. To address these limitations, we propose ReMoMask, a unified framework integrating three key innovations: 1) A Bidirectional Momentum Text-Motion Model decouples negative sample scale from batch size via momentum queues , substantially improving cross-modal retrieval precision; 2) A Semantic Spatio-temporal Attention mechanism enforces biomechanical constraints during part-level fusion to eliminate asynchronous artifacts; 3) RAG-Classier-Free Guidance incorporates minor unconditional generation to enhance generalization. Built upon MoMask's RVQ-VAE , ReMoMask efficiently generates temporally coherent motions in minimal steps. Extensive experiments on standard benchmarks demonstrate the state-of-the-art performance of ReMoMask, achieving a 3.88% and 10.97% improvement in FID scores on HumanML3D and KIT-ML, respectively, compared to the previous SOTA method RAG-T2M. Code: https://github.com/AIGeeksGroup/ReMoMask. Website: https://aigeeksgroup.github.io/ReMoMask.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02605

• PDF: https://arxiv.org/pdf/2508.02605

• Project Page: https://aigeeksgroup.github.io/ReMoMask/

• Github: https://github.com/AIGeeksGroup/ReMoMask

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2👍1🎉1
🔹 Title: Embedding-Aware Quantum-Classical SVMs for Scalable Quantum Machine Learning

🔹 Publication Date: Published on Jul 28

🔹 Abstract: Combining Vision Transformer embeddings with quantum-classical pipelines achieves quantum advantage in classification tasks, demonstrating the importance of embedding choice in quantum machine learning. AI-generated summary Quantum Support Vector Machines face scalability challenges due to high-dimensional quantum states and hardware limitations. We propose an embedding-aware quantum-classical pipeline combining class-balanced k-means distillation with pretrained Vision Transformer embeddings . Our key finding: ViT embeddings uniquely enable quantum advantage , achieving up to 8.02% accuracy improvements over classical SVMs on Fashion-MNIST and 4.42% on MNIST , while CNN features show performance degradation. Using 16-qubit tensor network simulation via cuTensorNet , we provide the first systematic evidence that quantum kernel advantage depends critically on embedding choice, revealing fundamental synergy between transformer attention and quantum feature spaces . This provides a practical pathway for scalable quantum machine learning that leverages modern neural architectures.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00024

• PDF: https://arxiv.org/pdf/2508.00024

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2🔥1
🔹 Title: Personalized Safety Alignment for Text-to-Image Diffusion Models

🔹 Publication Date: Published on Aug 2

🔹 Abstract: A personalized safety alignment framework integrates user-specific profiles into text-to-image diffusion models to better align generated content with individual safety preferences. AI-generated summary Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA) , a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism . Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01151

• PDF: https://arxiv.org/pdf/2508.01151

• Github: https://m-e-agi-lab.github.io/PSAlign/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2🔥1
🔹 Title: Cyber-Zero: Training Cybersecurity Agents without Runtime

🔹 Publication Date: Published on Jul 29

🔹 Abstract: Cyber-Zero synthesizes agent trajectories from CTF writeups to train runtime-free cybersecurity LLMs, achieving state-of-the-art performance on benchmarks. AI-generated summary Large Language Models ( LLMs ) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero , the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs . Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero , we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF , NYU CTF Bench , and Cybench . Our best model, Cyber-Zero-32B , establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00910

• PDF: https://arxiv.org/pdf/2508.00910

• Project Page: https://github.com/amazon-science/Cyber-Zero

• Github: https://github.com/amazon-science/Cyber-Zero

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3🔥1
🔹 Title:
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

🔹 Publication Date: Published on Jul 4, 2024

🔹 Abstract:
LLM-jp, a collaborative project, develops open-source and powerful Japanese large language models with over 1,500 participants. AI-generated summary This paper introduces LLM-jp , a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp , summaries of its activities, and technical reports on the LLMs developed by LLM-jp . For the latest activities, visit https:// llm-jp .nii.ac.jp/en/.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2407.03963
• PDF: https://arxiv.org/pdf/2407.03963

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2
🔹 Title: Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents

🔹 Publication Date: Published on Jul 31

🔹 Abstract: Reinforcement Learning enhances generalizable spatial reasoning and interaction in 3D environments through cross-view goal specification and automated task synthesis, achieving zero-shot generalization and improved interaction success rates. AI-generated summary While Reinforcement Learning ( RL ) has achieved remarkable success in language modeling, its triumph hasn't yet fully translated to visuomotor agents . A primary challenge in RL models is their tendency to overfit specific tasks or environments, thereby hindering the acquisition of generalizable behaviors across diverse settings. This paper provides a preliminary answer to this challenge by demonstrating that RL -finetuned visuomotor agents in Minecraft can achieve zero-shot generalization to unseen wo rl ds. Specifically, we explore RL 's potential to enhance generalizable spatial reasoning and interaction capabilities in 3D wo rl ds. To address challenges in multi-task RL representation, we analyze and establish cross-view goal specification as a unified multi-task goal space for visuomotor policies. Furthermore, to overcome the significant bottleneck of manual task design, we propose automated task synthesis within the highly customizable Minecraft environment for large-scale multi-task RL training, and we construct an efficient distributed RL framework to support this. Experimental results show RL significantly boosts interaction success rates by 4times and enables zero-shot generalization of spatial reasoning across diverse environments, including real-wo rl d settings. Our findings underscore the immense potential of RL training in 3D simulated environments, especially those amenable to large-scale task generation, for significantly advancing visuomotor agents ' spatial reasoning .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23698

• PDF: https://arxiv.org/pdf/2507.23698

• Github: https://github.com/CraftJarvis/ROCKET-3

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3
🔹 Title: Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

🔹 Publication Date: Published on Aug 4

🔹 Abstract: Seed Diffusion Preview, a discrete-state diffusion language model, achieves fast inference speeds through parallel generation, outperforming Mercury and Gemini Diffusion in speed and quality. AI-generated summary We present Seed Diffusion Preview , a large-scale language model based on discrete-state diffusion , offering remarkably fast inference speed. Thanks to non-sequential , parallel generation , discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding , as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02193

• PDF: https://arxiv.org/pdf/2508.02193

• Project Page: https://seed.bytedance.com/en/seed_diffusion

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1🔥1
🔹 Title: LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

🔹 Publication Date: Published on Aug 1

🔹 Abstract: LAMIC, a Layout-Aware Multi-Image Composition framework, extends single-reference diffusion models to multi-reference scenarios using attention mechanisms, achieving state-of-the-art performance in controllable image synthesis without training. AI-generated summary In controllable image synthesis, generating coherent and consistent images from multiple references with spatial layout awareness remains an open challenge. We present LAMIC , a Layout-Aware Multi-Image Composition framework that, for the first time, extends single-reference diffusion models to multi-reference scenarios in a training-free manner. Built upon the MMDiT model, LAMIC introduces two plug-and-play attention mechanisms: 1) Group Isolation Attention (GIA) to enhance entity disentanglement; and 2) Region-Modulated Attention (RMA) to enable layout-aware generation. To comprehensively evaluate model capabilities, we further introduce three metrics: 1) Inclusion Ratio ( IN-R ) and Fill Ratio (FI-R) for assessing layout control; and 2) Background Similarity ( BG-S ) for measuring background consistency. Extensive experiments show that LAMIC achieves state-of-the-art performance across most major metrics: it consistently outperforms existing multi-reference baselines in ID-S , BG-S , IN-R and AVG scores across all settings, and achieves the best DPG in complex composition tasks. These results demonstrate LAMIC 's superior abilities in identity keeping, background preservation, layout control, and prompt-following, all achieved without any training or fine-tuning, showcasing strong zero-shot generalization ability. By inheriting the strengths of advanced single-reference models and enabling seamless extension to multi-image scenarios, LAMIC establishes a new training-free paradigm for controllable multi-image composition. As foundation models continue to evolve, LAMIC 's performance is expected to scale accordingly. Our implementation is available at: https://github.com/Suchenl/ LAMIC .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00477

• PDF: https://arxiv.org/pdf/2508.00477

• Github: https://github.com/Suchenl/LAMIC

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3🔥1
🔹 Title:
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

🔹 Publication Date: Published on Jul 23

🔹 Abstract:
PRIX, an end-to-end driving architecture using only camera data, achieves state-of-the-art performance with a Context-aware Recalibration Transformer, outperforming larger multimodal planners in efficiency and scalability. AI-generated summary While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR . PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer ( CaRT ), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.17596
• PDF: https://arxiv.org/pdf/2507.17596

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3👍1