Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.24K photos
117 videos
23 files
3.45K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title:
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

🔹 Publication Date: Published on May 17, 2024

🔹 Abstract:
Mechanistic interpretability of neural networks is enhanced by a new technique that identifies and exploits degenerate parameters through the Interaction Basis, leading to sparser and more interpretable network representations. AI-generated summary Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations . An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to generalize further. We identify 3 ways that network parameters can be degenerate: linear dependence between activations in a layer; linear dependence between gradients passed back to a layer; ReLUs which fire on the same subset of datapoints. We also present a heuristic argument that modular networks are likely to be more degenerate, and we develop a metric for identifying modules in a network that is based on this argument. We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions. We introduce the Interaction Basis , a tractable technique to obtain a representation that is invariant to degeneracies from linear dependence of activations or Jacobians .

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2405.10927
• PDF: https://arxiv.org/pdf/2405.10927

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1
🔹 Title:
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

🔹 Publication Date: Published on Jul 17

🔹 Abstract:
A sliding iterative denoising process is proposed to enhance spatio-temporal consistency in 4D diffusion models for high-fidelity view synthesis from sparse-view videos. AI-generated summary This paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input. Previous methods solve the issue of insufficient observation by leveraging 4D diffusion models to generate videos at novel viewpoints. However, the generated videos from these models often lack spatio-temporal consistency , thus degrading view synthesis quality. In this paper, we propose a novel sliding iterative denoising process to enhance the spatio-temporal consistency of the 4D diffusion model. Specifically, we define a latent grid in which each latent encodes the image , camera pose , and human pose for a certain viewpoint and timestamp, then alternately denoising the latent grid along spatial and temporal dimensions with a sliding window, and finally decode the videos at target viewpoints from the corresponding denoised latents. Through the iterative sliding, information flows sufficiently across the latent grid , allowing the diffusion model to obtain a large receptive field and thus enhance the 4D consistency of the output, while making the GPU memory consumption affordable. The experiments on the DNA-Rendering and ActorsHQ datasets demonstrate that our method is able to synthesize high-quality and consistent novel-view videos and significantly outperforms the existing approaches. See our project page for interactive demos and video results: https://diffuman4d.github.io/ .

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.13344
• PDF: https://arxiv.org/pdf/2507.13344
• Project Page: https://diffuman4d.github.io/
• Github: https://github.com/zju3dv/Diffuman4D

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1
🔹 Title:
ForCenNet: Foreground-Centric Network for Document Image Rectification

🔹 Publication Date: Published on Jul 26

🔹 Abstract:
A Foreground-Centric Network for document image rectification improves state-of-the-art by effectively handling foreground elements and layout distortions. AI-generated summary Document image rectification aims to eliminate geometric deformation in photographed documents to facilitate text recognition. However, existing methods often neglect the significance of foreground elements, which provide essential geometric references and layout information for document image correction. In this paper, we introduce Foreground-Centric Network ( ForCenNet ) to eliminate geometric distortions in document images. Specifically, we initially propose a foreground-centric label generation method, which extracts detailed foreground elements from an undistorted image. Then we introduce a foreground-centric mask mechanism to enhance the distinction between readable and background regions. Furthermore, we design a curvature consistency loss to leverage the detailed foreground labels to help the model understand the distorted geometric distribution. Extensive experiments demonstrate that ForCenNet achieves new state-of-the-art on four real-world benchmarks, such as DocUNet, DIR300, WarpDoc, and DocReal. Quantitative analysis shows that the proposed method effectively undistorts layout elements, such as text lines and table borders. The resources for further comparison are provided at https://github.com/caipeng328/ ForCenNet .

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.19804
• PDF: https://arxiv.org/pdf/2507.19804
• Github: https://github.com/caipeng328/ForCenNet

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1
🔹 Title: Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

🔹 Publication Date: Published on Jul 31

🔹 Abstract: Seed-Prover, a lemma-style reasoning model using Lean, achieves high performance in formal theorem proving and automated mathematical reasoning through iterative refinement and specialized geometry support. AI-generated summary LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought , yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning . In this work, we propose Seed-Prover , a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F , and achieves over 50\% on PutnamBench , outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine Seed-Geometry , which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning , demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23726

• PDF: https://arxiv.org/pdf/2507.23726

• Github: https://github.com/ByteDance-Seed/Seed-Prover

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3
🔹 Title: Phi-Ground Tech Report: Advancing Perception in GUI Grounding

🔹 Publication Date: Published on Jul 31

🔹 Abstract: The Phi-Ground model family achieves state-of-the-art performance in GUI grounding for multimodal reasoning models, improving accuracy across various benchmarks. AI-generated summary With the development of multimodal reasoning models , Computer Use Agents (CUAs), akin to Jarvis from "Iron Man", are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision , indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the Phi-Ground model family , which achieves state-of-the-art performance across all five grounding benchmarks for models under 10B parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textbf{43.2} on ScreenSpot-pro and \textbf{27.2} on UI-Vision . We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: https://zhangmiaosen2000.github.io/Phi-Ground/{https://zhangmiaosen2000.github.io/Phi-Ground/}

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23779

• PDF: https://arxiv.org/pdf/2507.23779

• Project Page: https://zhangmiaosen2000.github.io/Phi-Ground/

• Github: https://github.com/zhangmiaosen2000/Phi-Ground

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title:
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

🔹 Publication Date: Published on Mar 13

🔹 Abstract:
A study analyzes various Gradient Inversion Attacks (GIA) in Federated Learning (FL), categorizing them and providing insights into their effectiveness and practicality, while suggesting a defense pipeline for privacy protection. AI-generated summary Federated Learning (FL) has emerged as a promising privacy-preserving collaborative model training paradigm without sharing raw data. However, recent studies have revealed that private information can still be leaked through shared gradient information and attacked by Gradient Inversion Attacks (GIA). While many GIA methods have been proposed, a detailed analysis, evaluation, and summary of these methods are still lacking. Although various survey papers summarize existing privacy attacks in FL, few studies have conducted extensive experiments to unveil the effectiveness of GIA and their associated limiting factors in this context. To fill this gap, we first undertake a systematic review of GIA and categorize existing methods into three types, i.e., optimization-based GIA (OP-GIA), generation-based GIA (GEN-GIA), and analytics-based GIA (ANA-GIA). Then, we comprehensively analyze and evaluate the three types of GIA in FL, providing insights into the factors that influence their performance, practicality, and potential threats. Our findings indicate that OP-GIA is the most practical attack setting despite its unsatisfactory performance, while GEN-GIA has many dependencies and ANA-GIA is easily detectable, making them both impractical. Finally, we offer a three-stage defense pipeline to users when designing FL frameworks and protocols for better privacy protection and share some future research directions from the perspectives of attackers and defenders that we believe should be pursued. We hope that our study can help researchers design more robust FL frameworks to defend against these attacks.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.11514
• PDF: https://arxiv.org/pdf/2503.11514
• Project Page: https://pengxin-guo.github.io/FLPrivacy/
• Github: https://pengxin-guo.github.io/FLPrivacy/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1🔥1
🔹 Title:
π^3: Scalable Permutation-Equivariant Visual Geometry Learning

🔹 Publication Date: Published on Jul 17

🔹 Abstract:
A permutation-equivariant neural network, $\pi^3$, reconstructs visual geometry without a fixed reference view, achieving state-of-the-art performance in camera pose estimation, depth estimation, and point map reconstruction. AI-generated summary We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation , monocular/ video depth estimation , and dense point map reconstruction. Code and models are publicly available.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.13347
• PDF: https://arxiv.org/pdf/2507.13347
• Project Page: https://yyfz.github.io/pi3/
• Github: https://yyfz.github.io/pi3/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/yyfz233/Pi3
==================================

For more data science resources:

https://t.iss.one/DataScienceT
🔹 Title: Persona Vectors: Monitoring and Controlling Character Traits in Language Models

🔹 Publication Date: Published on Jul 29

🔹 Abstract: Persona vectors in large language models can monitor and control personality changes during training and deployment, enabling the identification and mitigation of undesirable traits. AI-generated summary Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space - persona vectors -underlying several traits, such as evil , sycophancy , and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant's personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors . These shifts can be mitigated through post-hoc intervention , or avoided in the first place with a new preventative steering method . Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.21509

• PDF: https://arxiv.org/pdf/2507.21509

• Github: https://github.com/safety-research/persona_vectors/tree/main

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🎉3
🔹 Title: Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

🔹 Publication Date: Published on Aug 1

🔹 Abstract: DAEDAL, a novel training-free denoising strategy, enables dynamic length adaptation in Diffusion Large Language Models, improving performance and computational efficiency. AI-generated summary Diffusion Large Language Models ( DLLMs ) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models , offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models . DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric . 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion , ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio . By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs , bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00819

• PDF: https://arxiv.org/pdf/2508.00819

• Github: https://github.com/Li-Jinsong/DAEDAL

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title:
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

🔹 Publication Date: Published on Jul 14

🔹 Abstract:
Artic addresses latency issues in AI Video Chat by optimizing video streaming and frame rate adaptation to enhance MLLM accuracy and reduce bitrate. AI-generated summary AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model ( MLLM ). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty and instability, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we propose Artic, an AI-oriented Real-time Communication framework, exploring the network requirement shift from "humans watching video" to "AI understanding video". To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To avoid packet retransmission, we propose Loss-Resilient Adaptive Frame Rate that leverages previous frames to substitute for lost/delayed frames while avoiding bitrate waste. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark ( DeViBench ). Finally, we discuss some open questions and ongoing solutions for AI Video Chat.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2507.10510
• PDF: https://arxiv.org/pdf/2507.10510

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2🎉1
🔹 Title: 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

🔹 Publication Date: Published on Jul 31

🔹 Abstract: 3D-R1 enhances 3D scene understanding through a high-quality synthetic dataset, reinforcement learning with GRPO, and dynamic view selection, achieving significant improvements in reasoning and generalization. AI-generated summary Large vision-language models ( VLMs ) have made significant strides in 2D visual understanding tasks, sparking interest in extending these capabilities to 3D scene understanding . However, current 3D VLMs often struggle with robust reasoning and generalization due to limitations in high-quality spatial data and the static nature of viewpoint assumptions. To address these challenges, we propose 3D-R1 , a foundation model that enhances the reasoning capabilities of 3D VLMs . Specifically, we first construct a high-quality synthetic dataset with CoT, named Scene-30K , leveraging existing 3D-VL datasets and a data engine based on Gemini 2.5 Pro . It serves as cold-start initialization data for 3D-R1 . Moreover, we leverage RLHF policy such as GRPO in the reinforcement learning training process to enhance reasoning capabilities and introduce three reward functions: a perception reward , a semantic similarity reward and a format reward to maintain detection accuracy and answer semantic precision. Furthermore, we introduce a dynamic view selection strategy that adaptively chooses the most informative perspectives for 3D scene understanding . Extensive experiments demonstrate that 3D-R1 delivers an average improvement of 10% across various 3D scene benchmarks, highlighting its effectiveness in enhancing reasoning and generalization in 3D scene understanding . Code: https://github.com/AIGeeksGroup/ 3D-R1 . Website: https://aigeeksgroup.github.io/ 3D-R1 .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23478

• PDF: https://arxiv.org/pdf/2507.23478

• Project Page: https://aigeeksgroup.github.io/3D-R1

• Github: https://github.com/AIGeeksGroup/3D-R1

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title:
Multi-Label Knowledge Distillation

🔹 Publication Date: Published on Aug 12, 2023

🔹 Abstract:
The proposed method improves multi-label knowledge distillation by decomposing it into binary classification problems and leveraging label-wise embeddings to enhance feature representation distinctiveness. AI-generated summary Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2308.06453
• PDF: https://arxiv.org/pdf/2308.06453

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3👍1
🔹 Title: PixNerd: Pixel Neural Field Diffusion

🔹 Publication Date: Published on Jul 31

🔹 Abstract: Pixel Neural Field Diffusion (PixNerd) achieves high-quality image generation in a single-scale, single-stage process without VAEs or complex pipelines, and extends to text-to-image applications with competitive performance. AI-generated summary The current success of diffusion transformers heavily depends on the compressed latent space shaped by the pre-trained variational autoencoder (VAE). However, this two-stage training paradigm inevitably introduces accumulated errors and decoding artifacts. To address the aforementioned problems, researchers return to pixel space at the cost of complicated cascade pipelines and increased token complexity. In contrast to their efforts, we propose to model the patch-wise decoding with neural field and present a single-scale , single-stage , efficient, end-to-end solution , coined as pixel neural field diffusion~(PixelNerd). Thanks to the efficient neural field representation in PixNerd , we directly achieved 2.15 FID on ImageNet 256times256 and 2.84 FID on ImageNet 512times512 without any complex cascade pipeline or VAE. We also extend our PixNerd framework to text-to-image applications. Our PixNerd -XXL/16 achieved a competitive 0.73 overall score on the GenEval benchmark and 80.9 overall score on the DPG benchmark .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23268

• PDF: https://arxiv.org/pdf/2507.23268

• Project Page: https://huggingface.co/spaces/MCG-NJU/PixNerd

• Github: https://github.com/MCG-NJU/PixNerd

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/MCG-NJU/PixNerd
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: CellForge: Agentic Design of Virtual Cell Models

🔹 Publication Date: Published on Aug 4

🔹 Abstract: CellForge, an agentic system using a multi-agent framework, transforms raw single-cell multi-omics data into optimized computational models for virtual cells, outperforming state-of-the-art methods in single-cell perturbation prediction. AI-generated summary Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantities such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design , where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts , drug treatments , and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02276

• PDF: https://arxiv.org/pdf/2508.02276

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2👍1
🔹 Title: Qwen-Image Technical Report

🔹 Publication Date: Published on Aug 4

🔹 Abstract: Qwen-Image, an image generation model, advances text rendering and image editing through a comprehensive data pipeline, progressive training, and dual-encoding mechanism. AI-generated summary We present Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. To address the challenges of complex text rendering, we design a comprehensive data pipeline that includes large-scale data collection, filtering, annotation, synthesis, and balancing. Moreover, we adopt a progressive training strategy that starts with non-text-to-text rendering, evolves from simple to complex textual inputs, and gradually scales up to paragraph-level descriptions. This curriculum learning approach substantially enhances the model's native text rendering capabilities. As a result, Qwen-Image not only performs exceptionally well in alphabetic languages such as English, but also achieves remarkable progress on more challenging logographic languages like Chinese. To enhance image editing consistency, we introduce an improved multi-task training paradigm that incorporates not only traditional text-to-image (T2I) and text-image-to-image (TI2I) tasks but also image-to-image (I2I) reconstruction, effectively aligning the latent representations between Qwen2.5-VL and MMDiT. Furthermore, we separately feed the original image into Qwen2.5-VL and the VAE encoder to obtain semantic and reconstructive representations, respectively. This dual-encoding mechanism enables the editing module to strike a balance between preserving semantic consistency and maintaining visual fidelity . Qwen-Image achieves state-of-the-art performance, demonstrating its strong capabilities in both image generation and editing across multiple benchmarks.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02324

• PDF: https://arxiv.org/pdf/2508.02324

• Github: https://github.com/QwenLM/Qwen-Image

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
👍2
🔹 Title: ReMoMask: Retrieval-Augmented Masked Motion Generation

🔹 Publication Date: Published on Aug 4

🔹 Abstract: ReMoMask, a unified framework, addresses limitations in text-to-motion generation by integrating a Bidirectional Momentum Text-Motion Model, Semantic Spatio-temporal Attention, and RAG-Classier-Free Guidance, achieving state-of-the-art performance on HumanML3D and KIT-ML benchmarks. AI-generated summary Text-to-Motion (T2M) generation aims to synthesize realistic and semantically aligned human motion sequences from natural language descriptions. However, current approaches face dual challenges: Generative models (e.g., diffusion models) suffer from limited diversity, error accumulation, and physical implausibility, while Retrieval-Augmented Generation (RAG) methods exhibit diffusion inertia, partial-mode collapse, and asynchronous artifacts. To address these limitations, we propose ReMoMask, a unified framework integrating three key innovations: 1) A Bidirectional Momentum Text-Motion Model decouples negative sample scale from batch size via momentum queues , substantially improving cross-modal retrieval precision; 2) A Semantic Spatio-temporal Attention mechanism enforces biomechanical constraints during part-level fusion to eliminate asynchronous artifacts; 3) RAG-Classier-Free Guidance incorporates minor unconditional generation to enhance generalization. Built upon MoMask's RVQ-VAE , ReMoMask efficiently generates temporally coherent motions in minimal steps. Extensive experiments on standard benchmarks demonstrate the state-of-the-art performance of ReMoMask, achieving a 3.88% and 10.97% improvement in FID scores on HumanML3D and KIT-ML, respectively, compared to the previous SOTA method RAG-T2M. Code: https://github.com/AIGeeksGroup/ReMoMask. Website: https://aigeeksgroup.github.io/ReMoMask.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02605

• PDF: https://arxiv.org/pdf/2508.02605

• Project Page: https://aigeeksgroup.github.io/ReMoMask/

• Github: https://github.com/AIGeeksGroup/ReMoMask

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2👍1🎉1
🔹 Title: Embedding-Aware Quantum-Classical SVMs for Scalable Quantum Machine Learning

🔹 Publication Date: Published on Jul 28

🔹 Abstract: Combining Vision Transformer embeddings with quantum-classical pipelines achieves quantum advantage in classification tasks, demonstrating the importance of embedding choice in quantum machine learning. AI-generated summary Quantum Support Vector Machines face scalability challenges due to high-dimensional quantum states and hardware limitations. We propose an embedding-aware quantum-classical pipeline combining class-balanced k-means distillation with pretrained Vision Transformer embeddings . Our key finding: ViT embeddings uniquely enable quantum advantage , achieving up to 8.02% accuracy improvements over classical SVMs on Fashion-MNIST and 4.42% on MNIST , while CNN features show performance degradation. Using 16-qubit tensor network simulation via cuTensorNet , we provide the first systematic evidence that quantum kernel advantage depends critically on embedding choice, revealing fundamental synergy between transformer attention and quantum feature spaces . This provides a practical pathway for scalable quantum machine learning that leverages modern neural architectures.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00024

• PDF: https://arxiv.org/pdf/2508.00024

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2🔥1
🔹 Title: Personalized Safety Alignment for Text-to-Image Diffusion Models

🔹 Publication Date: Published on Aug 2

🔹 Abstract: A personalized safety alignment framework integrates user-specific profiles into text-to-image diffusion models to better align generated content with individual safety preferences. AI-generated summary Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA) , a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism . Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01151

• PDF: https://arxiv.org/pdf/2508.01151

• Github: https://m-e-agi-lab.github.io/PSAlign/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2🔥1
🔹 Title: Cyber-Zero: Training Cybersecurity Agents without Runtime

🔹 Publication Date: Published on Jul 29

🔹 Abstract: Cyber-Zero synthesizes agent trajectories from CTF writeups to train runtime-free cybersecurity LLMs, achieving state-of-the-art performance on benchmarks. AI-generated summary Large Language Models ( LLMs ) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero , the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs . Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero , we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF , NYU CTF Bench , and Cybench . Our best model, Cyber-Zero-32B , establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00910

• PDF: https://arxiv.org/pdf/2508.00910

• Project Page: https://github.com/amazon-science/Cyber-Zero

• Github: https://github.com/amazon-science/Cyber-Zero

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3🔥1
🔹 Title:
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

🔹 Publication Date: Published on Jul 4, 2024

🔹 Abstract:
LLM-jp, a collaborative project, develops open-source and powerful Japanese large language models with over 1,500 participants. AI-generated summary This paper introduces LLM-jp , a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp , summaries of its activities, and technical reports on the LLMs developed by LLM-jp . For the latest activities, visit https:// llm-jp .nii.ac.jp/en/.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2407.03963
• PDF: https://arxiv.org/pdf/2407.03963

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2