Data Science | Machine Learning with Python for Researchers
32.5K subscribers
3.11K photos
107 videos
22 files
3.33K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
πŸ”Ή Title: CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

πŸ”Ή Publication Date: Published on Jul 18

πŸ”Ή Abstract: CUDA-L1, an automated reinforcement learning framework, significantly improves CUDA optimization across various GPU architectures, achieving substantial speedups without human expertise. AI-generated summary The exponential growth in demand for GPU computing resources, driven by the rapid advancement of Large Language Models, has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models (e.g. R1, o1) achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1 , an automated reinforcement learning framework for CUDA optimization . CUDA-L1 achieves performance improvements on the CUDA optimization task: trained on NVIDIA A100 , it delivers an average speedup of x17.7 across all 250 CUDA kernels of KernelBench , with peak speedup s reaching x449. Furthermore, the model also demonstrates excellent portability across GPU architectures , achieving average speedup s of x17.8 on H100 , x19.0 on RTX 3090 , x16.5 on L40 , x14.7 on H800 , and x13.9 on H20 despite being optimized specifically for A100. Beyond these benchmark results, CUDA-L1 demonstrates several remarkable properties: 1) Discovers a variety of CUDA optimization techniques and learns to combine them strategically to achieve optimal performance; 2) Uncovers fundamental principles of CUDA optimization ; 3) Identifies non-obvious performance bottlenecks and rejects seemingly beneficial optimizations that harm performance. The capabilities of CUDA-L1 demonstrate that reinforcement learning can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup -based reward signals alone, without human expertise or domain knowledge. More importantly, the trained RL model extend the acquired reasoning abilities to new kernels. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources.

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.14111

β€’ PDF: https://arxiv.org/pdf/2507.14111

πŸ”Ή Datasets citing this paper:
β€’ https://huggingface.co/datasets/deepreinforce-ai/CUDA-L1

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀3
πŸš€ Become an Agentic AI Builder β€” Free 12‑Week Certification by Ready Tensor

Ready Tensor’s Agentic AI Developer Certification is a free, project first 12‑week program designed to help you build and deploy real-world agentic AI systems. You'll complete three portfolio-ready projects using tools like LangChain, LangGraph, and vector databases, while deploying production-ready agents with FastAPI or Streamlit.

The course focuses on developing autonomous AI agents that can plan, reason, use memory, and act safely in complex environments. Certification is earned not by watching lectures, but by building β€” each project is reviewed against rigorous standards.

You can start anytime, and new cohorts begin monthly. Ideal for developers and engineers ready to go beyond chat prompts and start building true agentic systems.

πŸ‘‰ Apply now: https://www.readytensor.ai/agentic-ai-cert/
πŸ”Ή Title:
Music Arena: Live Evaluation for Text-to-Music

πŸ”Ή Publication Date: Published on Jul 28

πŸ”Ή Abstract:
Music Arena provides a scalable, interactive platform for evaluating text-to-music models through user-generated preferences and detailed feedback. AI-generated summary We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare, as study protocols may differ across systems. Moreover, human preferences might help researchers align their TTM systems or improve automatic evaluation metrics, but an open and renewable source of preferences does not currently exist. We aim to fill these gaps by offering *live* evaluation for TTM. In Music Arena, real-world users input text prompts of their choosing and compare outputs from two TTM systems, and their preferences are used to compile a leaderboard. While Music Arena follows recent evaluation trends in other AI domains, we also design it with key features tailored to music: an LLM-based routing system to navigate the heterogeneous type signatures of TTM systems, and the collection of *detailed* preferences including listening data and natural language feedback . We also propose a rolling data release policy with user privacy guarantees , providing a renewable source of preference data and increasing platform transparency. Through its standardized evaluation protocol , transparent data access policies , and music-specific features, Music Arena not only addresses key challenges in the TTM ecosystem but also demonstrates how live evaluation can be thoughtfully adapted to unique characteristics of specific AI domains. Music Arena is available at: https://music-arena.org

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.20900
β€’ PDF: https://arxiv.org/pdf/2507.20900
β€’ Github: https://github.com/gclef-cmu/music-arena

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title:
Privacy-Aware Energy Consumption Modeling of Connected Battery Electric Vehicles using Federated Learning

πŸ”Ή Publication Date: Published on Dec 12, 2023

πŸ”Ή Abstract:
Federated Learning methods like FedAvg and FedPer improve BEV energy consumption prediction while protecting user privacy. AI-generated summary Battery Electric Vehicles (BEVs) are increasingly significant in modern cities due to their potential to reduce air pollution. Precise and real-time estimation of energy consumption for them is imperative for effective itinerary planning and optimizing vehicle systems, which can reduce driving range anxiety and decrease energy costs. As public awareness of data privacy increases, adopting approaches that safeguard data privacy in the context of BEV energy consumption modeling is crucial. Federated Learning ( FL ) is a promising solution mitigating the risk of exposing sensitive information to third parties by allowing local data to remain on devices and only sharing model updates with a central server. Our work investigates the potential of using FL methods, such as FedAvg , and FedPer , to improve BEV energy consumption prediction while maintaining user privacy. We conducted experiments using data from 10 BEVs under simulated real-world driving conditions. Our results demonstrate that the FedAvg - LSTM model achieved a reduction of up to 67.84\% in the MAE value of the prediction results. Furthermore, we explored various real-world scenarios and discussed how FL methods can be employed in those cases. Our findings show that FL methods can effectively improve the performance of BEV energy consumption prediction while maintaining user privacy.

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2312.07371
β€’ PDF: https://arxiv.org/pdf/2312.07371

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title: Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

πŸ”Ή Publication Date: Published on Jul 31

πŸ”Ή Abstract: Seed-Prover, a lemma-style reasoning model using Lean, achieves high performance in formal theorem proving and automated mathematical reasoning through iterative refinement and specialized geometry support. AI-generated summary LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought , yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning . In this work, we propose Seed-Prover , a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F , and achieves over 50\% on PutnamBench , outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine Seed-Geometry , which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning , demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.23726

β€’ PDF: https://arxiv.org/pdf/2507.23726

β€’ Github: https://github.com/ByteDance-Seed/Seed-Prover

πŸ”Ή Models citing this paper:
No models found

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title: RecGPT Technical Report

πŸ”Ή Publication Date: Published on Jul 30

πŸ”Ή Abstract: RecGPT integrates large language models into recommender systems to focus on user intent, improving content diversity and satisfaction while enhancing merchant and platform performance. AI-generated summary Recommender systems are among the most impactful applications of artificial intelligence, serving as critical infrastructure connecting users, merchants, and platforms. However, most current industrial systems remain heavily reliant on historical co-occurrence patterns and log-fitting objectives, i.e., optimizing for past user interactions without explicitly modeling user intent . This log-fitting approach often leads to overfitting to narrow historical preferences, failing to capture users' evolving and latent interests. As a result, it reinforces filter bubbles and long-tail phenomena, ultimately harming user experience and threatening the sustainability of the whole recommendation ecosystem. To address these challenges, we rethink the overall design paradigm of recommender systems and propose RecGPT, a next-generation framework that places user intent at the center of the recommendation pipeline. By integrating large language models ( LLMs ) into key stages of user interest mining , item retrieval , and explanation generation , RecGPT transforms log-fitting recommendation into an intent-centric process. To effectively align general-purpose LLMs to the above domain-specific recommendation tasks at scale, RecGPT incorporates a multi-stage training paradigm, which integrates reasoning-enhanced pre-alignment and self-training evolution , guided by a Human-LLM cooperative judge system. Currently, RecGPT has been fully deployed on the Taobao App. Online experiments demonstrate that RecGPT achieves consistent performance gains across stakeholders: users benefit from increased content diversity and satisfaction, merchants and the platform gain greater exposure and conversions. These comprehensive improvement results across all stakeholders validates that LLM-driven, intent-centric design can foster a more sustainable and mutually beneficial recommendation ecosystem.

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.22879

β€’ PDF: https://arxiv.org/pdf/2507.22879

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title: Beyond Linear Bottlenecks: Spline-Based Knowledge Distillation for Culturally Diverse Art Style Classification

πŸ”Ή Publication Date: Published on Jul 31

πŸ”Ή Abstract: Enhancing dual-teacher self-supervised frameworks with Kolmogorov-Arnold Networks improves art style classification by better modeling nonlinear feature correlations and disentangling complex style manifolds. AI-generated summary Art style classification remains a formidable challenge in computational aesthetics due to the scarcity of expertly labeled datasets and the intricate, often nonlinear interplay of stylistic elements. While recent dual-teacher self-supervised frameworks reduce reliance on labeled data, their linear projection layers and localized focus struggle to model global compositional context and complex style-feature interactions. We enhance the dual-teacher knowledge distillation framework to address these limitations by replacing conventional MLP projection and prediction heads with Kolmogorov-Arnold Networks ( KANs ). Our approach retains complementary guidance from two teacher networks, one emphasizing localized texture and brushstroke patterns, the other capturing broader stylistic hierarchies while leveraging KANs ' spline-based activations to model nonlinear feature correlations with mathematical precision. Experiments on WikiArt and Pandora18k demonstrate that our approach outperforms the base dual teacher architecture in Top-1 accuracy. Our findings highlight the importance of KANs in disentangling complex style manifolds , leading to better linear probe accuracy than MLP projection s.

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.23436

β€’ PDF: https://arxiv.org/pdf/2507.23436

β€’ Project Page: https://huggingface.co/papers?q=MLP%20projection

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title:
4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture

πŸ”Ή Publication Date: Published on Jul 7

πŸ”Ή Abstract:
A high-speed 4D capturing system using low FPS cameras with asynchronous capture and video-diffusion-based artifact correction enhances reconstruction quality. AI-generated summary Reconstructing fast-dynamic scenes from multi-view videos is crucial for high-speed motion analysis and realistic 4D reconstruction. However, the majority of 4D capture systems are limited to frame rates below 30 FPS (frames per second), and a direct 4D reconstruction of high-speed motion from low FPS input may lead to undesirable results. In this work, we propose a high-speed 4D capturing system only using low FPS cameras, through novel capturing and processing modules. On the capturing side, we propose an asynchronous capture scheme that increases the effective frame rate by staggering the start times of cameras. By grouping cameras and leveraging a base frame rate of 25 FPS, our method achieves an equivalent frame rate of 100-200 FPS without requiring specialized high-speed cameras. On processing side, we also propose a novel generative model to fix artifacts caused by 4D sparse-view reconstruction, as asynchrony reduces the number of viewpoints at each timestamp. Specifically, we propose to train a video-diffusion-based artifact-fix model for sparse 4D reconstruction, which refines missing details, maintains temporal consistency , and improves overall reconstruction quality. Experimental results demonstrate that our method significantly enhances high-speed 4D reconstruction compared to synchronous capture.

πŸ”Ή Links:
β€’ arXiv Page: https://arxivexplained.com/papers/4dslomo-4d-reconstruction-for-high-speed-scene-with-asynchronous-capture
β€’ PDF: https://arxiv.org/pdf/2507.05163
β€’ Github: https://openimaginglab.github.io/4DSloMo/

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀5πŸ”₯1
This is a channel for a job opportunity
πŸ”Ή Title:
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

πŸ”Ή Publication Date: Published on May 17, 2024

πŸ”Ή Abstract:
Mechanistic interpretability of neural networks is enhanced by a new technique that identifies and exploits degenerate parameters through the Interaction Basis, leading to sparser and more interpretable network representations. AI-generated summary Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations . An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to generalize further. We identify 3 ways that network parameters can be degenerate: linear dependence between activations in a layer; linear dependence between gradients passed back to a layer; ReLUs which fire on the same subset of datapoints. We also present a heuristic argument that modular networks are likely to be more degenerate, and we develop a metric for identifying modules in a network that is based on this argument. We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions. We introduce the Interaction Basis , a tractable technique to obtain a representation that is invariant to degeneracies from linear dependence of activations or Jacobians .

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2405.10927
β€’ PDF: https://arxiv.org/pdf/2405.10927

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀1
πŸ”Ή Title:
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

πŸ”Ή Publication Date: Published on Jul 17

πŸ”Ή Abstract:
A sliding iterative denoising process is proposed to enhance spatio-temporal consistency in 4D diffusion models for high-fidelity view synthesis from sparse-view videos. AI-generated summary This paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input. Previous methods solve the issue of insufficient observation by leveraging 4D diffusion models to generate videos at novel viewpoints. However, the generated videos from these models often lack spatio-temporal consistency , thus degrading view synthesis quality. In this paper, we propose a novel sliding iterative denoising process to enhance the spatio-temporal consistency of the 4D diffusion model. Specifically, we define a latent grid in which each latent encodes the image , camera pose , and human pose for a certain viewpoint and timestamp, then alternately denoising the latent grid along spatial and temporal dimensions with a sliding window, and finally decode the videos at target viewpoints from the corresponding denoised latents. Through the iterative sliding, information flows sufficiently across the latent grid , allowing the diffusion model to obtain a large receptive field and thus enhance the 4D consistency of the output, while making the GPU memory consumption affordable. The experiments on the DNA-Rendering and ActorsHQ datasets demonstrate that our method is able to synthesize high-quality and consistent novel-view videos and significantly outperforms the existing approaches. See our project page for interactive demos and video results: https://diffuman4d.github.io/ .

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.13344
β€’ PDF: https://arxiv.org/pdf/2507.13344
β€’ Project Page: https://diffuman4d.github.io/
β€’ Github: https://github.com/zju3dv/Diffuman4D

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀1
πŸ”Ή Title:
ForCenNet: Foreground-Centric Network for Document Image Rectification

πŸ”Ή Publication Date: Published on Jul 26

πŸ”Ή Abstract:
A Foreground-Centric Network for document image rectification improves state-of-the-art by effectively handling foreground elements and layout distortions. AI-generated summary Document image rectification aims to eliminate geometric deformation in photographed documents to facilitate text recognition. However, existing methods often neglect the significance of foreground elements, which provide essential geometric references and layout information for document image correction. In this paper, we introduce Foreground-Centric Network ( ForCenNet ) to eliminate geometric distortions in document images. Specifically, we initially propose a foreground-centric label generation method, which extracts detailed foreground elements from an undistorted image. Then we introduce a foreground-centric mask mechanism to enhance the distinction between readable and background regions. Furthermore, we design a curvature consistency loss to leverage the detailed foreground labels to help the model understand the distorted geometric distribution. Extensive experiments demonstrate that ForCenNet achieves new state-of-the-art on four real-world benchmarks, such as DocUNet, DIR300, WarpDoc, and DocReal. Quantitative analysis shows that the proposed method effectively undistorts layout elements, such as text lines and table borders. The resources for further comparison are provided at https://github.com/caipeng328/ ForCenNet .

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.19804
β€’ PDF: https://arxiv.org/pdf/2507.19804
β€’ Github: https://github.com/caipeng328/ForCenNet

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀1
πŸ”Ή Title: Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

πŸ”Ή Publication Date: Published on Jul 31

πŸ”Ή Abstract: Seed-Prover, a lemma-style reasoning model using Lean, achieves high performance in formal theorem proving and automated mathematical reasoning through iterative refinement and specialized geometry support. AI-generated summary LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought , yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning . In this work, we propose Seed-Prover , a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves 78.1% of formalized past IMO problems, saturates MiniF2F , and achieves over 50\% on PutnamBench , outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine Seed-Geometry , which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning , demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.23726

β€’ PDF: https://arxiv.org/pdf/2507.23726

β€’ Github: https://github.com/ByteDance-Seed/Seed-Prover

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀3
πŸ”Ή Title: Phi-Ground Tech Report: Advancing Perception in GUI Grounding

πŸ”Ή Publication Date: Published on Jul 31

πŸ”Ή Abstract: The Phi-Ground model family achieves state-of-the-art performance in GUI grounding for multimodal reasoning models, improving accuracy across various benchmarks. AI-generated summary With the development of multimodal reasoning models , Computer Use Agents (CUAs), akin to Jarvis from "Iron Man", are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision , indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the Phi-Ground model family , which achieves state-of-the-art performance across all five grounding benchmarks for models under 10B parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textbf{43.2} on ScreenSpot-pro and \textbf{27.2} on UI-Vision . We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: https://zhangmiaosen2000.github.io/Phi-Ground/{https://zhangmiaosen2000.github.io/Phi-Ground/}

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.23779

β€’ PDF: https://arxiv.org/pdf/2507.23779

β€’ Project Page: https://zhangmiaosen2000.github.io/Phi-Ground/

β€’ Github: https://github.com/zhangmiaosen2000/Phi-Ground

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title:
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

πŸ”Ή Publication Date: Published on Mar 13

πŸ”Ή Abstract:
A study analyzes various Gradient Inversion Attacks (GIA) in Federated Learning (FL), categorizing them and providing insights into their effectiveness and practicality, while suggesting a defense pipeline for privacy protection. AI-generated summary Federated Learning (FL) has emerged as a promising privacy-preserving collaborative model training paradigm without sharing raw data. However, recent studies have revealed that private information can still be leaked through shared gradient information and attacked by Gradient Inversion Attacks (GIA). While many GIA methods have been proposed, a detailed analysis, evaluation, and summary of these methods are still lacking. Although various survey papers summarize existing privacy attacks in FL, few studies have conducted extensive experiments to unveil the effectiveness of GIA and their associated limiting factors in this context. To fill this gap, we first undertake a systematic review of GIA and categorize existing methods into three types, i.e., optimization-based GIA (OP-GIA), generation-based GIA (GEN-GIA), and analytics-based GIA (ANA-GIA). Then, we comprehensively analyze and evaluate the three types of GIA in FL, providing insights into the factors that influence their performance, practicality, and potential threats. Our findings indicate that OP-GIA is the most practical attack setting despite its unsatisfactory performance, while GEN-GIA has many dependencies and ANA-GIA is easily detectable, making them both impractical. Finally, we offer a three-stage defense pipeline to users when designing FL frameworks and protocols for better privacy protection and share some future research directions from the perspectives of attackers and defenders that we believe should be pursued. We hope that our study can help researchers design more robust FL frameworks to defend against these attacks.

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2503.11514
β€’ PDF: https://arxiv.org/pdf/2503.11514
β€’ Project Page: https://pengxin-guo.github.io/FLPrivacy/
β€’ Github: https://pengxin-guo.github.io/FLPrivacy/

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀1πŸ”₯1
πŸ”Ή Title:
Ο€^3: Scalable Permutation-Equivariant Visual Geometry Learning

πŸ”Ή Publication Date: Published on Jul 17

πŸ”Ή Abstract:
A permutation-equivariant neural network, $\pi^3$, reconstructs visual geometry without a fixed reference view, achieving state-of-the-art performance in camera pose estimation, depth estimation, and point map reconstruction. AI-generated summary We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation , monocular/ video depth estimation , and dense point map reconstruction. Code and models are publicly available.

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.13347
β€’ PDF: https://arxiv.org/pdf/2507.13347
β€’ Project Page: https://yyfz.github.io/pi3/
β€’ Github: https://yyfz.github.io/pi3/

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
β€’ https://huggingface.co/spaces/yyfz233/Pi3
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
πŸ”Ή Title: Persona Vectors: Monitoring and Controlling Character Traits in Language Models

πŸ”Ή Publication Date: Published on Jul 29

πŸ”Ή Abstract: Persona vectors in large language models can monitor and control personality changes during training and deployment, enabling the identification and mitigation of undesirable traits. AI-generated summary Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space - persona vectors -underlying several traits, such as evil , sycophancy , and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant's personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors . These shifts can be mitigated through post-hoc intervention , or avoided in the first place with a new preventative steering method . Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description .

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.21509

β€’ PDF: https://arxiv.org/pdf/2507.21509

β€’ Github: https://github.com/safety-research/persona_vectors/tree/main

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
πŸŽ‰3
πŸ”Ή Title: Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

πŸ”Ή Publication Date: Published on Aug 1

πŸ”Ή Abstract: DAEDAL, a novel training-free denoising strategy, enables dynamic length adaptation in Diffusion Large Language Models, improving performance and computational efficiency. AI-generated summary Diffusion Large Language Models ( DLLMs ) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models , offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models . DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric . 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion , ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio . By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs , bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2508.00819

β€’ PDF: https://arxiv.org/pdf/2508.00819

β€’ Github: https://github.com/Li-Jinsong/DAEDAL

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT
❀2
πŸ”Ή Title:
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

πŸ”Ή Publication Date: Published on Jul 14

πŸ”Ή Abstract:
Artic addresses latency issues in AI Video Chat by optimizing video streaming and frame rate adaptation to enhance MLLM accuracy and reduce bitrate. AI-generated summary AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model ( MLLM ). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty and instability, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we propose Artic, an AI-oriented Real-time Communication framework, exploring the network requirement shift from "humans watching video" to "AI understanding video". To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To avoid packet retransmission, we propose Loss-Resilient Adaptive Frame Rate that leverages previous frames to substitute for lost/delayed frames while avoiding bitrate waste. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark ( DeViBench ). Finally, we discuss some open questions and ongoing solutions for AI Video Chat.

πŸ”Ή Links:
β€’ arXiv Page: https://arxiv.org/abs/2507.10510
β€’ PDF: https://arxiv.org/pdf/2507.10510

πŸ”Ή Datasets citing this paper:
No datasets found

πŸ”Ή Spaces citing this paper:
No spaces found
==================================

For more data science resources:

βœ“ https://t.iss.one/DataScienceT
❀2πŸŽ‰1