Data Science | Machine Learning with Python for Researchers
31.3K subscribers
1.46K photos
102 videos
22 files
1.74K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title: A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

🔹 Publication Date: Published on Aug 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.18106
• PDF: https://arxiv.org/pdf/2508.18106
• Github: https://github.com/Tencent/AICGSecEval

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21112
• PDF: https://arxiv.org/pdf/2508.21112
• Project Page: https://eo-robotics.ai/eo-1
• Github: https://github.com/EO-Robotics/EO-1

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.21113
• PDF: https://arxiv.org/pdf/2508.21113
• Github: https://github.com/yannqi/R-4B

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.13618
• PDF: https://arxiv.org/pdf/2508.13618
• Project Page: https://freedomintelligence.github.io/talk-vid/
• Github: https://github.com/FreedomIntelligence/TalkVid

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Efficient Code Embeddings from Code Generation Models

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21290
• PDF: https://arxiv.org/pdf/2508.21290

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

🔹 Publication Date: Published on Aug 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.17677
• PDF: https://arxiv.org/pdf/2508.17677

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: UItron: Foundational GUI Agent with Advanced Perception and Planning

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21767
• PDF: https://arxiv.org/pdf/2508.21767

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Morae: Proactively Pausing UI Agents for User Choices

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21456
• PDF: https://arxiv.org/pdf/2508.21456

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: AHELM: A Holistic Evaluation of Audio-Language Models

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21376
• PDF: https://arxiv.org/pdf/2508.21376
• Project Page: https://crfm.stanford.edu/helm/audio/v1.0.0/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21365
• PDF: https://arxiv.org/pdf/2508.21365

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

🔹 Publication Date: Published on Aug 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.20085
• PDF: https://arxiv.org/pdf/2508.20085
• Project Page: https://gemcollector.github.io/HERMES/
• Github: https://gemcollector.github.io/HERMES/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: HPSv3: Towards Wide-Spectrum Human Preference Score

🔹 Publication Date: Published on Aug 5

🔹 Abstract: HPSv3, a human preference score using a wide-spectrum dataset and uncertainty-aware ranking loss, enhances text-to-image generation quality through iterative refinement. AI-generated summary Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3 , the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference ( CoHP ), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality . The code and dataset are available at the HPSv3 Homepage.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03789

• PDF: https://arxiv.org/pdf/2508.03789

• Project Page: https://mizzenai.github.io/HPSv3.project/

• Github: https://github.com/MizzenAI/HPSv3

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://www.arxiv.org/abs/2508.20470
• PDF: https://arxiv.org/pdf/2508.20470
• Github: https://dropletx.github.io/

🔹 Datasets citing this paper:
https://huggingface.co/datasets/DropletX/Droplet3D-4M

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: CLIPSym: Delving into Symmetry Detection with CLIP

🔹 Publication Date: Published on Aug 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.14197
• PDF: https://arxiv.org/pdf/2508.14197
• Github: https://github.com/timyoung2333/CLIPSym

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21148
• PDF: https://arxiv.org/pdf/2508.21148
• Github: https://github.com/open-sciencelab/Awesome-Scientific-Datasets-and-LLMs

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Model-Task Alignment Drives Distinct RL Outcomes

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21188
• PDF: https://arxiv.org/pdf/2508.21188

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

🔹 Publication Date: Published on Aug 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.17380
• PDF: https://arxiv.org/pdf/2508.17380
• Github: https://jiaaqiliu.github.io/VIPER-R1/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21172
• PDF: https://arxiv.org/pdf/2508.21172
• Github: https://github.com/NennoMP/deepresesn

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Quantization Robustness to Input Degradations for Object Detection

🔹 Publication Date: Published on Aug 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.19600
• PDF: https://arxiv.org/pdf/2508.19600

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks

🔹 Publication Date: Published on Aug 23

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/yhua219/edurabsa-dataset-68b59bad56a9e1384de7faf2
• PDF: https://arxiv.org/pdf/2508.17008
• Github: https://github.com/yhua219/edurabsa_dataset_and_annotation_tool

🔹 Datasets citing this paper:
https://huggingface.co/datasets/yhua219/EduRABSA_ASTE
https://huggingface.co/datasets/yhua219/EduRABSA_AOPE
https://huggingface.co/datasets/yhua219/EduRABSA_ASQE
https://huggingface.co/datasets/yhua219/EduRABSA_ACD

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT