Data Science | Machine Learning with Python for Researchers
31.3K subscribers
1.47K photos
102 videos
22 files
1.75K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
๐Ÿ”น Title: Morae: Proactively Pausing UI Agents for User Choices

๐Ÿ”น Publication Date: Published on Aug 29

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21456
โ€ข PDF: https://arxiv.org/pdf/2508.21456

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: AHELM: A Holistic Evaluation of Audio-Language Models

๐Ÿ”น Publication Date: Published on Aug 29

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21376
โ€ข PDF: https://arxiv.org/pdf/2508.21376
โ€ข Project Page: https://crfm.stanford.edu/helm/audio/v1.0.0/

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

๐Ÿ”น Publication Date: Published on Aug 29

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21365
โ€ข PDF: https://arxiv.org/pdf/2508.21365

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

๐Ÿ”น Publication Date: Published on Aug 27

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.20085
โ€ข PDF: https://arxiv.org/pdf/2508.20085
โ€ข Project Page: https://gemcollector.github.io/HERMES/
โ€ข Github: https://gemcollector.github.io/HERMES/

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
โค1
๐Ÿ”น Title: HPSv3: Towards Wide-Spectrum Human Preference Score

๐Ÿ”น Publication Date: Published on Aug 5

๐Ÿ”น Abstract: HPSv3, a human preference score using a wide-spectrum dataset and uncertainty-aware ranking loss, enhances text-to-image generation quality through iterative refinement. AI-generated summary Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3 , the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference ( CoHP ), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality . The code and dataset are available at the HPSv3 Homepage.

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.03789

โ€ข PDF: https://arxiv.org/pdf/2508.03789

โ€ข Project Page: https://mizzenai.github.io/HPSv3.project/

โ€ข Github: https://github.com/MizzenAI/HPSv3

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

๐Ÿ”น Publication Date: Published on Aug 28

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://www.arxiv.org/abs/2508.20470
โ€ข PDF: https://arxiv.org/pdf/2508.20470
โ€ข Github: https://dropletx.github.io/

๐Ÿ”น Datasets citing this paper:
โ€ข https://huggingface.co/datasets/DropletX/Droplet3D-4M

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: CLIPSym: Delving into Symmetry Detection with CLIP

๐Ÿ”น Publication Date: Published on Aug 19

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.14197
โ€ข PDF: https://arxiv.org/pdf/2508.14197
โ€ข Github: https://github.com/timyoung2333/CLIPSym

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

๐Ÿ”น Publication Date: Published on Aug 28

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21148
โ€ข PDF: https://arxiv.org/pdf/2508.21148
โ€ข Github: https://github.com/open-sciencelab/Awesome-Scientific-Datasets-and-LLMs

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Model-Task Alignment Drives Distinct RL Outcomes

๐Ÿ”น Publication Date: Published on Aug 28

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21188
โ€ข PDF: https://arxiv.org/pdf/2508.21188

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

๐Ÿ”น Publication Date: Published on Aug 24

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.17380
โ€ข PDF: https://arxiv.org/pdf/2508.17380
โ€ข Github: https://jiaaqiliu.github.io/VIPER-R1/

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Deep Residual Echo State Networks: exploring residual orthogonal connections in untrained Recurrent Neural Networks

๐Ÿ”น Publication Date: Published on Aug 28

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21172
โ€ข PDF: https://arxiv.org/pdf/2508.21172
โ€ข Github: https://github.com/NennoMP/deepresesn

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Quantization Robustness to Input Degradations for Object Detection

๐Ÿ”น Publication Date: Published on Aug 27

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.19600
โ€ข PDF: https://arxiv.org/pdf/2508.19600

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks

๐Ÿ”น Publication Date: Published on Aug 23

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://huggingface.co/collections/yhua219/edurabsa-dataset-68b59bad56a9e1384de7faf2
โ€ข PDF: https://arxiv.org/pdf/2508.17008
โ€ข Github: https://github.com/yhua219/edurabsa_dataset_and_annotation_tool

๐Ÿ”น Datasets citing this paper:
โ€ข https://huggingface.co/datasets/yhua219/EduRABSA_ASTE
โ€ข https://huggingface.co/datasets/yhua219/EduRABSA_AOPE
โ€ข https://huggingface.co/datasets/yhua219/EduRABSA_ASQE
โ€ข https://huggingface.co/datasets/yhua219/EduRABSA_ACD

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

๐Ÿ”น Publication Date: Published on Aug 28

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.21104
โ€ข PDF: https://arxiv.org/pdf/2508.21104

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

๐Ÿ”น Publication Date: Published on Aug 26

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.19060
โ€ข PDF: https://arxiv.org/pdf/2508.19060
โ€ข Github: https://github.com/blaz-r/SuperSimplenet

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
โค1
๐Ÿ”น Title: SWE-Exp: Experience-Driven Software Issue Resolution

๐Ÿ”น Publication Date: Published on Jul 31

๐Ÿ”น Abstract: SWE-Exp enhances software issue resolution by systematically accumulating and leveraging repair expertise from past agent experiences, improving resolution rates. AI-generated summary Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS) . However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified under open-source agent frameworks . Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2507.23361

โ€ข PDF: https://arxiv.org/pdf/2507.23361

โ€ข Github: https://github.com/YerbaPage/SWE-Exp

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on ฯ„-bench

๐Ÿ”น Publication Date: Published on Aug 28

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.20931
โ€ข PDF: https://arxiv.org/pdf/2508.20931

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management

๐Ÿ”น Publication Date: Published on Aug 6

๐Ÿ”น Abstract: Sculptor, a framework for Active Context Management, enhances LLM performance on long contexts by enabling proactive attention and memory control, reducing proactive interference and improving reasoning reliability. AI-generated summary Large Language Models (LLMs) suffer from significant performance degradation when processing long contexts due to proactive interference , where irrelevant information in earlier parts of the context disrupts reasoning and memory recall. While most research focuses on external memory systems to augment LLMs' capabilities, we propose a complementary approach: empowering LLMs with Active Context Management (ACM) tools to actively sculpt their internal working memory. We introduce Sculptor, a framework that equips LLMs with three categories of tools: (1) context fragmentation , (2) summary , hide , and restore , and (3) intelligent search . Our approach enables LLMs to proactively manage their attention and working memory, analogous to how humans selectively focus on relevant information while filtering out distractions. Experimental evaluation on information-sparse benchmarks- PI-LLM ( proactive interference ) and NeedleBench Multi-Needle Reasoning -demonstrates that Sculptor significantly improves performance even without specific training, leveraging LLMs' inherent tool calling generalization capabilities. By enabling Active Context Management, Sculptor not only mitigates proactive interference but also provides a cognitive foundation for more reliable reasoning across diverse long-context tasks-highlighting that explicit context-control strategies, rather than merely larger token windows, are key to robustness at scale.

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.04664

โ€ข PDF: https://arxiv.org/pdf/2508.04664

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
โค1
โ€œI was laughed at when I bought crypto in 2019. Now my portfolioโ€™s up 1200% โ€” and friends keep asking for โ€˜the secretโ€™โ€ฆ
But nobody talks about the brutal truths I learned along the way. Want to see what everyoneโ€™s missing? ๐Ÿ‘‰ See it here

#ุฅุนู„ุงู† InsideAds
๐Ÿ”น Title: UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat

๐Ÿ”น Publication Date: Published on Aug 24

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.17378
โ€ข PDF: https://arxiv.org/pdf/2508.17378

๐Ÿ”น Datasets citing this paper:
No datasets found

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT
๐Ÿ”น Title: T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

๐Ÿ”น Publication Date: Published on Aug 27

๐Ÿ”น Paper Links:
โ€ข arXiv Page: https://arxiv.org/abs/2508.19813
โ€ข PDF: https://arxiv.org/pdf/2508.19813

๐Ÿ”น Datasets citing this paper:
โ€ข https://huggingface.co/datasets/Tele-AI/TeleTableBench

๐Ÿ”น Spaces citing this paper:
No spaces found
==================================

For more data science resources:
โœ“ https://t.iss.one/DataScienceT