Data Science | Machine Learning with Python for Researchers
32.5K subscribers
2.97K photos
104 videos
22 files
3.19K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title: WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

📝 Summary:
WebSailor is a post-training method that enables open-source AI models to match the performance of proprietary agents in complex information-seeking tasks. It does this by instilling the ability to systematically reduce uncertainty, closing a key capability gap.

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: WebSailor: Navigating Super-human Reasoning for Web Agent

📝 Summary:
WebSailor is a post-training method that teaches open-source LLMs to reduce extreme uncertainty in complex information-seeking tasks. It matches the superhuman reasoning of proprietary agents, closing the capability gap.

🔹 Publication Date: Published on Jul 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

📝 Summary:
ThinkMorph is a unified model that enhances multimodal reasoning by generating complementary text-image steps that manipulate visual content with coherent verbal logic. It achieves significant performance gains, generalizes effectively, and demonstrates emergent multimodal intelligence, including...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27492
• PDF: https://arxiv.org/pdf/2510.27492
• Project Page: https://thinkmorph.github.io/
• Github: https://github.com/ThinkMorph/ThinkMorph

🔹 Models citing this paper:
https://huggingface.co/ThinkMorph/ThinkMorph-7B

Datasets citing this paper:
https://huggingface.co/datasets/ThinkMorph/Jigsaw_Assembly
https://huggingface.co/datasets/ThinkMorph/Visual_Search
https://huggingface.co/datasets/ThinkMorph/Chart_Refocus

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

📝 Summary:
OS-Sentinel is a new hybrid framework that improves safety detection for mobile AI agents. It combines a Formal Verifier with a VLM-based Contextual Judge to identify both explicit system violations and contextual risks, showing significant performance gains.

🔹 Publication Date: Published on Oct 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24411
• PDF: https://arxiv.org/pdf/2510.24411
• Github: https://github.com/OS-Copilot/OS-Sentinel

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

📝 Summary:
This paper compares FP and INT quantization, challenging the trend towards FP. It finds fine-grained MXINT8 outperforms FP in 8-bit formats for accuracy and efficiency. For 4-bit, FP often leads, but INT can surpass it, suggesting fine-grained INT offers a better balance for future AI accelerators.

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25602
• PDF: https://arxiv.org/pdf/2510.25602
• Github: https://github.com/ChenMnZ/INT_vs_FP

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

📝 Summary:
piRL enables online RL fine-tuning for flow-based VLA models, overcoming their unique RL challenges. It uses novel algorithms to significantly boost VLA model performance and generalization on robotic tasks.

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25889
• PDF: https://arxiv.org/pdf/2510.25889
• Project Page: https://rlinf.readthedocs.io/en/latest/rst_source/examples/pi0.html
• Github: https://github.com/RLinf/RLinf

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Continuous Autoregressive Language Models

📝 Summary:
LLM efficiency is hampered by sequential token generation. Continuous Autoregressive Language Models CALM address this by predicting continuous vectors, each representing multiple tokens. This significantly reduces generative steps, boosting efficiency and establishing a scalable path for ultra-e...

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27688
• PDF: https://arxiv.org/pdf/2510.27688
• Project Page: https://shaochenze.github.io/blog/2025/CALM/
• Github: https://shaochenze.github.io/blog/2025/CALM

🔹 Models citing this paper:
https://huggingface.co/cccczshao/CALM-M
https://huggingface.co/cccczshao/CALM-L
https://huggingface.co/cccczshao/CALM-XL

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

📝 Summary:
Spatial-SSRL is a self-supervised reinforcement learning method that enhances LVLM spatial understanding. It uses five pretext tasks derived from RGB or RGB-D images to generate verifiable signals, avoiding costly human supervision. This approach significantly improves spatial reasoning while mai...

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27606
• PDF: https://arxiv.org/pdf/2510.27606
• Github: https://github.com/InternLM/Spatial-SSRL

🔹 Models citing this paper:
https://huggingface.co/internlm/Spatial-SSRL-7B

Datasets citing this paper:
https://huggingface.co/datasets/internlm/Spatial-SSRL-81k

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

📝 Summary:
GUI agents are overconfident and unreliable in grounding. HyperClick improves reliability by a dual reward mechanism that calibrates spatial confidence, reducing overconfidence. It achieves state-of-the-art performance for dependable GUI automation.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27266
• PDF: https://arxiv.org/pdf/2510.27266
• Github: https://github.com/xiaomi-research/hyperclick

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Defeating the Training-Inference Mismatch via FP16

📝 Summary:
RL fine-tuning of LLMs is unstable due to a numerical mismatch caused by BF16s rounding errors. We found that simply using FP16 effectively resolves this issue, leading to more stable optimization, faster convergence, and stronger performance. This simple change requires no model or algorithm mod...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26788
• PDF: https://arxiv.org/pdf/2510.26788
• Github: https://github.com/sail-sg/Precision-RL

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

📝 Summary:
To overcome limitations of one-step DMD on complex generative tasks, Phased DMD proposes a multi-step distillation framework. It employs progressive distribution matching across SNR subintervals with score matching to enhance diversity and generative capabilities.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27684
• PDF: https://arxiv.org/pdf/2510.27684

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Revisiting Multimodal Positional Encoding in Vision-Language Models

📝 Summary:
This paper systematically analyzes multimodal Rotary Positional Embedding RoPE for vision-language models. It identifies key guidelines for its design and proposes MHRoPE and MRoPE-Interleave, simple variants that significantly improve multimodal understanding.

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.23095
• PDF: https://arxiv.org/pdf/2510.23095

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Higher-order Linear Attention

📝 Summary:
Higher-order Linear Attention HLA addresses the quadratic cost of standard attention. It offers a scalable causal streaming mechanism for higher-order interactions with constant state size and linear per-token computation. HLA combines attention-like mixing with efficient recurrent architectures.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27258
• PDF: https://arxiv.org/pdf/2510.27258
• Project Page: https://yifanzhang-pro.github.io/HLA
• Github: https://github.com/yifanzhang-pro/HLA

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

📝 Summary:
DUST is a novel dual-stream diffusion framework for world-model augmented VLAs. It resolves modality conflicts by using separate streams for vision and action, enabling joint prediction without a unified latent space. DUST achieves significant performance gains in both simulation and real-world r...

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27607
• PDF: https://arxiv.org/pdf/2510.27607

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: The Denario project: Deep knowledge AI agents for scientific discovery

📝 Summary:
Denario is an AI multi-agent system for scientific research. It handles tasks like idea generation, code execution, and paper drafting. It successfully generated multiple scientific papers across diverse disciplines, expert-evaluated.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26887
• PDF: https://arxiv.org/pdf/2510.26887
• Github: https://github.com/AstroPilot-AI/Denario

==================================

For more data science resources:
https://t.iss.one/DataScienceT
This media is not supported in your browser
VIEW IN TELEGRAM
Title: Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

📝 Summary:
This paper introduces BEAT, the first framework for visual backdoor attacks on MLLM embodied agents using object triggers. It uses diverse training data and Contrastive Trigger Learning to ensure precise backdoor activation. BEAT achieves high attack success and exposes a critical security risk.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27623
• PDF: https://arxiv.org/pdf/2510.27623
• Project Page: https://zqs1943.github.io/BEAT/

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery

📝 Summary:
This paper applies YOLOv11, a new deep learning model, for joint building instance segmentation and discrete height classification from satellite imagery. It achieves strong performance on the DFC2023 dataset, outperforming earlier models in accuracy and speed for urban mapping.

🔹 Publication Date: Published on Oct 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27224
• PDF: https://arxiv.org/pdf/2510.27224

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

📝 Summary:
This paper investigates RLVR for mathematical reasoning in LLMs using two combinatorial problems. It finds that while RLVR improves performance, it often reinforces superficial heuristics rather than genuine new reasoning strategies. This highlights RLVRs generalization limits and the need for be...

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.27044
• PDF: https://arxiv.org/pdf/2510.27044
• Github: https://github.com/xashru/rlvr-seq-generalization

==================================

For more data science resources:
https://t.iss.one/DataScienceT
Title: A Survey on Efficient Vision-Language-Action Models

📝 Summary:
This survey reviews Efficient Vision-Language-Action models Efficient VLAs, which address the high computational and data requirements of existing VLAs. It categorizes efficiency techniques into model design, training, and data collection, providing a comprehensive overview and future roadmap.

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24795
• PDF: https://arxiv.org/pdf/2510.24795
• Project Page: https://evla-survey.github.io/
• Github: https://github.com/YuZhaoshu/Efficient-VLAs-Survey

==================================

For more data science resources:
https://t.iss.one/DataScienceT