Data Science | Machine Learning with Python for Researchers
31.5K subscribers
1.56K photos
102 videos
22 files
1.84K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title: Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

🔹 Publication Date: Published on Sep 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01363
• PDF: https://arxiv.org/pdf/2509.01363

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00581
• PDF: https://arxiv.org/pdf/2509.00581

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Metis: Training Large Language Models with Advanced Low-Bit Quantization

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00404
• PDF: https://arxiv.org/pdf/2509.00404

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Fantastic Pretraining Optimizers and Where to Find Them

🔹 Publication Date: Published on Sep 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.02046
• PDF: https://arxiv.org/pdf/2509.02046

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: Benchmarking Optimizers for Large Language Model Pretraining

🔹 Publication Date: Published on Sep 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01440
• PDF: https://arxiv.org/pdf/2509.01440
• Github: https://github.com/epfml/llm-optimizer-benchmark

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00425
• PDF: https://arxiv.org/pdf/2509.00425

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: MobiAgent: A Systematic Framework for Customizable Mobile Agents

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00531
• PDF: https://arxiv.org/pdf/2509.00531
• Github: https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

🔹 Publication Date: Published on Sep 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01584
• PDF: https://arxiv.org/pdf/2509.01584
• Project Page: https://ganlinzhang.xyz/vista-slam/
• Github: https://github.com/zhangganlin/vista-slam

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: DCPO: Dynamic Clipping Policy Optimization

🔹 Publication Date: Published on Sep 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.02333
• PDF: https://arxiv.org/pdf/2509.02333
• Github: https://github.com/lime-RL/DCPO

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.20586
• PDF: https://arxiv.org/pdf/2508.20586
• Project Page: https://github.com/Zheng-Chong/FastFit
• Github: https://github.com/Zheng-Chong/FastFit

🔹 Datasets citing this paper:
https://huggingface.co/datasets/zhengchong/DressCode-MR

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.21496
• PDF: https://arxiv.org/pdf/2508.21496
• Github: https://github.com/hlsv02/ELV-Halluc

🔹 Datasets citing this paper:
https://huggingface.co/datasets/HLSv/ELV-Halluc

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: DynaGuard: A Dynamic Guardrail Model With User-Defined Policies

🔹 Publication Date: Published on Sep 2

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/tomg-group-umd/dynaguard-68af4d916ae81d06ef774523
• PDF: https://arxiv.org/pdf/2509.02563
• Project Page: https://taruschirag.github.io/DynaGuard/
• Github: https://github.com/montehoover/DynaGuard

🔹 Datasets citing this paper:
https://huggingface.co/datasets/tomg-group-umd/DynaBench

🔹 Spaces citing this paper:
https://huggingface.co/spaces/tomg-group-umd/DynaGuard
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Stairway to Fairness: Connecting Group and Individual Fairness

🔹 Publication Date: Published on Aug 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21334
• PDF: https://arxiv.org/pdf/2508.21334
• Github: https://github.com/theresiavr/stairway-to-fairness

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00605
• PDF: https://arxiv.org/pdf/2509.00605
• Github: https://github.com/rishiraj/gam

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs

🔹 Publication Date: Published on Sep 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01790
• PDF: https://arxiv.org/pdf/2509.01790

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices

🔹 Publication Date: Published on Sep 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.02523
• PDF: https://arxiv.org/pdf/2509.02523
• Github: https://github.com/moonshine-ai/moonshine

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: On the Theoretical Limitations of Embedding-Based Retrieval

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.21038
• PDF: https://arxiv.org/pdf/2508.21038
• Github: https://github.com/google-deepmind/limit

🔹 Datasets citing this paper:
https://huggingface.co/datasets/orionweller/LIMIT
https://huggingface.co/datasets/orionweller/LIMIT-small

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: Robix: A Unified Model for Robot Interaction, Reasoning and Planning

🔹 Publication Date: Published on Sep 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01106
• PDF: https://arxiv.org/pdf/2509.01106
• Project Page: https://robix-seed.github.io/robix/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3
🔹 Title: Open Data Synthesis For Deep Research

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00375
• PDF: https://arxiv.org/pdf/2509.00375
• Github: https://github.com/VectorSpaceLab/InfoSeek

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

🔹 Publication Date: Published on Jul 23

🔹 Abstract: InstructVLA is an end-to-end vision-language-action model that enhances manipulation performance while preserving vision-language reasoning through multimodal training and mixture-of-experts adaptation. AI-generated summary To operate effectively in the real world, robots must integrate multimodal reasoning with precise action generation . However, existing vision-language-action (VLA) models often sacrifice one for the other, narrow their abilities to task-specific manipulation data, and suffer catastrophic forgetting of pre-trained vision-language capabilities. To bridge this gap, we introduce InstructVLA, an end-to-end VLA model that preserves the flexible reasoning of large vision-language models (VLMs) while delivering leading manipulation performance. InstructVLA introduces a novel training paradigm, Vision-Language-Action Instruction Tuning (VLA-IT) , which employs multimodal training with mixture-of-experts adaptation to jointly optimize textual reasoning and action generation on both standard VLM corpora and a curated 650K-sample VLA-IT dataset. On in-domain SimplerEnv tasks , InstructVLA achieves 30.5% improvement over SpatialVLA. To evaluate generalization, we introduce SimplerEnv-Instruct , an 80-task benchmark requiring closed-loop control and high-level instruction understanding, where it outperforms a fine-tuned OpenVLA by 92% and an action expert aided by GPT-4o by 29%. Additionally, InstructVLA surpasses baseline VLMs on multimodal tasks and exhibits inference-time scaling by leveraging textual reasoning to boost manipulation performance in both simulated and real-world settings. These results demonstrate InstructVLA's potential for bridging intuitive and steerable human-robot interaction with efficient policy learning .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.17520

• PDF: https://arxiv.org/pdf/2507.17520

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT