Data Science | Machine Learning with Python for Researchers
31.5K subscribers
1.59K photos
102 videos
22 files
1.87K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title: MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

🔹 Publication Date: Published on Sep 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01977
• PDF: https://arxiv.org/pdf/2509.01977
• Project Page: https://bytedance-fanqie-ai.github.io/MOSAIC/
• Github: https://github.com/bytedance-fanqie-ai/MOSAIC

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

🔹 Publication Date: Published on Aug 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2509.00428
• PDF: https://arxiv.org/pdf/2509.00428
• Project Page: https://xavierjiezou.github.io/Face-MoGLE/
• Github: https://github.com/XavierJiezou/Face-MoGLE

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference

🔹 Publication Date: Published on Aug 6

🔹 Abstract: The paper diagnoses structural issues in AI conferences, including publication rates, carbon footprint, negative community sentiment, and logistical challenges, and proposes a Community-Federated Conference model to address these issues. AI-generated summary Artificial Intelligence (AI) conferences are essential for advancing research, sharing knowledge, and fostering academic community. However, their rapid expansion has rendered the centralized conference model increasingly unsustainable. This paper offers a data-driven diagnosis of a structural crisis that threatens the foundational goals of scientific dissemination, equity, and community well-being. We identify four key areas of strain: (1) scientifically, with per-author publication rates more than doubling over the past decade to over 4.5 papers annually; (2) environmentally, with the carbon footprint of a single conference exceeding the daily emissions of its host city; (3) psychologically, with 71% of online community discourse reflecting negative sentiment and 35% referencing mental health concerns; and (4) logistically, with attendance at top conferences such as NeurIPS 2024 beginning to outpace venue capacity. These pressures point to a system that is misaligned with its core mission. In response, we propose the Community-Federated Conference (CFC) model, which separates peer review, presentation, and networking into globally coordinated but locally organized components, offering a more sustainable, inclusive, and resilient path forward for AI research.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04586

• PDF: https://arxiv.org/pdf/2508.04586

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

🔹 Publication Date: Published on Sep 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.03405
• PDF: https://arxiv.org/pdf/2509.03405

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

🔹 Publication Date: Published on Aug 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.00930
• PDF: https://arxiv.org/pdf/2509.00930

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots

🔹 Publication Date: Published on Sep 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.02530
• PDF: https://arxiv.org/pdf/2509.02530
• Github: https://manipulation-as-in-simulation.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

🔹 Publication Date: Published on Sep 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2509.03403
• PDF: https://arxiv.org/pdf/2509.03403
• Github: https://github.com/Chenluye99/PROF

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Towards a Unified View of Large Language Model Post-Training

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04419
• PDF: https://arxiv.org/pdf/2509.04419
• Github: https://github.com/TsinghuaC3I/Unify-Post-Training

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks

🔹 Publication Date: Published on Sep 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.01396
• PDF: https://arxiv.org/pdf/2509.01396

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings

🔹 Publication Date: Published on Aug 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.18733
• PDF: https://arxiv.org/pdf/2508.18733
• Github: https://github.com/lllssc/Drawing2CAD

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.03888
• PDF: https://arxiv.org/pdf/2509.03888

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Transition Models: Rethinking the Generative Learning Objective

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04394
• PDF: https://arxiv.org/pdf/2509.04394
• Github: https://github.com/WZDTHU/TiM

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: From Editor to Dense Geometry Estimator

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04338
• PDF: https://arxiv.org/pdf/2509.04338
• Github: https://amap-ml.github.io/FE2E/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Few-step Flow for 3D Generation via Marginal-Data Transport Distillation

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04406
• PDF: https://arxiv.org/pdf/2509.04406
• Github: https://github.com/Zanue/MDT-dist

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04292
• PDF: https://arxiv.org/pdf/2509.04292
• Project Page: https://huggingface.co/datasets/m-a-p/Inverse_IFEval

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04011
• PDF: https://arxiv.org/pdf/2509.04011
• Project Page: https://huggingface.co/papers?q=contrastive%20projection%20network
• Github: https://github.com/ShacharOr100/ner_retriever

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Delta Activations: A Representation for Finetuned Large Language Models

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.04442
• PDF: https://arxiv.org/pdf/2509.04442
• Project Page: https://oscarxzq.github.io/delta_activation/
• Github: https://oscarxzq.github.io/delta_activation/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

🔹 Publication Date: Published on Sep 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.03867
• PDF: https://arxiv.org/pdf/2509.03867

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding

🔹 Publication Date: Published on Aug 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.20478
• PDF: https://arxiv.org/pdf/2508.20478

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

🔹 Publication Date: Published on Sep 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.03059
• PDF: https://arxiv.org/pdf/2509.03059

🔹 Datasets citing this paper:
https://huggingface.co/datasets/camel-ai/loong

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2