Data Science | Machine Learning with Python for Researchers – Telegram

Data Science | Machine Learning with Python for Researchers

31.5K subscribers

1.59K photos

103 videos

22 files

1.86K links

Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT

Download Telegram

About

Blog

Apps

Platform

Data Science | Machine Learning with Python for Researchers

31.5K subscribers

Data Science | Machine Learning with Python for Researchers

Forwarded from Python Courses

🚀 LunaProxy - The Most Cost-effective Residential Proxy Exclusive Benefits for Members of This Group: 💥 Residential Proxy: As low as $0.77 / GB. Use the discount code [lunapro30] when placing an order and save 30% immediately. ✔️ Over 200 million pure IPs | No charge for invalid ones | Success rate > 99.9% 💥 Unlimited Traffic Proxy: Enjoy a discount of up to 72%, only $79 / day. ✔️ Unlimited traffic | Unlimited concurrency | Bandwidth of over 100Gbps | Customized services | Save 90% of the cost when collecting AI/LLM data Join the Luna Affiliate Program and earn a 10% commission. There is no upper limit for the commission, and you can withdraw it at any time.
👉 Take action now: https://www.lunaproxy.com/?ls=data&lk=?01

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1👍1

1.83K views06:40

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

🚀 Master the Transformer Architecture with PyTorch! 🧠

Dive deep into the world of Transformers with this comprehensive PyTorch implementation guide. Whether you're a seasoned ML engineer or just starting out, this resource breaks down the complexities of the Transformer model, inspired by the groundbreaking paper "Attention Is All You Need".

🔗 Check it out here:
https://www.k-a.in/pyt-transformer.html

This guide offers:

🌟 Detailed explanations of each component of the Transformer architecture.

🌟 Step-by-step code implementations in PyTorch.

🌟 Insights into the self-attention mechanism and positional encoding.

By following along, you'll gain a solid understanding of how Transformers work and how to implement them from scratch.

#MachineLearning #DeepLearning #PyTorch #Transformer #AI #NLP #AttentionIsAllYouNeed #Coding #DataScience #NeuralNetworks

💯

BEST DATA SCIENCE CHANNELS ON TELEGRAM

🌟

🧠

💻

📊

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.49K views05:52

Data Science | Machine Learning with Python for Researchers

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

8 Feb 2025 · Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang ·

Recently, large language model (#LLM) based text-to-speech (#TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with #XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.

Paper: https://arxiv.org/pdf/2502.05512v1.pdf

Code: https://github.com/index-tts/index-tts

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.88K viewsedited 10:12

Data Science | Machine Learning with Python for Researchers

LettuceDetect: A Hallucination Detection Framework for RAG Applications

24 Feb 2025 · Ádám Kovács, Gábor Recski ·

Retrieval Augmented Generation (#RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of #LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.

Paper: https://arxiv.org/pdf/2502.17125v1.pdf

Code: https://github.com/KRLabsOrg/LettuceDetect

Colab: https://colab.research.google.com/drive/1Ubca5aMaBGdHtJ1rpqj3Ke9SLEr-PaDn?usp=sharing

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍4

2.99K views13:23

Data Science | Machine Learning with Python for Researchers

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

7 Apr 2025 · Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu ·

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (#LLMs) on home devices. While consumer hardware is getting stronger and model quantization is improving, existing end-side solutions still demand #GPU clusters, large RAM/VRAM, and high bandwidth, far beyond what a common home cluster can handle. This paper introduces prima.cpp, a distributed inference system that runs 70B-scale models on everyday home devices using a mix of CPU/GPU, low RAM/VRAM, Wi-Fi, and cross-platform support. It uses mmap to manage model weights and introduces piped-ring parallelism with prefetching to hide disk loading. By modeling heterogeneity in computation, communication, disk, memory (and its management behavior), and OS, it optimally assigns model layers to each device's #CPU and GPU, further reducing token latency. An elegant algorithm named Halda is proposed to solve this NP-hard assignment problem. We evaluate prima.cpp on a common four-node home cluster. It outperforms llama.cpp,# exo, and #dllama on 30B+ models while keeping memory pressure below 6%. This brings frontier 30B-70B models, such as #Llama 3, #DeepSeek R1, #Qwen 2.5, and #QwQ to home assistants, making advanced AI truly accessible to individuals. The code is open source and available at https://github.com/Lizonghang/prima.cpp.

Paper: https://arxiv.org/pdf/2504.08791v1.pdf

Code: https://github.com/lizonghang/prima.cpp

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍5👏2

2.79K viewsedited 05:42

Data Science | Machine Learning with Python for Researchers

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

22 Nov 2024 · Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su ·

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, #BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current state-of-the-art results on the EmbodiedScan benchmark, achieving improvements of 5.69% in the 3D detection task and 15.25% in the 3D visual grounding task.

Paper: https://arxiv.org/pdf/2411.14869v2.pdf

Code: https://github.com/HorizonRobotics/BIP3D

HF: https://huggingface.co/spaces/AGC2024/visual-grounding-2024

Dataset: 10,000 People - Human Pose Recognition Data

https://t.iss.one/DataScienceT

😡

Please open Telegram to view this post

VIEW IN TELEGRAM

❤2👍2🔥2

3.04K views11:53

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

LLM Engineer’s Handbook (2024)

🚀 Unlock the Future of AI with the LLM Engineer’s Handbook 🚀

Step into the world of Large Language Models (LLMs) with this comprehensive guide that takes you from foundational concepts to deploying advanced applications using LLMOps best practices. Whether you're an AI engineer, NLP professional, or LLM enthusiast, this book offers practical insights into designing, training, and deploying LLMs in real-world scenarios.

Why Choose the LLM Engineer’s Handbook?

Comprehensive Coverage: Learn about data engineering, supervised fine-tuning, and deployment strategies.

Hands-On Approach: Implement MLOps components through practical examples, including building an LLM-powered twin that's cost-effective, scalable, and modular.

Cutting-Edge Techniques: Explore inference optimization, preference alignment, and real-time data processing to apply LLMs effectively in your projects.

Real-World Applications: Move beyond isolated Jupyter notebooks and focus on building production-grade end-to-end LLM systems.

Limited-Time Offer

Originally priced at $55, the LLM Engineer’s Handbook is now available for just $25—a 55% discount! This special offer is available for a limited quantity, so act fast to secure your copy.

Who Should Read This Book?

This handbook is ideal for AI engineers, NLP professionals, and LLM engineers looking to deepen their understanding of LLMs. A basic knowledge of LLMs, Python, and AWS is recommended. Whether you're new to AI or seeking to enhance your skills, this book provides comprehensive guidance on implementing LLMs in real-world scenarios.

Don't miss this opportunity to advance your expertise in LLM engineering. Secure your discounted copy today and take the next step in your AI journey!

Buy book: https://www.patreon.com/DataScienceBooks/shop/llm-engineers-handbook-2024-1582908

👍5❤1

2.34K views11:20

Data Science | Machine Learning with Python for Researchers

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

🖥

Github: https://github.com/cdjkim/respec

📕

Paper: https://arxiv.org/abs/2504.14875v1

🔗 Dataset: https://paperswithcode.com/task/informativeness

Please open Telegram to view this post

VIEW IN TELEGRAM

👍4

2.95K viewsedited 06:46

Data Science | Machine Learning with Python for Researchers

Forwarded from Data Science Premium (Books & Courses)

Join to our WhatsApp channel 📱

Tell your friends
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

Python | Machine Learning | Data Science | WhatsApp Channel

Python | Machine Learning | Data Science WhatsApp Channel. Welcome to our official WhatsApp Channel – your daily dose of AI, Python, and cutting-edge technology!

Here, we share:

Python tutorials and ready-to-use code snippets

AI & machine learning tips…

👍1

1.83K views13:45

Data Science | Machine Learning with Python for Researchers

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

6 May 2025 · Andrew Zhao, Yiran Wu, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang ·

Reinforcement learning with verifiable rewards (#RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where #AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (#AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall #SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.

Paper: https://arxiv.org/pdf/2505.03335v2.pdf

Code: https://arxiv.org/pdf/2505.03335v2.pdf

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍3

2.82K viewsedited 06:24

Data Science | Machine Learning with Python for Researchers

FastVLM: Efficient Vision Encoding for Vision Language Models

17 Dec 2024 · Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari ·

Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (#VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as #ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency caused by stacked self-attention layers. At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency. Based on a comprehensive efficiency analysis of the interplay between image resolution, vision latency, token count, and #LLM size, we introduce #FastVLM, a model that achieves an optimized trade-off between latency, model size and accuracy. FastVLM incorporates #FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Unlike previous methods, FastVLM achieves the optimal balance between visual token count and image resolution solely by scaling the input image, eliminating the need for additional token pruning and simplifying the model design. In the LLaVA-1.5 setup, FastVLM achieves 3.2
improvement in time-to-first-token (TTFT) while maintaining similar performance on VLM benchmarks compared to prior works. Compared to LLaVa-OneVision at the highest resolution (11521152), FastVLM achieves comparable performance on key benchmarks like SeedBench and MMMU, using the same 0.5B LLM, but with 85 faster TTFT and a vision encoder that is 3.4 smaller.

Paper: https://arxiv.org/pdf/2412.13303v1.pdf

code: https://github.com/apple/ml-fastvlm

Datasets: GQA -TextVQA - ScienceQA

https://t.iss.one/DataScienceT 🦾

👍2

2.78K views16:37

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

🎁 Your balance is credited $4,000 , the owner of the channel wants to contact you!

Dear subscriber, we would like to thank you very much for supporting our channel, and as a token of our gratitude we would like to provide you with free access to Lisa's investor channel, with the help of which you can earn today

t.iss.one/Lisainvestor

Be sure to take advantage of our gift, admission is free, don't miss the opportunity, change your life for the better.

You can follow the link :
https://t.iss.one/+0DQSCADFTUA3N2Qx

❤2

1.39K views14:36

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

This media is not supported in your browser

VIEW IN TELEGRAM

📀

55+ AI and Data Science Projects

💻 Often you read all these articles, watch online courses, but until you do a practical project, start coding, and implement the concepts in practice, you don't learn anything.

🔸

Here is a list of 55 projects in different categories:

👇

1⃣

Large language models

🔸

🔢

Fine-tuning LLMs

🔸

Link

🔢

Time series data analysis

🔸

🔢

Computer Vision

🔸

🔢

Data Science

🔸

➖

➖

➖

➖

➖

⏪ You can also access all of the above projects through the following GitHub repo: 👇

┌ 📂 AI Data Guided Projects
└

🐱

GitHub-Repos

Join to our WhatsApp 💬channel:
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1🔥1

1.72K views21:20

Data Science | Machine Learning with Python for Researchers

FastVLM: Efficient Vision Encoding for Vision Language Models

17 Dec 2024 · Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari ·

Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as #ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency caused by stacked self-attention layers. At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency. Based on a comprehensive efficiency analysis of the interplay between image resolution, vision latency, token count, and LLM size, we introduce FastVLM, a model that achieves an optimized trade-off between latency, model size and accuracy. FastVLM incorporates FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Unlike previous methods, FastVLM achieves the optimal balance between visual token count and image resolution solely by scaling the input image, eliminating the need for additional token pruning and simplifying the model design. In the LLaVA-1.5 setup, FastVLM achieves 3.2
improvement in time-to-first-token (TTFT) while maintaining similar performance on VLM benchmarks compared to prior works. Compared to LLaVa-OneVision at the highest resolution (11521152), #FastVLM achieves comparable performance on key benchmarks like SeedBench and MMMU, using the same 0.5B #LLM, but with 85 faster TTFT and a vision encoder that is 3.4 smaller.

Paper: https://arxiv.org/pdf/2412.13303v1.pdf

Code: https://github.com/apple/ml-fastvlm

Datasets: GQA - TextVQA - ScienceQA

https://t.iss.one/DataScienceT

📱 WhatsApp Channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.23K viewsedited 06:23

Data Science | Machine Learning with Python for Researchers

Generating Physically Stable and Buildable LEGO Designs from Text

8 May 2025 · Ava Pun, Kangle Deng, Ruixuan Liu, Deva Ramanan, Changliu Liu, Jun-Yan Zhu ·

We introduce #LegoGPT, the first approach for generating physically stable LEGO brick models from text prompts. To achieve this, we construct a large-scale, physically stable dataset of LEGO designs, along with their associated captions, and train an autoregressive large language model to predict the next brick to add via next-token prediction. To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints. Our experiments show that LegoGPT produces stable, diverse, and aesthetically pleasing LEGO designs that align closely with the input text prompts. We also develop a text-based LEGO texturing method to generate colored and textured designs. We show that our designs can be assembled manually by humans and automatically by robotic arms. We also release our new dataset, StableText2Lego, containing over 47,000 LEGO structures of over 28,000 unique 3D objects accompanied by detailed captions, along with our code and models at the project website: https://avalovelace1.github.io/LegoGPT/

Paper: https://arxiv.org/pdf/2505.05469v1.pdf

Code: https://github.com/AvaLovelace1/LegoGPT

Quick start: https://huggingface.co/spaces/cmu-gil/LegoGPT-Demo

Dataset: StableText2Lego

📱 WhatsApp Channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍3

2.14K views06:32

Data Science | Machine Learning with Python for Researchers

This media is not supported in your browser

VIEW IN TELEGRAM

🩷 Dance meets #ComputerVision

🩷

Saint-Étienne University has introduced a new 3D human body pose estimation pipeline designed specifically for dance analysis.
Check out the project page featuring results and an interactive demo! 💙

👉 Paper review: https://t.ly/JEdM3

👉 Full paper: https://arxiv.org/pdf/2505.07249

👉 Project page: https://lnkd.in/dD5dsMv5

#DanceAnalysis #3DPoseEstimation #DeepLearning #HumanPose #AI #MachineLearning #ComputerVisionResearch

🔗 Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.17K views15:08

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

✅

https://t.iss.one/Codeprogrammer

Please open Telegram to view this post

VIEW IN TELEGRAM

1.15K views20:27

Data Science | Machine Learning with Python for Researchers

Flow-GRPO: Training Flow Matching Models via Online RL

8 May 2025 · Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang ·

We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original inference timestep number, significantly improving sampling efficiency without performance degradation. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For complex compositions, RL-tuned SD3.5 generates nearly perfect object counts, spatial relations, and fine-grained attributes, boosting GenEval accuracy from 63% to 95%. In visual text rendering, its accuracy improves from 59% to 92%, significantly enhancing text generation. Flow-GRPO also achieves substantial gains in human preference alignment. Notably, very little reward hacking occurred, meaning rewards did not increase at the cost of appreciable image quality or diversity degradation.

Paper: https://arxiv.org/pdf/2505.05470v2.pdf

code: https://github.com/yifan123/flow_grpo

HG: https://huggingface.co/spaces/jieliu/SD3.5-M-Flow-GRPO

Datasets: DrawBench - GenEval - T2I-CompBench

Notes: Ranked #1 on Text-to-Image Generation on GenEval

🔗 Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍2

2.17K viewsedited 06:15

Data Science | Machine Learning with Python for Researchers

UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

9 May 2025 · Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, Hongyang Li ·

A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA's potential to facilitate scalable and efficient robot policy learning.

Paper: https://arxiv.org/pdf/2505.06111v2.pdf

Code: https://github.com/opendrivelab/univla

Datasets: R2R - VLN-CE - Open-X-Embodiment

🔗 Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.29K viewsedited 06:29

Data Science | Machine Learning with Python for Researchers

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

24 Apr 2025 · Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang ·

Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, where it constructs a high-level roadmap, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files; analysis, which focuses on interpreting implementation-specific details; and generation, where modular, dependency-aware code is produced. Moreover, each phase is instantiated through a set of specialized agents designed to collaborate effectively across the pipeline. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations, specifically from the original paper authors, with author-released repositories as ground truth if available. Our results demonstrate the effectiveness of PaperCoder in creating high-quality, faithful implementations. Furthermore, it consistently shows strengths in the recently released PaperBench benchmark, surpassing strong baselines by substantial margins. Code is available at: https://github.com/going-doer/Paper2Code.

Paper: https://arxiv.org/pdf/2504.17192v2.pdf

Code: https://github.com/going-doer/paper2code

📱 WhatsApp Channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

👍4

2.38K views17:52