Data Science | Machine Learning with Python for Researchers
31.5K subscribers
1.58K photos
102 videos
22 files
1.85K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

14 Apr 2025 ยท Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng ยท

In this paper we tackle a fundamental question: "Can we train latent diffusion models together with the variational auto-encoder (VAE) tokenizer in an end-to-end manner?" Traditional deep-learning wisdom dictates that end-to-end training is often preferable when possible. However, for latent diffusion transformers, it is observed that end-to-end training both VAE and diffusion-model using standard diffusion-loss is ineffective, even causing a degradation in final performance. We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process. Despite its simplicity, the proposed training recipe (REPA-E) shows remarkable performance; speeding up diffusion model training by over 17x and 45x over REPA and vanilla training recipes, respectively. Interestingly, we observe that end-to-end tuning with REPA-E also improves the VAE itself; leading to improved latent space structure and downstream generation performance. In terms of final performance, our approach sets a new state-of-the-art; achieving FID of 1.26 and 1.83 with and without classifier-free guidance on ImageNet 256 x 256. Code is available at https://end2end-diffusion.github.io.


Paper: https://arxiv.org/pdf/2504.10483v1.pdf

Code: https://github.com/End2End-Diffusion/REPA-E

Dataset: ImageNet

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘3๐Ÿ”ฅ1๐Ÿ™1
Liquid: Language Models are Scalable Multi-modal Generators

5 Dec 2024 ยท Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai ยท

We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100x in training costs while outperforming Chameleon in multimodal capabilities and maintaining language performance comparable to mainstream LLMs like LLAMA2. Liquid also outperforms models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. This work demonstrates that LLMs such as LLAMA3.2 and GEMMA2 are powerful multimodal generators, offering a scalable solution for enhancing both vision-language understanding and generation. The code and models will be released at https://github.com/FoundationVision/Liquid.


Paper: https://arxiv.org/pdf/2412.04332v2.pdf

Code: https://github.com/foundationvision/liquid

https://t.iss.one/DataScienceT ๐Ÿ–•
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces Describe Anything Model (DAM)

a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.

Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD

#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘5
This channels is for Programmers, Coders, Software Engineers.

0๏ธโƒฃ Python
1๏ธโƒฃ Data Science
2๏ธโƒฃ Machine Learning
3๏ธโƒฃ Data Visualization
4๏ธโƒฃ Artificial Intelligence
5๏ธโƒฃ Data Analysis
6๏ธโƒฃ Statistics
7๏ธโƒฃ Deep Learning
8๏ธโƒฃ programming Languages

โœ… https://t.iss.one/addlist/8_rRW2scgfRhOTc0

โœ… https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒผ SOTA Textured 3D-Guided VTON ๐ŸŒผ

๐Ÿ‘‰ #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released ๐Ÿ’™

๐Ÿ‘‰ Review: https://t.ly/0tjdC
๐Ÿ‘‰ Paper: https://lnkd.in/dFseYSXz
๐Ÿ‘‰ Project: https://lnkd.in/djtqzrzs
๐Ÿ‘‰ Repo: TBA

#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis

https://t.iss.one/DataScienceT ๐Ÿ”—
Please open Telegram to view this post
VIEW IN TELEGRAM
โค2๐Ÿ‘2
Forwarded from ENG. Hussein Sheikho
ูุฑุตุฉ ุนู…ู„ ุนู† ุจุนุฏ ๐Ÿง‘โ€๐Ÿ’ป
ู„ุง ูŠุชุทู„ุจ ุงูŠ ู…ุคู‡ู„ ุงูˆ ุฎุจุฑู‡ ุงู„ุดุฑูƒู‡ ุชู‚ุฏู… ุชุฏุฑูŠุจ ูƒุงู…ู„ โœจ
ุณุงุนุงุช ุงู„ุนู…ู„ ู…ุฑู†ู‡  โฐ
ูŠุชู… ุงู„ุชุณุฌูŠู„ ุซู… ุงู„ุชูˆุงุตู„ ู…ุนูƒ ู„ุญุถูˆุฑ ู„ู‚ุงุก ุชุนุฑูŠููŠ ุจุงู„ุนู…ู„ ูˆุงู„ุดุฑูƒู‡

https://forms.gle/hqUZXu7u4uLjEDPv8
Please open Telegram to view this post
VIEW IN TELEGRAM
Forwarded from Python Courses
๐Ÿš€ LunaProxy - The Most Cost-effective Residential Proxy Exclusive Benefits for Members of This Group: ๐Ÿ’ฅ Residential Proxy: As low as $0.77 / GB. Use the discount code [lunapro30] when placing an order and save 30% immediately. โœ”๏ธ Over 200 million pure IPs | No charge for invalid ones | Success rate > 99.9% ๐Ÿ’ฅ Unlimited Traffic Proxy: Enjoy a discount of up to 72%, only $79 / day. โœ”๏ธ Unlimited traffic | Unlimited concurrency | Bandwidth of over 100Gbps | Customized services | Save 90% of the cost when collecting AI/LLM data Join the Luna Affiliate Program and earn a 10% commission. There is no upper limit for the commission, and you can withdraw it at any time.
๐Ÿ‘‰ Take action now: https://www.lunaproxy.com/?ls=data&lk=?01
Please open Telegram to view this post
VIEW IN TELEGRAM
โค1๐Ÿ‘1
๐Ÿš€ Master the Transformer Architecture with PyTorch! ๐Ÿง 

Dive deep into the world of Transformers with this comprehensive PyTorch implementation guide. Whether you're a seasoned ML engineer or just starting out, this resource breaks down the complexities of the Transformer model, inspired by the groundbreaking paper "Attention Is All You Need".

๐Ÿ”— Check it out here:
https://www.k-a.in/pyt-transformer.html

This guide offers:

๐ŸŒŸ Detailed explanations of each component of the Transformer architecture.

๐ŸŒŸ Step-by-step code implementations in PyTorch.

๐ŸŒŸ Insights into the self-attention mechanism and positional encoding.

By following along, you'll gain a solid understanding of how Transformers work and how to implement them from scratch.

#MachineLearning #DeepLearning #PyTorch #Transformer #AI #NLP #AttentionIsAllYouNeed #Coding #DataScience #NeuralNetworks
๏ปฟ

๐Ÿ’ฏ BEST DATA SCIENCE CHANNELS ON TELEGRAM ๐ŸŒŸ

๐Ÿง ๐Ÿ’ป๐Ÿ“Š
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘1
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

8 Feb 2025 ยท Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang ยท

Recently, large language model (#LLM) based text-to-speech (#TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with #XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.


Paper: https://arxiv.org/pdf/2502.05512v1.pdf

Code: https://github.com/index-tts/index-tts

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘1
LettuceDetect: A Hallucination Detection Framework for RAG Applications

24 Feb 2025 ยท รdรกm Kovรกcs, Gรกbor Recski ยท

Retrieval Augmented Generation (#RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of #LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.


Paper: https://arxiv.org/pdf/2502.17125v1.pdf

Code: https://github.com/KRLabsOrg/LettuceDetect

Colab: https://colab.research.google.com/drive/1Ubca5aMaBGdHtJ1rpqj3Ke9SLEr-PaDn?usp=sharing

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘4
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

7 Apr 2025 ยท Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu ยท

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (#LLMs) on home devices. While consumer hardware is getting stronger and model quantization is improving, existing end-side solutions still demand #GPU clusters, large RAM/VRAM, and high bandwidth, far beyond what a common home cluster can handle. This paper introduces prima.cpp, a distributed inference system that runs 70B-scale models on everyday home devices using a mix of CPU/GPU, low RAM/VRAM, Wi-Fi, and cross-platform support. It uses mmap to manage model weights and introduces piped-ring parallelism with prefetching to hide disk loading. By modeling heterogeneity in computation, communication, disk, memory (and its management behavior), and OS, it optimally assigns model layers to each device's #CPU and GPU, further reducing token latency. An elegant algorithm named Halda is proposed to solve this NP-hard assignment problem. We evaluate prima.cpp on a common four-node home cluster. It outperforms llama.cpp,# exo, and #dllama on 30B+ models while keeping memory pressure below 6%. This brings frontier 30B-70B models, such as #Llama 3, #DeepSeek R1, #Qwen 2.5, and #QwQ to home assistants, making advanced AI truly accessible to individuals. The code is open source and available at https://github.com/Lizonghang/prima.cpp.


Paper: https://arxiv.org/pdf/2504.08791v1.pdf

Code: https://github.com/lizonghang/prima.cpp

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘5๐Ÿ‘2
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

22 Nov 2024 ยท Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su ยท

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, #BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current state-of-the-art results on the EmbodiedScan benchmark, achieving improvements of 5.69% in the 3D detection task and 15.25% in the 3D visual grounding task.


Paper: https://arxiv.org/pdf/2411.14869v2.pdf

Code: https://github.com/HorizonRobotics/BIP3D

HF: https://huggingface.co/spaces/AGC2024/visual-grounding-2024

Dataset: 10,000 People - Human Pose Recognition Data

https://t.iss.one/DataScienceT ๐Ÿ˜ก
Please open Telegram to view this post
VIEW IN TELEGRAM
โค2๐Ÿ‘2๐Ÿ”ฅ2
LLM Engineerโ€™s Handbook (2024)

๐Ÿš€ Unlock the Future of AI with the LLM Engineerโ€™s Handbook ๐Ÿš€

Step into the world of Large Language Models (LLMs) with this comprehensive guide that takes you from foundational concepts to deploying advanced applications using LLMOps best practices. Whether you're an AI engineer, NLP professional, or LLM enthusiast, this book offers practical insights into designing, training, and deploying LLMs in real-world scenarios.

Why Choose the LLM Engineerโ€™s Handbook?

Comprehensive Coverage: Learn about data engineering, supervised fine-tuning, and deployment strategies.

Hands-On Approach: Implement MLOps components through practical examples, including building an LLM-powered twin that's cost-effective, scalable, and modular.

Cutting-Edge Techniques: Explore inference optimization, preference alignment, and real-time data processing to apply LLMs effectively in your projects.

Real-World Applications: Move beyond isolated Jupyter notebooks and focus on building production-grade end-to-end LLM systems.


Limited-Time Offer

Originally priced at $55, the LLM Engineerโ€™s Handbook is now available for just $25โ€”a 55% discount! This special offer is available for a limited quantity, so act fast to secure your copy.

Who Should Read This Book?

This handbook is ideal for AI engineers, NLP professionals, and LLM engineers looking to deepen their understanding of LLMs. A basic knowledge of LLMs, Python, and AWS is recommended. Whether you're new to AI or seeking to enhance your skills, this book provides comprehensive guidance on implementing LLMs in real-world scenarios.

Don't miss this opportunity to advance your expertise in LLM engineering. Secure your discounted copy today and take the next step in your AI journey!

Buy book: https://www.patreon.com/DataScienceBooks/shop/llm-engineers-handbook-2024-1582908
๐Ÿ‘5โค1
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

๐Ÿ–ฅ Github: https://github.com/cdjkim/respec

๐Ÿ“• Paper: https://arxiv.org/abs/2504.14875v1

๐Ÿ”— Dataset: https://paperswithcode.com/task/informativeness
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘4
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

6 May 2025 ยท Andrew Zhao, Yiran Wu, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang ยท

Reinforcement learning with verifiable rewards (#RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where #AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (#AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall #SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.


Paper: https://arxiv.org/pdf/2505.03335v2.pdf

Code: https://arxiv.org/pdf/2505.03335v2.pdf

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘3
FastVLM: Efficient Vision Encoding for Vision Language Models

17 Dec 2024 ยท Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari ยท

Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (#VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as #ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency caused by stacked self-attention layers. At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency. Based on a comprehensive efficiency analysis of the interplay between image resolution, vision latency, token count, and #LLM size, we introduce #FastVLM, a model that achieves an optimized trade-off between latency, model size and accuracy. FastVLM incorporates #FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Unlike previous methods, FastVLM achieves the optimal balance between visual token count and image resolution solely by scaling the input image, eliminating the need for additional token pruning and simplifying the model design. In the LLaVA-1.5 setup, FastVLM achieves 3.2
improvement in time-to-first-token (TTFT) while maintaining similar performance on VLM benchmarks compared to prior works. Compared to LLaVa-OneVision at the highest resolution (11521152), FastVLM achieves comparable performance on key benchmarks like SeedBench and MMMU, using the same 0.5B LLM, but with 85 faster TTFT and a vision encoder that is 3.4 smaller.


Paper: https://arxiv.org/pdf/2412.13303v1.pdf

code: https://github.com/apple/ml-fastvlm

Datasets: GQA -TextVQA - ScienceQA

https://t.iss.one/DataScienceT ๐Ÿฆพ
๐Ÿ‘2
๐ŸŽ Your balance is credited $4,000 , the owner of the channel wants to contact you!

Dear subscriber, we would like to thank you very much for supporting our channel, and as a token of our gratitude we would like to provide you with free access to Lisa's investor channel, with the help of which you can earn today

t.iss.one/Lisainvestor

Be sure to take advantage of our gift, admission is free, don't miss the opportunity, change your life for the better.

You can follow the link :
https://t.iss.one/+0DQSCADFTUA3N2Qx
โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“€ 55+ AI and Data Science Projects


๐Ÿ’ป Often you read all these articles, watch online courses, but until you do a practical project, start coding, and implement the concepts in practice, you don't learn anything.


๐Ÿ”ธ Here is a list of 55 projects in different categories:๐Ÿ‘‡


1โƒฃ Large language models ๐Ÿ”ธ Link

๐Ÿ”ข Fine-tuning LLMs ๐Ÿ”ธ Link

๐Ÿ”ข Time series data analysis ๐Ÿ”ธ Link

๐Ÿ”ข Computer Vision ๐Ÿ”ธ Link

๐Ÿ”ข Data Science ๐Ÿ”ธ Link

โž–โž–โž–โž–โž–
โช You can also access all of the above projects through the following GitHub repo: ๐Ÿ‘‡

โ”Œ
๐Ÿ“‚ AI Data Guided Projects
โ””
๐Ÿฑ GitHub-Repos

Join to our WhatsApp ๐Ÿ’ฌchannel:
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘1๐Ÿ”ฅ1