Data Science | Machine Learning with Python for Researchers
31.5K subscribers
1.58K photos
102 videos
22 files
1.85K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

3 Apr 2025 · Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li Yuan ·

The recent breakthroughs in OpenAI's #GPT4o model have demonstrated surprisingly good capabilities in image generation and editing, resulting in significant excitement in the community. This technical report presents the first-look evaluation benchmark (named GPT-ImgEval), quantitatively and qualitatively diagnosing GPT-4o's performance across three critical dimensions: (1) generation quality, (2) editing proficiency, and (3) world knowledge-informed semantic synthesis. Across all three tasks, GPT-4o demonstrates strong performance, significantly surpassing existing methods in both image generation control and output quality, while also showcasing exceptional knowledge reasoning capabilities. Furthermore, based on the GPT-4o's generated data, we propose a classification-model-based approach to investigate the underlying architecture of GPT-4o, where our empirical results suggest the model consists of an auto-regressive (AR) combined with a diffusion-based head for image decoding, rather than the VAR-like architectures. We also provide a complete speculation on GPT-4o's overall architecture. In addition, we conduct a series of analyses to identify and visualize GPT-4o's specific limitations and the synthetic artifacts commonly observed in its image generation. We also present a comparative study of multi-round image editing between GPT-4o and Gemini 2.0 Flash, and discuss the safety implications of GPT-4o's outputs, particularly their detectability by existing image forensic models. We hope that our work can offer valuable insight and provide a reliable benchmark to guide future research, foster reproducibility, and accelerate innovation in the field of image generation and beyond. The codes and datasets used for evaluating GPT-4o can be found at https://github.com/PicoTrex/GPT-ImgEval.


Paper: https://arxiv.org/pdf/2504.02782v1.pdf

Code: https://github.com/picotrex/gpt-imgeval

Dataset: MagicBrush - GenEval

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🍄 4D Mocap Human-Object 🍄

Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated parts—a rich resource for researchers and developers in the field.


⚡️ Review: https://t.ly/lCof3
⚡️ Paper: https://lnkd.in/dVVBDd_c
⚡️ Project: https://lnkd.in/dwBcseDf

#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
👍51🔥1
Forget Coding; start Vibing! Tell AI what you want, and watch it build your dream website while you enjoy a cup of coffee.

Date: Thursday, April 17th at 9 PM IST

Register for FREE: https://lu.ma/4nczknky?tk=eAT3Bi

Limited FREE Seat !!!!!!
👍2
This media is not supported in your browser
VIEW IN TELEGRAM
💥 Geo4D: VideoGen 4D Scene 💥

The Oxford VGG unveils Geo4D, a breakthrough in #videodiffusion for monocular 4D reconstruction. Trained only on synthetic data, Geo4D still achieves strong generalization to real-world scenarios. It outputs point maps, depth, and ray maps, setting a new #SOTA in dynamic scene reconstruction. Code is now released!


⚡️ Review: https://t.ly/X55Uj
⚡️ Paper: https://arxiv.org/pdf/2504.07961
⚡️ Project: https://geo4d.github.io/
⚡️ Code: https://github.com/jzr99/Geo4D

#Geo4D #4DReconstruction #DynamicScenes #OxfordVGG #ComputerVision #MachineLearning #DiffusionModels

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
2👍2
🔥ENTER VIP FOR FREE! ENTRY 24 HOURS FREE!

LISA TRADER - most successful trader for 2024. A week ago they finished a marathon in their vip channel where from $100 they made $2000, in just two weeks of time!

Entry to her channel cost : $1500 FOR 24 ENTRY FREE!

JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 General Attention-Based Object Detection 🔥

👉 GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.

👉 Review: https://t.ly/O7wqH
👉 Paper: https://lnkd.in/dc5VTUj9
👉 Project: https://lnkd.in/dzrt-qQV

#3DObjectDetection #Monocular3D #DeepLearning #WeakSupervision #ComputerVision #AI #MachineLearning #GATE3D

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
👍31
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

14 Apr 2025 · Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng ·

In this paper we tackle a fundamental question: "Can we train latent diffusion models together with the variational auto-encoder (VAE) tokenizer in an end-to-end manner?" Traditional deep-learning wisdom dictates that end-to-end training is often preferable when possible. However, for latent diffusion transformers, it is observed that end-to-end training both VAE and diffusion-model using standard diffusion-loss is ineffective, even causing a degradation in final performance. We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process. Despite its simplicity, the proposed training recipe (REPA-E) shows remarkable performance; speeding up diffusion model training by over 17x and 45x over REPA and vanilla training recipes, respectively. Interestingly, we observe that end-to-end tuning with REPA-E also improves the VAE itself; leading to improved latent space structure and downstream generation performance. In terms of final performance, our approach sets a new state-of-the-art; achieving FID of 1.26 and 1.83 with and without classifier-free guidance on ImageNet 256 x 256. Code is available at https://end2end-diffusion.github.io.


Paper: https://arxiv.org/pdf/2504.10483v1.pdf

Code: https://github.com/End2End-Diffusion/REPA-E

Dataset: ImageNet

https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3🔥1🙏1
Liquid: Language Models are Scalable Multi-modal Generators

5 Dec 2024 · Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai ·

We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100x in training costs while outperforming Chameleon in multimodal capabilities and maintaining language performance comparable to mainstream LLMs like LLAMA2. Liquid also outperforms models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. This work demonstrates that LLMs such as LLAMA3.2 and GEMMA2 are powerful multimodal generators, offering a scalable solution for enhancing both vision-language understanding and generation. The code and models will be released at https://github.com/FoundationVision/Liquid.


Paper: https://arxiv.org/pdf/2412.04332v2.pdf

Code: https://github.com/foundationvision/liquid

https://t.iss.one/DataScienceT 🖕
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces Describe Anything Model (DAM)

a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.

Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD

#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch

https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
This media is not supported in your browser
VIEW IN TELEGRAM
🌼 SOTA Textured 3D-Guided VTON 🌼

👉 #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released 💙

👉 Review: https://t.ly/0tjdC
👉 Paper: https://lnkd.in/dFseYSXz
👉 Project: https://lnkd.in/djtqzrzs
👉 Repo: TBA

#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis

https://t.iss.one/DataScienceT 🔗
Please open Telegram to view this post
VIEW IN TELEGRAM
2👍2
Forwarded from Python Courses
🚀 LunaProxy - The Most Cost-effective Residential Proxy Exclusive Benefits for Members of This Group: 💥 Residential Proxy: As low as $0.77 / GB. Use the discount code [lunapro30] when placing an order and save 30% immediately. ✔️ Over 200 million pure IPs | No charge for invalid ones | Success rate > 99.9% 💥 Unlimited Traffic Proxy: Enjoy a discount of up to 72%, only $79 / day. ✔️ Unlimited traffic | Unlimited concurrency | Bandwidth of over 100Gbps | Customized services | Save 90% of the cost when collecting AI/LLM data Join the Luna Affiliate Program and earn a 10% commission. There is no upper limit for the commission, and you can withdraw it at any time.
👉 Take action now: https://www.lunaproxy.com/?ls=data&lk=?01
Please open Telegram to view this post
VIEW IN TELEGRAM
1👍1
🚀 Master the Transformer Architecture with PyTorch! 🧠

Dive deep into the world of Transformers with this comprehensive PyTorch implementation guide. Whether you're a seasoned ML engineer or just starting out, this resource breaks down the complexities of the Transformer model, inspired by the groundbreaking paper "Attention Is All You Need".

🔗 Check it out here:
https://www.k-a.in/pyt-transformer.html

This guide offers:

🌟 Detailed explanations of each component of the Transformer architecture.

🌟 Step-by-step code implementations in PyTorch.

🌟 Insights into the self-attention mechanism and positional encoding.

By following along, you'll gain a solid understanding of how Transformers work and how to implement them from scratch.

#MachineLearning #DeepLearning #PyTorch #Transformer #AI #NLP #AttentionIsAllYouNeed #Coding #DataScience #NeuralNetworks


💯 BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟

🧠💻📊
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

8 Feb 2025 · Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang ·

Recently, large language model (#LLM) based text-to-speech (#TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with #XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.


Paper: https://arxiv.org/pdf/2502.05512v1.pdf

Code: https://github.com/index-tts/index-tts

https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍1
LettuceDetect: A Hallucination Detection Framework for RAG Applications

24 Feb 2025 · Ádám Kovács, Gábor Recski ·

Retrieval Augmented Generation (#RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of #LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.


Paper: https://arxiv.org/pdf/2502.17125v1.pdf

Code: https://github.com/KRLabsOrg/LettuceDetect

Colab: https://colab.research.google.com/drive/1Ubca5aMaBGdHtJ1rpqj3Ke9SLEr-PaDn?usp=sharing

https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

7 Apr 2025 · Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu ·

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (#LLMs) on home devices. While consumer hardware is getting stronger and model quantization is improving, existing end-side solutions still demand #GPU clusters, large RAM/VRAM, and high bandwidth, far beyond what a common home cluster can handle. This paper introduces prima.cpp, a distributed inference system that runs 70B-scale models on everyday home devices using a mix of CPU/GPU, low RAM/VRAM, Wi-Fi, and cross-platform support. It uses mmap to manage model weights and introduces piped-ring parallelism with prefetching to hide disk loading. By modeling heterogeneity in computation, communication, disk, memory (and its management behavior), and OS, it optimally assigns model layers to each device's #CPU and GPU, further reducing token latency. An elegant algorithm named Halda is proposed to solve this NP-hard assignment problem. We evaluate prima.cpp on a common four-node home cluster. It outperforms llama.cpp,# exo, and #dllama on 30B+ models while keeping memory pressure below 6%. This brings frontier 30B-70B models, such as #Llama 3, #DeepSeek R1, #Qwen 2.5, and #QwQ to home assistants, making advanced AI truly accessible to individuals. The code is open source and available at https://github.com/Lizonghang/prima.cpp.


Paper: https://arxiv.org/pdf/2504.08791v1.pdf

Code: https://github.com/lizonghang/prima.cpp

https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
👍5👏2
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

22 Nov 2024 · Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su ·

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, #BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current state-of-the-art results on the EmbodiedScan benchmark, achieving improvements of 5.69% in the 3D detection task and 15.25% in the 3D visual grounding task.


Paper: https://arxiv.org/pdf/2411.14869v2.pdf

Code: https://github.com/HorizonRobotics/BIP3D

HF: https://huggingface.co/spaces/AGC2024/visual-grounding-2024

Dataset: 10,000 People - Human Pose Recognition Data

https://t.iss.one/DataScienceT 😡
Please open Telegram to view this post
VIEW IN TELEGRAM
2👍2🔥2
LLM Engineer’s Handbook (2024)

🚀 Unlock the Future of AI with the LLM Engineer’s Handbook 🚀

Step into the world of Large Language Models (LLMs) with this comprehensive guide that takes you from foundational concepts to deploying advanced applications using LLMOps best practices. Whether you're an AI engineer, NLP professional, or LLM enthusiast, this book offers practical insights into designing, training, and deploying LLMs in real-world scenarios.

Why Choose the LLM Engineer’s Handbook?

Comprehensive Coverage: Learn about data engineering, supervised fine-tuning, and deployment strategies.

Hands-On Approach: Implement MLOps components through practical examples, including building an LLM-powered twin that's cost-effective, scalable, and modular.

Cutting-Edge Techniques: Explore inference optimization, preference alignment, and real-time data processing to apply LLMs effectively in your projects.

Real-World Applications: Move beyond isolated Jupyter notebooks and focus on building production-grade end-to-end LLM systems.


Limited-Time Offer

Originally priced at $55, the LLM Engineer’s Handbook is now available for just $25—a 55% discount! This special offer is available for a limited quantity, so act fast to secure your copy.

Who Should Read This Book?

This handbook is ideal for AI engineers, NLP professionals, and LLM engineers looking to deepen their understanding of LLMs. A basic knowledge of LLMs, Python, and AWS is recommended. Whether you're new to AI or seeking to enhance your skills, this book provides comprehensive guidance on implementing LLMs in real-world scenarios.

Don't miss this opportunity to advance your expertise in LLM engineering. Secure your discounted copy today and take the next step in your AI journey!

Buy book: https://www.patreon.com/DataScienceBooks/shop/llm-engineers-handbook-2024-1582908
👍51