Forwarded from Python | Machine Learning | Coding | R
Access whitepapers, podcasts, code labs, & recorded livestreams. Additionally, there is a bonus assignment for you!
https://www.kaggle.com/learn-guide/5-day-genai
#GenerativeAI #GoogleAI #AICourse #SelfPacedLearning #MachineLearning #DeepLearning #Kaggle #AICommunity #TechEducation #AIforEveryone
Please open Telegram to view this post
VIEW IN TELEGRAM
Kaggle
5-Day Gen AI Intensive Course with Google
Kaggle is the worldโs largest data science community with powerful tools and resources to help you achieve your data science goals.
โคโ๐ฅ2
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Paper: https://arxiv.org/pdf/2504.10483v1.pdf
Code: https://github.com/End2End-Diffusion/REPA-E
Dataset: ImageNet
https://t.iss.one/DataScienceTโ
14 Apr 2025 ยท Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng ยท
In this paper we tackle a fundamental question: "Can we train latent diffusion models together with the variational auto-encoder (VAE) tokenizer in an end-to-end manner?" Traditional deep-learning wisdom dictates that end-to-end training is often preferable when possible. However, for latent diffusion transformers, it is observed that end-to-end training both VAE and diffusion-model using standard diffusion-loss is ineffective, even causing a degradation in final performance. We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process. Despite its simplicity, the proposed training recipe (REPA-E) shows remarkable performance; speeding up diffusion model training by over 17x and 45x over REPA and vanilla training recipes, respectively. Interestingly, we observe that end-to-end tuning with REPA-E also improves the VAE itself; leading to improved latent space structure and downstream generation performance. In terms of final performance, our approach sets a new state-of-the-art; achieving FID of 1.26 and 1.83 with and without classifier-free guidance on ImageNet 256 x 256. Code is available at https://end2end-diffusion.github.io.
Paper: https://arxiv.org/pdf/2504.10483v1.pdf
Code: https://github.com/End2End-Diffusion/REPA-E
Dataset: ImageNet
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐3๐ฅ1๐1
Liquid: Language Models are Scalable Multi-modal Generators
Paper: https://arxiv.org/pdf/2412.04332v2.pdf
Code: https://github.com/foundationvision/liquid
https://t.iss.one/DataScienceT๐
5 Dec 2024 ยท Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai ยท
We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100x in training costs while outperforming Chameleon in multimodal capabilities and maintaining language performance comparable to mainstream LLMs like LLAMA2. Liquid also outperforms models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. This work demonstrates that LLMs such as LLAMA3.2 and GEMMA2 are powerful multimodal generators, offering a scalable solution for enhancing both vision-language understanding and generation. The code and models will be released at https://github.com/FoundationVision/Liquid.
Paper: https://arxiv.org/pdf/2412.04332v2.pdf
Code: https://github.com/foundationvision/liquid
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐2
This media is not supported in your browser
VIEW IN TELEGRAM
NVIDIA introduces Describe Anything Model (DAM)
a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.
Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD
a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.
Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD
#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch
https://t.iss.one/DataScienceTโ
Please open Telegram to view this post
VIEW IN TELEGRAM
๐5
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0๏ธโฃ Python
1๏ธโฃ Data Science
2๏ธโฃ Machine Learning
3๏ธโฃ Data Visualization
4๏ธโฃ Artificial Intelligence
5๏ธโฃ Data Analysis
6๏ธโฃ Statistics
7๏ธโฃ Deep Learning
8๏ธโฃ programming Languages
โ
https://t.iss.one/addlist/8_rRW2scgfRhOTc0
โ
https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
๐2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ผ SOTA Textured 3D-Guided VTON ๐ผ
๐ #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released ๐
๐ Review: https://t.ly/0tjdC
๐ Paper: https://lnkd.in/dFseYSXz
๐ Project: https://lnkd.in/djtqzrzs
๐ Repo: TBA
#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis
https://t.iss.one/DataScienceT๐
๐ #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released ๐
๐ Review: https://t.ly/0tjdC
๐ Paper: https://lnkd.in/dFseYSXz
๐ Project: https://lnkd.in/djtqzrzs
๐ Repo: TBA
#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
โค2๐2
Forwarded from ENG. Hussein Sheikho
ูุฑุตุฉ ุนู
ู ุนู ุจุนุฏ ๐งโ๐ป
ูุง ูุชุทูุจ ุงู ู ุคูู ุงู ุฎุจุฑู ุงูุดุฑูู ุชูุฏู ุชุฏุฑูุจ ูุงู ูโจ
ุณุงุนุงุช ุงูุนู ู ู ุฑููโฐ
ูุชู ุงูุชุณุฌูู ุซู ุงูุชูุงุตู ู ุนู ูุญุถูุฑ ููุงุก ุชุนุฑููู ุจุงูุนู ู ูุงูุดุฑูู
https://forms.gle/hqUZXu7u4uLjEDPv8
ูุง ูุชุทูุจ ุงู ู ุคูู ุงู ุฎุจุฑู ุงูุดุฑูู ุชูุฏู ุชุฏุฑูุจ ูุงู ู
ุณุงุนุงุช ุงูุนู ู ู ุฑูู
ูุชู ุงูุชุณุฌูู ุซู ุงูุชูุงุตู ู ุนู ูุญุถูุฑ ููุงุก ุชุนุฑููู ุจุงูุนู ู ูุงูุดุฑูู
https://forms.gle/hqUZXu7u4uLjEDPv8
Please open Telegram to view this post
VIEW IN TELEGRAM
Google Docs
ูุฑุตุฉ ุนู
ู
ุงูุนู
ู ู
ู ุงูู
ูุฒู ูู ุจุจุณุงุทุฉ ุญู ูู
ุดููุฉ ุงูุจุทุงูุฉ ููุดุจุงุจ ุงูุนุฑุจู ูููู ุงูุจุดุฑ ุญูู ุงูุนุงูู
ุ๐ ุงูู ุทุฑููู ูููุตูู ุงูู ุงูุญุฑูุฉ ุงูู
ุงููุฉ ูุจุนูุฏุงู ุนู ุดุบู ุงููุธููุฉ ุงูุญููู
ูุฉ ุงูู
ู
ูุฉ ูุงูู
ุฑุชุจุงุช ุงูุถุนููุฉ..
ุฃุตุจุญ ุงูุฑุจุญ ู ู ุงูุงูุชุฑูุช ุฃู ุฑ ุญูููู ูููุณ ููู ..๐ค
ููุฏู ูู ูุฑุตุฉ ุงูุขู ู ู ุบูุฑ ุฃู ุดูุงุฏุงุชโฆ
ุฃุตุจุญ ุงูุฑุจุญ ู ู ุงูุงูุชุฑูุช ุฃู ุฑ ุญูููู ูููุณ ููู ..๐ค
ููุฏู ูู ูุฑุตุฉ ุงูุขู ู ู ุบูุฑ ุฃู ุดูุงุฏุงุชโฆ
Forwarded from Python Courses
Please open Telegram to view this post
VIEW IN TELEGRAM
โค1๐1
Forwarded from Python | Machine Learning | Coding | R
Dive deep into the world of Transformers with this comprehensive PyTorch implementation guide. Whether you're a seasoned ML engineer or just starting out, this resource breaks down the complexities of the Transformer model, inspired by the groundbreaking paper "Attention Is All You Need".
https://www.k-a.in/pyt-transformer.html
This guide offers:
By following along, you'll gain a solid understanding of how Transformers work and how to implement them from scratch.
#MachineLearning #DeepLearning #PyTorch #Transformer #AI #NLP #AttentionIsAllYouNeed #Coding #DataScience #NeuralNetworks๏ปฟ
Please open Telegram to view this post
VIEW IN TELEGRAM
๐1
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper: https://arxiv.org/pdf/2502.05512v1.pdf
Code: https://github.com/index-tts/index-tts
https://t.iss.one/DataScienceTโ
8 Feb 2025 ยท Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang ยท
Recently, large language model (#LLM) based text-to-speech (#TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with #XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.
Paper: https://arxiv.org/pdf/2502.05512v1.pdf
Code: https://github.com/index-tts/index-tts
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐1
LettuceDetect: A Hallucination Detection Framework for RAG Applications
24 Feb 2025 ยท รdรกm Kovรกcs, Gรกbor Recski ยท
Paper: https://arxiv.org/pdf/2502.17125v1.pdf
Code: https://github.com/KRLabsOrg/LettuceDetect
Colab: https://colab.research.google.com/drive/1Ubca5aMaBGdHtJ1rpqj3Ke9SLEr-PaDn?usp=sharing
https://t.iss.one/DataScienceTโ
24 Feb 2025 ยท รdรกm Kovรกcs, Gรกbor Recski ยท
Retrieval Augmented Generation (#RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of #LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.
Paper: https://arxiv.org/pdf/2502.17125v1.pdf
Code: https://github.com/KRLabsOrg/LettuceDetect
Colab: https://colab.research.google.com/drive/1Ubca5aMaBGdHtJ1rpqj3Ke9SLEr-PaDn?usp=sharing
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐4
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
7 Apr 2025 ยท Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu ยท
Paper: https://arxiv.org/pdf/2504.08791v1.pdf
Code: https://github.com/lizonghang/prima.cpp
https://t.iss.one/DataScienceTโ
7 Apr 2025 ยท Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu ยท
Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (#LLMs) on home devices. While consumer hardware is getting stronger and model quantization is improving, existing end-side solutions still demand #GPU clusters, large RAM/VRAM, and high bandwidth, far beyond what a common home cluster can handle. This paper introduces prima.cpp, a distributed inference system that runs 70B-scale models on everyday home devices using a mix of CPU/GPU, low RAM/VRAM, Wi-Fi, and cross-platform support. It uses mmap to manage model weights and introduces piped-ring parallelism with prefetching to hide disk loading. By modeling heterogeneity in computation, communication, disk, memory (and its management behavior), and OS, it optimally assigns model layers to each device's #CPU and GPU, further reducing token latency. An elegant algorithm named Halda is proposed to solve this NP-hard assignment problem. We evaluate prima.cpp on a common four-node home cluster. It outperforms llama.cpp,# exo, and #dllama on 30B+ models while keeping memory pressure below 6%. This brings frontier 30B-70B models, such as #Llama 3, #DeepSeek R1, #Qwen 2.5, and #QwQ to home assistants, making advanced AI truly accessible to individuals. The code is open source and available at https://github.com/Lizonghang/prima.cpp.
Paper: https://arxiv.org/pdf/2504.08791v1.pdf
Code: https://github.com/lizonghang/prima.cpp
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐5๐2
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
22 Nov 2024 ยท Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su ยท
Paper: https://arxiv.org/pdf/2411.14869v2.pdf
Code: https://github.com/HorizonRobotics/BIP3D
HF: https://huggingface.co/spaces/AGC2024/visual-grounding-2024
Dataset: 10,000 People - Human Pose Recognition Data
https://t.iss.one/DataScienceT๐ก
22 Nov 2024 ยท Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su ยท
In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, #BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current state-of-the-art results on the EmbodiedScan benchmark, achieving improvements of 5.69% in the 3D detection task and 15.25% in the 3D visual grounding task.
Paper: https://arxiv.org/pdf/2411.14869v2.pdf
Code: https://github.com/HorizonRobotics/BIP3D
HF: https://huggingface.co/spaces/AGC2024/visual-grounding-2024
Dataset: 10,000 People - Human Pose Recognition Data
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
โค2๐2๐ฅ2
Forwarded from Python | Machine Learning | Coding | R
LLM Engineerโs Handbook (2024)
๐ Unlock the Future of AI with the LLM Engineerโs Handbook ๐
Step into the world of Large Language Models (LLMs) with this comprehensive guide that takes you from foundational concepts to deploying advanced applications using LLMOps best practices. Whether you're an AI engineer, NLP professional, or LLM enthusiast, this book offers practical insights into designing, training, and deploying LLMs in real-world scenarios.
Why Choose the LLM Engineerโs Handbook?
Comprehensive Coverage: Learn about data engineering, supervised fine-tuning, and deployment strategies.
Hands-On Approach: Implement MLOps components through practical examples, including building an LLM-powered twin that's cost-effective, scalable, and modular.
Cutting-Edge Techniques: Explore inference optimization, preference alignment, and real-time data processing to apply LLMs effectively in your projects.
Real-World Applications: Move beyond isolated Jupyter notebooks and focus on building production-grade end-to-end LLM systems.
Limited-Time Offer
Originally priced at $55, the LLM Engineerโs Handbook is now available for just $25โa 55% discount! This special offer is available for a limited quantity, so act fast to secure your copy.
Who Should Read This Book?
This handbook is ideal for AI engineers, NLP professionals, and LLM engineers looking to deepen their understanding of LLMs. A basic knowledge of LLMs, Python, and AWS is recommended. Whether you're new to AI or seeking to enhance your skills, this book provides comprehensive guidance on implementing LLMs in real-world scenarios.
Don't miss this opportunity to advance your expertise in LLM engineering. Secure your discounted copy today and take the next step in your AI journey!
Buy book: https://www.patreon.com/DataScienceBooks/shop/llm-engineers-handbook-2024-1582908
๐ Unlock the Future of AI with the LLM Engineerโs Handbook ๐
Step into the world of Large Language Models (LLMs) with this comprehensive guide that takes you from foundational concepts to deploying advanced applications using LLMOps best practices. Whether you're an AI engineer, NLP professional, or LLM enthusiast, this book offers practical insights into designing, training, and deploying LLMs in real-world scenarios.
Why Choose the LLM Engineerโs Handbook?
Comprehensive Coverage: Learn about data engineering, supervised fine-tuning, and deployment strategies.
Hands-On Approach: Implement MLOps components through practical examples, including building an LLM-powered twin that's cost-effective, scalable, and modular.
Cutting-Edge Techniques: Explore inference optimization, preference alignment, and real-time data processing to apply LLMs effectively in your projects.
Real-World Applications: Move beyond isolated Jupyter notebooks and focus on building production-grade end-to-end LLM systems.
Limited-Time Offer
Originally priced at $55, the LLM Engineerโs Handbook is now available for just $25โa 55% discount! This special offer is available for a limited quantity, so act fast to secure your copy.
Who Should Read This Book?
This handbook is ideal for AI engineers, NLP professionals, and LLM engineers looking to deepen their understanding of LLMs. A basic knowledge of LLMs, Python, and AWS is recommended. Whether you're new to AI or seeking to enhance your skills, this book provides comprehensive guidance on implementing LLMs in real-world scenarios.
Don't miss this opportunity to advance your expertise in LLM engineering. Secure your discounted copy today and take the next step in your AI journey!
Buy book: https://www.patreon.com/DataScienceBooks/shop/llm-engineers-handbook-2024-1582908
๐5โค1
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
๐ฅ Github: https://github.com/cdjkim/respec
๐ Paper: https://arxiv.org/abs/2504.14875v1
๐ Dataset: https://paperswithcode.com/task/informativeness
๐ Dataset: https://paperswithcode.com/task/informativeness
Please open Telegram to view this post
VIEW IN TELEGRAM
๐4
Forwarded from Data Science Premium (Books & Courses)
Join to our WhatsApp channel ๐ฑ
Tell your friends
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Tell your friends
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
WhatsApp.com
Python | Machine Learning | Data Science | WhatsApp Channel
Python | Machine Learning | Data Science WhatsApp Channel. Welcome to our official WhatsApp Channel โ your daily dose of AI, Python, and cutting-edge technology!
Here, we share:
Python tutorials and ready-to-use code snippets
AI & machine learning tipsโฆ
Here, we share:
Python tutorials and ready-to-use code snippets
AI & machine learning tipsโฆ
๐1
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
6 May 2025 ยท Andrew Zhao, Yiran Wu, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang ยท
Paper: https://arxiv.org/pdf/2505.03335v2.pdf
Code: https://arxiv.org/pdf/2505.03335v2.pdf
https://t.iss.one/DataScienceTโ
6 May 2025 ยท Andrew Zhao, Yiran Wu, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang ยท
Reinforcement learning with verifiable rewards (#RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where #AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (#AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall #SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.
Paper: https://arxiv.org/pdf/2505.03335v2.pdf
Code: https://arxiv.org/pdf/2505.03335v2.pdf
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
๐3
FastVLM: Efficient Vision Encoding for Vision Language Models
17 Dec 2024 ยท Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari ยท
Paper: https://arxiv.org/pdf/2412.13303v1.pdf
code: https://github.com/apple/ml-fastvlm
Datasets: GQA -TextVQA - ScienceQA
https://t.iss.one/DataScienceT ๐ฆพ
17 Dec 2024 ยท Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari ยท
Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (#VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as #ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency caused by stacked self-attention layers. At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency. Based on a comprehensive efficiency analysis of the interplay between image resolution, vision latency, token count, and #LLM size, we introduce #FastVLM, a model that achieves an optimized trade-off between latency, model size and accuracy. FastVLM incorporates #FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Unlike previous methods, FastVLM achieves the optimal balance between visual token count and image resolution solely by scaling the input image, eliminating the need for additional token pruning and simplifying the model design. In the LLaVA-1.5 setup, FastVLM achieves 3.2
improvement in time-to-first-token (TTFT) while maintaining similar performance on VLM benchmarks compared to prior works. Compared to LLaVa-OneVision at the highest resolution (11521152), FastVLM achieves comparable performance on key benchmarks like SeedBench and MMMU, using the same 0.5B LLM, but with 85 faster TTFT and a vision encoder that is 3.4 smaller.
Paper: https://arxiv.org/pdf/2412.13303v1.pdf
code: https://github.com/apple/ml-fastvlm
Datasets: GQA -TextVQA - ScienceQA
https://t.iss.one/DataScienceT ๐ฆพ
๐2
Forwarded from Python | Machine Learning | Coding | R
๐ Your balance is credited $4,000 , the owner of the channel wants to contact you!
Dear subscriber, we would like to thank you very much for supporting our channel, and as a token of our gratitude we would like to provide you with free access to Lisa's investor channel, with the help of which you can earn today
t.iss.one/Lisainvestor
Be sure to take advantage of our gift, admission is free, don't miss the opportunity, change your life for the better.
You can follow the link :
https://t.iss.one/+0DQSCADFTUA3N2Qx
Dear subscriber, we would like to thank you very much for supporting our channel, and as a token of our gratitude we would like to provide you with free access to Lisa's investor channel, with the help of which you can earn today
t.iss.one/Lisainvestor
Be sure to take advantage of our gift, admission is free, don't miss the opportunity, change your life for the better.
You can follow the link :
https://t.iss.one/+0DQSCADFTUA3N2Qx
โค2
Forwarded from Python | Machine Learning | Coding | R
This media is not supported in your browser
VIEW IN TELEGRAM
โ
โ
Join to our WhatsApp
https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
๐1๐ฅ1