ML Research Hub

Forwarded from Machine Learning with Python

This media is not supported in your browser

𝐕𝐢𝐬𝐮𝐚𝐥 𝐛𝐥𝐨𝐠 on Vision Transformers is live.
https://vizuaranewsletter.com/p/vision-transformers?r=5b5pyd&utm_campaign=post&utm_medium=web

Learn how ViT works from the ground up, and fine-tune one on a real classification dataset.

CNNs process images through small sliding filters. Each filter only sees a tiny local region, and the model has to stack many layers before distant parts of an image can even talk to each other.

Vision Transformers threw that whole approach out.

ViT chops an image into patches, treats each patch like a token, and runs self-attention across the full sequence.
Every patch can attend to every other patch from the very first layer. No stacking required.

That global view from layer one is what made ViT surpass CNNs on large-scale benchmarks.

𝐖𝐡𝐚𝐭 𝐭𝐡𝐞 𝐛𝐥𝐨𝐠 𝐜𝐨𝐯𝐞𝐫𝐬:

- Introduction to Vision Transformers and comparison with CNNs
- Adapting transformers to images: patch embeddings and flattening
- Positional encodings in Vision Transformers
- Encoder-only structure for classification
- Benefits and drawbacks of ViT
- Real-world applications of Vision Transformers
- Hands-on: fine-tuning ViT for image classification

The Image below shows

Self-attention connects every pixel to every other pixel at once. Convolution only sees a small local window. That's why ViT captures things CNNs miss, like the optical illusion painting where distant patches form a hidden face.

The architecture is simple. Split image into patches, flatten them into embeddings (like words in a sentence), run them through a Transformer encoder, and the class token collects info from all patches for the final prediction. Patch in, class out.

Inside attention: each patch (query) compares itself to all other patches (keys), softmax gives attention weights, and the weighted sum of values produces a new representation aware of the full image, visualizes what the CLS token actually attends to through attention heatmaps.

The second half of the blog is hands-on code. I fine-tuned ViT-Base from google (86M params) on the Oxford-IIIT Pet dataset, 37 breeds, ~7,400 images.

𝐁𝐥𝐨𝐠 𝐋𝐢𝐧𝐤
https://vizuaranewsletter.com/p/vision-transformers?r=5b5pyd&utm_campaign=post&utm_medium=web

𝐒𝐨𝐦𝐞 𝐑𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬
ViT paper dissection
https://youtube.com/watch?v=U_sdodhcBC4

Build ViT from Scratch
https://youtube.com/watch?v=ZRo74xnN2SI

Original Paper
https://arxiv.org/abs/2010.11929

https://t.iss.one/CodeProgrammer

68 views20:37

ML Research Hub

✨Automatic detection of Gen-AI texts: A comparative framework of neural models

📝 Summary:
This paper compares neural models for detecting AI-generated text. It found that supervised machine learning detectors achieved more stable and robust performance than commercial tools across different languages and domains.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18750
• PDF: https://arxiv.org/pdf/2603.18750
• Project Page: https://huggingface.co/datasets/cristian03/ARTandMH
• Github: https://github.com/cristian03git/DETECTION_GENAI

✨ Datasets citing this paper:
• https://huggingface.co/datasets/cristian03/ARTandMH

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GenAI #AIDetection #MachineLearning #NeuralNetworks #NLP

128 views20:45

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

📝 Summary:
This paper shifts VLM image tampering detection from coarse object masks to pixel-level analysis with semantic understanding. It introduces a new taxonomy, benchmark, and metrics to evaluate both localization accuracy and the meaning of image modifications. This offers a more rigorous standard fo...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20193
• PDF: https://arxiv.org/pdf/2603.20193
• Github: https://github.com/VILA-Lab/PIXAR

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLM #ImageTampering #DeepfakeDetection #ComputerVision #AIResearch

148 views21:45

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Forwarded from Machine Learning with Python

Follow the Machine Learning with Python channel on WhatsApp: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

45 views23:28

ML Research Hub

✨LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

📝 Summary:
LongCat-Flash-Prover is a 560B MoE model advancing Lean4 formal reasoning using agentic tool integration. It employs a hybrid framework and hierarchical policy optimization for stable training. It achieves state-of-the-art results, including 97.1% on MiniF2F-Test and improved performance on Prove...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21065
• PDF: https://arxiv.org/pdf/2603.21065
• Project Page: https://github.com/meituan-longcat/LongCat-Flash-Prover
• Github: https://github.com/meituan-longcat/LongCat-Flash-Prover

🔹 Models citing this paper:
• https://huggingface.co/meituan-longcat/LongCat-Flash-Prover

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

104 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT

📝 Summary:
Multi-task supervised fine-tuning with heterogeneous learning dynamics benefits from an iterative overfitting-aware search algorithm that improves performance across diverse datasets and compute budge...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21606
• PDF: https://arxiv.org/pdf/2603.21606
• Github: https://github.com/reiss-koh/msft

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

81 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization

📝 Summary:
R e u s i n g a n d i n v o k i n g e x i s t i n g c o d e r e m a i n s c o s t l y a n d u n r e l i a b l e , a s m o s t p r a c t i c a l t o o l s a r e e m b e d d e d i n h e t e r o g e n e ...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09290
• PDF: https://arxiv.org/pdf/2603.09290
• Project Page: https://sdiaa.tech/projects

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMAgents #OpenSource #ToolStandardization #AIResearch #DataScience

80 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

79 views03:00

ML Research Hub

✨PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

📝 Summary:
PowerInfer, a high-speed LLM inference engine for personal computers, enhances efficiency using hotspot neuron analysis, GPU-CPU hybrid computation, adaptive predictors, and neuron-aware sparse operat...

🔹 Publication Date: Published on Dec 16, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2312.12456
• PDF: https://arxiv.org/pdf/2312.12456
• Github: https://github.com/sjtu-ipads/powerinfer

🔹 Models citing this paper:
• https://huggingface.co/SparseLLM/prosparse-llama-2-7b
• https://huggingface.co/openbmb/MiniCPM-S-1B-sft
• https://huggingface.co/openbmb/MiniCPM-S-1B-sft-gguf

✨ Spaces citing this paper:
• https://huggingface.co/spaces/FallnAI/Quantize-HF-Models
• https://huggingface.co/spaces/openfree/LLM_Quantization
• https://huggingface.co/spaces/seawolf2357/LLM_Quantization

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

arXiv.org

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key principle underlying the...

114 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

📝 Summary:
VideoDetective framework improves long video understanding by integrating query-to-segment relevance and inter-segment affinity through visual-temporal graphs and hypothesis verification loops. AI-gen...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.22285
• PDF: https://arxiv.org/pdf/2603.22285
• Project Page: https://videodetective.github.io/
• Github: https://github.com/yangruoliu/VideoDetective

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

90 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

2:03

This media is not supported in your browser

VIEW IN TELEGRAM

✨Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

📝 Summary:
daVinci-MagiHuman is an open-source audio-video generative model using a single-stream Transformer for synchronized content from text. It achieves high-quality, human-centric generation with efficient inference and strong evaluation results against leading models.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21986
• PDF: https://arxiv.org/pdf/2603.21986
• Project Page: https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman
• Github: https://github.com/GAIR-NLP/daVinci-MagiHuman

🔹 Models citing this paper:
• https://huggingface.co/GAIR/daVinci-MagiHuman

✨ Spaces citing this paper:
• https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GenerativeAI #AudioVideoAI #FoundationModels #DeepLearning #AIResearch

66 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

📝 Summary:
Reinforcement learning with verifiable rewards improves language model reasoning by focusing on the direction of parameter updates rather than their magnitude, enabling better test-time extrapolation ...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22117
• PDF: https://arxiv.org/pdf/2603.22117
• Project Page: https://qwen-pilot.notion.site/rlvr-direction
• Github: https://github.com/Hesse73/RLVR-Directions

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

81 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

📝 Summary:
PivotRL is a novel framework that combines supervised fine-tuning efficiency with reinforcement learning generalization by using local rollouts and functional-equivalent action rewards to achieve bett...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21383
• PDF: https://arxiv.org/pdf/2603.21383

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

71 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WorldCache: Content-Aware Caching for Accelerated Video World Models

📝 Summary:
WorldCache improves diffusion transformer inference by adaptively reusing features through motion-adaptive thresholds and saliency-weighted drift estimation, achieving faster processing with minimal q...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22286
• PDF: https://arxiv.org/pdf/2603.22286
• Project Page: https://umair1221.github.io/World-Cache/
• Github: https://github.com/umair1221/WorldCache

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

67 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MemDLM: Memory-Enhanced DLM Training

📝 Summary:
MemDLM addresses the train-inference mismatch in diffusion language models by incorporating a bi-level optimization framework with parametric memory that enhances both training efficiency and inferenc...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22241
• PDF: https://arxiv.org/pdf/2603.22241
• Project Page: https://github.com/JarvisPei/MemDLM
• Github: https://github.com/JarvisPei/MemDLM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

92 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation

📝 Summary:
Perceptio enhances vision-language models with explicit spatial reasoning through integrated semantic segmentation and depth tokens generated via VQ-VAE distillation and multi-task learning. AI-genera...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18795
• PDF: https://arxiv.org/pdf/2603.18795

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

61 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference

📝 Summary:
AnimalCLAP is a taxonomy-aware language-audio framework that uses hierarchical biological information to improve species classification from vocalizations, achieving better performance than CLAP by le...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22053
• PDF: https://arxiv.org/pdf/2603.22053
• Project Page: https://dahlian00.github.io/AnimalCLAP_Page/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

97 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Effective Strategies for Asynchronous Software Engineering Agents

📝 Summary:
Multi-agent collaboration for software engineering tasks faces challenges in coordination and synchronization, which are addressed through a structured paradigm using centralized delegation, asynchron...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21489
• PDF: https://arxiv.org/pdf/2603.21489
• Github: https://github.com/JiayiGeng/CAID

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

94 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Agentic AI and the next intelligence explosion

📝 Summary:
T h e " A I s i n g u l a r i t y " i s o f t e n m i s c a s t a s a m o n o l i t h i c , g o d l i k e m i n d . E v o l u t i o n s u g g e s t s a d i f f e r e n t p a t h : i n t e l l i g e n ...

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20639
• PDF: https://arxiv.org/pdf/2603.20639

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

93 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Understanding Behavior Cloning with Action Quantization

📝 Summary:
Behavior cloning with quantized actions in autoregressive models achieves optimal sample complexity under stability and smoothness conditions, with quantization error affecting horizon-dependent perfo...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20538
• PDF: https://arxiv.org/pdf/2603.20538

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

130 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

📝 Summary:
High-rank DoRA is improved by addressing its memory and speed limitations. The paper introduces a factored norm decomposition and fused Triton kernels. This makes DoRA faster for inference and training, reduces memory usage, and maintains high accuracy across vision-language models.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22276
• PDF: https://arxiv.org/pdf/2603.22276
• Github: https://github.com/sockeye44/dorafactors

✨ Datasets citing this paper:
• https://huggingface.co/datasets/eyes-ml/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking-QR-max4096

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

115 views05:02

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform