ML Research Hub
32.4K subscribers
6.15K photos
404 videos
24 files
6.66K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
PixelSmile: Toward Fine-Grained Facial Expression Editing

📝 Summary:
PixelSmile is a diffusion framework for fine-grained facial expression editing. It achieves better disentanglement and identity preservation through symmetric joint training and contrastive learning. This enables precise, stable, and continuous control for expression editing.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25728
• PDF: https://arxiv.org/pdf/2603.25728
• Project Page: https://ammmob.github.io/PixelSmile/
• Github: https://github.com/Ammmob/PixelSmile

🔹 Models citing this paper:
https://huggingface.co/PixelSmile/PixelSmile

Datasets citing this paper:
https://huggingface.co/datasets/PixelSmile/FFE-Bench

Spaces citing this paper:
https://huggingface.co/spaces/Pr0f3ssi0n4ln00b/Qwen-Image-Edit-Rapid-AIO-Loras-Experimental
https://huggingface.co/spaces/PixelSmile/PixelSmile-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#FacialExpressionEditing #DiffusionModels #AI #ComputerVision #DeepLearning
1
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

📝 Summary:
A large-scale dataset and open-source model are developed to improve image restoration performance and close the gap with closed-source alternatives, with a dedicated benchmark for real-world degradat...

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.25502
• PDF: https://arxiv.org/pdf/2603.25502
• Project Page: https://yfyang007.github.io/RealRestorer/
• Github: https://github.com/yfyang007/RealRestorer

🔹 Models citing this paper:
https://huggingface.co/RealRestorer/RealRestorer
https://huggingface.co/RealRestorer/RealRestorer_degradation_models

Datasets citing this paper:
https://huggingface.co/datasets/RealRestorer/RealIR-Bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Representation Alignment for Just Image Transformers is not Easier than You Think

📝 Summary:
Representation alignment fails for pixel-space diffusion transformers due to information asymmetry, but PixelREPA addresses this by transforming alignment targets and using masked transformer adapters...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14366
• PDF: https://arxiv.org/pdf/2603.14366
• Github: https://github.com/kaist-cvml/PixelREPA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

📝 Summary:
S2D2 is a training-free self-speculative decoding framework for block-diffusion LLMs. It improves accuracy-speed by using the same model as both parallel drafter and autoregressive verifier via a speculative verification step. This achieves significant speedups up to 4.7 times and higher accuracy...

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25702
• PDF: https://arxiv.org/pdf/2603.25702
• Github: https://github.com/phymhan/S2D2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #DiffusionModels #Decoding #AI #MachineLearning
1
Electrostatic Photoluminescence Tuning in All-Solid-State Perovskite Transistors

📝 Summary:
This paper demonstrates an all-solid-state perovskite transistor that electrostatically controls photoluminescence intensity. By modulating charge recombination, it achieves high quantum efficiencies and tunable light emission. This expands perovskite applications in photonics.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25718
• PDF: https://arxiv.org/pdf/2603.25718
• Project Page: https://kj-chen666.github.io/Hybrid-Memory-in-Video-World-Models/
• Github: https://github.com/H-EmbodVis/HyDRA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Perovskites #Photoluminescence #Optoelectronics #Transistors #Photonics
1
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

📝 Summary:
On-policy distillation for LLMs suffers from fragile token-level signals and unreliable teacher guidance. This paper introduces teacher top-K local support matching with truncated reverse-KL, top-p sampling, and special-token masking to achieve stable optimization and improved performance.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25562
• PDF: https://arxiv.org/pdf/2603.25562
• Project Page: https://www.notion.so/yuqianfu/Revisiting-On-Policy-Distillation-Empirical-Failure-Modes-and-Simple-Fixes-31dd5cc40dd181f89eead3de7181df1d
• Github: https://github.com/hhh675597/revisiting_opd

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#OnPolicyDistillation #LLMs #MachineLearning #DeepLearning #NLP
1
MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

📝 Summary:
MemMA is a multi-agent framework that coordinates the memory cycle in LLM agents. It uses a Meta-Thinker for strategic guidance and in-situ self-evolving repair for memory construction and retrieval. MemMA consistently outperforms existing baselines.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18718
• PDF: https://arxiv.org/pdf/2603.18718
• Github: https://github.com/ventr1c/memma

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #MultiAgentSystems #AIMemory #AIResearch #ArtificialIntelligence
1
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

📝 Summary:
Calibri enhances Diffusion Transformers by adding a single learned scaling parameter to improve generative quality. This parameter-efficient method, optimizing only ~100 parameters, reduces inference steps across various text-to-image models while maintaining high-quality outputs.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24800
• PDF: https://arxiv.org/pdf/2603.24800
• Project Page: https://v-gen-ai.github.io/Calibri-page/
• Github: https://github.com/v-gen-ai/Calibri

🔹 Models citing this paper:
https://huggingface.co/v-gen-ai/flux-calibri-gates
https://huggingface.co/v-gen-ai/qwen-calibri

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #AIResearch #MachineLearning #DeepLearning
1
AVControl: Efficient Framework for Training Audio-Visual Controls

📝 Summary:
AVControl efficiently enables modular audio-visual generation by training diverse controls as separate LoRA adapters on a parallel canvas in LTX-2. It achieves superior performance on various tasks including depth and pose guidance, requiring minimal computational resources.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24793
• PDF: https://arxiv.org/pdf/2603.24793
• Project Page: https://matanby.github.io/AVControl/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AudioVisualAI #GenerativeAI #LoRA #EfficientAI #DeepLearning
1
PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders

📝 Summary:
PMT introduces a Plain Mask Decoder for fast image and video segmentation using frozen Vision Foundation Model encoders. This preserves VFM multi-task sharing, achieving competitive accuracy and significant speed improvements over prior methods.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25398
• PDF: https://arxiv.org/pdf/2603.25398
• Github: https://github.com/tue-mps/pmt

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageSegmentation #VideoSegmentation #Transformers #ComputerVision #DeepLearning
IQuest-Coder-V1 Technical Report

📝 Summary:
The IQuest-Coder-V1 series presents new code LLMs using a multi-stage training paradigm to capture dynamic software logic. This approach achieves state-of-the-art performance in agentic software engineering and competitive programming tasks. The Loop variant also optimizes deployment efficiency.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16733
• PDF: https://arxiv.org/pdf/2603.16733
• Project Page: https://iquestlab.github.io/release-1.0-2603/index.html
• Github: https://github.com/IQuestLab/IQuest-Coder-V1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#CodeLLM #SoftwareEngineering #LargeLanguageModels #AIResearch #MachineLearning
Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

📝 Summary:
This paper introduces training-free inference-time model steering to enhance Chain-of-Thought reasoning in Large Audio-Language Models. It achieves accuracy gains up to 4.4% and shows cross-modal transfer, where text-derived steering vectors efficiently guide speech reasoning. This positions mode...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14636
• PDF: https://arxiv.org/pdf/2603.14636

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MachineLearning #LALMs #ChainOfThought #ModelSteering
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

📝 Summary:
CroBo is a visual state representation framework that learns what-is-where composition for robotics. It uses global-to-local reconstruction to encode scene element identities and spatial locations in a compact token. This enables tracking scene dynamics for sequential decision making.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13904
• PDF: https://arxiv.org/pdf/2603.13904
• Project Page: https://seokminlee-chris.github.io/CroBo-ProjectPage/
• Github: https://github.com/SeokminLee-Chris/CroBo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #ComputerVision #SceneUnderstanding #AI #StateRepresentation
1
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

📝 Summary:
VFIG is a vision-language model that converts raster images into scalable vector graphics SVG. It employs a 66K dataset and hierarchical training for high-fidelity conversion, outperforming open-source models and matching proprietary ones.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.24575
• PDF: https://arxiv.org/pdf/2603.24575
• Project Page: https://vfig-proj.github.io/
• Github: https://github.com/RAIVNLab/VFig

🔹 Models citing this paper:
https://huggingface.co/XunmeiLiu/VFIG-4B

Spaces citing this paper:
https://huggingface.co/spaces/allenai/VFig-Image2SVG-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #SVG #VectorGraphics #AI #ComputerVision
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

📝 Summary:
ScratchMath introduces a benchmark for analyzing errors in student handwritten math. It reveals MLLMs significantly lag human experts in visual and logical reasoning, but proprietary models show potential for error explanation.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24961
• PDF: https://arxiv.org/pdf/2603.24961
• Project Page: https://bbsngg.github.io/ScratchMath/
• Github: https://github.com/ai-for-edu/ScratchMath

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

📝 Summary:
Language models typically give one answer, but many tasks have multiple solutions. This paper presents multi-answer RL, allowing LMs to generate multiple plausible answers with confidence in a single pass, improving diversity, accuracy, and computational efficiency.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24844
• PDF: https://arxiv.org/pdf/2603.24844
• Project Page: https://multi-answer-rl.github.io/
• Github: https://github.com/ishapuri/multi_answer_rl

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AVO: Agentic Variation Operators for Autonomous Evolutionary Search

📝 Summary:
Agentic variation operators enable autonomous discovery of performance-critical micro-architectural optimizations for attention kernels, outperforming state-of-the-art implementations on advanced GPU ...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24517
• PDF: https://arxiv.org/pdf/2603.24517

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching

📝 Summary:
WAFT-Stereo achieves state-of-the-art stereo matching performance by replacing cost volumes with warping techniques, demonstrating superior efficiency and accuracy on major benchmarks. AI-generated su...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24836
• PDF: https://arxiv.org/pdf/2603.24836
• Github: https://github.com/princeton-vl/WAFT-Stereo

🔹 Models citing this paper:
https://huggingface.co/MemorySlices/WAFT-Stereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

📝 Summary:
QuantAgent is a multi-agent LLM framework for high-frequency trading. It uses specialized agents for indicators, patterns, trends, and risk to make rapid decisions. It outperforms existing neural and rule-based systems in accuracy and returns.

🔹 Publication Date: Published on Sep 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.09995
• PDF: https://arxiv.org/pdf/2509.09995
• Project Page: https://Y-Research-SBU.github.io/QuantAgent/
• Github: https://github.com/Y-Research-SBU/QuantAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #MultiAgent #HighFrequencyTrading #FinTech #AlgorithmicTrading