ML Research Hub
32.9K subscribers
4.59K photos
284 videos
24 files
4.96K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SAMTok: Representing Any Mask with Two Words

📝 Summary:
SAMTok enables pixel-wise capabilities in multi-modal LLMs through discrete mask tokenization and standard training methods, achieving state-of-the-art performance on various vision-language tasks. AI...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16093
• PDF: https://arxiv.org/pdf/2601.16093
• Project Page: https://github.com/bytedance/Sa2VA/tree/main/projects/samtok

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

📝 Summary:
Representation Autoencoders (RAEs) demonstrate superior performance over VAEs in large-scale text-to-image generation, showing improved stability, faster convergence, and better quality while enabling...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16208
• PDF: https://arxiv.org/pdf/2601.16208
• Project Page: https://rae-dit.github.io/scale-rae/
• Github: https://github.com/ZitengWangNYU/Scale-RAE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Wigner's Friend as a Circuit: Inter-Branch Communication Witness Benchmarks on Superconducting Quantum Hardware

📝 Summary:
Implementation and benchmarking of quantum circuits for estimating operational inter-branch communication witnesses on IBM Quantum hardware demonstrates visibility and coherence witness measurements u...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16004
• PDF: https://arxiv.org/pdf/2601.16004
• Github: https://github.com/christopher-altman/ibm-qml-kernel

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

📝 Summary:
A pretrained video model is adapted into a robot policy through single-stage post-training, enabling direct action generation and planning capabilities without architectural modifications. AI-generate...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16163
• PDF: https://arxiv.org/pdf/2601.16163

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

📝 Summary:
HERMES enables real-time streaming video understanding by reusing a compact KV cache as hierarchical memory. It provides 10x faster response times and superior accuracy, even with greatly reduced video token input, improving efficiency in resource-constrained settings.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14724
• PDF: https://arxiv.org/pdf/2601.14724
• Project Page: https://hermes-streaming.github.io/
• Github: https://hermes-streaming.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
VIOLA: Towards Video In-Context Learning with Minimal Annotations

📝 Summary:
VIOLA enables effective multimodal large language model adaptation in low-resource video domains using minimal expert annotations and abundant unlabeled data. It uses density-uncertainty sampling and confidence-aware retrieval to maximize efficiency and leverage unlabeled data, significantly outp...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15549
• PDF: https://arxiv.org/pdf/2601.15549

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
360Anything: Geometry-Free Lifting of Images and Videos to 360°

📝 Summary:
360Anything is a geometry-free framework using diffusion transformers to lift perspective images and videos to 360 panoramas without camera metadata. It achieves state-of-the-art results and uses circular latent encoding to eliminate seam artifacts.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16192
• PDF: https://arxiv.org/pdf/2601.16192
• Github: https://360anything.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #DiffusionModels #360Photography #ImageProcessing #DeepLearning
Numba-Accelerated 2D Diffusion-Limited Aggregation: Implementation and Fractal Characterization

📝 Summary:
This paper details a Numba-accelerated Python framework for 2D DLA simulations. It confirms a fractal dimension of 1.71 for dilute regimes but reveals a crossover to 1.87 compact growth in high-density environments. This provides an open-source testbed.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15440
• PDF: https://arxiv.org/pdf/2601.15440
• Project Page: https://pypi.org/project/dla-ideal-solver/
• Github: https://github.com/sandyherho/dla-ideal-solver

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DLA #Fractals #ScientificComputing #Python #Simulations
1
Media is too big
VIEW IN TELEGRAM
VideoMaMa: Mask-Guided Video Matting via Generative Prior

📝 Summary:
VideoMaMa uses pretrained video diffusion models to convert coarse masks into accurate alpha mattes, achieving zero-shot generalization. This enabled a scalable pseudo-labeling pipeline to create the large MA-V dataset, significantly improving real-world video matting performance.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.14255
• PDF: https://arxiv.org/pdf/2601.14255
• Github: https://cvlab-kaist.github.io/VideoMaMa/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoMatting #ComputerVision #DeepLearning #DiffusionModels #AIResearch
1
Towards Automated Kernel Generation in the Era of LLMs

📝 Summary:
This survey explores how large language models and agent systems are automating kernel generation and optimization, a critical yet non-scalable process for AI systems. It provides a structured overview of existing approaches, datasets, and benchmarks, aiming to unify this fragmented field and out...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15727
• PDF: https://arxiv.org/pdf/2601.15727
• Github: https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #KernelGeneration #AI #Automation #CodeGeneration
LLM Prompt Evaluation for Educational Applications

📝 Summary:
This study presents a systematic framework using tournament-style testing and Glicko2 ratings to evaluate LLM prompts for education. A prompt emphasizing metacognitive learning strategies outperformed others, demonstrating evidence-based prompt development.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16134
• PDF: https://arxiv.org/pdf/2601.16134

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #Education #PromptEngineering #AIinEducation #Metacognition
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

📝 Summary:
Open-Sora 2.0 is a commercial-level video generation model trained for only $200k. It achieves performance comparable to top models. This open-source project aims to democratize access and foster innovation in video generation.

🔹 Publication Date: Published on Mar 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.09642
• PDF: https://arxiv.org/pdf/2503.09642
• Github: https://github.com/hpcaitech/open-sora

🔹 Models citing this paper:
https://huggingface.co/hpcai-tech/Open-Sora-v2
https://huggingface.co/Compumacy/OPensora

Spaces citing this paper:
https://huggingface.co/spaces/zumwaltboi/Sora2_test
https://huggingface.co/spaces/AverageAiLiker/vidsora-magic-wand
https://huggingface.co/spaces/AverageAiLiker/bot-tks1p3jy

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #OpenSora #GenerativeAI #DeepLearning #OpenSource
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

📝 Summary:
EvoCUA introduces an evolutionary computer-use agent that combines autonomous task generation with policy optimization. This scalable approach achieves a new state-of-the-art 56.7% success rate on the OSWorld benchmark, demonstrating a robust path for advancing native agent capabilities.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15876
• PDF: https://arxiv.org/pdf/2601.15876
• Github: https://github.com/meituan/EvoCUA

🔹 Models citing this paper:
https://huggingface.co/meituan/EvoCUA-32B-20260105
https://huggingface.co/meituan/EvoCUA-8B-20260105

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #Agents #MachineLearning #ReinforcementLearning #EvolutionaryAlgorithms
Media is too big
VIEW IN TELEGRAM
ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

📝 Summary:
ActionMesh extends 3D diffusion models with a temporal axis to generate high-quality, rig-free animated 3D meshes. This 'temporal 3D diffusion' framework quickly creates topology-consistent animations from various inputs like video or text, achieving state-of-the-art results.

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.16148
• PDF: https://remysabathier.github.io/actionmesh/actionmesh_2026.pdf
• Project Page: https://remysabathier.github.io/actionmesh/
• Github: https://github.com/facebookresearch/actionmesh

🔹 Models citing this paper:
https://huggingface.co/facebook/ActionMesh

Spaces citing this paper:
https://huggingface.co/spaces/facebook/ActionMesh

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DAnimation #DiffusionModels #ComputerGraphics #DeepLearning #3DModeling
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

📝 Summary:
VLMs struggle to estimate task progress from partial views. ProgressLM-3B, a new training-based model, shows consistent improvements in progress reasoning across disjoint tasks, addressing this limitation.

🔹 Publication Date: Published on Jan 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15224
• PDF: https://arxiv.org/pdf/2601.15224
• Project Page: https://progresslm.github.io/ProgressLM/
• Github: https://github.com/ProgressLM/ProgressLM

🔹 Models citing this paper:
https://huggingface.co/Raymond-Qiancx/ProgressLM-3B-SFT
https://huggingface.co/Raymond-Qiancx/ProgressLM-3B-RL

Datasets citing this paper:
https://huggingface.co/datasets/Raymond-Qiancx/ProgressLM-Dataset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLM #ProgressReasoning #AI #MachineLearning #DeepLearning
MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness

📝 Summary:
MIRRORBENCH is an open-source framework to evaluate large language models as human user simulators. It assesses their ability to generate human-like conversational responses across diverse tasks using various metrics, revealing systematic gaps between AI and real users.

🔹 Publication Date: Published on Jan 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08118
• PDF: https://arxiv.org/pdf/2601.08118
• Github: https://github.com/SAP/mirrorbench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #HumanLikeness #AISimulation #ConversationalAI #OpenSource
Agentic Confidence Calibration

📝 Summary:
AI agents' overconfidence in failure hinders their deployment. This paper introduces Agentic Confidence Calibration and Holistic Trajectory Calibration HTC, a new framework analyzing an agent's entire process trajectory. HTC improves reliability, interpretability, and generalizes across diverse A...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15778
• PDF: https://arxiv.org/pdf/2601.15778

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Agentic Uncertainty Quantification

📝 Summary:
A unified dual-process framework transforms verbalized uncertainty into active control signals for improved reasoning reliability in AI agents. AI-generated summary Although AI agents have demonstrate...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15703
• PDF: https://arxiv.org/pdf/2601.15703

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

📝 Summary:
Large language models face reliability challenges that are being addressed through uncertainty as an active control signal across advanced reasoning, autonomous agents, and reinforcement learning, sup...

🔹 Publication Date: Published on Jan 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.15690
• PDF: https://arxiv.org/pdf/2601.15690

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
These Google Colab-notebooks help to implement all machine learning algorithms from scratch 🤯

Repo: https://udlbook.github.io/udlbook/


👉 @codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM