ML Research Hub
32.3K subscribers
6.53K photos
450 videos
24 files
7.1K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Woosh: A Sound Effects Foundation Model

📝 Summary:
Woosh is a sound effect foundation model featuring audio encoding/decoding, text-audio alignment, and text-to-audio/video-to-audio generation capabilities with distilled versions for efficient deploym...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01929
• PDF: https://arxiv.org/pdf/2604.01929
• Project Page: https://sonyresearch.github.io/Woosh/
• Github: https://github.com/SonyResearch/Woosh

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

📝 Summary:
Entity-centric factual question answering involves localized MLP neurons that can be causally intervened to recover entity-consistent predictions, showing robustness to various linguistic variations b...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01404
• PDF: https://arxiv.org/pdf/2604.01404
• Github: https://github.com/1tux/in-silico

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Automatic Image-Level Morphological Trait Annotation for Organismal Images

📝 Summary:
This paper presents a scalable method for automatically annotating morphological traits from biological images. It uses sparse autoencoders on foundation model features to identify meaningful parts, then applies vision-language prompting to generate trait descriptions. This approach creates large...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01619
• PDF: https://arxiv.org/pdf/2604.01619
• Github: https://github.com/OSU-NLP-Group/sae-trait-annotation

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Executing as You Generate: Hiding Execution Latency in LLM Code Generation

📝 Summary:
Parallel execution paradigm for LLM-based coding agents reduces latency by executing code during generation rather than in sequential stages. AI-generated summary Current LLM-based coding agents follo...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00491
• PDF: https://arxiv.org/pdf/2604.00491

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

📝 Summary:
UniRecGen combines feed-forward reconstruction and diffusion-based generation in a shared canonical space to produce complete and consistent 3D models from sparse inputs through disentangled cooperati...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01479
• PDF: https://arxiv.org/pdf/2604.01479
• Github: https://github.com/zsh523/UniRecGen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
T5Gemma-TTS Technical Report

📝 Summary:
T5Gemma-TTS is an encoder-decoder codec language model that improves voice cloning and duration control for multilingual speech synthesis. It uses cross-attention for persistent text conditioning and Progress-Monitoring Rotary Position Embedding PM-RoPE for better target speech length tracking. I...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01760
• PDF: https://arxiv.org/pdf/2604.01760
• Github: https://github.com/Aratako/T5Gemma-TTS

🔹 Models citing this paper:
https://huggingface.co/Aratako/T5Gemma-TTS-2b-2b

Spaces citing this paper:
https://huggingface.co/spaces/Aratako/T5Gemma-TTS-Demo
https://huggingface.co/spaces/litagin/T5Gemma-TTS-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechSynthesis #TTS #VoiceCloning #Multilingual #LanguageModels
Media is too big
VIEW IN TELEGRAM
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

📝 Summary:
DynaVid improves dynamic video synthesis by training with synthetic optical flow, which provides diverse motion patterns without artificial appearances. A two-stage framework learns dynamic motion while preserving visual realism, enhancing motion control.

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01666
• PDF: https://arxiv.org/pdf/2604.01666
• Project Page: https://jinwonjoon.github.io/DynaVid/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #AIVideo #DeepLearning #ComputerVision #SyntheticData
This media is not supported in your browser
VIEW IN TELEGRAM
VOID: Video Object and Interaction Deletion

📝 Summary:
VOID is a video object removal framework designed for complex scenarios involving significant object interactions. It uses vision-language and video diffusion models, leveraging causal reasoning to generate physically plausible counterfactual scenes. VOID better preserves consistent scene dynamic...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02296
• PDF: https://arxiv.org/pdf/2604.02296
• Project Page: https://void-model.github.io/
• Github: https://github.com/Netflix/void-model

🔹 Models citing this paper:
https://huggingface.co/netflix/void-model

Spaces citing this paper:
https://huggingface.co/spaces/sam-motamed/VOID

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #GenerativeAI #DeepLearning
Forwarded from Code With Python
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
1
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

📝 Summary:
Omni123 is a 3D-native foundation model unifying text-to-2D and text-to-3D generation. It addresses limited 3D data by leveraging cross-modal consistency from abundant 2D images as a geometric prior. This model significantly improves text-guided 3D generation and editing.

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02289
• PDF: https://arxiv.org/pdf/2604.02289

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #FoundationModels #AIResearch #ComputerVision #DeepLearning
2
Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time

📝 Summary:
Researchers analyzed AI coding agent contributions to open source projects. They found increasing agent activity but higher code churn over time compared to human-authored code.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00917
• PDF: https://arxiv.org/pdf/2604.00917
• Project Page: https://arxiv.org/html/2604.00917v1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #SoftwareEngineering #OpenSource #CodeQuality #AIResearch
2
AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

📝 Summary:
AutoMIA is an agentic framework that automates membership inference attacks. It dynamically generates and refines attack strategies via self-exploration and closed-loop evaluation. This approach consistently outperforms static methods by eliminating manual feature engineering and improving adapta...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01014
• PDF: https://arxiv.org/pdf/2604.01014
• Github: https://github.com/amiya-special/AutoMIA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MembershipInference #MLSecurity #Cybersecurity #AI #DataPrivacy
Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning

📝 Summary:
A memory-augmented VLM agent resolves inconsistent object descriptions across viewpoints. It unifies data association, captioning, and exploration within a single framework, leveraging object-level memory for persistent semantic consistency and improved scores.

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24257
• PDF: https://arxiv.org/pdf/2603.24257
• Project Page: https://hsp-iit.github.io/epos-vlm/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLM #ObjectCaptioning #AI #ComputerVision #DeepLearning
Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

📝 Summary:
Tex3D is the first framework optimizing 3D adversarial textures to attack vision-language-action models. It significantly degrades robotic manipulation performance in real-world settings, revealing critical vulnerabilities.

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01618
• PDF: https://arxiv.org/pdf/2604.01618
• Project Page: https://vla-attack.github.io/tex3d/
• Github: https://github.com/vla-attack/tex3d

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AdversarialAI #Robotics #VLAmodels #Cybersecurity #ComputerVision
Efficient Universal Perception Encoder

📝 Summary:
EUPE enhances edge device performance through a novel two-stage knowledge distillation approach. It scales up to a large proxy teacher then down to an efficient encoder. This method provides superior, versatile representations for diverse tasks, outperforming prior techniques.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22387
• PDF: https://arxiv.org/pdf/2603.22387
• Github: https://github.com/facebookresearch/eupe

🔹 Models citing this paper:
https://huggingface.co/facebook/EUPE-ConvNeXt-S
https://huggingface.co/facebook/EUPE-ViT-S
https://huggingface.co/facebook/EUPE-ViT-B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#KnowledgeDistillation #EdgeAI #ComputerVision #DeepLearning #RepresentationLearning
Steerable Visual Representations

📝 Summary:
Steerable Visual Representations allow language-guided focus on specific image elements while maintaining high representation quality. This is achieved through early fusion of text directly into the visual encoder. Our method outperforms dedicated approaches and generalizes well to new tasks.

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02327
• PDF: https://arxiv.org/pdf/2604.02327
• Project Page: https://jonaruthardt.github.io/project/SteerViT/
• Github: https://github.com/JonaRuthardt/SteerViT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #DeepLearning #MultimodalAI #ImageRecognition #AI
ASI-Evolve: AI Accelerates AI

📝 Summary:
ASI-Evolve is an AI framework demonstrating AI-driven discovery across key AI development components. It achieved superior performance in neural architecture design, data curation, and reinforcement learning algorithm design, showing AI can accelerate AI itself.

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29640
• PDF: https://arxiv.org/pdf/2603.29640
• Github: https://github.com/GAIR-NLP/ASI-Evolve

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #AIAcceleration #MachineLearning #DeepLearning #AIResearch
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation

📝 Summary:
AIBench evaluates academic illustration quality through logic correctness and aesthetics using VQA and VLM assessments, revealing significant performance gaps and the challenge of optimizing both aspe...

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28068
• PDF: https://arxiv.org/pdf/2603.28068
• Project Page: https://deep-kaixun.github.io/aibench-page/
• Github: https://deep-kaixun.github.io/aibench-page/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial

📝 Summary:
Bayesian optimisation provides a principled probabilistic framework for automating scientific discovery by iteratively refining hypotheses and selecting experiments to balance exploration and exploita...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01328
• PDF: https://arxiv.org/pdf/2604.01328

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

📝 Summary:
A new multilingual document parsing benchmark reveals significant performance gaps between closed-source and open-source models, especially on non-Latin scripts and photographed documents. AI-generate...

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28130
• PDF: https://arxiv.org/pdf/2603.28130
• Github: https://github.com/Yuliang-Liu/MultimodalOCR

Datasets citing this paper:
https://huggingface.co/datasets/Delores-Lin/MDPBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultilingualNLP #DocumentAI #OCR #AIbenchmark #MachineLearning
Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

📝 Summary:
A multi-agent system using uncertainty-aware design improves LLM agent performance on underspecified software development tasks by detecting ambiguity and proactively seeking clarification. AI-generat...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26233
• PDF: https://arxiv.org/pdf/2603.26233
• Github: https://github.com/nedwards99/ask-or-assume

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research