RIML Lab
2.86K subscribers
46 photos
25 videos
7 files
144 links
Robust and Interpretable Machine Learning Lab,
Prof. Mohammad Hossein Rohban,
Sharif University of Technology

https://youtube.com/@rimllab

twitter.com/MhRohban

https://www.linkedin.com/company/robust-and-interpretable-machine-learning-lab/
Download Telegram
جلسه‌ی سی و یکم باشگاه مدل‌های زبانی بزرگ
📚 موضوع: برآورد عدم قطعیت در شبکه‌های عمیق
سخنران:
دکتر یاسین عباسی، پژوهشگر پیشین هوش مصنوعی در دیپ‌مایند
زمان: چهارشنبه ۱۴۰۴/۰۶/۲۶، ساعت ۱۵:۰۰
لینک جلسه:
https://vc.sharif.edu/rohban
یوتیوب (ویدئو جلسه‌ها)
توییتر
افزودن رویداد به تقویم گوگل‌
وبسایت ژورنال‌کلاب
از همه دعوت می‌کنیم که در این جلسه شرکت کنند.
#LLM_Club
@LLM_CLUB
🪢 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

🌟 This Week's Presentation:

📄 Paper:
Minority-Focused Text-to-Image Generation via Prompt Optimization

🧠 Abstract:
This paper introduces a new framework for improving the generation of minority samples with pretrained text-to-image diffusion models. Minority instances—defined as samples in low-density regions of text-conditioned data distributions—are valuable for applications like data augmentation and creative AI but are underrepresented in current models, which tend to focus on high-density regions. To address this imbalance, the authors propose an online prompt optimization method that preserves semantic content while guiding the emergence of desired properties. They further adapt this approach with a specialized likelihood-based objective to better capture minority features. Experimental results across multiple diffusion models show that the method substantially improves the quality and diversity of generated minority samples compared to existing techniques.

🎙️ Presenter: Amir Kasaei

Session Details:
- 📅 Date: Tuesday
- 🕒 Time: 4:00 PM - 5:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🔐 ML Security Journal Club

This Week's Presentation:

🔹 Title: Safe Generative AI Workshop @ NeurIPS 2024

🔸 Presenter: Arian Komaei

🌀 Abstract:
In the past two years, generative AI has been the major driving force behind the development of advanced AI productssuch as ChatGPT4, AlphaFold, and StableDiffusion. These technologies, while significantly improving productivity for many, have raised significant safety concerns. However, there has been no workshop focusing on this topic in the past two years. This workshop, emphasizing AI safety concerns related to the use of generative AI, is very needed for the community. Generative AI, including large language models, vision-language models, diffusion models, and many more, has significantly aided various aspects of both academia and industry. In scientific discovery, these aspects encompass experimental design, hypothesis formulation, theoretical reasoning, and observation organization. In commercial applications, generative models such as large language models and diffusion algorithms have changed the lifestyles and workflows of billions around the world. This workshop aims to convene experts from various fields to address these challenges and explore potential solutions.


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🪢 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.

🌟 This Week's Presentation:

📌 Title:
Compositional Visual Reasoning
: Why It Matters and What Holds Us Back

🧠 Abstract:
Compositional visual reasoning is a key challenge in multimodal AI, focusing on enabling machines to break down visual scenes into meaningful parts, connect them with concepts, and perform multi-step logical inference. In this session, we will introduce the foundations of visual reasoning and discuss why compositionality is crucial for achieving robustness, interpretability, and cognitive alignment in AI systems. We will also highlight major challenges, including hallucinations, difficulty in maintaining semantic fidelity, and the limitations of current reasoning strategies. The aim is to provide a clear picture of the problem space and motivate deeper exploration in future sessions.

📄 Paper:
Explain Before You Answer: A Survey on Compositional Visual Reasoning

🎙 Presenter:
Amir Kasaei

Session Details:
- 📅 Date: Tuesday, September 23
- 🕒 Time: 4:00 AM - 5:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🔐 ML Security Journal Club

This Week's Presentation:

🔹 Title: Unlearning diffusion models

🔸 Presenter: Arian Komaei

🌀 Abstract:
This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained models efficiently. Unlike traditional unlearning approaches that require costly updates across many layers, SLUG updates only one carefully chosen layer using a single gradient step. The method relies on layer importance and gradient alignment to identify the optimal layer, preserving model performance while unlearning targeted content. Experiments show that SLUG works effectively across models like CLIP, Stable Diffusion, and vision-language models, handling both concrete concepts (e.g., objects, identities) and abstract ones (e.g., artistic styles). Compared to existing approaches, SLUG achieves similar unlearning results but with much lower computational cost, making it a practical solution for efficient and precise targeted unlearning.

📄 Paper: Targeted Unlearning with Single Layer Unlearning Gradient


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🔐 ML Security Journal Club

This Week's Presentation:

🔹 Title: Unlearning diffusion models

🔸 Presenter: Arian Komaei

🌀 Abstract:
This paper introduces Single Layer Unlearning Gradient (SLUG), a new method for removing unwanted information from trained models efficiently. Unlike traditional unlearning approaches that require costly updates across many layers, SLUG updates only one carefully chosen layer using a single gradient step. The method relies on layer importance and gradient alignment to identify the optimal layer, preserving model performance while unlearning targeted content. Experiments show that SLUG works effectively across models like CLIP, Stable Diffusion, and vision-language models, handling both concrete concepts (e.g., objects, identities) and abstract ones (e.g., artistic styles). Compared to existing approaches, SLUG achieves similar unlearning results but with much lower computational cost, making it a practical solution for efficient and precise targeted unlearning.

📄 Paper: Targeted Unlearning with Single Layer Unlearning Gradient


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 4:00 - 5:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
Research Team Formation: ML Trustworthiness & Speech Language Models

We are currently forming a research team for a project in the field of ML Trustworthiness and Speech Language Models.
Our goal is to publish the outcomes of this research in top-tier machine learning conferences. Additionally, active team members who contribute meaningfully to the project will receive recommendation letters from faculty members.

If you are interested in these topics and have sufficient time to dedicate to research, please fill out the form below:
Form Link

To learn more about related works previously conducted in our lab, you can visit the following links:
• Dr. Mohammad Hossein Rohban – Google Scholar
• PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers (CVPR 2025)

_We look forward to collaborating with you!_
🚀 Open RA Positions – Reinforcement Learning (Generalization & Sample Efficiency)

We have a few Research Assistant (RA) openings on Generalization and Sample Efficiency in Reinforcement Learning (RL). Selected candidates will work directly with Dr. Rohban and the project supervisor.

The project focuses on improving RL agents’ generalization beyond training environments using contrastive learning. While the choice of positive/negative samples greatly impacts training (see: https://arxiv.org/abs/2102.10960
), anchor selection remains an unexplored area (related works: https://arxiv.org/abs/2004.04136
and https://arxiv.org/abs/1511.05952
).

We’re looking for highly motivated researchers (B.Sc. or higher) with:
1️⃣ Strong background in Python and Git
2️⃣ Proficiency in Deep Learning & Reinforcement Learning (taken/audited both courses)
3️⃣ At least 3 months of prior research experience
4️⃣ Self-motivated, independent, and a quick learner
5️⃣ On-site presence in the lab, with weekly meetings with Dr. Rohban and regular reports to the project supervisor

🕘 Deadline: Wednesday, October 20th, 2025 – 9:00 AM (Tehran time)
📄 Apply here: https://forms.gle/88SfwtwZvQ2JCZ7X7
📢 Research Assistant Positions Available

The Robust and Interpretable Machine Learning (RIML) Lab and the Trustworthy and Secure Artificial Intelligence Lab (TSAIL) at the Computer Engineering Department of Sharif University of Technology are seeking highly motivated and talented research assistants to join our team. This collaborative project is jointly supervised by Dr. Rohban and Dr. Sadeghzadeh.


🔍 Position Overview
We are working on cutting-edge research in the field of generative models, with a focus on robustness, interpretability, and trustworthiness. As a research assistant, you will contribute to impactful projects at the intersection of theory and real-world applications.

🧠 Required Qualifications

- Solid background in machine learning, artificial intelligence, and generative models
- Hands-on experience with generative models and their practical applications
- Proficiency in Python and frameworks such as PyTorch
- Strong communication skills and the ability to work well in a collaborative research environment

📝 How to Apply
If you are interested in joining our team, please complete the application form and upload your CV using the following link:
👉 Application Form

📚 Suggested Background Reading
To better understand the context of our research, we recommend reviewing the following papers:

1. https://arxiv.org/abs/2410.15618
2. https://arxiv.org/abs/2305.10120

⚠️ Note 1: We do not accept applicants who currently have a full-time job or those who are students with a part-time job.

⚠️ Note 2: The target of these projects is submission to ICML and ECCV at the end of this Shamsi year. Therefore, time is limited, and participants must have at least 20–30 hours of free time per week to dedicate to the projects.

We look forward to your application!
Call for Research Assistants in Large Language Model Projects

If you are familiar with LLMs, you are invited to join our research projects as a research assistant. This project focuses on abductive reasoning in llms.
This project focuses on abductive reasoning in LLMs and aims at preparing submission for ACL 2026.
For an introduction to the topic, you can read:
GEAR: A General Evaluation Framework for Abductive Reasoning
If you are interested, please complete the following form:
Registration Form
🔘 Open Research Position: Hallucination Detection in Vision-Language Models (VLMs)

We are looking for motivated students to join our research project.

🔍 Project Description
VLMs suffer from hallucination issues, where responses are incorrect, misleading, or not grounded in the image content. This research focuses on detecting these hallucinations and distinguishing hallucinated responses from non-hallucinated ones.

🔹 Requirements
- Strong Python programming skills
- Knowledge of deep learning
- Familiarity with VLMs
- Hands-on experience with PyTorch

📌 Note: Filling out this form does not guarantee acceptance. Only shortlisted candidates will receive an email by Nov 23.

📅 Application Deadline: Nov 22, 2025 (یکم آذر)
🔗 Apply here: Google Form
🔐 ML Security Journal Club

This Week's Presentation:

🔹 Title: Unlearning diffusion models

🔸 Presenter: Arian Komaei

🌀 Abstract:
This paper digs into the messiness of “concept erasure” in diffusion models and shows just how fragile most erasure claims really are. The authors break down the erasure process into two fundamental mechanisms: (1) disrupting the model’s internal guidance so it tries not to produce a target concept, and (2) outright suppressing the unconditional probability of generating that concept at all. Then they put current erasure techniques under a microscope using a battery of independent probes—visual context manipulation, altered diffusion trajectories, classifier guidance tests, and inspection of substitute generations that emerge when the “erased” concept is supposedly gone. The verdict? Most methods barely scratch the surface. Models often smuggle the concept back in through alternative prompts, context cues, or trajectory tweaks. The paper’s evaluation suite exposes these failure modes and sets a much higher bar for claiming true erasure in diffusion models.

📄 Paper: When Are Concepts Erased From Diffusion Models?


Session Details:
- 📅 Date: Sunday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🪢 Compositional Learning Journal Club

Join us this week for an in-depth discussion on Compositional Learning for Visual Reasoning in modern vision–language models. We will explore recent breakthroughs and challenges, focusing on how these models perform compositional visual reasoning over complex scenes and where there is still room for improvement in robustness, faithfulness, and instruction following.

🌟 This Week's Presentation

📌 Title:
Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

🧠 Abstract:
Multimodal Large Language Models (MLLMs) have recently shown strong potential in visual reasoning, especially when combined with test-time scaling techniques. However, most current approaches keep the visual input fixed and only explore different textual reasoning paths, which limits their ability to exploit rich visual details—particularly in high-resolution images with many fine-grained elements. In such settings, vision-level reasoning becomes crucial: models need to dynamically zoom into informative regions of the image to gather the evidence required for accurate decisions.
In this session, we will discuss ZoomEye, a training-free, model-agnostic tree search algorithm for vision-level reasoning. ZoomEye treats an image as a hierarchical tree, where each node is a region and child nodes correspond to zoomed-in sub-regions. By navigating this tree, MLLMs can simulate human-like zooming behavior, selectively focusing on task-relevant areas. Experiments on high-resolution benchmarks show that ZoomEye substantially boosts the performance of multiple MLLMs (e.g., InternVL2.5-8B gains over 15%–17% on HR-Bench) and even enables small 3–8B models to outperform larger systems such as GPT-4o.

📄 Paper:
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

🎙 Presenter: Amir Kasaei

Session Details:
- 📅 Date: Tuesday, November 25th
- 🕒 Time: 3:00 PM - 4:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🔐 ML Security Journal Club

This Week's Presentation:

🔹 Title: Unlearning diffusion models

🔸 Presenter: Arian Komaei

🌀 Abstract:
This paper takes a hard look at the real-world reliability of concept erasure in text-to-image models. While many erasure methods look clean in controlled demos, their behavior collapses when concepts get interconnected or ambiguous. The authors identify two major gaps in current practice: the lack of evaluation across diverse concept types and the absence of systematic analysis of failure modes after erasure. They examine how removing one concept unintentionally damages others—visually similar, binomial, or semantically linked concepts—revealing widespread spillover effects. To tackle this, they introduce EraseBench, a large benchmark containing 100+ curated concepts, targeted prompts, and metrics that capture both erasure effectiveness and unintended degradation. Their findings show consistent concept entanglement, where erasing a target concept suppresses non-target ones and reduces generation quality, exposing significant limitations in current erasure techniques.

📄 Paper: Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts

Session Details:
- 📅 Date: Sunday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🪢 Compositional Learning Journal Club

Join us this week for a deep dive into how CLIP actually represents multiple objects in an image—and where it silently goes wrong. We’ll look at subtle biases in both text and image encoders, how they interact with caption structure and object size, and what this means for downstream multimodal models and text-to-image generation.

🌟 This Week's Presentation

📌 Title:
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation

🧠 Abstract:
Contrastive Language–Image Pre-training (CLIP) has become a workhorse for zero-shot classification and many vision–language tasks, but its behavior in complex scenes with multiple objects is far from fully understood. This session focuses on a systematic study of CLIP in controlled multi-object setups using ComCO, a dedicated dataset designed to probe how CLIP’s encoders handle object combinations and compositional structure.

We will discuss evidence that:

- The text encoder tends to over-focus on the first-mentioned object in a caption.

- The image encoder tends to favor larger objects in the scene.

- Small changes such as swapping token order or resizing objects can cause sharp drops in image–text matching and retrieval performance across multiple CLIP variants.


📄 Paper:
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation

🎙 Presenter: Dr MH Rohban

Session Details:
- 📅 Date: Tuesday, December 2nd
- 🕒 Time: 3:00 PM - 4:00 PM
- 🌐 Location: Online at vc.sharif.edu/rohban

We look forward to your participation! ✌️
🔐 ML Security Journal Club

This Week's Presentation:

🔹 Title: Unlearning diffusion models

🔸 Presenter: Arian Komaei

🌀 Abstract:
The paper argues that modern multi-stage training pipelines create a fundamental obstacle for machine unlearning. The ideal goal—Retrain Equivalence—is for an unlearned model to behave exactly like one retrained from scratch without the forgotten data. But the authors show, both theoretically and empirically, that this is often impossible: once training happens in multiple stages with different data and objectives, the model’s behavior becomes path-dependent. That means the order of training steps permanently affects how unlearning works, and “local” unlearning methods that only use gradients from the forget set can’t universally reach Retrain Equivalence. Experiments on Llama and Qwen models (1B–14B) confirm strong divergence: the same data but different training orders lead to very different unlearning outcomes, with accuracy dropping by 20% across paths. Some training paths also produce models that are inherently harder to unlearn. Because multi-stage training is now standard and training histories are often unavailable, the paper concludes that Retrain Equivalence is the wrong target and the field needs to rethink what machine unlearning should actually aim for.

📄 Paper: ON THE IMPOSSIBILITY OF RETRAIN EQUIVALENCE IN MACHINE UNLEARNING

Session Details:
- 📅 Date: Sunday
- 🕒 Time: 3:30 - 4:30 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🪢 Compositional Learning Journal Club

Join us this week for a critical exploration of robustness in Visual Question Answering systems and the broader implications for visual–language model reliability. We’ll analyze how even subtle, meaning-preserving changes to inputs can destabilize model outputs and discuss what this means for future evaluation and model design.

🌟 This Week's Presentation

📄 Paper:
Questioning the Stability of Visual Question Answering

🧠 Abstract:
Modern Visual Language Models (VLMs) have achieved impressive performance on a wide range of visual reasoning tasks, yet fundamental questions remain about their robustness to benign input perturbations. This paper presents the first large-scale, systematic study of how VLMs respond to small, meaning-preserving changes—such as pixel shifts, light geometric transformations, padded rescaling, paraphrasing, and multilingual rewrites—that do not change the true semantics of an image–question pair.

Across multiple datasets and models, the authors find that minor visual or textual perturbations frequently lead to different predicted answers, even for state-of-the-art systems like GPT-4o and Gemini 2.0 Flash. They also show that stability under perturbations correlates strongly with correctness, and that the stability patterns of small open-source models can be used to predict when larger models will fail.

In this session, we’ll discuss:
• What kinds of input changes most disrupt VQA predictions.
• How stability can serve as a proxy for reliability and model confidence.
• Implications for evaluation benchmarks and future model development.


🎙 Presenter: Amir Kasaei

Session Details:
- 📅 Date: Tuesday, December 23rd
- 🕒 Time: 3:00 PM - 4:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
🪢 Compositional Learning Journal Club

Join us this week for a fascinating dive into how multimodal language models can think visually by drawing — mimicking a human’s use of sketches to guide reasoning and solve complex tasks.

🌟 This Week's Presentation

📄 Paper:
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

🧠 Abstract:
Multimodal LLMs are strong at visual reasoning, but they typically rely on text-only intermediate steps. This paper introduces Visual Sketchpad, which gives MLLMs a lightweight drawing interface (e.g., lines, boxes, marks) so they can create visual intermediate steps while reasoning—similar to how humans sketch when solving problems. By integrating these sketch actions (and optionally leveraging vision modules during sketching), the approach improves performance across a wide range of tasks, including math/geometry, graphs, and spatial reasoning.

🎙 Presenter: Amir Kasaei

Session Details:
- 📅 Date: Tuesday, December 30
- 🕒 Time: 3:00 - 4:00 PM
- 🌐 Location: Online at vc.sharif.edu/ch/rohban

We look forward to your participation! ✌️
Hey folks! 👋
Our team is working on foundation models, with a strong focus on interpretability and reliability. We’re opening applications for new team members who are excited to learn, take ownership of tasks, and contribute consistently to solid, hands-on research and experimentation.

Highly motivated bachelor’s students and junior undergraduates are especially encouraged to apply. Selection prioritizes motivation, dedication, perseverance, strong fundamentals, and consistent follow-through over grades. Experience with PyTorch is a major plus. Ideal candidates are comfortable implementing clean, reproducible experiments and communicating progress clearly. 👉 Apply Here.

Looking forward to doing deep, high-impact, and fun science together! 🚀🥰🔥