ML Research Hub

✨Agent S: An Open Agentic Framework that Uses Computers Like a Human

📝 Summary:
Agent S is an open agentic framework enabling autonomous GUI interaction to automate complex tasks. It employs experience-augmented hierarchical planning and an Agent-Computer Interface with MLLMs for enhanced reasoning. Agent S achieves state-of-the-art performance on OSWorld and demonstrates br...

🔹 Publication Date: Published on Oct 10, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.08164
• PDF: https://arxiv.org/pdf/2410.08164
• Github: https://huggingface.co/collections/ranpox/awesome-computer-use-agents

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AgenticAI #MultimodalAI #HumanComputerInteraction #Automation #AIResearch

145 views11:07

✨Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics

📝 Summary:
TIMAR is a new causal framework for 3D conversational head generation. It models dialogue using interleaved audio-visual contexts to predict continuous head dynamics, improving coherence and expressive variability. Experiments show TIMAR significantly reduces errors and improves performance.

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15340
• PDF: https://arxiv.org/pdf/2512.15340
• Project Page: https://github.com/CoderChen01/towards-seamleass-interaction/blob/main/README.md
• Github: https://github.com/CoderChen01/towards-seamleass-interaction

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ConversationalAI #3DAnimation #HumanComputerInteraction #CausalModeling #AI

311 views11:05

✨Continual GUI Agents

📝 Summary:
The Continual GUI Agents framework addresses performance degradation in dynamic UI environments. It introduces GUI-Anchoring in Flux GUI-AiF, a reinforcement fine-tuning method with novel anchoring rewards that stabilize learning across shifting UI domains and resolutions, outperforming existing ...

🔹 Publication Date: Published on Jan 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.20732
• PDF: https://arxiv.org/pdf/2601.20732

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ContinualLearning #ReinforcementLearning #AIAgents #HumanComputerInteraction #MachineLearning

198 views08:04

✨Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

📝 Summary:
This paper introduces a human-centric video world model for extended reality, using tracked head and hand poses for dexterous interaction. This system generates egocentric virtual environments, significantly improving user task performance and perceived control.

🔹 Publication Date: Published on Feb 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18422
• PDF: https://arxiv.org/pdf/2602.18422
• Project Page: https://codeysun.github.io/generated-reality/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ExtendedReality #VideoGeneration #HumanComputerInteraction #VirtualEnvironments #AIResearch

❤1

341 views03:00

✨How to Take a Memorable Picture? Empowering Users with Actionable Feedback

📝 Summary:
This paper introduces Memorability Feedback MemFeed, a new task providing actionable natural language guidance to improve photo memorability. Their method, MemCoach, uses MLLMs and a teacher-student strategy, demonstrating that memorability can be taught and instructed.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21877
• PDF: https://arxiv.org/pdf/2602.21877
• Project Page: https://laitifranz.github.io/MemCoach/
• Github: https://laitifranz.github.io/MemCoach/

✨ Datasets citing this paper:
• https://huggingface.co/datasets/laitifranz/MemBench-InternVL3.5-Eval
• https://huggingface.co/datasets/laitifranz/MemBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PhotoMemorability #MLLMs #ComputerVision #AIResearch #HumanComputerInteraction

225 views15:05

✨InfoPO: Information-Driven Policy Optimization for User-Centric Agents

📝 Summary:
InfoPO optimizes agent-user collaboration for underspecified requests. It uses an information-gain reward to credit valuable turns that reduce uncertainty, improving decision-making and outperforming multi-turn RL baselines.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00656
• PDF: https://arxiv.org/pdf/2603.00656
• Github: https://github.com/kfq20/InfoPO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #AI #HumanComputerInteraction #InformationTheory #AIagents

105 views08:04

✨MIBURI: Towards Expressive Interactive Gesture Synthesis

📝 Summary:
MIBURI is an online, real-time framework generating expressive full-body gestures and facial expressions for spoken dialogue. It uses body-part aware codecs and LLM embeddings to create natural, diverse, and contextually aligned motions causally, overcoming limitations of prior methods.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03282
• PDF: https://arxiv.org/pdf/2603.03282
• Project Page: https://vcai.mpi-inf.mpg.de/projects/MIBURI/
• Github: https://github.com/m-hamza-mughal/miburi

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GestureSynthesis #AI #HumanComputerInteraction #NLP #RealtimeTech

124 views12:04