✨GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
📝 Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04307
• PDF: https://arxiv.org/pdf/2511.04307
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
📝 Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04307
• PDF: https://arxiv.org/pdf/2511.04307
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
✨OmniParser for Pure Vision Based GUI Agent
📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.
🔹 Publication Date: Published on Aug 1, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser
🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k
✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning
📝 Summary:
OmniParser enhances GPT-4V's ability to act as a GUI agent by improving screen parsing. It identifies interactable icons and understands element semantics using specialized models. This significantly boosts GPT-4V's performance on benchmarks like ScreenSpot, Mind2Web, and AITW.
🔹 Publication Date: Published on Aug 1, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2408.00203
• PDF: https://arxiv.org/pdf/2408.00203
• Github: https://github.com/microsoft/omniparser
🔹 Models citing this paper:
• https://huggingface.co/microsoft/OmniParser
• https://huggingface.co/microsoft/OmniParser-v2.0
• https://huggingface.co/banao-tech/OmniParser
✨ Datasets citing this paper:
• https://huggingface.co/datasets/mlfoundations/Click-100k
✨ Spaces citing this paper:
• https://huggingface.co/spaces/callmeumer/OmniParser-v2
• https://huggingface.co/spaces/nofl/OmniParser-v2
• https://huggingface.co/spaces/SheldonLe/OmniParser-v2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GUIagents #ComputerVision #GPT4V #AIagents #DeepLearning
arXiv.org
OmniParser for Pure Vision Based GUI Agent
The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as...