β
LISA: Reasoning Segmentation via Large Language Model
New segmentation task -- reasoning segmentation. The task is designed to output a segmentation mask given a complex and implicit query text.
π₯ Github: https://github.com/dvlab-research/lisa
π Paper: https://arxiv.org/abs/2308.00692v2
βοΈ Dataset: https://github.com/dvlab-research/lisa#dataset
https://t.iss.one/DataScienceT
New segmentation task -- reasoning segmentation. The task is designed to output a segmentation mask given a complex and implicit query text.
π₯ Github: https://github.com/dvlab-research/lisa
π Paper: https://arxiv.org/abs/2308.00692v2
βοΈ Dataset: https://github.com/dvlab-research/lisa#dataset
https://t.iss.one/DataScienceT
π7
When training generative models, the training dataset plays an important role in the quality of reference of ready-made models.
One of the good sources can be MiraData from Tencent - a ready-made dataset with a total video duration of 16 thousand hours, designed for training models for generating text in videos. It includes long videos (average 72.1 seconds) with high motion intensity and detailed structured annotations (average 318 words per video).
To assess the quality of the dataset, a system of MiraBench benchmarks was even specially created, consisting of 17 metrics that evaluate temporal consistency, movement in the frame, video quality, and other parameters. According to their results, MiroData outperforms other well-known datasets available in open sources, which mainly consist of short videos with floating quality and short descriptions.
#Text2Video #Dataset #ML
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated partsβa rich resource for researchers and developers in the field.
#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation
Please open Telegram to view this post
VIEW IN TELEGRAM
π5β€1π₯1
β¨Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
π Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.
πΉ Publication Date: Published on Oct 22
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2510.19808
β’ PDF: https://arxiv.org/pdf/2510.19808
β’ Github: https://github.com/apple/pico-banana-400k
πΉ Models citing this paper:
β’ https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
π Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.
πΉ Publication Date: Published on Oct 22
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2510.19808
β’ PDF: https://arxiv.org/pdf/2510.19808
β’ Github: https://github.com/apple/pico-banana-400k
πΉ Models citing this paper:
β’ https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
β¨GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
π Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...
πΉ Publication Date: Published on Nov 6
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.04307
β’ PDF: https://arxiv.org/pdf/2511.04307
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
π Summary:
GUI-360 is a large dataset and benchmark for computer-using agents, addressing gaps in real-world tasks and unified evaluation. It contains over 1.2M action steps in Windows apps for GUI grounding, screen parsing, and action prediction. Benchmarking reveals significant shortcomings in current mod...
πΉ Publication Date: Published on Nov 6
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.04307
β’ PDF: https://arxiv.org/pdf/2511.04307
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AI #ComputerAgents #GUIAgents #Dataset #Benchmark
β¨CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios
π Summary:
CATS-V2V is a new real-world dataset for V2V cooperative perception, focusing on complex adverse traffic scenarios. It provides extensive synchronized sensor data, including LiDAR and cameras, from two vehicles across diverse conditions. This dataset supports autonomous driving research.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11168
β’ PDF: https://arxiv.org/pdf/2511.11168
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#V2V #AutonomousDriving #CooperativePerception #Dataset #ADAS
π Summary:
CATS-V2V is a new real-world dataset for V2V cooperative perception, focusing on complex adverse traffic scenarios. It provides extensive synchronized sensor data, including LiDAR and cameras, from two vehicles across diverse conditions. This dataset supports autonomous driving research.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11168
β’ PDF: https://arxiv.org/pdf/2511.11168
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#V2V #AutonomousDriving #CooperativePerception #Dataset #ADAS
β¨miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward
π Summary:
An analysis of miniF2F showed AI systems had 36% accuracy due to problem errors. Correcting these errors created miniF2F-v2, improving accuracy to 70%. High-quality benchmarks like miniF2F-v2 are crucial for evaluating formal reasoning progress.
πΉ Publication Date: Published on Nov 5
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.03108
β’ PDF: https://arxiv.org/pdf/2511.03108
β’ Github: https://github.com/roozbeh-yz/miniF2F_v2
β¨ Datasets citing this paper:
β’ https://huggingface.co/datasets/roozbeh-yz/miniF2F_v2
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AI #FormalReasoning #Benchmarks #MachineLearning #Dataset
π Summary:
An analysis of miniF2F showed AI systems had 36% accuracy due to problem errors. Correcting these errors created miniF2F-v2, improving accuracy to 70%. High-quality benchmarks like miniF2F-v2 are crucial for evaluating formal reasoning progress.
πΉ Publication Date: Published on Nov 5
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.03108
β’ PDF: https://arxiv.org/pdf/2511.03108
β’ Github: https://github.com/roozbeh-yz/miniF2F_v2
β¨ Datasets citing this paper:
β’ https://huggingface.co/datasets/roozbeh-yz/miniF2F_v2
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AI #FormalReasoning #Benchmarks #MachineLearning #Dataset
β¨MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model
π Summary:
MicroVQA plus plus is a new high-quality microscopy VQA dataset built via a three-stage process. This includes HiCQA-Graph, a novel filtering method using NLI, CLIP, and MLLM signals. The dataset enables strong microscopy reasoning performance for MLLMs.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11407
β’ PDF: https://arxiv.org/pdf/2511.11407
β’ Github: https://github.com/ieellee/MicroVQA-PlusPlus
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#MLLM #Microscopy #VQA #AIResearch #Dataset
π Summary:
MicroVQA plus plus is a new high-quality microscopy VQA dataset built via a three-stage process. This includes HiCQA-Graph, a novel filtering method using NLI, CLIP, and MLLM signals. The dataset enables strong microscopy reasoning performance for MLLMs.
πΉ Publication Date: Published on Nov 14
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.11407
β’ PDF: https://arxiv.org/pdf/2511.11407
β’ Github: https://github.com/ieellee/MicroVQA-PlusPlus
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#MLLM #Microscopy #VQA #AIResearch #Dataset