This media is not supported in your browser
VIEW IN TELEGRAM
🦄Unified Region-Level MLLM🦄
👉PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available💙
👉Review https://t.ly/WH4dQ
👉Paper arxiv.org/pdf/2510.23603
👉Project circleradon.github.io/PixelRefer
👉Repo https://github.com/alibaba-damo-academy/PixelRefer
👉PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available💙
👉Review https://t.ly/WH4dQ
👉Paper arxiv.org/pdf/2510.23603
👉Project circleradon.github.io/PixelRefer
👉Repo https://github.com/alibaba-damo-academy/PixelRefer
🔥3❤2🤯2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱PlanarTrack: Large Planar Tracking🌱
👉PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available💙
👉Review https://t.ly/mYNi7
👉Paper arxiv.org/pdf/2510.23368
👉Repo https://lnkd.in/edb3GMyT
👉Project https://lnkd.in/eC-hVB-U
👉Data https://lnkd.in/eew2j4tM
👉PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available💙
👉Review https://t.ly/mYNi7
👉Paper arxiv.org/pdf/2510.23368
👉Repo https://lnkd.in/edb3GMyT
👉Project https://lnkd.in/eC-hVB-U
👉Data https://lnkd.in/eew2j4tM
🔥9❤5👏2👍1
This media is not supported in your browser
VIEW IN TELEGRAM
👢Generative View Stitching 👢
👉GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT💙
👉Review https://t.ly/TiN_5
👉Paper https://arxiv.org/pdf/2510.24718
👉Project https://andrewsonga.github.io/gvs/
👉Repo github.com/andrewsonga/generative_view_stitching
👉GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT💙
👉Review https://t.ly/TiN_5
👉Paper https://arxiv.org/pdf/2510.24718
👉Project https://andrewsonga.github.io/gvs/
👉Repo github.com/andrewsonga/generative_view_stitching
🔥9❤2👍1