This media is not supported in your browser
VIEW IN TELEGRAM
πΎLLaNA: NeRF-LLM assistantπΎ
πUniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.
πReview https://t.ly/JAfhV
πPaper arxiv.org/pdf/2406.11840
πProject andreamaduzzi.github.io/llana/
πCode & Data coming
πUniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.
πReview https://t.ly/JAfhV
πPaper arxiv.org/pdf/2406.11840
πProject andreamaduzzi.github.io/llana/
πCode & Data coming
β€16π₯2π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Depth Anything v2 is out! π₯
π Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models availableπ
πReview https://t.ly/QX9Nu
πPaper arxiv.org/pdf/2406.09414
πProject depth-anything-v2.github.io/
πRepo github.com/DepthAnything/Depth-Anything-V2
πData huggingface.co/datasets/depth-anything/DA-2K
π Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models availableπ
πReview https://t.ly/QX9Nu
πPaper arxiv.org/pdf/2406.09414
πProject depth-anything-v2.github.io/
πRepo github.com/DepthAnything/Depth-Anything-V2
πData huggingface.co/datasets/depth-anything/DA-2K
π₯10π€―9β‘1β€1π1π₯°1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ
Anomaly Object-Detectionπͺ
πThe University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying βodd-lookingβ objects relative to the other instances within a multiple-views scene. Code announcedπ
πReview https://t.ly/3dGHp
πPaper arxiv.org/pdf/2406.20099
πRepo https://lnkd.in/d9x6FpUq
πThe University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying βodd-lookingβ objects relative to the other instances within a multiple-views scene. Code announcedπ
πReview https://t.ly/3dGHp
πPaper arxiv.org/pdf/2406.20099
πRepo https://lnkd.in/d9x6FpUq
β€10π₯6π3π3β‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ© MimicMotion: HQ Motion Generation πͺ©
π#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code availableπ
πReview https://t.ly/XFoin
πPaper arxiv.org/pdf/2406.19680
πProject https://lnkd.in/eW-CMg_C
πCode https://lnkd.in/eZ6SC2bc
π#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code availableπ
πReview https://t.ly/XFoin
πPaper arxiv.org/pdf/2406.19680
πProject https://lnkd.in/eW-CMg_C
πCode https://lnkd.in/eZ6SC2bc
π₯12π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ΄ CAVIS: SOTA Context-Aware Segmentationπͺ΄
πDGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announcedπ
πReview https://t.ly/G5obN
πPaper arxiv.org/pdf/2407.03010
πRepo github.com/Seung-Hun-Lee/CAVIS
πProject seung-hun-lee.github.io/projects/CAVIS
πDGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announcedπ
πReview https://t.ly/G5obN
πPaper arxiv.org/pdf/2407.03010
πRepo github.com/Seung-Hun-Lee/CAVIS
πProject seung-hun-lee.github.io/projects/CAVIS
β€6π5π₯4π2
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Segment Any 4D Gaussians π₯
πSA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024π
πReview https://t.ly/uw3FS
πPaper https://arxiv.org/pdf/2407.04504
πProject https://jsxzs.github.io/sa4d/
πRepo https://github.com/hustvl/SA4D
πSA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024π
πReview https://t.ly/uw3FS
πPaper https://arxiv.org/pdf/2407.04504
πProject https://jsxzs.github.io/sa4d/
πRepo https://github.com/hustvl/SA4D
π€―5π3β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π€ CODERS: Stereo Detection, 6D & Shape π€
πCODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announcedπ
πReview https://t.ly/Xpizz
πPaper https://lnkd.in/dr5ZxC46
πProject xingyoujun.github.io/coders/
πRepo (TBA)
πCODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announcedπ
πReview https://t.ly/Xpizz
πPaper https://lnkd.in/dr5ZxC46
πProject xingyoujun.github.io/coders/
πRepo (TBA)
π₯12β€1π1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πΈ Tracking Everything via Decomposition πΈ
πHefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT Licenseπ
πReview https://t.ly/OsFTO
πPaper https://arxiv.org/pdf/2407.06531
πRepo github.com/qianduoduolr/DecoMotion
πHefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT Licenseπ
πReview https://t.ly/OsFTO
πPaper https://arxiv.org/pdf/2407.06531
πRepo github.com/qianduoduolr/DecoMotion
π₯9π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΎTAPVid-3D: benchmark for TAP-3DπΎ
π#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0π
πReview https://t.ly/SsptD
πPaper arxiv.org/pdf/2407.05921
πProject tapvid3d.github.io/
πCode github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
π#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0π
πReview https://t.ly/SsptD
πPaper arxiv.org/pdf/2407.05921
πProject tapvid3d.github.io/
πCode github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
π₯3π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ 940+ FPS Multi-Person Pose Estimation π₯
πRTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models π
πReview https://t.ly/XkBmg
πPaper arxiv.org/pdf/2407.08634
πRepo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
πRTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models π
πReview https://t.ly/XkBmg
πPaper arxiv.org/pdf/2407.08634
πRepo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
β€8π₯4π1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯₯ OmniNOCS: largest 3D NOCS π₯₯
πOmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0π
πReview https://t.ly/xPgBn
πPaper arxiv.org/pdf/2407.08711
πProject https://omninocs.github.io/
πData github.com/google-deepmind/omninocs
πOmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0π
πReview https://t.ly/xPgBn
πPaper arxiv.org/pdf/2407.08711
πProject https://omninocs.github.io/
πData github.com/google-deepmind/omninocs
π₯4β€3π2π1π₯°1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π KineTy: Typography Diffusion π
πGIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0π
πReview https://t.ly/2FWo9
πPaper arxiv.org/pdf/2407.10476
πProject seonmip.github.io/kinety/
πRepo github.com/SeonmiP/KineTy/tree/main
πGIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0π
πReview https://t.ly/2FWo9
πPaper arxiv.org/pdf/2407.10476
πProject seonmip.github.io/kinety/
πRepo github.com/SeonmiP/KineTy/tree/main
β€4π1π₯1π₯°1
πGradient Boosting Reinforcement Learningπ
π#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπ
πReview https://t.ly/zv9pl
πPaper https://arxiv.org/pdf/2407.08250
πCode https://github.com/NVlabs/gbrl
π#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπ
πReview https://t.ly/zv9pl
πPaper https://arxiv.org/pdf/2407.08250
πCode https://github.com/NVlabs/gbrl
β€7π€―4π3π₯1π₯°1
Hi folks,
I need you help π
π Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
I need you help π
π Could you help me understanding what do you think about the lasting of the hiring process for #AI roles? Any comment here will be appreciated :)
Vote here: https://t.ly/UMRXH
Thanks <3
Linkedin
#ai #artificialintelligence #machinelearning #ml #ai #deeplearning #computervision #hiring | Alessandro Ferrari
π½ ARGO Vision is gonna open new positions for #AI & research in computer vision. I'm doing my best to make the hiring process the smoother as possible. Our current process is managed by a quick tech/intro interview with me, followed by a tech/scientific/codingβ¦
π5
This media is not supported in your browser
VIEW IN TELEGRAM
π§Ώ Shape of Motion for 4D π§Ώ
π Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released π
πReview https://t.ly/d9RsA
πProject https://shape-of-motion.github.io/
πPaper arxiv.org/pdf/2407.13764
πCode github.com/vye16/shape-of-motion/
π Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released π
πReview https://t.ly/d9RsA
πProject https://shape-of-motion.github.io/
πPaper arxiv.org/pdf/2407.13764
πCode github.com/vye16/shape-of-motion/
β€5π€―4π₯2π1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π TRG: new SOTA 6DoF Head π
πECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedπ
πReview https://t.ly/lOIRA
πPaper https://lnkd.in/dCWEwNyF
πCode https://lnkd.in/dzRrwKBD
πECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedπ
πReview https://t.ly/lOIRA
πPaper https://lnkd.in/dCWEwNyF
πCode https://lnkd.in/dzRrwKBD
π₯5π€―3π1π₯°1
πWho's the REAL SOTA tracker in the world?π
πBofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableπ
πReview https://t.ly/WB9AR
πPaper https://arxiv.org/pdf/2407.15707
πCode github.com/BasitAlawode/Best_of_N_Trackers
πBofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableπ
πReview https://t.ly/WB9AR
πPaper https://arxiv.org/pdf/2407.15707
πCode github.com/BasitAlawode/Best_of_N_Trackers
π₯5π€―5π2β€1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π’ TAPTRv2: new SOTA for TAP π’
πTAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingπ
πReview https://t.ly/H84ae
πPaper v1 https://lnkd.in/d4vD_6xx
πPaper v2 https://lnkd.in/dE_TUzar
πProject https://taptr.github.io/
πCode https://lnkd.in/dgfs9Qdy
πTAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingπ
πReview https://t.ly/H84ae
πPaper v1 https://lnkd.in/d4vD_6xx
πPaper v2 https://lnkd.in/dE_TUzar
πProject https://taptr.github.io/
πCode https://lnkd.in/dgfs9Qdy
π6π₯3π€―3β€2π±1
π§±EAFormer: Scene Text-Segm.π§±
πA novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on π€
πReview https://t.ly/0G2uX
πPaper arxiv.org/pdf/2407.17020
πProject hyangyu.github.io/EAFormer/
πData huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
πA novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on π€
πReview https://t.ly/0G2uX
πPaper arxiv.org/pdf/2407.17020
πProject hyangyu.github.io/EAFormer/
πData huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
β€14π₯6π1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π½ Keypoint Promptable Re-ID π½
πKPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonπ
πReview https://t.ly/vCXV_
πPaper https://arxiv.org/pdf/2407.18112
πRepo github.com/VlSomers/keypoint_promptable_reidentification
πKPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonπ
πReview https://t.ly/vCXV_
πPaper https://arxiv.org/pdf/2407.18112
πRepo github.com/VlSomers/keypoint_promptable_reidentification
π₯6π3π₯°1