AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🌾LLaNA: NeRF-LLM assistant🌾

πŸ‘‰UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.

πŸ‘‰Review https://t.ly/JAfhV
πŸ‘‰Paper arxiv.org/pdf/2406.11840
πŸ‘‰Project andreamaduzzi.github.io/llana/
πŸ‘‰Code & Data coming
❀16πŸ”₯2πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Depth Anything v2 is out! πŸ”₯

πŸ‘‰ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models availableπŸ’™

πŸ‘‰Review https://t.ly/QX9Nu
πŸ‘‰Paper arxiv.org/pdf/2406.09414
πŸ‘‰Project depth-anything-v2.github.io/
πŸ‘‰Repo github.com/DepthAnything/Depth-Anything-V2
πŸ‘‰Data huggingface.co/datasets/depth-anything/DA-2K
πŸ”₯10🀯9⚑1❀1πŸ‘1πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ…Anomaly Object-DetectionπŸͺ…

πŸ‘‰The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying β€˜odd-looking’ objects relative to the other instances within a multiple-views scene. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/3dGHp
πŸ‘‰Paper arxiv.org/pdf/2406.20099
πŸ‘‰Repo https://lnkd.in/d9x6FpUq
❀10πŸ”₯6πŸ‘3πŸ‘3⚑1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ© MimicMotion: HQ Motion Generation πŸͺ©

πŸ‘‰#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code availableπŸ’™

πŸ‘‰Review https://t.ly/XFoin
πŸ‘‰Paper arxiv.org/pdf/2406.19680
πŸ‘‰Project https://lnkd.in/eW-CMg_C
πŸ‘‰Code https://lnkd.in/eZ6SC2bc
πŸ”₯12πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ΄ CAVIS: SOTA Context-Aware SegmentationπŸͺ΄

πŸ‘‰DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announcedπŸ’™

πŸ‘‰Review https://t.ly/G5obN
πŸ‘‰Paper arxiv.org/pdf/2407.03010
πŸ‘‰Repo github.com/Seung-Hun-Lee/CAVIS
πŸ‘‰Project seung-hun-lee.github.io/projects/CAVIS
❀6πŸ‘5πŸ”₯4πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Segment Any 4D Gaussians πŸ”₯

πŸ‘‰SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024πŸ’™

πŸ‘‰Review https://t.ly/uw3FS
πŸ‘‰Paper https://arxiv.org/pdf/2407.04504
πŸ‘‰Project https://jsxzs.github.io/sa4d/
πŸ‘‰Repo https://github.com/hustvl/SA4D
🀯5πŸ‘3❀2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€– CODERS: Stereo Detection, 6D & Shape πŸ€–

πŸ‘‰CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announcedπŸ’™

πŸ‘‰Review https://t.ly/Xpizz
πŸ‘‰Paper https://lnkd.in/dr5ZxC46
πŸ‘‰Project xingyoujun.github.io/coders/
πŸ‘‰Repo (TBA)
πŸ”₯12❀1πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🐸 Tracking Everything via Decomposition 🐸

πŸ‘‰Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT LicenseπŸ’™

πŸ‘‰Review https://t.ly/OsFTO
πŸ‘‰Paper https://arxiv.org/pdf/2407.06531
πŸ‘‰Repo github.com/qianduoduolr/DecoMotion
πŸ”₯9πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍾TAPVid-3D: benchmark for TAP-3D🍾

πŸ‘‰#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/SsptD
πŸ‘‰Paper arxiv.org/pdf/2407.05921
πŸ‘‰Project tapvid3d.github.io/
πŸ‘‰Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
πŸ”₯3πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ 940+ FPS Multi-Person Pose Estimation πŸ”₯

πŸ‘‰RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models πŸ’™

πŸ‘‰Review https://t.ly/XkBmg
πŸ‘‰Paper arxiv.org/pdf/2407.08634
πŸ‘‰Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
❀8πŸ”₯4πŸ‘1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯₯ OmniNOCS: largest 3D NOCS πŸ₯₯

πŸ‘‰OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/xPgBn
πŸ‘‰Paper arxiv.org/pdf/2407.08711
πŸ‘‰Project https://omninocs.github.io/
πŸ‘‰Data github.com/google-deepmind/omninocs
πŸ”₯4❀3πŸ‘2πŸ‘1πŸ₯°1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’Œ KineTy: Typography Diffusion πŸ’Œ

πŸ‘‰GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0πŸ’™

πŸ‘‰Review https://t.ly/2FWo9
πŸ‘‰Paper arxiv.org/pdf/2407.10476
πŸ‘‰Project seonmip.github.io/kinety/
πŸ‘‰Repo github.com/SeonmiP/KineTy/tree/main
❀4πŸ‘1πŸ”₯1πŸ₯°1
πŸ“ˆGradient Boosting Reinforcement LearningπŸ“ˆ

πŸ‘‰#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zv9pl
πŸ‘‰Paper https://arxiv.org/pdf/2407.08250
πŸ‘‰Code https://github.com/NVlabs/gbrl
❀7🀯4πŸ‘3πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
🧿 Shape of Motion for 4D 🧿

πŸ‘‰ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/d9RsA
πŸ‘‰Project https://shape-of-motion.github.io/
πŸ‘‰Paper arxiv.org/pdf/2407.13764
πŸ‘‰Code github.com/vye16/shape-of-motion/
❀5🀯4πŸ”₯2πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎭 TRG: new SOTA 6DoF Head 🎭

πŸ‘‰ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be releasedπŸ’™

πŸ‘‰Review https://t.ly/lOIRA
πŸ‘‰Paper https://lnkd.in/dCWEwNyF
πŸ‘‰Code https://lnkd.in/dzRrwKBD
πŸ”₯5🀯3πŸ‘1πŸ₯°1
πŸ†Who's the REAL SOTA tracker in the world?πŸ†

πŸ‘‰BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code availableπŸ’™

πŸ‘‰Review https://t.ly/WB9AR
πŸ‘‰Paper https://arxiv.org/pdf/2407.15707
πŸ‘‰Code github.com/BasitAlawode/Best_of_N_Trackers
πŸ”₯5🀯5πŸ‘2❀1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🐒 TAPTRv2: new SOTA for TAP 🐒

πŸ‘‰TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 comingπŸ’™

πŸ‘‰Review https://t.ly/H84ae
πŸ‘‰Paper v1 https://lnkd.in/d4vD_6xx
πŸ‘‰Paper v2 https://lnkd.in/dE_TUzar
πŸ‘‰Project https://taptr.github.io/
πŸ‘‰Code https://lnkd.in/dgfs9Qdy
πŸ‘6πŸ”₯3🀯3❀2😱1
🧱EAFormer: Scene Text-Segm.🧱

πŸ‘‰A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on πŸ€—

πŸ‘‰Review https://t.ly/0G2uX
πŸ‘‰Paper arxiv.org/pdf/2407.17020
πŸ‘‰Project hyangyu.github.io/EAFormer/
πŸ‘‰Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
❀14πŸ”₯6πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘½ Keypoint Promptable Re-ID πŸ‘½

πŸ‘‰KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soonπŸ’™

πŸ‘‰Review https://t.ly/vCXV_
πŸ‘‰Paper https://arxiv.org/pdf/2407.18112
πŸ‘‰Repo github.com/VlSomers/keypoint_promptable_reidentification
πŸ”₯6πŸ‘3πŸ₯°1