AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Depth Anything v2 is out! ๐Ÿ”ฅ

๐Ÿ‘‰ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QX9Nu
๐Ÿ‘‰Paper arxiv.org/pdf/2406.09414
๐Ÿ‘‰Project depth-anything-v2.github.io/
๐Ÿ‘‰Repo github.com/DepthAnything/Depth-Anything-V2
๐Ÿ‘‰Data huggingface.co/datasets/depth-anything/DA-2K
๐Ÿ”ฅ10๐Ÿคฏ9โšก1โค1๐Ÿ‘1๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช…Anomaly Object-Detection๐Ÿช…

๐Ÿ‘‰The University of Edinburgh introduces a novel anomaly detection problem that focuses on identifying โ€˜odd-lookingโ€™ objects relative to the other instances within a multiple-views scene. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3dGHp
๐Ÿ‘‰Paper arxiv.org/pdf/2406.20099
๐Ÿ‘‰Repo https://lnkd.in/d9x6FpUq
โค10๐Ÿ”ฅ6๐Ÿ‘3๐Ÿ‘3โšก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชฉ MimicMotion: HQ Motion Generation ๐Ÿชฉ

๐Ÿ‘‰#Tencent opens a novel controllable video generation framework, dubbed MimicMotion, which can generate HQ videos of arbitrary length mimicking specific motion guidance. Source Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XFoin
๐Ÿ‘‰Paper arxiv.org/pdf/2406.19680
๐Ÿ‘‰Project https://lnkd.in/eW-CMg_C
๐Ÿ‘‰Code https://lnkd.in/eZ6SC2bc
๐Ÿ”ฅ12๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชด CAVIS: SOTA Context-Aware Segmentation๐Ÿชด

๐Ÿ‘‰DGIST unveils the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. It's the new SOTA in several benchmarks. Source Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/G5obN
๐Ÿ‘‰Paper arxiv.org/pdf/2407.03010
๐Ÿ‘‰Repo github.com/Seung-Hun-Lee/CAVIS
๐Ÿ‘‰Project seung-hun-lee.github.io/projects/CAVIS
โค6๐Ÿ‘5๐Ÿ”ฅ4๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Segment Any 4D Gaussians ๐Ÿ”ฅ

๐Ÿ‘‰SA4G is a novel framework to segment anything in #4D Gaussians world. HQ segmentation within seconds in 4D Gaussians and remove, recolor, compose, and render HQ anything masks. Source Code available within August 2024๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/uw3FS
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.04504
๐Ÿ‘‰Project https://jsxzs.github.io/sa4d/
๐Ÿ‘‰Repo https://github.com/hustvl/SA4D
๐Ÿคฏ5๐Ÿ‘3โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค– CODERS: Stereo Detection, 6D & Shape ๐Ÿค–

๐Ÿ‘‰CODERS: one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images. Source Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Xpizz
๐Ÿ‘‰Paper https://lnkd.in/dr5ZxC46
๐Ÿ‘‰Project xingyoujun.github.io/coders/
๐Ÿ‘‰Repo (TBA)
๐Ÿ”ฅ12โค1๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿธ Tracking Everything via Decomposition ๐Ÿธ

๐Ÿ‘‰Hefei unveils a novel decoupled representation that divides static scenes and dynamic objects in terms of motion and appearance. A more robust tracking through occlusions and deformations. Source Code announced under MIT License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/OsFTO
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.06531
๐Ÿ‘‰Repo github.com/qianduoduolr/DecoMotion
๐Ÿ”ฅ9๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸพTAPVid-3D: benchmark for TAP-3D๐Ÿพ

๐Ÿ‘‰#Deepmind (+College London & Oxford) introduces TAPVid-3D, a new benchmark for evaluating long-range Tracking Any Point in 3D: 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor/outdoor environments. Data & Code available, Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SsptD
๐Ÿ‘‰Paper arxiv.org/pdf/2407.05921
๐Ÿ‘‰Project tapvid3d.github.io/
๐Ÿ‘‰Code github.com/google-deepmind/tapnet/tree/main/tapnet/tapvid3d
๐Ÿ”ฅ3๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ 940+ FPS Multi-Person Pose Estimation ๐Ÿ”ฅ

๐Ÿ‘‰RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XkBmg
๐Ÿ‘‰Paper arxiv.org/pdf/2407.08634
๐Ÿ‘‰Repo github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
โค8๐Ÿ”ฅ4๐Ÿ‘1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅฅ OmniNOCS: largest 3D NOCS ๐Ÿฅฅ

๐Ÿ‘‰OmniNOCS by #Google (+Georgia) is a unified NOCS (Normalized Object Coordinate Space) dataset that contains data across different domains with 90+ object classes. The largest NOCS dataset to date. Data & Code available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/xPgBn
๐Ÿ‘‰Paper arxiv.org/pdf/2407.08711
๐Ÿ‘‰Project https://omninocs.github.io/
๐Ÿ‘‰Data github.com/google-deepmind/omninocs
๐Ÿ”ฅ4โค3๐Ÿ‘2๐Ÿ‘1๐Ÿฅฐ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Œ KineTy: Typography Diffusion ๐Ÿ’Œ

๐Ÿ‘‰GIST introduces a novel realistic kinetic typography generation driven by text. Guided video diffusion models to achieve visually-pleasing text appearances. Repo to be released under Attribution-NC 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/2FWo9
๐Ÿ‘‰Paper arxiv.org/pdf/2407.10476
๐Ÿ‘‰Project seonmip.github.io/kinety/
๐Ÿ‘‰Repo github.com/SeonmiP/KineTy/tree/main
โค4๐Ÿ‘1๐Ÿ”ฅ1๐Ÿฅฐ1
๐Ÿ“ˆGradient Boosting Reinforcement Learning๐Ÿ“ˆ

๐Ÿ‘‰#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zv9pl
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.08250
๐Ÿ‘‰Code https://github.com/NVlabs/gbrl
โค7๐Ÿคฏ4๐Ÿ‘3๐Ÿ”ฅ1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฟ Shape of Motion for 4D ๐Ÿงฟ

๐Ÿ‘‰ Google (+Berkeley) unveils a novel method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. Impressive tracking capabilities. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/d9RsA
๐Ÿ‘‰Project https://shape-of-motion.github.io/
๐Ÿ‘‰Paper arxiv.org/pdf/2407.13764
๐Ÿ‘‰Code github.com/vye16/shape-of-motion/
โค5๐Ÿคฏ4๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽญ TRG: new SOTA 6DoF Head ๐ŸŽญ

๐Ÿ‘‰ECE (Korea) unveils TRG, a novel landmark-based method for estimating a 6DoF head pose which stands out for its explicit bidirectional interaction structure. Experiments on ARKitFace & BIWI confirm it's the new SOTA. Source Code & Models to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/lOIRA
๐Ÿ‘‰Paper https://lnkd.in/dCWEwNyF
๐Ÿ‘‰Code https://lnkd.in/dzRrwKBD
๐Ÿ”ฅ5๐Ÿคฏ3๐Ÿ‘1๐Ÿฅฐ1
๐Ÿ†Who's the REAL SOTA tracker in the world?๐Ÿ†

๐Ÿ‘‰BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks (LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M). Source Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WB9AR
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.15707
๐Ÿ‘‰Code github.com/BasitAlawode/Best_of_N_Trackers
๐Ÿ”ฅ5๐Ÿคฏ5๐Ÿ‘2โค1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿข TAPTRv2: new SOTA for TAP ๐Ÿข

๐Ÿ‘‰TAPTRv2: Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DETR and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. The Source Code of V1 is available, V2 coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/H84ae
๐Ÿ‘‰Paper v1 https://lnkd.in/d4vD_6xx
๐Ÿ‘‰Paper v2 https://lnkd.in/dE_TUzar
๐Ÿ‘‰Project https://taptr.github.io/
๐Ÿ‘‰Code https://lnkd.in/dgfs9Qdy
๐Ÿ‘6๐Ÿ”ฅ3๐Ÿคฏ3โค2๐Ÿ˜ฑ1
๐ŸงฑEAFormer: Scene Text-Segm.๐Ÿงฑ

๐Ÿ‘‰A novel Edge-Aware Transformers to segment texts more accurately, especially at the edges. FULL re-annotation of COCO_TS and MLT_S! Code coming, data available on ๐Ÿค—

๐Ÿ‘‰Review https://t.ly/0G2uX
๐Ÿ‘‰Paper arxiv.org/pdf/2407.17020
๐Ÿ‘‰Project hyangyu.github.io/EAFormer/
๐Ÿ‘‰Data huggingface.co/datasets/HaiyangYu/TextSegmentation/tree/main
โค14๐Ÿ”ฅ6๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฝ Keypoint Promptable Re-ID ๐Ÿ‘ฝ

๐Ÿ‘‰KPR is a novel formulation of the ReID problem that explicitly complements the input BBox with a set of semantic keypoints indicating the intended target. Code, dataset and annotations coming soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/vCXV_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2407.18112
๐Ÿ‘‰Repo github.com/VlSomers/keypoint_promptable_reidentification
๐Ÿ”ฅ6๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ A guide for modern CV ๐ŸŽ

๐Ÿ‘‰In the last 18 months I received 1,100+ applications for research roles. The majority part of the applicants doesn't deeply know a few milestones in CV. Here a short collection of mostly-free resources to spend a bit of good time in the summer.

๐๐จ๐จ๐ค๐ฌ:
โœ…DL with Python https://t.ly/VjaVx
โœ…Python OOP https://t.ly/pTQRm

V๐ข๐๐ž๐จ ๐‚๐จ๐ฎ๐ซ๐ฌ๐ž๐ฌ:
โœ…Berkeley | Modern CV (2023) https://t.ly/AU7S3

๐‹๐ข๐›๐ซ๐š๐ซ๐ข๐ž๐ฌ:
โœ…PyTorch https://lnkd.in/dTvJbjAx
โœ…PyTorchLighting https://lnkd.in/dAruPA6T
โœ…Albumentations https://albumentations.ai/

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โœ…EfficientNet https://lnkd.in/dTsT44ae
โœ…ViT https://lnkd.in/dB5yKdaW
โœ…UNet https://lnkd.in/dnpKVa6T
โœ…DeepLabV3+ https://lnkd.in/dVvqkmPk
โœ…YOLOv1: https://lnkd.in/dQ9rs53B
โœ…YOLOv2: arxiv.org/abs/1612.08242
โœ…YOLOX: https://lnkd.in/d9ZtsF7g
โœ…SAM: https://arxiv.org/abs/2304.02643

๐Ÿ‘‰More papers and the full list: https://t.ly/WAwAk
โค34๐Ÿ‘19