AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽRoboTAP: Dense Tracking for Few-Shot Imitation๐ŸŽ

๐Ÿ‘‰RoboTAP: novel dense tracking representation for robotic arm

๐Ÿ˜ŽReview https://t.ly/MCO_V
๐Ÿ˜ŽPaper arxiv.org/pdf/2308.15975.pdf
๐Ÿ˜ŽProject https://robotap.github.io/
๐Ÿ˜ŽCode github.com/deepmind/tapnet
๐Ÿ”ฅ8๐Ÿ‘2๐Ÿคฏ2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ›บFACET: Fairness in Computer Visionโ›บ

๐Ÿ‘‰#META AI opens a large, publicly available dataset for classification, detection & segmentation. Potential performance disparities & challenges across sensitive demographic attributes

๐Ÿ˜ŽReview https://t.ly/mKn-t
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.00035.pdf
๐Ÿ˜ŽDataset https://facet.iss.onetademolab.com/
๐Ÿ”ฅ10โค6๐Ÿ‘4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
โ™Š๏ธ Doppelgangers in Structures โ™Š๏ธ

๐Ÿ‘‰A novel learning-based approach for visual disambiguation: distinguishing illusory matches to produce correct, disambiguated #3D reconstructions

๐Ÿ˜ŽReview https://t.ly/9yLot
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.02420.pdf
๐Ÿ˜ŽCode github.com/RuojinCai/Doppelgangers
๐Ÿ˜ŽProject doppelgangers-3d.github.io/
๐Ÿ”ฅ8๐Ÿ‘3๐Ÿคฏ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿƒ Tracking Anything with Decoupled VOS ๐Ÿƒ

๐Ÿ‘‰A novel VOS approach that extends SAM for open-world video segmentation with no user input required

๐Ÿ˜ŽReview https://t.ly/xeobR
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.03903.pdf
๐Ÿ˜ŽProject hkchengrex.com/Tracking-Anything-with-DEVA
๐Ÿ˜ŽCode github.com/hkchengrex/Tracking-Anything-with-DEVA
๐Ÿ˜ŽColab https://colab.research.google.com/drive/1OsyNVoV_7ETD1zIE8UWxL3NXxu12m_YZ
๐Ÿ”ฅ13๐Ÿ‘6๐Ÿคฏ4โค2๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชท Diffusive Consistent Video Editing ๐Ÿชท

๐Ÿ‘‰ Weizmann Institute of Science unveils TokenFlow, a novel text-to-image diffusion model for text-driven video editing

๐Ÿ˜ŽReview https://t.ly/ru8km
๐Ÿ˜ŽPaper arxiv.org/pdf/2307.10373.pdf
๐Ÿ˜ŽProject diffusion-tokenflow.github.io
๐Ÿ˜ŽCode github.com/omerbt/TokenFlow
โค9๐Ÿ‘6๐Ÿ”ฅ2๐Ÿคฏ1๐Ÿ˜ฑ1๐Ÿ˜ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ๐Ÿ”ฅ #META's DINOv2 is now commercial! ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ‘‰Universal features for image classification, instance retrieval, video understanding, depth & semantic segmentation. Now suitable for commercial.

๐Ÿ˜ŽReview https://t.ly/LNrGy
๐Ÿ˜ŽPaper arxiv.org/pdf/2304.07193.pdf
๐Ÿ˜ŽCode github.com/facebookresearch/dinov2
๐Ÿ˜ŽDemo dinov2.metademolab.com/
๐Ÿ”ฅ15๐Ÿ‘3โค1๐Ÿคฏ1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง„FreeMan: towards #3D Humans ๐Ÿง„

๐Ÿ‘‰FreeMan: the first large-scale, real-world, multi-view dataset for #3D human pose estimation. 11M frames!

๐Ÿ˜ŽReview https://t.ly/ICxpA
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.05073.pdf
๐Ÿ˜ŽProject wangjiongw.github.io/freeman
๐Ÿ‘6๐Ÿคฏ4๐Ÿฅฐ1
๐ŸฆŠ MagiCapture: HD Multi-Concept Portrait ๐ŸฆŠ

๐Ÿ‘‰KAIST unveils MagiCapture: integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references

๐Ÿ˜ŽReview https://t.ly/c9rOo
๐Ÿ˜ŽPaper https://arxiv.org/pdf/2309.06895.pdf
โค5๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ Dynamic NeRFs for Soccer โšฝ

๐Ÿ‘‰SoccerNeRF: first attempt of "cheap" NeRF applied to football for reconstructing soccer replays in space and time.

๐Ÿ˜ŽReview https://t.ly/Ywcvk
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.06802.pdf
๐Ÿ˜ŽProject https://soccernerfs.isach.be/
๐Ÿ˜ŽCode github.com/iSach/SoccerNeRFs
๐Ÿ”ฅ8โค4๐Ÿ‘3๐Ÿคฉ2๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜ข๏ธ GlueStick: Graph Neural Matching โ˜ข๏ธ

๐Ÿ‘‰GlueStick is joint deep matcher for points and lines that leverages the connectivity information between nodes to better glue them together

๐Ÿ˜ŽReview https://t.ly/Atxqo
๐Ÿ˜ŽPaper arxiv.org/pdf/2304.02008.pdf
๐Ÿ˜ŽCode https://github.com/cvg/GlueStick
๐Ÿ”ฅ11๐Ÿ‘4โค1๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ€CPR-Coach: Neural Cardiopulmonary Resuscitation๐Ÿซ€

๐Ÿ‘‰CPR-Coach: fine-grained action recognition in cardiopulmonary resuscitation

๐Ÿ˜ŽReview https://t.ly/Qbg4K
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.11718.pdf
๐Ÿ˜ŽCode github.com/Shunli-Wang/CPR-Coach
๐Ÿ˜ŽProject shunli-wang.github.io/CPR-Coach
โค7๐Ÿ”ฅ3๐Ÿ‘1
๐Ÿงช NeuralLabeling with NeRF ๐Ÿงช

๐Ÿ‘‰Annotating a scene by generating segmentation masks, affordance maps, 2D bounding boxes, 3D BB, 6DOF poses, depth & meshes.

๐Ÿ˜ŽReview https://t.ly/1GPsj
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.11966.pdf
๐Ÿ˜ŽCode github.com/FlorisE/neural-labeling
๐Ÿ˜ŽProject florise.github.io/neural_labeling_web
๐Ÿ‘5๐Ÿคฏ3๐Ÿ”ฅ2โค1๐Ÿฅฐ1
๐ŸŸ DE-ViT: detecting everything via DINOv2 ๐ŸŸ

๐Ÿ‘‰DE-ViT: open-set object detector based on DINOv2 backbone. It's the new SOTA on COCO & LVIS dataset

๐Ÿ˜ŽReview https://t.ly/_DAmt
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.12969.pdf
๐Ÿ˜ŽCode https://github.com/mlzxy/devit
๐Ÿ”ฅ8๐Ÿ‘4โค1๐Ÿคฏ1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ตCoTracker: fast transformer-tracker๐Ÿ›ต

๐Ÿ‘‰META's CoTracker is a fast transformer-based model that can track any point in a video

๐Ÿ˜ŽReview https://t.ly/M36A_
๐Ÿ˜ŽPaper arxiv.org/pdf/2307.07635.pdf
๐Ÿ˜ŽProject https://co-tracker.github.io/
๐Ÿ˜ŽCode github.com/facebookresearch/co-tracker
โค7๐Ÿ‘4๐Ÿคฏ2๐Ÿ”ฅ1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฌ๏ธ Neural Blowing in Still Photos ๐ŸŒฌ๏ธ

๐Ÿ‘‰ A novel approach to animate human hair (and clothes) in a still portraits

๐Ÿ˜ŽReview https://t.ly/HKG0t
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.14207.pdf
๐Ÿ˜ŽProject nevergiveu.github.io/AutomaticHairBlowing
๐Ÿ‘6๐Ÿคฏ3๐Ÿ”ฅ1๐Ÿ‘1๐Ÿ˜1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฎ OW Indoor Segmentation ๐ŸŒฎ

๐Ÿ‘‰3D-OWIS is a novel open-world 3D indoor instance segmentation method (with auto-labeling scheme) to separate known/unknown category labels

๐Ÿ˜ŽReview https://t.ly/-7ALf
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.14338.pdf
๐Ÿ˜ŽCode github.com/aminebdj/3D-OWIS
๐Ÿ‘6๐Ÿ”ฅ1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฑ Generating Scenes from Touch ๐Ÿงฑ

๐Ÿ‘‰#AI for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks

๐Ÿ˜ŽReview https://t.ly/Gxr0L
๐Ÿ˜ŽPaper https://arxiv.org/pdf/2309.15117.pdf
๐Ÿ˜ŽProject https://fredfyyang.github.io/vision-from-touch
๐Ÿ˜ŽCode https://github.com/fredfyyang/vision-from-touch
๐Ÿคฏ9๐Ÿ‘6โค1๐Ÿ”ฅ1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜•Decaf: 3D Face-Hand Interactionsโ˜•

๐Ÿ‘‰The first learning-based MoCap to track human hands interacting with human faces in #3D from single monocular RGB videos

๐Ÿ˜ŽReview https://t.ly/070Tj
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.16670.pdf
๐Ÿ˜ŽProject vcai.mpi-inf.mpg.de/projects/Decaf
๐Ÿ‘8๐Ÿคฏ8๐Ÿ”ฅ3โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฑ Making LLaMA See and Draw ๐ŸŒฑ

๐Ÿ‘‰Tencent #AI planted a SEED of Vision in Large Language Model. Making LLaMA see 'n' draw stuff.

๐Ÿ˜ŽReview https://t.ly/QiCAv
๐Ÿ˜ŽPaper arxiv.org/pdf/2310.01218.pdf
๐Ÿ˜ŽCode github.com/AILab-CVC/SEED
โค8๐Ÿ‘4๐Ÿคฏ3๐Ÿ”ฅ1
๐Ÿ”ฅVisual-Math Q&A: MathVista is out! ๐Ÿ”ฅ

๐Ÿ‘‰ MathVista is the ultimate benchmark designed to amalgamate challenges from diverse mathematical and visual tasks

๐Ÿ˜ŽReview https://t.ly/yfqHZ
๐Ÿ˜ŽPaper https://arxiv.org/pdf/2310.02255.pdf
๐Ÿ˜ŽProject https://mathvista.github.io/
๐Ÿ˜ŽCode github.com/lupantech/MathVista
โค8๐Ÿ‘3๐Ÿ”ฅ3๐Ÿพ2๐Ÿ‘1๐Ÿคฏ1