AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
136 photos
250 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›οΈ PIXART-Ξ£: 4K Generation πŸ›οΈ

πŸ‘‰PixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced πŸ’™

πŸ‘‰Review https://t.ly/Cm2Qh
πŸ‘‰Paper arxiv.org/pdf/2403.04692.pdf
πŸ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
πŸ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
πŸ€—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
πŸ”₯7⚑1❀1πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘Ί Can GPT-4 play DOOM? πŸ‘Ί

πŸ‘‰Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

πŸ‘‰Review https://t.ly/W8-0F
πŸ‘‰Paper https://lnkd.in/dmsB7bjA
πŸ‘‰Project https://lnkd.in/ddDPwjQB
🀯8πŸ’©7πŸ”₯2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ–RT Humanoid from Head-Mounted SensorsπŸͺ–

πŸ‘‰#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

πŸ‘‰Review https://t.ly/Si2Mp
πŸ‘‰Paper arxiv.org/pdf/2403.06862.pdf
πŸ‘‰Project www.zhengyiluo.com/SimXR/
❀12⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🏷️ Face Foundation Model 🏷️

πŸ‘‰Arc2Face, the first foundation model for human faces. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://lnkd.in/dViE_tCd
πŸ‘‰Project https://lnkd.in/d4MHdEZK
πŸ‘‰Code https://lnkd.in/dv9ZtDfA
❀12πŸ‘3πŸ‘1🀩1
πŸͺΌFaceXFormer: Unified Face-TransformerπŸͺΌ

πŸ‘‰FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://arxiv.org/pdf/2403.12960.pdf
πŸ‘‰Project kartik-3004.github.io/facexformer_web/
πŸ‘‰Code github.com/Kartik-3004/facexformer
πŸ‘11❀4πŸ₯°2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦• DINO-based Video Tracking πŸ¦•

πŸ‘‰The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)πŸ’™

πŸ‘‰Review https://t.ly/_GIMT
πŸ‘‰Paper https://lnkd.in/dsGVDcar
πŸ‘‰Project dino-tracker.github.io/
πŸ‘‰Code https://github.com/AssafSinger94/dino-tracker
πŸ”₯18❀3🀯2πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦– T-Rex 2: a new SOTA is out! πŸ¦–

πŸ‘‰A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/fYw8D
πŸ‘‰Paper https://lnkd.in/dpmRh2zh
πŸ‘‰Project https://lnkd.in/dnR_jPcR
πŸ‘‰Code https://lnkd.in/dnZnGRUn
πŸ‘‰Demo https://lnkd.in/drDUEDYh
πŸ”₯23πŸ‘3🀯2❀1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’„TinyBeauty: 460 FPS Make-upπŸ’„

πŸ‘‰TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!

πŸ‘‰Review https://t.ly/LG5ok
πŸ‘‰Paper https://arxiv.org/pdf/2403.15033.pdf
πŸ‘‰Project https://tinybeauty.github.io/TinyBeauty/
πŸ‘7🀯4😍2⚑1πŸ”₯1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
β˜” AiOS: All-in-One-Stage Humans β˜”

πŸ‘‰All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.

πŸ‘‰Review https://t.ly/ekNd4
πŸ‘‰Paper https://arxiv.org/pdf/2403.17934.pdf
πŸ‘‰Project https://ttxskk.github.io/AiOS/
πŸ‘‰Code/Demo (announced)
❀6πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€ MAVOS Object Segmentation πŸ€

πŸ‘‰MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)πŸ’™

πŸ‘‰Review https://t.ly/SKaRG
πŸ‘‰Paper https://lnkd.in/dQyifKa3
πŸ‘‰Project github.com/Amshaker/MAVOS
πŸ”₯10πŸ‘2❀1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’¦ ObjectDrop: automagical objects removal πŸ’¦

πŸ‘‰#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!

πŸ‘‰Review https://t.ly/ZJ6NN
πŸ‘‰Paper https://arxiv.org/pdf/2403.18818.pdf
πŸ‘‰Project https://objectdrop.github.io/
πŸ‘14🀯8❀4πŸ”₯3🍾2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺΌ Universal Mono Metric Depth πŸͺΌ

πŸ‘‰ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code releasedπŸ’™

πŸ‘‰Review https://t.ly/5C8eq
πŸ‘‰Paper arxiv.org/pdf/2403.18913.pdf
πŸ‘‰Code github.com/lpiccinelli-eth/unidepth
πŸ”₯10πŸ‘1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”˜ RELI11D: Multimodal Humans πŸ”˜

πŸ‘‰RELI11D is the ultimate and high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera. Dataset & Source Code to be released soonπŸ’™

πŸ‘‰Review https://t.ly/5EG6X
πŸ‘‰Paper https://lnkd.in/ep6Utcik
πŸ‘‰Project https://lnkd.in/eDhNHYBb
❀3πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ ECoDepth: SOTA Diffusive Mono-Depth πŸ”₯

πŸ‘‰New SIDE model using a diffusion backbone conditioned on ViT embeddings. It's the new SOTA in SIDE. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/s2pbB
πŸ‘‰Paper https://lnkd.in/eYt5yr_q
πŸ‘‰Code https://lnkd.in/eEcyPQcd
πŸ”₯11πŸ‘4❀3⚑1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ•·οΈ Gen-NeRF2NeRF Translation πŸ•·οΈ

πŸ‘‰GenN2N: unified NeRF-to-NeRF translation for editing tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc.

πŸ‘‰Review https://t.ly/VMWAH
πŸ‘‰Paper arxiv.org/pdf/2404.02788.pdf
πŸ‘‰Project xiangyueliu.github.io/GenN2N/
πŸ‘‰Code github.com/Lxiangyue/GenN2N
🀯4❀3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘†iSeg: Interactive 3D SegmentationπŸ‘†

πŸ‘‰ iSeg: interactive segmentation technique for 3D shapes operating entirely in 3D. It accepts both positive/negative clicks directly on the shape's surface, indicating inclusion & exclusion of regions.

πŸ‘‰Review https://t.ly/tyFnD
πŸ‘‰Paper https://lnkd.in/dydAz8zp
πŸ‘‰Project https://lnkd.in/de-h6SRi
πŸ‘‰Code (coming)
❀7πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— Neural Bodies with Clothes πŸ‘—

πŸ‘‰Neural-ABC is a novel parametric model based on neural implicit functions that can represent clothed human bodies with disentangled latent spaces for ID, clothing, shape, and pose.

πŸ‘‰Review https://t.ly/Un1wc
πŸ‘‰Project https://lnkd.in/dhDG6FF5
πŸ‘‰Paper https://lnkd.in/dhcfK7jZ
πŸ‘‰Code https://lnkd.in/dQvXWysP
πŸ”₯7πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”Œ BodyMAP: human body & pressure πŸ”Œ

πŸ‘‰#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming πŸ’™

πŸ‘‰Review https://t.ly/8926S
πŸ‘‰Project bodymap3d.github.io/
πŸ‘‰Paper https://lnkd.in/gCxH4ev3
πŸ‘‰Code https://lnkd.in/gaifdy3q
❀8🀯4⚑1πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞 XComposer2: 4K Vision-Language 🧞

πŸ‘‰InternLMXComposer2-4KHD brings LVLM resolution capabilities up to 4K HD (3840Γ—1600) and beyond. Authors: Shanghai AI Lab, CUHK, SenseTime & Tsinghua. Source Code & Models released πŸ’™

πŸ‘‰Review https://t.ly/GCHsz
πŸ‘‰Paper arxiv.org/pdf/2404.06512.pdf
πŸ‘‰Code github.com/InternLM/InternLM-XComposer
πŸ₯°7⚑2πŸ‘1