AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
136 photos
248 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ…FlowMDM: Human Composition๐Ÿซ…

๐Ÿ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

๐Ÿ‘‰Review https://t.ly/pr2g_
๐Ÿ‘‰Paper https://lnkd.in/daYRftdF
๐Ÿ‘‰Project https://lnkd.in/dcRkv5Pc
๐Ÿ‘‰Repo https://lnkd.in/dw-3JJks
โค9๐Ÿ”ฅ6๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽทEMO: talking/singing Gen-AI ๐ŸŽท

๐Ÿ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

๐Ÿ‘‰Review https://t.ly/4IYj5
๐Ÿ‘‰Paper https://lnkd.in/dGPX2-Yc
๐Ÿ‘‰Project https://lnkd.in/dyf6p_N3
๐Ÿ‘‰Repo (empty) github.com/HumanAIGC/EMO
โค18๐Ÿ”ฅ7๐Ÿ‘4๐Ÿคฏ3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Œ Multi-LoRA Composition ๐Ÿ’Œ

๐Ÿ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GFy3Z
๐Ÿ‘‰Paper arxiv.org/pdf/2402.16843.pdf
๐Ÿ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
๐Ÿ‘11โค6๐Ÿ”ฅ2๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฅ MM-AU: Video Accident ๐Ÿ’ฅ

๐Ÿ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a-jKI
๐Ÿ‘‰Paper arxiv.org/pdf/2403.00436.pdf
๐Ÿ‘‰Dataset https://www.lotvsmmau.net/MMAU/demo
๐Ÿ‘11โค2๐Ÿ”ฅ2๐Ÿคฏ2
๐Ÿ”ฅ SOTA: Stable Diffusion 3 is out! ๐Ÿ”ฅ

๐Ÿ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a1koo
๐Ÿ‘‰Paper https://lnkd.in/d4i-9Bte
๐Ÿ‘‰Blog https://lnkd.in/d-bEX-ww
๐Ÿ”ฅ19โค5๐Ÿ‘3โšก1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตE-LoFTR: new Feats-Matching SOTA๐Ÿงต

๐Ÿ‘‰A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร— faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

๐Ÿ‘‰Review https://t.ly/7SPmC
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.04765.pdf
๐Ÿ‘‰Project https://zju3dv.github.io/efficientloftr/
๐Ÿ‘‰Repo https://github.com/zju3dv/efficientloftr
๐Ÿ”ฅ13๐Ÿ‘4๐Ÿคฏ2โค1
๐ŸฆStableDrag: Point-based Editing๐Ÿฆ

๐Ÿ‘‰#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

๐Ÿ‘‰Review https://t.ly/eUI05
๐Ÿ‘‰Paper https://lnkd.in/dz8-ymck
๐Ÿ‘‰Project stabledrag.github.io/
โค2๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›๏ธ PIXART-ฮฃ: 4K Generation ๐Ÿ›๏ธ

๐Ÿ‘‰PixArt-ฮฃ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Cm2Qh
๐Ÿ‘‰Paper arxiv.org/pdf/2403.04692.pdf
๐Ÿ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
๐Ÿ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
๐Ÿค—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
๐Ÿ”ฅ7โšก1โค1๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘บ Can GPT-4 play DOOM? ๐Ÿ‘บ

๐Ÿ‘‰Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

๐Ÿ‘‰Review https://t.ly/W8-0F
๐Ÿ‘‰Paper https://lnkd.in/dmsB7bjA
๐Ÿ‘‰Project https://lnkd.in/ddDPwjQB
๐Ÿคฏ8๐Ÿ’ฉ7๐Ÿ”ฅ2๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช–RT Humanoid from Head-Mounted Sensors๐Ÿช–

๐Ÿ‘‰#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

๐Ÿ‘‰Review https://t.ly/Si2Mp
๐Ÿ‘‰Paper arxiv.org/pdf/2403.06862.pdf
๐Ÿ‘‰Project www.zhengyiluo.com/SimXR/
โค12โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿท๏ธ Face Foundation Model ๐Ÿท๏ธ

๐Ÿ‘‰Arc2Face, the first foundation model for human faces. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/MfAFI
๐Ÿ‘‰Paper https://lnkd.in/dViE_tCd
๐Ÿ‘‰Project https://lnkd.in/d4MHdEZK
๐Ÿ‘‰Code https://lnkd.in/dv9ZtDfA
โค12๐Ÿ‘3๐Ÿ‘1๐Ÿคฉ1
๐ŸชผFaceXFormer: Unified Face-Transformer๐Ÿชผ

๐Ÿ‘‰FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

๐Ÿ‘‰Review https://t.ly/MfAFI
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.12960.pdf
๐Ÿ‘‰Project kartik-3004.github.io/facexformer_web/
๐Ÿ‘‰Code github.com/Kartik-3004/facexformer
๐Ÿ‘11โค4๐Ÿฅฐ2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ• DINO-based Video Tracking ๐Ÿฆ•

๐Ÿ‘‰The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_GIMT
๐Ÿ‘‰Paper https://lnkd.in/dsGVDcar
๐Ÿ‘‰Project dino-tracker.github.io/
๐Ÿ‘‰Code https://github.com/AssafSinger94/dino-tracker
๐Ÿ”ฅ18โค3๐Ÿคฏ2๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ– T-Rex 2: a new SOTA is out! ๐Ÿฆ–

๐Ÿ‘‰A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/fYw8D
๐Ÿ‘‰Paper https://lnkd.in/dpmRh2zh
๐Ÿ‘‰Project https://lnkd.in/dnR_jPcR
๐Ÿ‘‰Code https://lnkd.in/dnZnGRUn
๐Ÿ‘‰Demo https://lnkd.in/drDUEDYh
๐Ÿ”ฅ23๐Ÿ‘3๐Ÿคฏ2โค1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’„TinyBeauty: 460 FPS Make-up๐Ÿ’„

๐Ÿ‘‰TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!

๐Ÿ‘‰Review https://t.ly/LG5ok
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.15033.pdf
๐Ÿ‘‰Project https://tinybeauty.github.io/TinyBeauty/
๐Ÿ‘7๐Ÿคฏ4๐Ÿ˜2โšก1๐Ÿ”ฅ1๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜” AiOS: All-in-One-Stage Humans โ˜”

๐Ÿ‘‰All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.

๐Ÿ‘‰Review https://t.ly/ekNd4
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.17934.pdf
๐Ÿ‘‰Project https://ttxskk.github.io/AiOS/
๐Ÿ‘‰Code/Demo (announced)
โค6๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ€ MAVOS Object Segmentation ๐Ÿ€

๐Ÿ‘‰MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SKaRG
๐Ÿ‘‰Paper https://lnkd.in/dQyifKa3
๐Ÿ‘‰Project github.com/Amshaker/MAVOS
๐Ÿ”ฅ10๐Ÿ‘2โค1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฆ ObjectDrop: automagical objects removal ๐Ÿ’ฆ

๐Ÿ‘‰#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!

๐Ÿ‘‰Review https://t.ly/ZJ6NN
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.18818.pdf
๐Ÿ‘‰Project https://objectdrop.github.io/
๐Ÿ‘14๐Ÿคฏ8โค4๐Ÿ”ฅ3๐Ÿพ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชผ Universal Mono Metric Depth ๐Ÿชผ

๐Ÿ‘‰ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/5C8eq
๐Ÿ‘‰Paper arxiv.org/pdf/2403.18913.pdf
๐Ÿ‘‰Code github.com/lpiccinelli-eth/unidepth
๐Ÿ”ฅ10๐Ÿ‘1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”˜ RELI11D: Multimodal Humans ๐Ÿ”˜

๐Ÿ‘‰RELI11D is the ultimate and high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera. Dataset & Source Code to be released soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/5EG6X
๐Ÿ‘‰Paper https://lnkd.in/ep6Utcik
๐Ÿ‘‰Project https://lnkd.in/eDhNHYBb
โค3๐Ÿ”ฅ2