AI with Papers - Artificial Intelligence & Deep Learning

🫅FlowMDM: Human Composition🫅

👉FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

👉Review https://t.ly/pr2g_
👉Paper https://lnkd.in/daYRftdF
👉Project https://lnkd.in/dcRkv5Pc
👉Repo https://lnkd.in/dw-3JJks

❤9🔥6👍1👏1

9K views08:03

AI with Papers - Artificial Intelligence & Deep Learning

0:06

This media is not supported in your browser

VIEW IN TELEGRAM

🎷EMO: talking/singing Gen-AI 🎷

👉EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

👉Review https://t.ly/4IYj5
👉Paper https://lnkd.in/dGPX2-Yc
👉Project https://lnkd.in/dyf6p_N3
👉Repo (empty) github.com/HumanAIGC/EMO

❤18🔥7👍4🤯3🥰1

8.69K viewsedited 07:55

AI with Papers - Artificial Intelligence & Deep Learning

0:09

This media is not supported in your browser

VIEW IN TELEGRAM

💌 Multi-LoRA Composition 💌

👉Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released 💙

👉Review https://t.ly/GFy3Z
👉Paper arxiv.org/pdf/2402.16843.pdf
👉Code github.com/maszhongming/Multi-LoRA-Composition

👍11❤6🔥2🥰1👏1

8.76K views09:27

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

💥 MM-AU: Video Accident 💥

👉MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced 💙

👉Review https://t.ly/a-jKI
👉Paper arxiv.org/pdf/2403.00436.pdf
👉Dataset https://www.lotvsmmau.net/MMAU/demo

👍11❤2🔥2🤯2

8.44K viewsedited 07:58

AI with Papers - Artificial Intelligence & Deep Learning

🔥 SOTA: Stable Diffusion 3 is out! 🔥

👉Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released 💙

👉Review https://t.ly/a1koo
👉Paper https://lnkd.in/d4i-9Bte
👉Blog https://lnkd.in/d-bEX-ww

🔥19❤5👏3⚡1👍1😱1

8.4K viewsedited 13:33

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧵E-LoFTR: new Feats-Matching SOTA🧵

👉A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5× faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

👉Review https://t.ly/7SPmC
👉Paper https://arxiv.org/pdf/2403.04765.pdf
👉Project https://zju3dv.github.io/efficientloftr/
👉Repo https://github.com/zju3dv/efficientloftr

🔥13👍4🤯2❤1

7.89K views08:19

AI with Papers - Artificial Intelligence & Deep Learning

🦁StableDrag: Point-based Editing🦁

👉#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

👉Review https://t.ly/eUI05
👉Paper https://lnkd.in/dz8-ymck
👉Project stabledrag.github.io/

❤2👍1🔥1👏1

7.95K views13:29

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🏛️ PIXART-Σ: 4K Generation 🏛️

👉PixArt-Σ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced 💙

👉Review https://t.ly/Cm2Qh
👉Paper arxiv.org/pdf/2403.04692.pdf
👉Project pixart-alpha.github.io/PixArt-sigma-project/
👉Repo (empty) github.com/PixArt-alpha/PixArt-sigma
🤗-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha

🔥7⚡1❤1👍1🤯1

8.67K views13:06

AI with Papers - Artificial Intelligence & Deep Learning

0:05

This media is not supported in your browser

VIEW IN TELEGRAM

👺 Can GPT-4 play DOOM? 👺

👉Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

👉Review https://t.ly/W8-0F
👉Paper https://lnkd.in/dmsB7bjA
👉Project https://lnkd.in/ddDPwjQB

🤯8💩7🔥2🥰1

8.49K views13:35

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪖RT Humanoid from Head-Mounted Sensors🪖

👉#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

👉Review https://t.ly/Si2Mp
👉Paper arxiv.org/pdf/2403.06862.pdf
👉Project www.zhengyiluo.com/SimXR/

❤12⚡1👍1

9.78K views11:58

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🏷️ Face Foundation Model 🏷️

👉Arc2Face, the first foundation model for human faces. Source Code released 💙

👉Review https://t.ly/MfAFI
👉Paper https://lnkd.in/dViE_tCd
👉Project https://lnkd.in/d4MHdEZK
👉Code https://lnkd.in/dv9ZtDfA

❤12👍3👏1🤩1

8.26K viewsedited 07:36

AI with Papers - Artificial Intelligence & Deep Learning

🪼FaceXFormer: Unified Face-Transformer🪼

👉FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

👉Review https://t.ly/MfAFI
👉Paper https://arxiv.org/pdf/2403.12960.pdf
👉Project kartik-3004.github.io/facexformer_web/
👉Code github.com/Kartik-3004/facexformer

👍11❤4🥰2🔥1

8.5K views07:29

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🦕 DINO-based Video Tracking 🦕

👉The Weizmann Institute announced the new SOTA in point-tracking via pre-trained DINO features. Source code announced (not yet released)💙

👉Review https://t.ly/_GIMT
👉Paper https://lnkd.in/dsGVDcar
👉Project dino-tracker.github.io/
👉Code https://github.com/AssafSinger94/dino-tracker

🔥18❤3🤯2👍1🤩1

8.68K viewsedited 08:40

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🦖 T-Rex 2: a new SOTA is out! 🦖

👉A novel (VERY STRONG) open-set object detector model. Strong zero-shot capabilities, suitable for various scenarios with only one suit of weights. Demo and Source Code released💙

👉Review https://t.ly/fYw8D
👉Paper https://lnkd.in/dpmRh2zh
👉Project https://lnkd.in/dnR_jPcR
👉Code https://lnkd.in/dnZnGRUn
👉Demo https://lnkd.in/drDUEDYh

🔥23👍3🤯2❤1🤩1

8.26K views08:56

AI with Papers - Artificial Intelligence & Deep Learning

0:07

This media is not supported in your browser

VIEW IN TELEGRAM

💄TinyBeauty: 460 FPS Make-up💄

👉TinyBeauty: only 80K parameters to achieve the SOTA in virtual makeup without intricate face prompts. Up to 460 FPS on mobile!

👉Review https://t.ly/LG5ok
👉Paper https://arxiv.org/pdf/2403.15033.pdf
👉Project https://tinybeauty.github.io/TinyBeauty/

👍7🤯4😍2⚡1🔥1💩1

7.76K viewsedited 08:06

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

☔ AiOS: All-in-One-Stage Humans ☔

👉All-in-one-stage framework for SOTA multiple expressive pose and shape recovery without additional human detection step.

👉Review https://t.ly/ekNd4
👉Paper https://arxiv.org/pdf/2403.17934.pdf
👉Project https://ttxskk.github.io/AiOS/
👉Code/Demo (announced)

❤6👍1👏1

7.64K views12:44

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🏀 MAVOS Object Segmentation 🏀

👉MAVOS is a transformer-based VOS w/ a novel, optimized and dynamic long-term modulated cross-attention memory. Code & Models announced (BSD 3-Clause)💙

👉Review https://t.ly/SKaRG
👉Paper https://lnkd.in/dQyifKa3
👉Project github.com/Amshaker/MAVOS

🔥10👍2❤1🥰1

7.77K viewsedited 07:56

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

💦 ObjectDrop: automagical objects removal 💦

👉#Google unveils ObjectDrop, the new SOTA in photorealistic object removal and insertion. Focus on shadows and reflections, impressive!

👉Review https://t.ly/ZJ6NN
👉Paper https://arxiv.org/pdf/2403.18818.pdf
👉Project https://objectdrop.github.io/

👍14🤯8❤4🔥3🍾2

8.2K views14:18

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🪼 Universal Mono Metric Depth 🪼

👉ETH unveils UniDepth: metric 3D scenes from solely single images across domains. A novel, universal and flexible MMDE solution. Source code released💙

👉Review https://t.ly/5C8eq
👉Paper arxiv.org/pdf/2403.18913.pdf
👉Code github.com/lpiccinelli-eth/unidepth

🔥10👍1🤣1

8.13K views16:43

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔘 RELI11D: Multimodal Humans 🔘

👉RELI11D is the ultimate and high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera. Dataset & Source Code to be released soon💙

👉Review https://t.ly/5EG6X
👉Paper https://lnkd.in/ep6Utcik
👉Project https://lnkd.in/eDhNHYBb

❤3🔥2

8.12K views09:23

About

Blog

Apps

Platform