AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ†” Magic-Me: ID-Specific Video πŸ†”

πŸ‘‰#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt

πŸ‘‰Review https://t.ly/qjJ2O
πŸ‘‰Paper arxiv.org/pdf/2402.09368.pdf
πŸ‘‰Project magic-me-webpage.github.io
πŸ‘‰Code github.com/Zhen-Dong/Magic-Me
❀6πŸ₯°1🀯1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Breaking: GEMINI 1.5 is out πŸ”₯

πŸ‘‰Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview 🫠

πŸ‘‰Review https://t.ly/Vblvx
πŸ‘‰More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
🀯17πŸ‘4😱2
This media is not supported in your browser
VIEW IN TELEGRAM
β˜€οΈ One2Avatar: Pic -> 3D Avatar β˜€οΈ

πŸ‘‰#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.

πŸ‘‰Review https://t.ly/AS1oc
πŸ‘‰Paper arxiv.org/pdf/2402.11909.pdf
πŸ‘‰Project zhixuany.github.io/one2avatar_webpage/
πŸ‘12❀3🀩3πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺŸ BOG: Fine Geometric Views πŸͺŸ

πŸ‘‰ #Google (+TΓΌbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).

πŸ‘‰Review https://t.ly/E6T0W
πŸ‘‰Paper https://lnkd.in/dQEq3zy6
πŸ‘‰Project https://lnkd.in/dYYCadx9
πŸ‘‰Demo https://lnkd.in/d92R6QME
πŸ”₯8🀯4πŸ‘3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦₯Neuromorphic Video BinarizationπŸ¦₯

πŸ‘‰ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!

πŸ‘‰Review https://t.ly/V-NFa
πŸ‘‰Paper arxiv.org/pdf/2402.12644.pdf
πŸ‘‰Project github.com/eleboss/EBR
❀15πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩻 Pose via Ray Diffusion 🩻

πŸ‘‰Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released πŸ’™

πŸ‘‰Review https://t.ly/qBsFK
πŸ‘‰Paper arxiv.org/pdf/2402.14817.pdf
πŸ‘‰Project jasonyzhang.com/RayDiffusion
πŸ‘‰Code github.com/jasonyzhang/RayDiffusion
πŸ”₯17❀6🀯3πŸ‘1πŸ‘1🍾1
πŸ—ƒοΈ MATH-Vision Dataset πŸ—ƒοΈ

πŸ‘‰MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released πŸ’™

πŸ‘‰Review https://t.ly/gmIAu
πŸ‘‰Paper arxiv.org/pdf/2402.14804.pdf
πŸ‘‰Project mathvision-cuhk.github.io/
πŸ‘‰Code github.com/mathvision-cuhk/MathVision
🀯8πŸ”₯4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«…FlowMDM: Human CompositionπŸ«…

πŸ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

πŸ‘‰Review https://t.ly/pr2g_
πŸ‘‰Paper https://lnkd.in/daYRftdF
πŸ‘‰Project https://lnkd.in/dcRkv5Pc
πŸ‘‰Repo https://lnkd.in/dw-3JJks
❀9πŸ”₯6πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷EMO: talking/singing Gen-AI 🎷

πŸ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

πŸ‘‰Review https://t.ly/4IYj5
πŸ‘‰Paper https://lnkd.in/dGPX2-Yc
πŸ‘‰Project https://lnkd.in/dyf6p_N3
πŸ‘‰Repo (empty) github.com/HumanAIGC/EMO
❀18πŸ”₯7πŸ‘4🀯3πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’Œ Multi-LoRA Composition πŸ’Œ

πŸ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/GFy3Z
πŸ‘‰Paper arxiv.org/pdf/2402.16843.pdf
πŸ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
πŸ‘11❀6πŸ”₯2πŸ₯°1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯ MM-AU: Video Accident πŸ’₯

πŸ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced πŸ’™

πŸ‘‰Review https://t.ly/a-jKI
πŸ‘‰Paper arxiv.org/pdf/2403.00436.pdf
πŸ‘‰Dataset https://www.lotvsmmau.net/MMAU/demo
πŸ‘11❀2πŸ”₯2🀯2
πŸ”₯ SOTA: Stable Diffusion 3 is out! πŸ”₯

πŸ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released πŸ’™

πŸ‘‰Review https://t.ly/a1koo
πŸ‘‰Paper https://lnkd.in/d4i-9Bte
πŸ‘‰Blog https://lnkd.in/d-bEX-ww
πŸ”₯19❀5πŸ‘3⚑1πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🧡E-LoFTR: new Feats-Matching SOTA🧡

πŸ‘‰A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5Γ— faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

πŸ‘‰Review https://t.ly/7SPmC
πŸ‘‰Paper https://arxiv.org/pdf/2403.04765.pdf
πŸ‘‰Project https://zju3dv.github.io/efficientloftr/
πŸ‘‰Repo https://github.com/zju3dv/efficientloftr
πŸ”₯13πŸ‘4🀯2❀1
🦁StableDrag: Point-based Editing🦁

πŸ‘‰#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

πŸ‘‰Review https://t.ly/eUI05
πŸ‘‰Paper https://lnkd.in/dz8-ymck
πŸ‘‰Project stabledrag.github.io/
❀2πŸ‘1πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›οΈ PIXART-Ξ£: 4K Generation πŸ›οΈ

πŸ‘‰PixArt-Ξ£ is a novel Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. Authors: #Huawei, Dalian, HKU & HKUST. Demos available, code announced πŸ’™

πŸ‘‰Review https://t.ly/Cm2Qh
πŸ‘‰Paper arxiv.org/pdf/2403.04692.pdf
πŸ‘‰Project pixart-alpha.github.io/PixArt-sigma-project/
πŸ‘‰Repo (empty) github.com/PixArt-alpha/PixArt-sigma
πŸ€—-Demo https://huggingface.co/spaces/PixArt-alpha/PixArt-alpha
πŸ”₯7⚑1❀1πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘Ί Can GPT-4 play DOOM? πŸ‘Ί

πŸ‘‰Apparently yes, GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. Code (with licensing restrictions) released

πŸ‘‰Review https://t.ly/W8-0F
πŸ‘‰Paper https://lnkd.in/dmsB7bjA
πŸ‘‰Project https://lnkd.in/ddDPwjQB
🀯8πŸ’©7πŸ”₯2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ–RT Humanoid from Head-Mounted SensorsπŸͺ–

πŸ‘‰#META (+CMU) announced SimXR, a method for controlling a simulated avatar from info obtained from AR/VR headsets

πŸ‘‰Review https://t.ly/Si2Mp
πŸ‘‰Paper arxiv.org/pdf/2403.06862.pdf
πŸ‘‰Project www.zhengyiluo.com/SimXR/
❀12⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🏷️ Face Foundation Model 🏷️

πŸ‘‰Arc2Face, the first foundation model for human faces. Source Code released πŸ’™

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://lnkd.in/dViE_tCd
πŸ‘‰Project https://lnkd.in/d4MHdEZK
πŸ‘‰Code https://lnkd.in/dv9ZtDfA
❀12πŸ‘3πŸ‘1🀩1
πŸͺΌFaceXFormer: Unified Face-TransformerπŸͺΌ

πŸ‘‰FaceXFormer, the first unified transformer for facial analysis: face parsing, landmark detection, head pose, attributes recognition, age, gender, race, and landmarks.

πŸ‘‰Review https://t.ly/MfAFI
πŸ‘‰Paper https://arxiv.org/pdf/2403.12960.pdf
πŸ‘‰Project kartik-3004.github.io/facexformer_web/
πŸ‘‰Code github.com/Kartik-3004/facexformer
πŸ‘11❀4πŸ₯°2πŸ”₯1