AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
136 photos
250 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชต HASSOD Object Detection ๐Ÿชต

๐Ÿ‘‰ HASSOD: fully self-supervised detection and instance segmentation. The new SOTA able to understand the part-to-whole object composition like humans do.

๐Ÿ‘‰Review https://t.ly/66qHF
๐Ÿ‘‰Paper arxiv.org/pdf/2402.03311.pdf
๐Ÿ‘‰Project hassod-neurips23.github.io/
๐Ÿ‘‰Repo github.com/Shengcao-Cao/HASSOD
๐Ÿ”ฅ13โค5๐Ÿ‘3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒต G-Splatting Portraits ๐ŸŒต

๐Ÿ‘‰From monocular/casual video captures, Rig3DGS rigs 3D Gaussian Splatting to enable the creation of re-animatable portrait videos with control over facial expressions, head-pose and viewing direction

๐Ÿ‘‰Review https://t.ly/fq71w
๐Ÿ‘‰Paper https://arxiv.org/pdf/2402.03723.pdf
๐Ÿ‘‰Project shahrukhathar.github.io/2024/02/05/Rig3DGS.html
๐Ÿ”ฅ13โค3๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ† Up to 69x Faster SAM ๐ŸŒ†

๐Ÿ‘‰EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAMโ€™s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

๐Ÿ‘‰Review https://t.ly/zGiE9
๐Ÿ‘‰Paper arxiv.org/pdf/2402.05008.pdf
๐Ÿ‘‰Code github.com/mit-han-lab/efficientvit
๐Ÿ”ฅ19๐Ÿ‘7โค4๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒด Direct-a-Video Generation ๐ŸŒด

๐Ÿ‘‰Direct-a-Video is a text-to-video generation framework that allows users to individually or jointly control the camera movement and/or object motion

๐Ÿ‘‰Review https://t.ly/dZSLs
๐Ÿ‘‰Paper arxiv.org/pdf/2402.03162.pdf
๐Ÿ‘‰Project https://direct-a-video.github.io/
๐Ÿ”ฅ7๐Ÿ‘3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‡ Graph Neural Network in TF ๐Ÿ‡

๐Ÿ‘‰#Google TensorFlow-GNN: novel library to build Graph Neural Networks on TensorFlow. Source Code released under Apache 2.0 license ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/TQfg-
๐Ÿ‘‰Code github.com/tensorflow/gnn
๐Ÿ‘‰Blog blog.research.google/2024/02/graph-neural-networks-in-tensorflow.html
โค17๐Ÿ‘4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ†” Magic-Me: ID-Specific Video ๐Ÿ†”

๐Ÿ‘‰#ByteDance VCD: with just a few images of a specific identity it can generate temporal consistent videos aligned with the given prompt

๐Ÿ‘‰Review https://t.ly/qjJ2O
๐Ÿ‘‰Paper arxiv.org/pdf/2402.09368.pdf
๐Ÿ‘‰Project magic-me-webpage.github.io
๐Ÿ‘‰Code github.com/Zhen-Dong/Magic-Me
โค6๐Ÿฅฐ1๐Ÿคฏ1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Breaking: GEMINI 1.5 is out ๐Ÿ”ฅ

๐Ÿ‘‰Gemini 1.5 just announced: standard 128,000 token context window, up to 1 MILLION tokens via AI-Studio and #Vertex AI in private preview ๐Ÿซ 

๐Ÿ‘‰Review https://t.ly/Vblvx
๐Ÿ‘‰More: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#build-experiment
๐Ÿคฏ17๐Ÿ‘4๐Ÿ˜ฑ2
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜€๏ธ One2Avatar: Pic -> 3D Avatar โ˜€๏ธ

๐Ÿ‘‰#Google presents a new approach to generate animatable photo-realistic avatars from only a few/one image. Impressive results.

๐Ÿ‘‰Review https://t.ly/AS1oc
๐Ÿ‘‰Paper arxiv.org/pdf/2402.11909.pdf
๐Ÿ‘‰Project zhixuany.github.io/one2avatar_webpage/
๐Ÿ‘12โค3๐Ÿคฉ3๐Ÿ”ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸชŸ BOG: Fine Geometric Views ๐ŸชŸ

๐Ÿ‘‰ #Google (+Tรผbingen) unveils Binary Opacity Grids, a novel method to reconstruct triangle meshes from multi-view images able to capture fine geometric detail such as leaves, branches & grass. New SOTA, real-time on Google Pixel 8 Pro (and similar).

๐Ÿ‘‰Review https://t.ly/E6T0W
๐Ÿ‘‰Paper https://lnkd.in/dQEq3zy6
๐Ÿ‘‰Project https://lnkd.in/dYYCadx9
๐Ÿ‘‰Demo https://lnkd.in/d92R6QME
๐Ÿ”ฅ8๐Ÿคฏ4๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆฅNeuromorphic Video Binarization๐Ÿฆฅ

๐Ÿ‘‰ University of HK unveils the new SOTA in event-based neuromorphic binary reconstruction: stunning results on QR Code, barcode, & Text. Real-Time, only CPU, up to 10,000 FPS!

๐Ÿ‘‰Review https://t.ly/V-NFa
๐Ÿ‘‰Paper arxiv.org/pdf/2402.12644.pdf
๐Ÿ‘‰Project github.com/eleboss/EBR
โค15๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฉป Pose via Ray Diffusion ๐Ÿฉป

๐Ÿ‘‰Novel distributed representation of camera pose that treats a camera as a bundle of rays. Naturally suited for set-level transformers, it's the new SOTA on camera pose estimation. Source code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/qBsFK
๐Ÿ‘‰Paper arxiv.org/pdf/2402.14817.pdf
๐Ÿ‘‰Project jasonyzhang.com/RayDiffusion
๐Ÿ‘‰Code github.com/jasonyzhang/RayDiffusion
๐Ÿ”ฅ17โค6๐Ÿคฏ3๐Ÿ‘1๐Ÿ‘1๐Ÿพ1
๐Ÿ—ƒ๏ธ MATH-Vision Dataset ๐Ÿ—ƒ๏ธ

๐Ÿ‘‰MATH-V is a curated dataset of 3,040 HQ mat problems with visual contexts sourced from real math competitions. Dataset released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/gmIAu
๐Ÿ‘‰Paper arxiv.org/pdf/2402.14804.pdf
๐Ÿ‘‰Project mathvision-cuhk.github.io/
๐Ÿ‘‰Code github.com/mathvision-cuhk/MathVision
๐Ÿคฏ8๐Ÿ”ฅ4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ…FlowMDM: Human Composition๐Ÿซ…

๐Ÿ‘‰FlowMDM, a diffusion-based approach capable of generating seamlessly continuous sequences of human motion from textual descriptions.

๐Ÿ‘‰Review https://t.ly/pr2g_
๐Ÿ‘‰Paper https://lnkd.in/daYRftdF
๐Ÿ‘‰Project https://lnkd.in/dcRkv5Pc
๐Ÿ‘‰Repo https://lnkd.in/dw-3JJks
โค9๐Ÿ”ฅ6๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽทEMO: talking/singing Gen-AI ๐ŸŽท

๐Ÿ‘‰EMO: audio-driven portrait-video generation. Vocal avatar videos with expressive facial expressions, and various head poses. Input: 1 single frame, video duration = length of input audio

๐Ÿ‘‰Review https://t.ly/4IYj5
๐Ÿ‘‰Paper https://lnkd.in/dGPX2-Yc
๐Ÿ‘‰Project https://lnkd.in/dyf6p_N3
๐Ÿ‘‰Repo (empty) github.com/HumanAIGC/EMO
โค18๐Ÿ”ฅ7๐Ÿ‘4๐Ÿคฏ3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’Œ Multi-LoRA Composition ๐Ÿ’Œ

๐Ÿ‘‰Two novel training-free image composition: LoRA Switch and LoRA Composite for integrating any number of elements in an image through multi-LoRA composition. Source Code released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GFy3Z
๐Ÿ‘‰Paper arxiv.org/pdf/2402.16843.pdf
๐Ÿ‘‰Code github.com/maszhongming/Multi-LoRA-Composition
๐Ÿ‘11โค6๐Ÿ”ฅ2๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฅ MM-AU: Video Accident ๐Ÿ’ฅ

๐Ÿ‘‰MM-AU - Multi-Modal Accident Understanding: 11,727 videos with temporally aligned descriptions. 2.23M+ BBs, 58,650 pairs of video-based accident reasons. Data & Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a-jKI
๐Ÿ‘‰Paper arxiv.org/pdf/2403.00436.pdf
๐Ÿ‘‰Dataset https://www.lotvsmmau.net/MMAU/demo
๐Ÿ‘11โค2๐Ÿ”ฅ2๐Ÿคฏ2
๐Ÿ”ฅ SOTA: Stable Diffusion 3 is out! ๐Ÿ”ฅ

๐Ÿ‘‰Stable Diffusion 3 is the new SOTA in text-to-image generation (based on human preference evaluations). New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image & language, improving text understanding/spelling capabilities. Weights & Source Code to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/a1koo
๐Ÿ‘‰Paper https://lnkd.in/d4i-9Bte
๐Ÿ‘‰Blog https://lnkd.in/d-bEX-ww
๐Ÿ”ฅ19โค5๐Ÿ‘3โšก1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตE-LoFTR: new Feats-Matching SOTA๐Ÿงต

๐Ÿ‘‰A novel LoFTR-inspired algorithm for efficiently producing semidense matches across images: up to 2.5ร— faster than LoFTR, superior to previous SOTA pipeline (SuperPoint + LightGlue). Code announced.

๐Ÿ‘‰Review https://t.ly/7SPmC
๐Ÿ‘‰Paper https://arxiv.org/pdf/2403.04765.pdf
๐Ÿ‘‰Project https://zju3dv.github.io/efficientloftr/
๐Ÿ‘‰Repo https://github.com/zju3dv/efficientloftr
๐Ÿ”ฅ13๐Ÿ‘4๐Ÿคฏ2โค1
๐ŸฆStableDrag: Point-based Editing๐Ÿฆ

๐Ÿ‘‰#Tencent unveils StableDrag, a novel point-based image editing framework via discriminative point tracking method + confidence-based latent enhancement strategy for motion supervision. Source Code announced but still no repo.

๐Ÿ‘‰Review https://t.ly/eUI05
๐Ÿ‘‰Paper https://lnkd.in/dz8-ymck
๐Ÿ‘‰Project stabledrag.github.io/
โค2๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ‘1