AI with Papers - Artificial Intelligence & Deep Learning
15.4K subscribers
139 photos
253 videos
14 files
1.33K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
♟️Neural RGB-D Reconstruction♟️

👉Novel approach for #3D mixing implicit surface representations with NeRFs

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
RGB-D based reconstruction
Leveraging color & depth
Depth into the NeRF
Pose & camera refinement

More: https://bit.ly/3iN6e54
🔥5👍2🤯2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦓 Hyper-Fast Refinement 🦓

👉SharpContour: novel contour-based refinement for semantic segmentation

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Instance-aware Point Classifier
Deforming by discrete updating
Estimating offsets independently
Source code soon available!

More: https://bit.ly/3qL04GY
👍5🔥4🤯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🥗 Neural Mesh via Text only 🥗

👉Zero-shot generation of 3D model using only a target text prompt

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
ZS 3D model with text only
ZS text-guided generation
Meshes with texture/normal
Differentiable LLS implementation

More: https://bit.ly/3u0qnvb
🤯8👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🪆#3D, Materials, and Lighting from 2D🪆

👉Nvidia: topology, materials & map lighting jointly from 2D. INSANE 😮

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Topology, materials and lighting
Meshes with materials/lighting
Compact volumetric texturing
Differentiable all-frequency lighting
Code under #NVIDIA License

More: https://bit.ly/3IUoF2t
👏5👍1🤯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🍜Ref-NeRF for extreme realism🍜

👉Ref-NeRF: reflected radiance & structures via collection of spatially-varying scene properties

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Realism and accuracy
Replacing NeRF’s params
Regularization of volume density
Integrated Directional Encoding

More: https://bit.ly/3tTlS5l
👍4🤯2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧OFA for all: Cross, Vision, Language🦧

👉Unified multimodal model for image generation, visual grounding, etc.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Sequence-to-sequence learning
Image Captioning / Generation
Visual Grounding / Classification
Text-to-Image Generation
Visual Question Answering

More: https://bit.ly/3wSTGlc
👍7🤯6👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍿Old Films Back to Life with #AI🍿

👉Recurrent transformer network (RTN) to restore heavily degraded old films

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Transformer blocks for spatial
Knowledge from adjacent frames
Color from keyframes to whole clip
Source code available in days!

More: https://bit.ly/3wZbV8y
12👍2🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍊Neural Head #Avatars from RGB🍊

👉Novel neural representation for animatable head avatar

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel articulated human head
Full-geometry reconstruction
Differentiable optimization pipeline
Disentanglement of shape/color

More: https://bit.ly/3DxUGMI
🔥3🤯2😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🌶️ MyStyle: personal generative #AI 🌶️

👉Personalized deep generation with a few shots of a person

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Small set of portraits(∼100)
Local, low-dim, personal manifold
Personal #AI for ill-posed tasks
SOTA vs. previous few-shots

More: https://bit.ly/3wWMwMu
🔥5👍4🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦆 GAN + Dense Map 🦆

👉CoordGAN: structure-texture disentangled GAN with dense correspondence map

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel coordinate space
Warping to learn coordinate
Encoder for structure representation
HQ structure/texture editable images

More: https://bit.ly/3DOlOaB
🤯42🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
Unified shape & non-rigid motion

👉CaDeX: SOTA in both shape & non-rigid motion

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Canonical Deformation Coordinate Space
Shape + non rigid motion representation
Factorization of def-homeomorphisms
Cycle consistency, topology & volume
SOTA in modelling deformable objects

More: https://bit.ly/3NM5NX1
4🤯1😱1
📸 ~6 BILLION CLIP-filtered pairs 📸

👉A dataset 14x bigger than the previously biggest openly accessible image-text dataset in the world.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
2,3B English image-text pairs
2,2B from 100+ other languages
1,3B language not detected
KNN index for quick search

More: https://bit.ly/3LFhKvT
3🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🥮 PP-YOLOE: e-version of YOLO 🥮

👉 SOTA object detector up to 149+ FPS!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Optimized PP-YOLOv2
S/M/L/XL for different scenarios
149+ FPS, with TensorRT & FP16
Source code & models available

More: https://bit.ly/3x454uy
🔥5👍3👏1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧙 HD synthesis with LDM 🧙

👉Low-cost DM via latent space of powerful pretrained autoencoders

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Hi-res synthesis of megapixel
Synthesis, inpainting, stochastic SR
Large, consistent images of ∼1024px
General conditioning via cross-attention
Code licensed under MIT License

More: https://bit.ly/3LIVOzS
🔥6👍3🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎩 SinNeRF: Single Image NeRF 🎩

👉NEural Radiance Field via single view only

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
UATX + UIUC + UOregon + Picsart AI
"Looking only once” approach
semi-supervised learning process
Geometry/semantic pseudo-labels
SOTA in novel-view synthesis

More: https://bit.ly/3ujMZqF
👍7🔥2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Transformer-based Tracking 🔥

👉Tracker via Transformer-based model prediction module

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Tracking by Transformer prediction
Extending model predictor for BBs
SOTA on three public benchmark
Code/models under GNU License 3.0

More: https://bit.ly/3ucYvUI
🔥9🤯2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
👗 In-The-Wild Virtual Try-On 👗

👉StyleGAN-based architecture for appearance flow estimation in VTON application

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Global appearance flow estimation
Ok with mis-alignments person/garment
"In-the-wild": person with natural poses
Code under CC BY-NC-SA 4.0 license

More: https://bit.ly/3LPR9wl
👏63🔥1🤔1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎇DALL·E 2 just announced!🎇

👉DALL·E 2 to create realistic images and art from natural language

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
More realistic/accurate, 4x res.
Better caption matching
Not available yet, waiting list!

More: https://bit.ly/3j9v3bR
🔥12🤯5👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
👋Forecasting interactions via attention👋

👉Predicting the hand motion trajectory and the future contact points on the next active object

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Object-Centric Transformer (OCT)
Self-attention Transformer mechanism
Framework to handle uncertainty
SOTA on Epic-Kitchens and EGTEA

More: https://bit.ly/3v3PpbI
👍4🔥2👏1🤔1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍇SmeLU: Smooth Activation Function🍇

👉Google unveils a new smooth activation function: easy to implement, cheap & less error-prone

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Smooth to mitigate irreproducibility
Cheap function, better than GELU/Swish
0-1 slope through quadratic middle region
SmeLU as convolution of ReLU with box
Best reproducibility-accuracy tradeoff

More: https://bit.ly/3xcskXm
😱8👍41🔥1😁1🤯1
📍Hyper-Dense Landmarks at 150FPS📍

👉#Microsoft unveils the SOTA in dense landmarking + #3D reconstruction. MAGIC.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Accurate 10× as many landmarks as usual
Synthetic data, perfect annotations
NO appearance, light, diff-rendering
#3D @150+FPS with a single CPU thread
SOTA in monocular 3D reconstruction

More: https://bit.ly/37pQS40
👍6🔥4🤯1