AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽญNovel pre-training strategy for #AI๐ŸŽญ

๐Ÿ‘‰EPFL unveils the Multi-modal Multi-task Masked Autoencoders (MultiMAE)

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Multimodal: additional modal. over RGB
โœ…Multi-task: multiple outputs over RGB
โœ…General: MultiMAE by pseudo-labeling
โœ…Classification, segmentation, depth
โœ…Code under NonCommercial 4.0 Int.

More: https://bit.ly/3jRhNsN
๐Ÿ”ฅ7๐Ÿคฏ2๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงช A new SOTA in Dataset Distillation ๐Ÿงช

๐Ÿ‘‰A new approach by Matching Training Trajectories is out!

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Distilling data "to match" bigger one
โœ…Distilled data to guide a network
โœ…Trajectories of experts from real data
โœ…SOTA + distilling higher-res visual data

More: https://bit.ly/3JwYOxW
๐Ÿ‘5๐Ÿ”ฅ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงค Two-Hand tracking via GCN ๐Ÿงค

๐Ÿ‘‰The first-ever GCN for two interacting hands in single RGB image

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Reconstruction by GCN mesh regression
โœ…PIFA: pyramid attention for local occlusion
โœ…CHA: cross hand attention for interaction
โœ…SOTA + generalization in-the-wild scenario
โœ…Source code available under GNU ๐Ÿคฏ

More: https://bit.ly/3KH5FWO
๐Ÿ‘10๐Ÿ‘4๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ•น๏ธVideo K-Net, SOTA in Segmentation๐Ÿ•น๏ธ

๐Ÿ‘‰Simple, strong, and unified framework for fully end-to-end video panoptic segmentation

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Learnable kernels from K-Net
โœ…K-Net learns to segment & track
โœ…Appearance / cross-T kernel interaction
โœ…New SOTA without bells and whistles ๐Ÿคทโ€โ™‚๏ธ

More: https://bit.ly/3uEEZQR
๐Ÿ‘6๐Ÿ”ฅ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸญDeepLabCut: tracking animals in the wild๐Ÿญ

๐Ÿ‘‰A toolbox for markerless pose estimation of animals performing various tasks

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Multi-animal pose estimation
โœ…Datasets for multi-animal pose
โœ…Key-points, limbs, animal identity
โœ…Optimal key-points without input

More: https://bit.ly/37L1mLE
๐Ÿ”ฅ6๐Ÿค”4๐Ÿ‘2๐Ÿคฏ2โค1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸกNeural Articulated Human Body๐Ÿก

๐Ÿ‘‰Novel neural implicit representation for articulated body

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…COmpositional Articulated People
โœ…Large variety of shapes & poses
โœ…Novel encoder-decoder architecture

More: https://bit.ly/3xvn7dl
๐Ÿ‘4๐Ÿฅฐ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆš 2K Resolution Generative #AI ๐Ÿฆš

๐Ÿ‘‰Novel continuous-scale training with variable output resolutions

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Mixed-resolution data
โœ…Arbitrary scales during training
โœ…Generations beyond 1024ร—1024
โœ…Variant of FID metric for scales
โœ…Source code under MIT license

More: https://bit.ly/3uNfVY6
๐Ÿคฏ11๐Ÿ‘2๐Ÿ”ฅ2๐Ÿ˜ฑ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸDS Unsupervised Video Decomposition๐Ÿ

๐Ÿ‘‰Novel method to extract persistent elements of a scene

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Scene element as Deformable Sprite (DS)
โœ…Deformable Sprites by video auto-encoder
โœ…Canonical texture image for appearance
โœ…Non-rigid geom. transformation

More: https://bit.ly/37WV9w1
๐Ÿ‘4๐Ÿคฏ3๐Ÿ”ฅ1๐Ÿฅฐ1๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅ“ L-SVPE for Deep Deblurring ๐Ÿฅ“

๐Ÿ‘‰L-SVPE to deblur scenes while recovering high-freq details

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Learned Spatially Varying Pixel Exposures
โœ…Next-gen focal-plane sensor + DL
โœ…Deep conv decoder for motion deblurring
โœ…Superior results over non-optimized exp.

More: https://bit.ly/3uRYQMT
๐Ÿคฉ7๐Ÿ‘2๐Ÿค”2๐ŸŽ‰1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงงHyper-Fast Instance Segmentation๐Ÿงง

๐Ÿ‘‰Novel Temporally Efficient Vision Transformer (TeViT) for VIS

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Video instance segmentation transformer
โœ…Contextual-info at frame/instance level
โœ…Nearly convolution-free framework ๐Ÿคทโ€โ™‚๏ธ
โœ…The new SOTA for VIS, ~70 FPS!
โœ…Code & models under MIT license

More: https://bit.ly/3rCMXIn
๐Ÿ”ฅ10๐Ÿ‘3๐Ÿ‘1๐Ÿคฏ1
๐Ÿ“—Unified Scene Text/Layout Detection๐Ÿ“—

๐Ÿ‘‰World's first hierarchical scene text dataset + novel detection method

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Unified detection & geometric layout
โœ…Hierarchical annotations in natural scenes
โœ…Word, line, & paragraph level annotations
โœ…Source under CC Attribution Share Alike 4.0

More: https://bit.ly/3jRpezV
๐Ÿ”ฅ3๐Ÿคฏ2โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ™Œ #Oculus' new Hand Tracking ๐Ÿ™Œ

๐Ÿ‘‰Hands are able to move as naturally and intuitively in the #metaverse as do in real life

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Hands2.0 powered by CV & ML
โœ…Tracking hand-over-hand interactions
โœ…Crossing hands, clapping, high-fives
โœ…Accurate thumbs-up gesture

More: https://bit.ly/3JXPvY2
๐Ÿคฏ6โค4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ—๏ธNew SOTA in #3D human avatar๐ŸŽ—๏ธ

๐Ÿ‘‰PHORHUM: photorealistic 3D human from mono-RGB

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Pixel-aligned method for 3D geometry
โœ…Unshaded surface color + illumination
โœ…Patch-based rendering losses for visible
โœ…Plausible color estimation for non-visible

More: https://bit.ly/3MkvBrA
๐Ÿคฏ4๐Ÿ‘2๐Ÿฅฐ2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“Ÿ What's in your hands (#3D) ? ๐Ÿ“Ÿ

๐Ÿ‘‰Reconstructing hand-held objects (from single RGB) without knowing their 3D templates๐Ÿคทโ€โ™‚๏ธ

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Hand is highly predictive of object shape
โœ…Conditional-based on the articulation
โœ…Visual feats. / articulation-aware coords.
โœ…Code and models available!

More: https://bit.ly/3vuYn2a
๐Ÿ‘9๐Ÿคฏ2๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”‹YODO: You Only Demonstrate Once๐Ÿ”‹

๐Ÿ‘‰A novel category-level manipulation learned in sim from single demonstration video๐Ÿคฏ

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…One-shot IL, model-free 6D pose tracking
โœ…Demonstration BY single 3rd-person-view
โœ…manipulation including hi-precision tasks
โœ…Category-level Behavior Cloning
โœ…Attention for dynamic coords selection
โœ…Generalizability to novel unseen obj/env

More: https://bit.ly/3v0V4R4
๐Ÿคฏ8โค3๐Ÿ‘2๐Ÿ˜ฑ2๐Ÿคฉ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘— Dress Code for Virtual Try-On ๐Ÿ‘—

๐Ÿ‘‰UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Hi-Res paired front-view / full-body
โœ…Pixel-level Semantic-Aware Discriminator
โœ…9 SOTA VTON approaches / 3 baselines
โœ…New SOTA considering res. & garments

More: https://bit.ly/3xKXSUw
โค3๐Ÿ‘3๐Ÿ”ฅ1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸƒDeep Equilibrium for Optical Flow๐Ÿƒ

๐Ÿ‘‰DEQ: converge faster, less memory, often more accurate

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Novel formulation of optical flow method
โœ…Compatible with prior modeling/data-related
โœ…Sparse fixed-point correction for stability
โœ…Code/models under GNU Affero GPL v3.0

More: https://bit.ly/3v4fZmi
๐Ÿ‘3๐Ÿฅฐ2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒณUltra High-Resolution Neural Saliency๐ŸŒณ

๐Ÿ‘‰A novel ultra high-resolution saliency detector with dataset!

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Ultra Hi-Res Saliency Detection
โœ…5,920 pics at 4K-8K resolution
โœ…Pyramid Grafting Network
โœ…Cross-Model Grafting Module
โœ…AGL: Attention Guided Loss
โœ…Code/models under MIT

More: https://bit.ly/3MnU1Rf
โค6๐Ÿ‘3๐Ÿคฏ3๐Ÿ”ฅ2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช†StyleGAN-Human for fashion ๐Ÿช†

๐Ÿ‘‰A novel unconditional human generation based on StyleGAN is out!

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…200,000+ labeled sample (pose/texture)
โœ…1024x512 StyleGAN-Human StyleGAN3
โœ…512x256 StyleGAN-Human StyleGAN1
โœ…Face model for downstream: InsetGAN
โœ…Source code and model available!

More: https://bit.ly/3xMg5B2
โค5๐Ÿ‘4๐Ÿ”ฅ3๐Ÿคฏ1๐Ÿ’ฉ1