AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿณ๏ธโ€๐ŸŒˆDeep Clustering on ImageNet & Co.๐Ÿณ๏ธโ€๐ŸŒˆ

๐Ÿ‘‰World's first deep nonparametric clustering on large dataset such as ImageNet

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Deep clustering that infers nr. of clusters
โœ…Loss: amortized inference in mixt-models
โœ…Deep nonparametric clustering on ImageNet
โœ…Code and model available under MIT license

More: https://bit.ly/38p62rn
๐Ÿ”ฅ9๐Ÿคฏ3๐Ÿ‘2๐Ÿคฉ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฅHQ-EยฒFGVI just released๐Ÿ’ฅ๐Ÿ’ฅ

๐Ÿ‘‰Flow-Guided Video Inpainting through three trainable modules

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Flow, pixel-prop, content hallucination
โœ…Three stage-modules, jointly optimized
โœ…The new SOTA, promising efficiency
โœ…Code and Models under MIT license

More: https://bit.ly/3Ln0ICj
๐Ÿคฏ10๐Ÿ‘1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช” AvatarCLIP: Text-Driven Avatar ๐Ÿช”

๐Ÿ‘‰Zero-shot text-driven for #3D avatar in #metaverse

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…First text-driven synthesis
โœ…Shape, texture, and motion
โœ…Animation-ready, HQ texture/geometry
โœ…Zero-shot text-guided ref-based motion
โœ…Code and model under MIT license

More: https://bit.ly/3LjTWgB
๐Ÿ”ฅ4๐Ÿ‘2๐Ÿคฏ2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ#AIwithPapers: we are 2,500!๐Ÿ”ฅ

๐Ÿ’™๐Ÿ’›Only 2 Billion papers remaining on arXiv. The more we are, the faster we read๐Ÿ’™๐Ÿ’›

๐Ÿ˜ˆ Invite your friends -> https://t.iss.one/AI_DeepLearning
๐Ÿ”ฅ9โค4๐Ÿ‘2๐Ÿค”2๐Ÿ‘1
๐Ÿ’ฅPodcasting AI & CV๐Ÿ’ฅ

๐Ÿ‘‰๐ŸผFor people fluent in Italian: 1 hour podcast in which I talk about AI, CV, Startup and more (included this wonderful project).

More: https://bit.ly/38DtBwB
๐Ÿ‘6โค3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅInpainting: new SOTA! INSANE๐Ÿ”ฅ

๐Ÿ‘‰Novel two-stream approach: inpainting at the next level!

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…High-freq locally, low-freq globally
โœ…Local to global -> error correction
โœ…44% / 26% improvements FID/scores
โœ…Source code, more clips available

More: https://bit.ly/3ltIX9R
๐Ÿ‘8๐Ÿคฏ3๐Ÿ”ฅ1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅSuper-Human Crossword Solver๐Ÿ”ฅ

๐Ÿ‘‰Solving crosswords outperforming best humans

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Crossword solving based on NNs
โœ…Q&A, structured decoding, local search
โœ…Wide domains with perfect accuracy
โœ…Large question-answer dataset

More: https://bit.ly/3a3zzqQ
๐Ÿ”ฅ4๐Ÿคฏ3๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฅธImagen: far beyond DALLยทE 2๐Ÿฅธ

๐Ÿ‘‰#Google: unprecedented photorealism and deep level of language understanding

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Dynamic thresh diffusion sampling
โœ…Efficient U-Net, efficient++ variant
โœ…DrawBench, new text-to-image
โœ…The new SOTA, COCO FID of 7.27

More: https://bit.ly/3lVtkbz
๐Ÿ”ฅ9๐Ÿคฏ6๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸชคTracking over SOTA detectors๐Ÿชค

๐Ÿ‘‰Lightweight Python lib for real-time 2D object tracking ๐Ÿ’ฅ

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Layer of tracking over SOTA detectors
โœ…Suitable for complex video processing
โœ…Source code under BSD 3-Clause
โœ…Maintained by Tryolabs team

More: https://bit.ly/3wKtGqg
๐Ÿ‘7๐Ÿ”ฅ3๐Ÿคฉ3
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅท๐Ÿฟ FCA: #3D Neural Camouflage ๐Ÿฅท๐Ÿฟ

๐Ÿ‘‰#3D full-camouflage adversarial patch to fool neural detectors

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Attack by diff-neural render
โœ…E2E physical adversarial attack
โœ…Envs, vehicles & detectors
โœ…Source code available!

More: https://bit.ly/38kKyfa
๐Ÿ‘5๐Ÿ”ฅ3๐Ÿคฏ2๐Ÿ‘1
Media is too big
VIEW IN TELEGRAM
๐Ÿ‹ One-Shot Object Pose ๐Ÿ‹

๐Ÿ‘‰A novel one-shot object pose estimator

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Visual localization pipeline for object pose
โœ…Handling novel objects without CAD model
โœ…Novel graph attention for 2D-3D matching
โœ…Large dataset for one-shot object pose

More: https://bit.ly/3MTogjJ
๐Ÿ”ฅ11โค4๐Ÿ‘2๐Ÿคฏ2
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜„๏ธSTEVE: Slot-TransformEr for VidEosโ˜„๏ธ

๐Ÿ‘‰STEVE: unsupervised model for object-centric learning in videos

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Adoption of a slot decoder (SLATE)
โœ…SLATE with slot-level recurrence model
โœ…Complex and naturalistic videos
โœ…Significantly outperforms previous SOTA

More: https://bit.ly/3PNxxM3
๐Ÿ”ฅ7๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ” CogVideo: insane text-to-clip ๐Ÿฆ”

๐Ÿ‘‰CogVideo: 9B-parameters world's first large scale open-source text-to-video ๐Ÿ˜ต

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Largest open-source T2C transformer
โœ…Finetuning of text-to-image model
โœ…Multi-frame-rate hierarchical training
โœ…From pretrained model CogView2

More: https://bit.ly/3Gzfl4n
๐Ÿ”ฅ9๐Ÿ‘6
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ„Time-Aware Neural Voxels๐Ÿฆ„

๐Ÿ‘‰TiNeuVox: "NeRF" with time-aware voxel features ๐Ÿ˜ต

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Dynamic scene w/ optimizable structure
โœ…Temporal information in radiance net
โœ…Small/large motion w/ single-res of feats
โœ…192ร— faster than previous Hyper-NeRF

More: https://bit.ly/3wR4O08
๐Ÿ‘11๐Ÿ”ฅ2๐Ÿคฏ1
๐ŸซNeural Anomaly Detection by AWS๐Ÿซ

๐Ÿ‘‰Ultra-competitive inference and SOTA for both detection and localization

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Locally aggregated, mid-level feats patch
โœ…Maximizing nominal information at test time
โœ…Reducing biases towards ImageNet classes
โœ…Image-level anomaly AUROC of up to 99.6%

More: https://bit.ly/3t7Ndjg
๐Ÿ”ฅ7๐Ÿคฏ3๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›น Project Skate from Google #AI ๐Ÿ›น

๐Ÿ‘‰#AI tool to analyze the skateboarder's tricks in real-time

More: https://bit.ly/3zbQS3M
๐Ÿ”ฅ15๐Ÿคฉ3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงฌNeural Text2Human Generation๐Ÿงฌ

๐Ÿ‘‰Text-driven neural human generation

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Full-body from a given human pose
โœ…Hierarchical texture-aware codebook
โœ…DeepFashion -> 44k Hi-Res images
โœ…Code and models available!

More: https://bit.ly/3Mdnpt0
๐Ÿ”ฅ15๐Ÿ‘1
๐ŸงจEfficientFormers: 1.6ms inference ๐Ÿงจ

๐Ÿ‘‰Transformers fast as MobileNet? Snap shows that on #iphone!

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Low latency on mobile, high performance!
โœ…Revisiting the design of ViT through latency
โœ…New dimension-consistent design paradigm
โœ…EfficientFormers: a new ViT for mobile!

More: https://bit.ly/3MdgW15
๐Ÿ”ฅ16๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿข Transformer-Based Sens-Fusion ๐Ÿข

๐Ÿ‘‰Updating TransFuser (CVPR21): image + LiDAR representations with self-attention

๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โœ…Existing approach can't handle traffic ๐Ÿ˜ข
โœ…Novel multi-modal fusion transformer
โœ…The new SOTA in driving performance
โœ…Reducing avg collisions per KM by 48%
โœ…Insights on current limitations of E2E

More: https://bit.ly/391dmd6
๐Ÿ‘11๐Ÿ”ฅ2