This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ๏ธโ๐Deep Clustering on ImageNet & Co.๐ณ๏ธโ๐
๐World's first deep nonparametric clustering on large dataset such as ImageNet
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Deep clustering that infers nr. of clusters
โ Loss: amortized inference in mixt-models
โ Deep nonparametric clustering on ImageNet
โ Code and model available under MIT license
More: https://bit.ly/38p62rn
๐World's first deep nonparametric clustering on large dataset such as ImageNet
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Deep clustering that infers nr. of clusters
โ Loss: amortized inference in mixt-models
โ Deep nonparametric clustering on ImageNet
โ Code and model available under MIT license
More: https://bit.ly/38p62rn
๐ฅ9๐คฏ3๐2๐คฉ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅHQ-EยฒFGVI just released๐ฅ๐ฅ
๐Flow-Guided Video Inpainting through three trainable modules
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Flow, pixel-prop, content hallucination
โ Three stage-modules, jointly optimized
โ The new SOTA, promising efficiency
โ Code and Models under MIT license
More: https://bit.ly/3Ln0ICj
๐Flow-Guided Video Inpainting through three trainable modules
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Flow, pixel-prop, content hallucination
โ Three stage-modules, jointly optimized
โ The new SOTA, promising efficiency
โ Code and Models under MIT license
More: https://bit.ly/3Ln0ICj
๐คฏ10๐1๐ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ช AvatarCLIP: Text-Driven Avatar ๐ช
๐Zero-shot text-driven for #3D avatar in #metaverse
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ First text-driven synthesis
โ Shape, texture, and motion
โ Animation-ready, HQ texture/geometry
โ Zero-shot text-guided ref-based motion
โ Code and model under MIT license
More: https://bit.ly/3LjTWgB
๐Zero-shot text-driven for #3D avatar in #metaverse
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ First text-driven synthesis
โ Shape, texture, and motion
โ Animation-ready, HQ texture/geometry
โ Zero-shot text-guided ref-based motion
โ Code and model under MIT license
More: https://bit.ly/3LjTWgB
๐ฅ4๐2๐คฏ2โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ#AIwithPapers: we are 2,500!๐ฅ
๐๐Only 2 Billion papers remaining on arXiv. The more we are, the faster we read๐๐
๐ Invite your friends -> https://t.iss.one/AI_DeepLearning
๐๐Only 2 Billion papers remaining on arXiv. The more we are, the faster we read๐๐
๐ Invite your friends -> https://t.iss.one/AI_DeepLearning
๐ฅ9โค4๐2๐ค2๐1
๐ฅPodcasting AI & CV๐ฅ
๐๐ผFor people fluent in Italian: 1 hour podcast in which I talk about AI, CV, Startup and more (included this wonderful project).
More: https://bit.ly/38DtBwB
๐๐ผFor people fluent in Italian: 1 hour podcast in which I talk about AI, CV, Startup and more (included this wonderful project).
More: https://bit.ly/38DtBwB
๐6โค3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅInpainting: new SOTA! INSANE๐ฅ
๐Novel two-stream approach: inpainting at the next level!
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ High-freq locally, low-freq globally
โ Local to global -> error correction
โ 44% / 26% improvements FID/scores
โ Source code, more clips available
More: https://bit.ly/3ltIX9R
๐Novel two-stream approach: inpainting at the next level!
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ High-freq locally, low-freq globally
โ Local to global -> error correction
โ 44% / 26% improvements FID/scores
โ Source code, more clips available
More: https://bit.ly/3ltIX9R
๐8๐คฏ3๐ฅ1๐ฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅSuper-Human Crossword Solver๐ฅ
๐Solving crosswords outperforming best humans
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Crossword solving based on NNs
โ Q&A, structured decoding, local search
โ Wide domains with perfect accuracy
โ Large question-answer dataset
More: https://bit.ly/3a3zzqQ
๐Solving crosswords outperforming best humans
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Crossword solving based on NNs
โ Q&A, structured decoding, local search
โ Wide domains with perfect accuracy
โ Large question-answer dataset
More: https://bit.ly/3a3zzqQ
๐ฅ4๐คฏ3๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅธImagen: far beyond DALLยทE 2๐ฅธ
๐#Google: unprecedented photorealism and deep level of language understanding
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Dynamic thresh diffusion sampling
โ Efficient U-Net, efficient++ variant
โ DrawBench, new text-to-image
โ The new SOTA, COCO FID of 7.27
More: https://bit.ly/3lVtkbz
๐#Google: unprecedented photorealism and deep level of language understanding
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Dynamic thresh diffusion sampling
โ Efficient U-Net, efficient++ variant
โ DrawBench, new text-to-image
โ The new SOTA, COCO FID of 7.27
More: https://bit.ly/3lVtkbz
๐ฅ9๐คฏ6๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชคTracking over SOTA detectors๐ชค
๐Lightweight Python lib for real-time 2D object tracking ๐ฅ
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Layer of tracking over SOTA detectors
โ Suitable for complex video processing
โ Source code under BSD 3-Clause
โ Maintained by Tryolabs team
More: https://bit.ly/3wKtGqg
๐Lightweight Python lib for real-time 2D object tracking ๐ฅ
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Layer of tracking over SOTA detectors
โ Suitable for complex video processing
โ Source code under BSD 3-Clause
โ Maintained by Tryolabs team
More: https://bit.ly/3wKtGqg
๐7๐ฅ3๐คฉ3
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅท๐ฟ FCA: #3D Neural Camouflage ๐ฅท๐ฟ
๐#3D full-camouflage adversarial patch to fool neural detectors
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Attack by diff-neural render
โ E2E physical adversarial attack
โ Envs, vehicles & detectors
โ Source code available!
More: https://bit.ly/38kKyfa
๐#3D full-camouflage adversarial patch to fool neural detectors
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Attack by diff-neural render
โ E2E physical adversarial attack
โ Envs, vehicles & detectors
โ Source code available!
More: https://bit.ly/38kKyfa
๐5๐ฅ3๐คฏ2๐1
Media is too big
VIEW IN TELEGRAM
๐ One-Shot Object Pose ๐
๐A novel one-shot object pose estimator
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Visual localization pipeline for object pose
โ Handling novel objects without CAD model
โ Novel graph attention for 2D-3D matching
โ Large dataset for one-shot object pose
More: https://bit.ly/3MTogjJ
๐A novel one-shot object pose estimator
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Visual localization pipeline for object pose
โ Handling novel objects without CAD model
โ Novel graph attention for 2D-3D matching
โ Large dataset for one-shot object pose
More: https://bit.ly/3MTogjJ
๐ฅ11โค4๐2๐คฏ2
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธSTEVE: Slot-TransformEr for VidEosโ๏ธ
๐STEVE: unsupervised model for object-centric learning in videos
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Adoption of a slot decoder (SLATE)
โ SLATE with slot-level recurrence model
โ Complex and naturalistic videos
โ Significantly outperforms previous SOTA
More: https://bit.ly/3PNxxM3
๐STEVE: unsupervised model for object-centric learning in videos
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Adoption of a slot decoder (SLATE)
โ SLATE with slot-level recurrence model
โ Complex and naturalistic videos
โ Significantly outperforms previous SOTA
More: https://bit.ly/3PNxxM3
๐ฅ7๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ CogVideo: insane text-to-clip ๐ฆ
๐CogVideo: 9B-parameters world's first large scale open-source text-to-video ๐ต
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Largest open-source T2C transformer
โ Finetuning of text-to-image model
โ Multi-frame-rate hierarchical training
โ From pretrained model CogView2
More: https://bit.ly/3Gzfl4n
๐CogVideo: 9B-parameters world's first large scale open-source text-to-video ๐ต
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Largest open-source T2C transformer
โ Finetuning of text-to-image model
โ Multi-frame-rate hierarchical training
โ From pretrained model CogView2
More: https://bit.ly/3Gzfl4n
๐ฅ9๐6
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆTime-Aware Neural Voxels๐ฆ
๐TiNeuVox: "NeRF" with time-aware voxel features ๐ต
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Dynamic scene w/ optimizable structure
โ Temporal information in radiance net
โ Small/large motion w/ single-res of feats
โ 192ร faster than previous Hyper-NeRF
More: https://bit.ly/3wR4O08
๐TiNeuVox: "NeRF" with time-aware voxel features ๐ต
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Dynamic scene w/ optimizable structure
โ Temporal information in radiance net
โ Small/large motion w/ single-res of feats
โ 192ร faster than previous Hyper-NeRF
More: https://bit.ly/3wR4O08
๐11๐ฅ2๐คฏ1
๐ซNeural Anomaly Detection by AWS๐ซ
๐Ultra-competitive inference and SOTA for both detection and localization
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Locally aggregated, mid-level feats patch
โ Maximizing nominal information at test time
โ Reducing biases towards ImageNet classes
โ Image-level anomaly AUROC of up to 99.6%
More: https://bit.ly/3t7Ndjg
๐Ultra-competitive inference and SOTA for both detection and localization
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Locally aggregated, mid-level feats patch
โ Maximizing nominal information at test time
โ Reducing biases towards ImageNet classes
โ Image-level anomaly AUROC of up to 99.6%
More: https://bit.ly/3t7Ndjg
๐ฅ7๐คฏ3๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐น Project Skate from Google #AI ๐น
๐#AI tool to analyze the skateboarder's tricks in real-time
More: https://bit.ly/3zbQS3M
๐#AI tool to analyze the skateboarder's tricks in real-time
More: https://bit.ly/3zbQS3M
๐ฅ15๐คฉ3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฌNeural Text2Human Generation๐งฌ
๐Text-driven neural human generation
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Full-body from a given human pose
โ Hierarchical texture-aware codebook
โ DeepFashion -> 44k Hi-Res images
โ Code and models available!
More: https://bit.ly/3Mdnpt0
๐Text-driven neural human generation
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Full-body from a given human pose
โ Hierarchical texture-aware codebook
โ DeepFashion -> 44k Hi-Res images
โ Code and models available!
More: https://bit.ly/3Mdnpt0
๐ฅ15๐1
๐งจEfficientFormers: 1.6ms inference ๐งจ
๐Transformers fast as MobileNet? Snap shows that on #iphone!
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Low latency on mobile, high performance!
โ Revisiting the design of ViT through latency
โ New dimension-consistent design paradigm
โ EfficientFormers: a new ViT for mobile!
More: https://bit.ly/3MdgW15
๐Transformers fast as MobileNet? Snap shows that on #iphone!
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Low latency on mobile, high performance!
โ Revisiting the design of ViT through latency
โ New dimension-consistent design paradigm
โ EfficientFormers: a new ViT for mobile!
More: https://bit.ly/3MdgW15
๐ฅ16๐1๐คฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ข Transformer-Based Sens-Fusion ๐ข
๐Updating TransFuser (CVPR21): image + LiDAR representations with self-attention
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Existing approach can't handle traffic ๐ข
โ Novel multi-modal fusion transformer
โ The new SOTA in driving performance
โ Reducing avg collisions per KM by 48%
โ Insights on current limitations of E2E
More: https://bit.ly/391dmd6
๐Updating TransFuser (CVPR21): image + LiDAR representations with self-attention
๐๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:
โ Existing approach can't handle traffic ๐ข
โ Novel multi-modal fusion transformer
โ The new SOTA in driving performance
โ Reducing avg collisions per KM by 48%
โ Insights on current limitations of E2E
More: https://bit.ly/391dmd6
๐11๐ฅ2