AI with Papers - Artificial Intelligence & Deep Learning
15.4K subscribers
139 photos
253 videos
14 files
1.33K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
☄️STEVE: Slot-TransformEr for VidEos☄️

👉STEVE: unsupervised model for object-centric learning in videos

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Adoption of a slot decoder (SLATE)
SLATE with slot-level recurrence model
Complex and naturalistic videos
Significantly outperforms previous SOTA

More: https://bit.ly/3PNxxM3
🔥7👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦔 CogVideo: insane text-to-clip 🦔

👉CogVideo: 9B-parameters world's first large scale open-source text-to-video 😵

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Largest open-source T2C transformer
Finetuning of text-to-image model
Multi-frame-rate hierarchical training
From pretrained model CogView2

More: https://bit.ly/3Gzfl4n
🔥9👍6
This media is not supported in your browser
VIEW IN TELEGRAM
🦄Time-Aware Neural Voxels🦄

👉TiNeuVox: "NeRF" with time-aware voxel features 😵

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Dynamic scene w/ optimizable structure
Temporal information in radiance net
Small/large motion w/ single-res of feats
192× faster than previous Hyper-NeRF

More: https://bit.ly/3wR4O08
👍11🔥2🤯1
🫐Neural Anomaly Detection by AWS🫐

👉Ultra-competitive inference and SOTA for both detection and localization

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Locally aggregated, mid-level feats patch
Maximizing nominal information at test time
Reducing biases towards ImageNet classes
Image-level anomaly AUROC of up to 99.6%

More: https://bit.ly/3t7Ndjg
🔥7🤯3👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🛹 Project Skate from Google #AI 🛹

👉#AI tool to analyze the skateboarder's tricks in real-time

More: https://bit.ly/3zbQS3M
🔥15🤩3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧬Neural Text2Human Generation🧬

👉Text-driven neural human generation

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Full-body from a given human pose
Hierarchical texture-aware codebook
DeepFashion -> 44k Hi-Res images
Code and models available!

More: https://bit.ly/3Mdnpt0
🔥15👍1
🧨EfficientFormers: 1.6ms inference 🧨

👉Transformers fast as MobileNet? Snap shows that on #iphone!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Low latency on mobile, high performance!
Revisiting the design of ViT through latency
New dimension-consistent design paradigm
EfficientFormers: a new ViT for mobile!

More: https://bit.ly/3MdgW15
🔥16👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🐢 Transformer-Based Sens-Fusion 🐢

👉Updating TransFuser (CVPR21): image + LiDAR representations with self-attention

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Existing approach can't handle traffic 😢
Novel multi-modal fusion transformer
The new SOTA in driving performance
Reducing avg collisions per KM by 48%
Insights on current limitations of E2E

More: https://bit.ly/391dmd6
👍11🔥2
🧘🏻‍♂️YogNet: neural yoga assistant🧘🏻‍♂️

👉Multi-person yoga neural expert for 20 asanas

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
CNNs & reg.LSTMs + 3D-CNNs
Multi-person asanas in real-time
YAR: dataset for yoga & posture
1206 videos, 2D RGB camera

More: https://bit.ly/3NncVbE
13👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔴 Geogram: geometric algos in C++ 🔴

👉Novel open-source programming library with (research) geometric algorithms in C++

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Geometry Processing from #INRIA
30+ papers from SIGGRAPH, etc.
Grants: GOODSHAPE & VORPALINE
Code (mostly C++) under BSD 3

More: https://bit.ly/3mhS4L7
🔥6👍31
🍏 Open Source Vision from #Apple 🍏

👉CVNets: open-source (not a joke) lib for neural vision.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
PyTorch-based neural lib. for vision
Train 2−4× longer w/ augmentations
Plug-and-play components for CV
Source code under a custom license

More: https://bit.ly/39d1dSj
👍9
This media is not supported in your browser
VIEW IN TELEGRAM
🏇🏻Neural Clips by #Nvidia: INSANE 🏇🏻

👉Neural generation with changes in camera viewpoint & content that arises over time 🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel hierarchical generator architecture
Temp. receptive field + temporal embed.
Multi-res. with super-resolution network
SOTA in long clip with motion & changes
Code, data & models in August 2022 🏖️

More: https://bit.ly/3zroWsC
🤯9👎21
This media is not supported in your browser
VIEW IN TELEGRAM
Zero to #Messi with #deeplearning

👉EA unveils a neural system to learn multiple soccer juggling skills 😍

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Learning difficult soccer juggling skills
Layer-wise mixture-of-experts architecture
Specialization arises naturally
Adaptive random walk training strategy

More: https://bit.ly/3mwRaL2
🔥7👍3
This media is not supported in your browser
VIEW IN TELEGRAM
🏖️ HumanNeRF: source code is out! 🏖️

👉Pausing the video at any frame and rendering the subject from arbitrary views!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Synthesizing photorealistic humans
Synthesizing details, ie. cloth & face
Volumetric canonical T-pose
Skeletal rigid/non-rigid decomposition

More: https://bit.ly/3NEkTNY
🤯17🔥5👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🎒 EG3D: source code is out! 🎒

👉#Nvidia just opened EG3D: real time multi-view faces w/ HQ #3D geometry!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Tri-plane-based 3D GAN framework
Pose-correlated attribute (expression)
SOTA in uncond. 3D-aware synthesis
Source code & models NOW available!

More: https://bit.ly/3aOfHs0
🔥7🤯6👍42
🔥One Millisecond Backbone. Fire!🔥

👉MobileOne by #Apple: efficient mobile backbone with inference <1 ms on #iPhone12!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
75.9% top-1 accuracy on ImageNet
38× faster than MobileFormer net
Classification, detection & segmentation
Source code & model soon available!

More: https://bit.ly/3tsT7f2
24👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🧨 Scaling Transformers to GigaPixels!🧨

👉Novel ViT called Hierarchical Image Pyramid Transformer (HIPT) -> Scaling to GigaPixels!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Gigapixel whole-slide imaging (WSI)
Leveraging natural hier. structure of WSI
Self-supervised Hi-Res representations
Source code and models available!

More: https://bit.ly/3xLuzkg
🤯16👍1
This media is not supported in your browser
VIEW IN TELEGRAM
👗BodyMap: Hyper-Detailed Humans👗

👉#META unveils 1st-ever dense continuous correspondence for clothed humans

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
1st-ever dense continuous corresp.
HQ fingers, hair, and clothes
Novel ViT-based architecture
SOTA on DensePose COCO

More: https://bit.ly/39nEPps
👍132
🐹 NOAH just open-sourced! 🐹

👉A novel approach to find the optimal design of prompt modules through NAS algos.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
NOAH from Neural prOmpt seArcH
Parameter-efficient “prompt modules”
Efficient NAS-based implementation
Better than transfer, few-shot & domain gen.

More: https://bit.ly/3MKfVhi
👍5👏2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🏄🏻‍♀️Neural Super-Resolution in Movies🏄🏻‍♀️

👉Implicit neural representation to get arbitrary spatial resolution & FPS -> Super Resolution!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Video as continuous video representation
Clips in arbitrary space/time resolution
OOD generalization in space-time
Source code and models available

More: https://bit.ly/3xsqccf
🔥6👍2