AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”„๏ธ Orient Anything in 3D ๐Ÿ”„๏ธ
๏ธ
๐Ÿ‘‰Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ro5ep
๐Ÿ‘‰Paper arxiv.org/pdf/2412.18605
๐Ÿ‘‰Project orient-anything.github.io/
๐Ÿ‘‰Code https://lnkd.in/d_3k6Nxz
๐Ÿ‘9โค7๐Ÿ”ฅ3โšก1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โญTOP 10 Papers you loved - 2024โญ

๐Ÿ‘‰Here the list of my posts you liked the most in 2024, thank you all ๐Ÿ’™

๐๐š๐ฉ๐ž๐ซ๐ฌ:
โญ"Look Ma, no markers"
โญT-Rex 2 Detector
โญModels at Any Resolution

๐Ÿ‘‰The full list with links: https://t.ly/GvQVy
โค12๐Ÿ”ฅ4๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒณ HD Video Object Insertion ๐ŸŒณ

๐Ÿ‘‰VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.

๐Ÿ‘‰Review https://t.ly/hyvRq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.01427
๐Ÿ‘‰Project videoanydoor.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค2๐Ÿ’ฉ2๐Ÿ‘1๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โญ Poll Alert!! โญ

[EDIT] see below
โค3๐Ÿ‘2๐Ÿ”ฅ1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
๐Ÿ‘11๐Ÿ”ฅ2โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅฎ SOTA probabilistic tracking๐Ÿฅฎ

๐Ÿ‘‰ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/YY_PH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.03220
๐Ÿ‘‰Project michaelszj.github.io/protracker/
๐Ÿ‘‰Code github.com/Michaelszj/pro-tracker
โค6๐Ÿ”ฅ5๐Ÿ‘2๐Ÿคฉ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคWorld-Space Ego 3D Hands๐Ÿงค

๐Ÿ‘‰The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ozJn7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02973
๐Ÿ‘‰Project hawor-project.github.io/
๐Ÿ‘‰Code github.com/ThunderVVV/HaWoR
๐Ÿ”ฅ4๐Ÿ˜ข1๐Ÿคฉ1
๐Ÿ”ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐Ÿ”ฅ

โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
๐ŸคฒPortabile Training Workstation
34%
โš›๏ธNuclear energy for AI training
33%
๐Ÿ–ฒ๏ธCheaper Only-inference devices
9%
๐Ÿ’ฐCloud-intensive Only-inference
๐Ÿ‘4โค1๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ FIFA 3D Human Pose โšฝ

๐Ÿ‘‰#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/kvGVQ
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02771
๐Ÿ‘‰Project https://lnkd.in/d5hFWpY2
๐Ÿ‘‰Dataset https://lnkd.in/dAphJ9WA
๐Ÿคฉ7โค6๐Ÿคฏ3๐Ÿ‘1๐Ÿ’ฉ1๐Ÿ˜1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Depth Any Camera (SOTA) ๐Ÿ”ฅ

๐Ÿ‘‰DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360โ—ฆ). Code announced (not available yet)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/1qz4F
๐Ÿ‘‰Paper arxiv.org/pdf/2501.02464
๐Ÿ‘‰Project yuliangguo.github.io/depth-any-camera/
๐Ÿ‘‰Repo github.com/yuliangguo/depth_any_camera
๐Ÿ‘12๐Ÿ”ฅ5๐Ÿคฉ4โค2๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โค๏ธโ€๐Ÿ”ฅ Uncommon object in #3D โค๏ธโ€๐Ÿ”ฅ

๐Ÿ‘‰#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360โ—ฆ coverage. Code & data under CCA 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Z_tvA
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.07574
๐Ÿ‘‰Project https://uco3d.github.io/
๐Ÿ‘‰Repo github.com/facebookresearch/uco3d
โค11โšก2๐Ÿ˜2๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ†Universal Detector-Free Match๐Ÿ†

๐Ÿ‘‰MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/sx92L
๐Ÿ‘‰Paper https://lnkd.in/dWwRwGyY
๐Ÿ‘‰Project https://lnkd.in/dCwb2Yte
๐Ÿ‘‰Repo https://lnkd.in/dnUXYzQ5
โค8๐Ÿคฏ7๐Ÿ”ฅ4๐Ÿ‘3โšก1๐Ÿคฉ1๐Ÿ˜1๐Ÿพ1
๐Ÿ†˜ Help: Looking for Outstanding Speakers ๐Ÿ†˜

๐Ÿ‘‰Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only โ€œhardcoreโ€ technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

โญGuaranteed tickets & more for the suggestions that will become invited speakers ;)
โค5๐Ÿ”ฅ4๐Ÿ‘3
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™‚๏ธOmni-RGPT: SOTA MLLM Understanding๐Ÿงžโ€โ™‚๏ธ

๐Ÿ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

๐Ÿ‘‰Review https://t.ly/KHnQ7
๐Ÿ‘‰Paper arxiv.org/pdf/2501.08326
๐Ÿ‘‰Project miranheo.github.io/omni-rgpt/
๐Ÿ‘‰Repo TBA soon
๐Ÿ”ฅ10โค3๐Ÿพ2โšก1๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ GAGA: Group Any Gaussians ๐Ÿ”ฅ

๐Ÿ‘‰GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Nk_jT
๐Ÿ‘‰Paper www.gaga.gallery/static/pdf/Gaga.pdf
๐Ÿ‘‰Project www.gaga.gallery/
๐Ÿ‘‰Repo github.com/weijielyu/Gaga
๐Ÿ”ฅ11โค3๐Ÿ‘2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽFree Book: LLM Foundations๐ŸŽ

๐Ÿ‘‰A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

โœ…Chapter 1: basics of pre-training
โœ…Chapter 2: gen-models & LLMs
โœ…Chapter 3: prompting methods
โœ…Chapter 4: alignment methods

๐Ÿ‘‰If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

๐Ÿ‘‰Review https://t.ly/9LGCa
๐Ÿ‘‰Book https://lnkd.in/d3VkswZf
โค17๐Ÿ”ฅ6๐Ÿ‘3๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ„โ€โ™€๏ธ GSTAR: Gaussian Surface Tracking ๐Ÿ„โ€โ™€๏ธ

๐Ÿ‘‰ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/udpMq
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10283
๐Ÿ‘‰Project chengwei-zheng.github.io/GSTAR/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8๐Ÿคฉ3๐Ÿ‘2๐Ÿ˜2โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฝ Diffusion Video Inpainting ๐Ÿงฝ

๐Ÿ‘‰#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7rEll
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10018
๐Ÿ‘‰Project lixiaowen-xw.github.io/DiffuEraser-page/
๐Ÿ‘‰Repo github.com/lixiaowen-xw/DiffuEraser
๐Ÿ”ฅ14โค3๐Ÿ‘2โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ #Nvidia Foundation ZS-Stereo ๐ŸŒˆ

๐Ÿ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rfBr5
๐Ÿ‘‰Paper arxiv.org/pdf/2501.09898
๐Ÿ‘‰Project nvlabs.github.io/FoundationStereo/
๐Ÿ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐Ÿ”ฅ6๐Ÿคฉ1