AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧠 Single Neuron Reconstruction 🧠

πŸ‘‰SIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience πŸ’™

πŸ‘‰Review https://t.ly/Y5Xu0
πŸ‘‰Paper https://arxiv.org/pdf/2411.04715
πŸ‘‰Repo github.com/beanli161514/neurofly
❀4πŸ”₯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🫠 X-Portrait 2: SOTA(?) Portrait Animation 🫠

πŸ‘‰ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.

πŸ‘‰Review https://t.ly/8Owh9 [UPDATE]
πŸ‘‰Paper ?
πŸ‘‰Project byteaigc.github.io/X-Portrait2/
πŸ‘‰Repo ?
πŸ”₯13🀯5πŸ‘4❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
❄️Don’t Look Twice: ViT by RLT❄️

πŸ‘‰CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced πŸ’™

πŸ‘‰Review https://t.ly/ccSwN
πŸ‘‰Paper https://lnkd.in/d6VXur_q
πŸ‘‰Project https://lnkd.in/d4tXwM5T
πŸ‘‰Repo TBA
πŸ”₯9πŸ‘3❀1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”SeedEdit: foundational T2IπŸ”

πŸ‘‰ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is onlineπŸ’™

πŸ‘‰Review https://t.ly/hPlnN
πŸ‘‰Paper https://arxiv.org/pdf/2411.06686
πŸ‘‰Project team.doubao.com/en/special/seededit
πŸ€—Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
πŸ”₯10❀6🀩2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ 4 NanoSeconds inference πŸ”₯

πŸ‘‰LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!

πŸ‘‰Review https://t.ly/GflOW
πŸ‘‰Paper https://lnkd.in/dAZQr3dW
πŸ‘‰Full clip https://lnkd.in/dvDJ3j-u
πŸ”₯29🀯12πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›₯️ Global Tracklet Association MOT πŸ›₯️

πŸ‘‰A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code releasedπŸ’™

πŸ‘‰Review https://t.ly/gk-yh
πŸ‘‰Paper https://lnkd.in/dvXQVKFw
πŸ‘‰Repo https://lnkd.in/dEJqiyWs
πŸ‘10πŸ”₯4❀2
This media is not supported in your browser
VIEW IN TELEGRAM
🧢 MagicQuill: super-easy Diffusion Editing 🧢

πŸ‘‰MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released πŸ’™

πŸ‘‰Review https://t.ly/hJyLa
πŸ‘‰Paper https://arxiv.org/pdf/2411.09703
πŸ‘‰Project https://magicquill.art/demo/
πŸ‘‰Repo https://github.com/magic-quill/magicquill
πŸ‘‰Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
🀩7πŸ”₯4❀3πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
🧰 EchoMimicV2: Semi-body Human 🧰

πŸ‘‰Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announcedπŸ’™

πŸ‘‰Review https://t.ly/enLxJ
πŸ‘‰Paper arxiv.org/pdf/2411.10061
πŸ‘‰Project antgroup.github.io/ai/echomimic_v2/
πŸ‘‰Repo-v2 github.com/antgroup/echomimic_v2
πŸ‘‰Repo-v1 https://github.com/antgroup/echomimic
❀5πŸ”₯5πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
βš”οΈSAMurai: SAM for Trackingβš”οΈ

πŸ‘‰UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/yGU0P
πŸ‘‰Paper https://arxiv.org/pdf/2411.11922
πŸ‘‰Repo https://github.com/yangchris11/samurai
πŸ‘‰Project https://yangchris11.github.io/samurai/
πŸ”₯20❀6😍2⚑1πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦–Dino-X: Unified Obj-Centric LVMπŸ¦–

πŸ‘‰Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced πŸ’™

πŸ‘‰Review https://t.ly/CSQon
πŸ‘‰Paper https://lnkd.in/dc44ZM8v
πŸ‘‰Project https://lnkd.in/dehKJVvC
πŸ‘‰Repo https://lnkd.in/df8Kb6iz
πŸ”₯12🀯8❀4πŸ‘3🀩1
🌎All Languages Matter: LMMs vs. 100 Lang.🌎

πŸ‘‰ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset πŸ’™

πŸ‘‰Review https://t.ly/VsoJB
πŸ‘‰Paper https://lnkd.in/ddVVZfi2
πŸ‘‰Project https://lnkd.in/dpssaeRq
πŸ‘‰Code https://lnkd.in/dnbaJJE4
πŸ‘‰Dataset https://lnkd.in/drw-_95v
❀3πŸ‘1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦™ EdgeCape: SOTA Agnostic Pose πŸ¦™

πŸ‘‰EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code releasedπŸ’™

πŸ‘‰Review https://t.ly/4TpAs
πŸ‘‰Paper https://arxiv.org/pdf/2411.16665
πŸ‘‰Project https://orhir.github.io/edge_cape/
πŸ‘‰Code https://github.com/orhir/EdgeCape
πŸ”₯10πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›Ÿ StableAnimator: ID-aware Humans πŸ›Ÿ

πŸ‘‰StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!

πŸ‘‰Review https://t.ly/JDtL3
πŸ‘‰Paper https://arxiv.org/pdf/2411.17697
πŸ‘‰Project francis-rings.github.io/StableAnimator/
πŸ‘‰Code github.com/Francis-Rings/StableAnimator
πŸ‘12❀3🀯2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧢SOTA track-by-propagation🧢

πŸ‘‰SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 πŸ’™

πŸ‘‰Review https://t.ly/QSQ8L
πŸ‘‰Paper arxiv.org/pdf/2410.01806
πŸ‘‰Project sambamotr.github.io/
πŸ‘‰Repo https://lnkd.in/dRDX6nk2
❀5πŸ”₯2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ΊHiFiVFS: Extreme Face SwappingπŸ‘Ί

πŸ‘‰HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced😒

πŸ‘‰Review https://t.ly/ea8dU
πŸ‘‰Paper https://arxiv.org/pdf/2411.18293
πŸ‘‰Project https://cxcx1996.github.io/HiFiVFS
🀯13❀2πŸ”₯2πŸ‘1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Video Depth without Video ModelsπŸ”₯

πŸ‘‰RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth 🀯 Code under ApacheπŸ’™

πŸ‘‰Review https://t.ly/R4LqS
πŸ‘‰Paper https://arxiv.org/pdf/2411.19189
πŸ‘‰Project https://rollingdepth.github.io/
πŸ‘‰Repo https://github.com/prs-eth/rollingdepth
πŸ”₯14🀯4πŸ‘2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽Universal Soccer Foundation Model⚽

πŸ‘‰Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be releasedπŸ’™

πŸ‘‰Review https://t.ly/-X90B
πŸ‘‰Paper https://arxiv.org/pdf/2412.01820
πŸ‘‰Project https://jyrao.github.io/UniSoccer/
πŸ‘‰Repo https://github.com/jyrao/UniSoccer
πŸ”₯8❀2πŸ‘2🀩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Motion Prompting Video Generation🌈

πŸ‘‰DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced 😒

πŸ‘‰Review https://t.ly/VyKbv
πŸ‘‰Paper arxiv.org/pdf/2412.02700
πŸ‘‰Project motion-prompting.github.io
πŸ”₯13❀5πŸ‘1😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦘AniGS: Single Pic Animatable Avatar🦘

πŸ‘‰#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be releasedπŸ’™

πŸ‘‰Review https://t.ly/4yfzn
πŸ‘‰Paper arxiv.org/pdf/2412.02684
πŸ‘‰Project lingtengqiu.github.io/2024/AniGS/
πŸ‘‰Repo github.com/aigc3d/AniGS
1❀11πŸ”₯7πŸ‘3🀩2πŸ‘1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀GigaHands: Massive #3D Hands🧀

πŸ‘‰Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images

πŸ‘‰Review https://t.ly/SA0HG
πŸ‘‰Paper www.arxiv.org/pdf/2412.04244
πŸ‘‰Repo github.com/brown-ivl/gigahands
πŸ‘‰Project ivl.cs.brown.edu/research/gigahands.html
❀7πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦒 Track4Gen: Diffusion + Tracking 🦒

πŸ‘‰Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced😒

πŸ‘‰Review https://t.ly/9ujhc
πŸ‘‰Paper arxiv.org/pdf/2412.06016
πŸ‘‰Project hyeonho99.github.io/track4gen/
πŸ‘‰Gallery hyeonho99.github.io/track4gen/full.html
❀3πŸ”₯3🍾1