This media is not supported in your browser
    VIEW IN TELEGRAM
  𦴠One-Image Object Detection 𦴠
 
πDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπ
 
πReview https://t.ly/-li2G
πPaper arxiv.org/pdf/2410.00900
πCode github.com/RobinGerster7/OSSA
πDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπ
πReview https://t.ly/-li2G
πPaper arxiv.org/pdf/2410.00900
πCode github.com/RobinGerster7/OSSA
π₯19π2β‘1π1π₯°1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π³οΈ EVER Ellipsoid Rendering π³οΈ
 
πUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving βΌ30 FPS at 720p on #NVIDIA RTX4090.
 
πReview https://t.ly/zAfGU
πPaper arxiv.org/pdf/2410.01804
πProject half-potato.gitlab.io/posts/ever/
πUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving βΌ30 FPS at 720p on #NVIDIA RTX4090.
πReview https://t.ly/zAfGU
πPaper arxiv.org/pdf/2410.01804
πProject half-potato.gitlab.io/posts/ever/
π₯13β€2π2π1π€―1π±1πΎ1
  π₯ "Deep Gen-AI" Full Course π₯ 
 
πA fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...
 
πReview https://t.ly/ylBxq
πCourse https://lnkd.in/dMKH9gNe
πLectures https://lnkd.in/d_uwDvT6
πA fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...
πReview https://t.ly/ylBxq
πCourse https://lnkd.in/dMKH9gNe
πLectures https://lnkd.in/d_uwDvT6
β€21π₯7π2π1π₯°1π€©1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π EFM3D: 3D Ego-Foundation π
 
π#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedπ
 
πReview https://t.ly/cDJv6
πPaper arxiv.org/pdf/2406.10224
πProject www.projectaria.com/datasets/aeo/
πRepo github.com/facebookresearch/efm3d
π#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedπ
πReview https://t.ly/cDJv6
πPaper arxiv.org/pdf/2406.10224
πProject www.projectaria.com/datasets/aeo/
πRepo github.com/facebookresearch/efm3d
π₯9β€2π2β‘1π1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π₯¦Gaussian Splatting VTONπ₯¦ 
 
πGS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedπ
 
πReview https://t.ly/sTPbW
πPaper arxiv.org/pdf/2410.05259
πProject yukangcao.github.io/GS-VTON/
πRepo github.com/yukangcao/GS-VTON
πGS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedπ
πReview https://t.ly/sTPbW
πPaper arxiv.org/pdf/2410.05259
πProject yukangcao.github.io/GS-VTON/
πRepo github.com/yukangcao/GS-VTON
π₯14β€3π1π1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π‘Diffusion Models Relightingπ‘ 
 
π#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.
πReview https://t.ly/fliXU
πPaper arxiv.org/pdf/2410.08188
πProject www.eyelinestudios.com/research/diffrelight.html
π#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.
πReview https://t.ly/fliXU
πPaper arxiv.org/pdf/2410.08188
πProject www.eyelinestudios.com/research/diffrelight.html
π₯17β€7β‘2π2π2π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π₯POKEFLEX: Soft Object Datasetπ₯
 
πPokeFlex from ETH is a dataset that includes 3D textured meshes, point clouds, RGB & depth maps of deformable objects. Pretrained models & dataset announcedπ
 
πReview https://t.ly/GXggP
πPaper arxiv.org/pdf/2410.07688
πProject https://lnkd.in/duv-jS7a
πRepo
πPokeFlex from ETH is a dataset that includes 3D textured meshes, point clouds, RGB & depth maps of deformable objects. Pretrained models & dataset announcedπ
πReview https://t.ly/GXggP
πPaper arxiv.org/pdf/2410.07688
πProject https://lnkd.in/duv-jS7a
πRepo
π7π₯2π₯°1π1π±1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π₯ DEPTH ANY VIDEO is out! π₯ 
 
πDAV is a novel foundation model for image/video depth estimation.The new SOTA for accuracy & consistency, up to 150 FPS!
 
πReview https://t.ly/CjSz2
πPaper arxiv.org/pdf/2410.10815
πProject depthanyvideo.github.io/
πCode github.com/Nightmare-n/DepthAnyVideo
πDAV is a novel foundation model for image/video depth estimation.The new SOTA for accuracy & consistency, up to 150 FPS!
πReview https://t.ly/CjSz2
πPaper arxiv.org/pdf/2410.10815
πProject depthanyvideo.github.io/
πCode github.com/Nightmare-n/DepthAnyVideo
π₯14π€―3β€1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  πͺRobo-Emulation via Video Imitationπͺ 
 
πOKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
 
πReview https://t.ly/_N29-
πPaper arxiv.org/pdf/2410.11792
πProject https://lnkd.in/d6bHF_-s
πOKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
πReview https://t.ly/_N29-
πPaper arxiv.org/pdf/2410.11792
πProject https://lnkd.in/d6bHF_-s
π4π€―2π₯1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π₯ CoTracker3 by #META is out! π₯ 
 
π#Meta (+VGG Oxford) unveils CoTracker3, a new tracker that outperforms the previous SoTA by a large margin using only the 0.1% of the training data π€―π€―π€―
 
πReview https://t.ly/TcRIv
πPaper arxiv.org/pdf/2410.11831
πProject cotracker3.github.io/
πCode github.com/facebookresearch/co-tracker
π#Meta (+VGG Oxford) unveils CoTracker3, a new tracker that outperforms the previous SoTA by a large margin using only the 0.1% of the training data π€―π€―π€―
πReview https://t.ly/TcRIv
πPaper arxiv.org/pdf/2410.11831
πProject cotracker3.github.io/
πCode github.com/facebookresearch/co-tracker
β€14π₯3π€―3πΎ2π1π±1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π¦  Neural Metamorphosis π¦  
 
πNU Singapore unveils NeuMeta to transform neural nets by allowing a single model to adapt on the fly to different sizes, generating the right weights when needed.
 
πReview https://t.ly/DJab3
πPaper arxiv.org/pdf/2410.11878
πProject adamdad.github.io/neumeta
πCode github.com/Adamdad/neumeta
πNU Singapore unveils NeuMeta to transform neural nets by allowing a single model to adapt on the fly to different sizes, generating the right weights when needed.
πReview https://t.ly/DJab3
πPaper arxiv.org/pdf/2410.11878
πProject adamdad.github.io/neumeta
πCode github.com/Adamdad/neumeta
β€7π₯3π€―3π±2β‘1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  βοΈ GS + Depth = SOTA βοΈ
 
πDepthSplat, the new SOTA in depth estimation & novel view synthesis. The key feature is the cross-task interaction between Gaussian Splatting & depth estimation. Source Code to be released soonπ
 
πReview https://t.ly/87HuH
πPaper arxiv.org/abs/2410.13862
πProject haofeixu.github.io/depthsplat/
πCode github.com/cvg/depthsplat
πDepthSplat, the new SOTA in depth estimation & novel view synthesis. The key feature is the cross-task interaction between Gaussian Splatting & depth estimation. Source Code to be released soonπ
πReview https://t.ly/87HuH
πPaper arxiv.org/abs/2410.13862
πProject haofeixu.github.io/depthsplat/
πCode github.com/cvg/depthsplat
π€―9π₯8β€3β‘1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π₯BitNet: code of 1-bit LLM releasedπ₯ 
 
πBitNet by #Microsoft, announced in late 2023, is a 1-bit Transformer architecture designed for LLMs. BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Source Code just released π
 
πReview https://t.ly/3G2LA
πPaper arxiv.org/pdf/2310.11453
πCode https://lnkd.in/duPADJVb
πBitNet by #Microsoft, announced in late 2023, is a 1-bit Transformer architecture designed for LLMs. BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Source Code just released π
πReview https://t.ly/3G2LA
πPaper arxiv.org/pdf/2310.11453
πCode https://lnkd.in/duPADJVb
π₯21β€5π€―2π1π₯°1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π§Ώ Look Ma, no markers π§Ώ
 
π#Microsoft unveils the first technique for marker-free, HQ reconstruction of COMPLETE human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Impressive results! Repo for training & Dataset releasedπ
 
πReview https://t.ly/5fN0g
πPaper arxiv.org/pdf/2410.11520
πProject microsoft.github.io/SynthMoCap/
πRepo github.com/microsoft/SynthMoCap
π#Microsoft unveils the first technique for marker-free, HQ reconstruction of COMPLETE human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Impressive results! Repo for training & Dataset releasedπ
πReview https://t.ly/5fN0g
πPaper arxiv.org/pdf/2410.11520
πProject microsoft.github.io/SynthMoCap/
πRepo github.com/microsoft/SynthMoCap
π€―16π10π₯3π±3β€1π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  πͺ PL2Map: efficient neural 2D-3D πͺ 
πPL2Map is a novel neural network tailored for efficient representation of complex point & line maps. A natural representation of 2D-3D correspondences
 
πReview https://t.ly/D-bVD
πPaper arxiv.org/pdf/2402.18011
πProject https://thpjp.github.io/pl2map
πCode https://github.com/ais-lab/pl2map
πPL2Map is a novel neural network tailored for efficient representation of complex point & line maps. A natural representation of 2D-3D correspondences
πReview https://t.ly/D-bVD
πPaper arxiv.org/pdf/2402.18011
πProject https://thpjp.github.io/pl2map
πCode https://github.com/ais-lab/pl2map
π₯14π€―8π2β€1π€©1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π» Plant Camouflage Detectionπ» 
 
πPlantCamo Dataset is the first dataset for plant camouflage detection: 1,250 images with camouflage characteristics. Source Code released π
 
πReview https://t.ly/pYFX4
πPaper arxiv.org/pdf/2410.17598
πCode github.com/yjybuaa/PlantCamo
πPlantCamo Dataset is the first dataset for plant camouflage detection: 1,250 images with camouflage characteristics. Source Code released π
πReview https://t.ly/pYFX4
πPaper arxiv.org/pdf/2410.17598
πCode github.com/yjybuaa/PlantCamo
β€11π6π€―4π1π€©1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  βοΈ SMITE:  SEGMENT IN TIME βοΈ
 
πSFU unveils SMITE: a novel AI that -with only one or few segmentation references with fine granularity- is able to segment different unseen videos respecting the segmentation references. Dataset & Code (under Apache 2.0) announced π
 
πReview https://t.ly/w6aWJ
πPaper arxiv.org/pdf/2410.18538
πProject segment-me-in-time.github.io/
πRepo github.com/alimohammadiamirhossein/smite
πSFU unveils SMITE: a novel AI that -with only one or few segmentation references with fine granularity- is able to segment different unseen videos respecting the segmentation references. Dataset & Code (under Apache 2.0) announced π
πReview https://t.ly/w6aWJ
πPaper arxiv.org/pdf/2410.18538
πProject segment-me-in-time.github.io/
πRepo github.com/alimohammadiamirhossein/smite
π€―11β€4π€©1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π« Blendify: #Python + Blender π« 
 
πLightweight Python framework that provides a high-level API for creating & rendering scenes with #Blender. It simplifies data augmentation & synthesis. Source Code releasedπ
 
πReview https://t.ly/l0crA
πPaper https://arxiv.org/pdf/2410.17858
πCode https://virtualhumans.mpi-inf.mpg.de/blendify/
πLightweight Python framework that provides a high-level API for creating & rendering scenes with #Blender. It simplifies data augmentation & synthesis. Source Code releasedπ
πReview https://t.ly/l0crA
πPaper https://arxiv.org/pdf/2410.17858
πCode https://virtualhumans.mpi-inf.mpg.de/blendify/
π€©13π4π₯4β€2π1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  π₯ D-FINE: new SOTA Detector π₯
πD-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR model. New SOTA on MS COCO with additional data. Code & models available π
πReview https://t.ly/aw9fN
πPaper https://arxiv.org/pdf/2410.13842
πCode https://github.com/Peterande/D-FINE
πD-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR model. New SOTA on MS COCO with additional data. Code & models available π
πReview https://t.ly/aw9fN
πPaper https://arxiv.org/pdf/2410.13842
πCode https://github.com/Peterande/D-FINE
β€16π3π1π€―1
  
  AI with Papers - Artificial Intelligence & Deep Learning
π« Free-Moving Reconstruction π«   πEPFL (+#MagicLeap) unveils a novel approach for reconstructing free-moving object from monocular RGB clip. Free interaction with objects in front of a moving cam without relying on any prior, and optimizes the sequence globallyβ¦
  
  GitHub
  
  GitHub - HaixinShi/fmov_pose: This is the official repo for the implementation of Free-Moving Object Reconstruction and Pose Estimationβ¦
  This is the official repo for the implementation of Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera(AAAI 2025). - HaixinShi/fmov_pose
π1