Проекты машинного обучения
78 subscribers
4 photos
414 links
Download Telegram
Reconstructing 3D Human Pose by Watching Humans in the Mirror

In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.
https://github.com/zju3dv/EasyMocap
Pretraining is All You Need for Image-to-Image Translation

We propose to use pretraining to boost general image-to-image translation.


https://github.com/PITI-Synthesis/PITI
👍1
Elucidating the Design Space of Diffusion-Based Generative Models

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.


https://github.com/lucidrains/imagen-pytorch
Ivy: Templated Deep Learning for Inter-Framework Portability

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.


https://github.com/ivy-dl/ivy
This media is not supported in your browser
VIEW IN TELEGRAM
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

https://github.com/compvis/latent-diffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Flow-Guided Transformer for Video Inpainting

Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention.
https://github.com/hitachinsk/fgt
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

📝We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.
https://github.com/timdettmers/bitsandbytes
KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints

📝In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views.
https://github.com/facebookresearch/KeypointNeRF
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

📝Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.
https://github.com/arthur-qiu/stylefacev
Multi-instrument Music Synthesis with Spectrogram Diffusion

📝An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes.

https://github.com/magenta/music-spectrogram-diffusion
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

📝Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance.

https://github.com/CAIC-AD/YOLOPv2
Online Decision Transformer

📝Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling.

https://github.com/facebookresearch/online-dt
This media is not supported in your browser
VIEW IN TELEGRAM
PeRFception: Perception using Radiance Fields

📝The recent progress in implicit 3D representation, i. e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner.

https://github.com/POSTECH-CVLab/PeRFception