Проекты машинного обучения
78 subscribers
4 photos
414 links
Download Telegram
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

******Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields.
https://github.com/celebv-hq/celebv-hq
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

******Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.
https://github.com/lbh1024/can
In Defense of Online Models for Video Instance Segmentation

******In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.
https://github.com/wjf5203/vnext
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

******Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 97 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types.
https://github.com/facebookresearch/omni3d
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

📝YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
https://github.com/wongkinyiu/yolov7
Reconstructing 3D Human Pose by Watching Humans in the Mirror

In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.
https://github.com/zju3dv/EasyMocap
Pretraining is All You Need for Image-to-Image Translation

We propose to use pretraining to boost general image-to-image translation.


https://github.com/PITI-Synthesis/PITI
👍1
Elucidating the Design Space of Diffusion-Based Generative Models

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.


https://github.com/lucidrains/imagen-pytorch
Ivy: Templated Deep Learning for Inter-Framework Portability

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.


https://github.com/ivy-dl/ivy
This media is not supported in your browser
VIEW IN TELEGRAM
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

https://github.com/compvis/latent-diffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Flow-Guided Transformer for Video Inpainting

Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention.
https://github.com/hitachinsk/fgt
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

📝We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.
https://github.com/timdettmers/bitsandbytes
KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints

📝In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views.
https://github.com/facebookresearch/KeypointNeRF
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3

📝Notably, StyleFaceV is capable of generating realistic $1024\times1024$ face videos even without high-resolution training videos.
https://github.com/arthur-qiu/stylefacev