Robust Speech Recognition via Large-Scale Weak Supervision
📝We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
https://github.com/openai/whisper
📝We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
https://github.com/openai/whisper
GitHub
GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper
Diffusion Models: A Comprehensive Survey of Methods and Applications
📝Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation.
https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy
📝Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation.
https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy
GitHub
GitHub - YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy: Diffusion model papers, survey, and taxonomy
Diffusion model papers, survey, and taxonomy. Contribute to YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy development by creating an account on GitHub.
Plenoxels: Radiance Fields without Neural Networks
📝We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.
https://github.com/kakaobrain/NeRF-Factory
📝We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.
https://github.com/kakaobrain/NeRF-Factory
GitHub
GitHub - kakaobrain/nerf-factory: An awesome PyTorch NeRF library
An awesome PyTorch NeRF library. Contribute to kakaobrain/nerf-factory development by creating an account on GitHub.
LAVIS: A Library for Language-Vision Intelligence
📝We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.
https://github.com/salesforce/lavis
📝We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.
https://github.com/salesforce/lavis
GitHub
GitHub - salesforce/LAVIS: LAVIS - A One-stop Library for Language-Vision Intelligence
LAVIS - A One-stop Library for Language-Vision Intelligence - salesforce/LAVIS
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
📝Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.
https://github.com/IDEA-Research/detrex
📝Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.
https://github.com/IDEA-Research/detrex
GitHub
GitHub - IDEA-Research/detrex: detrex is a research platform for DETR-based object detection, segmentation, pose estimation and…
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks. - IDEA-Research/detrex
Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions
📝Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg).
https://github.com/hust-linyi/seg_trick
📝Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg).
https://github.com/hust-linyi/seg_trick
GitHub
GitHub - hust-linyi/MedISeg
Contribute to hust-linyi/MedISeg development by creating an account on GitHub.
High-Resolution Image Synthesis with Latent Diffusion Models
📝By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
https://github.com/compvis/stable-diffusion
📝By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
https://github.com/compvis/stable-diffusion
GitHub
GitHub - CompVis/stable-diffusion: A latent text-to-image diffusion model
A latent text-to-image diffusion model. Contribute to CompVis/stable-diffusion development by creating an account on GitHub.
USB: A Unified Semi-supervised Learning Benchmark
📝Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples.
https://github.com/microsoft/semi-supervised-learning
📝Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples.
https://github.com/microsoft/semi-supervised-learning
GitHub
GitHub - microsoft/Semi-supervised-learning: A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
A Unified Semi-Supervised Learning Codebase (NeurIPS'22) - GitHub - microsoft/Semi-supervised-learning: A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
Advancing Model Pruning via Bi-level Optimization
📝To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP.
https://github.com/optml-group/bip
📝To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP.
https://github.com/optml-group/bip
GitHub
GitHub - OPTML-Group/BiP: [NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit…
[NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu - Git...
An Efficient Person Clustering Algorithm for Open Checkout-free Groceries
📝Then, to ensure that the method adapts to the dynamic and unseen person flow, we propose Graph Convolutional Network (GCN) with a simple Nearest Neighbor (NN) strategy to accurately cluster the instances of CSG.
https://github.com/WuJunde/checkoutfree
📝Then, to ensure that the method adapts to the dynamic and unseen person flow, we propose Graph Convolutional Network (GCN) with a simple Nearest Neighbor (NN) strategy to accurately cluster the instances of CSG.
https://github.com/WuJunde/checkoutfree
GitHub
GitHub - WuJunde/checkoutfree: It is a python implementation of the person clustering algorithm in the check-out free grocery visual…
It is a python implementation of the person clustering algorithm in the check-out free grocery visual system. - GitHub - WuJunde/checkoutfree: It is a python implementation of the person clustering...
NerfAcc: A General NeRF Acceleration Toolbox
📝We propose NerfAcc, a toolbox for efficient volumetric rendering of radiance fields.
https://github.com/kair-bair/nerfacc
📝We propose NerfAcc, a toolbox for efficient volumetric rendering of radiance fields.
https://github.com/kair-bair/nerfacc
GitHub
GitHub - nerfstudio-project/nerfacc: A General NeRF Acceleration Toolbox in PyTorch.
A General NeRF Acceleration Toolbox in PyTorch. Contribute to nerfstudio-project/nerfacc development by creating an account on GitHub.
Human Motion Diffusion Model
📝In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain.
https://github.com/guytevet/motion-diffusion-model
📝In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain.
https://github.com/guytevet/motion-diffusion-model
GitHub
GitHub - GuyTevet/motion-diffusion-model: The official PyTorch implementation of the paper "Human Motion Diffusion Model"
The official PyTorch implementation of the paper "Human Motion Diffusion Model" - GuyTevet/motion-diffusion-model
VToonify: Controllable High-Resolution Portrait Video Style Transfer
📝Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.
https://github.com/williamyang1991/vtoonify
📝Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.
https://github.com/williamyang1991/vtoonify
GitHub
GitHub - williamyang1991/VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer - williamyang1991/VToonify
Content-Based Search for Deep Generative Models
📝The growing proliferation of pretrained generative models has made it infeasible for a user to be fully cognizant of every model in existence.
https://github.com/generative-intelligence-lab/modelverse
📝The growing proliferation of pretrained generative models has made it infeasible for a user to be fully cognizant of every model in existence.
https://github.com/generative-intelligence-lab/modelverse
GitHub
GitHub - generative-intelligence-lab/modelverse: Modelverse: Content-Based Search for Deep Generative Models
Modelverse: Content-Based Search for Deep Generative Models - GitHub - generative-intelligence-lab/modelverse: Modelverse: Content-Based Search for Deep Generative Models
DigiFace-1M: 1 Million Digital Face Images for Face Recognition
📝Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet.
https://github.com/microsoft/digiface1m
📝Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet.
https://github.com/microsoft/digiface1m
GitHub
GitHub - microsoft/DigiFace1M
Contribute to microsoft/DigiFace1M development by creating an account on GitHub.
Ask Me Anything: A simple strategy for prompting language models
📝Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
https://github.com/hazyresearch/ama_prompting
📝Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
https://github.com/hazyresearch/ama_prompting
GitHub
GitHub - HazyResearch/ama_prompting: Ask Me Anything language model prompting
Ask Me Anything language model prompting. Contribute to HazyResearch/ama_prompting development by creating an account on GitHub.
DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models
📝We analyze prompts in the dataset and discuss key properties of these prompts.
https://github.com/poloclub/diffusiondb
📝We analyze prompts in the dataset and discuss key properties of these prompts.
https://github.com/poloclub/diffusiondb
GitHub
GitHub - poloclub/diffusiondb: A large-scale text-to-image prompt gallery dataset based on Stable Diffusion
A large-scale text-to-image prompt gallery dataset based on Stable Diffusion - poloclub/diffusiondb
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
📝We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform.
https://github.com/masayakawamura/mb-istft-vits
📝We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform.
https://github.com/masayakawamura/mb-istft-vits
GitHub
GitHub - MasayaKawamura/MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse…
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform - GitHub - MasayaKawamura/MB-iSTFT-VITS: Lightweight and High-Fidelity En...
High Fidelity Neural Audio Compression
📝We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.
https://github.com/facebookresearch/encodec
📝We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.
https://github.com/facebookresearch/encodec
GitHub
GitHub - facebookresearch/encodec: State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo…
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio. - facebookresearch/encodec
Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation
📝In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods.
https://github.com/zju3dv/vox-fusion
📝In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods.
https://github.com/zju3dv/vox-fusion
GitHub
GitHub - zju3dv/Vox-Fusion: Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022
Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022 - GitHub - zju3dv/Vox-Fusion: Code for "Dense Tracking and Mapping with ...
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
📝The success of Transformer models has pushed the deep learning model scale to billions of parameters.
https://github.com/hpcaitech/colossalai
📝The success of Transformer models has pushed the deep learning model scale to billions of parameters.
https://github.com/hpcaitech/colossalai
GitHub
GitHub - hpcaitech/ColossalAI: Making large AI models cheaper, faster and more accessible
Making large AI models cheaper, faster and more accessible - hpcaitech/ColossalAI