High-Resolution Image Synthesis with Latent Diffusion Models 
📝By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
https://github.com/compvis/stable-diffusion
  
  📝By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
https://github.com/compvis/stable-diffusion
GitHub
  
  GitHub - CompVis/stable-diffusion: A latent text-to-image diffusion model
  A latent text-to-image diffusion model. Contribute to CompVis/stable-diffusion development by creating an account on GitHub.
  USB: A Unified Semi-supervised Learning Benchmark 
📝Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples.
https://github.com/microsoft/semi-supervised-learning
  
  📝Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples.
https://github.com/microsoft/semi-supervised-learning
GitHub
  
  GitHub - microsoft/Semi-supervised-learning: A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
  A Unified Semi-Supervised Learning Codebase (NeurIPS'22) - GitHub - microsoft/Semi-supervised-learning: A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
  Advancing Model Pruning via Bi-level Optimization 
📝To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP.
 
https://github.com/optml-group/bip
  
  📝To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP.
https://github.com/optml-group/bip
GitHub
  
  GitHub - OPTML-Group/BiP: [NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit…
  [NeurIPS22] "Advancing Model Pruning via Bi-level Optimization" by Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu - Git...
  An Efficient Person Clustering Algorithm for Open Checkout-free Groceries 
📝Then, to ensure that the method adapts to the dynamic and unseen person flow, we propose Graph Convolutional Network (GCN) with a simple Nearest Neighbor (NN) strategy to accurately cluster the instances of CSG.
 
https://github.com/WuJunde/checkoutfree
  
  📝Then, to ensure that the method adapts to the dynamic and unseen person flow, we propose Graph Convolutional Network (GCN) with a simple Nearest Neighbor (NN) strategy to accurately cluster the instances of CSG.
https://github.com/WuJunde/checkoutfree
GitHub
  
  GitHub - WuJunde/checkoutfree: It is a python implementation of the person clustering algorithm in the check-out free grocery visual…
  It is a python implementation of the person clustering algorithm in the check-out free grocery visual system. - GitHub - WuJunde/checkoutfree: It is a python implementation of the person clustering...
  NerfAcc: A General NeRF Acceleration Toolbox 
📝We propose NerfAcc, a toolbox for efficient volumetric rendering of radiance fields.
https://github.com/kair-bair/nerfacc
  
  📝We propose NerfAcc, a toolbox for efficient volumetric rendering of radiance fields.
https://github.com/kair-bair/nerfacc
GitHub
  
  GitHub - nerfstudio-project/nerfacc: A General NeRF Acceleration Toolbox in PyTorch.
  A General NeRF Acceleration Toolbox in PyTorch. Contribute to nerfstudio-project/nerfacc development by creating an account on GitHub.
  Human Motion Diffusion Model 
📝In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain.
https://github.com/guytevet/motion-diffusion-model
  
  📝In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain.
https://github.com/guytevet/motion-diffusion-model
GitHub
  
  GitHub - GuyTevet/motion-diffusion-model: The official PyTorch implementation of the paper "Human Motion Diffusion Model"
  The official PyTorch implementation of the paper "Human Motion Diffusion Model" - GuyTevet/motion-diffusion-model
  VToonify: Controllable High-Resolution Portrait Video Style Transfer 
📝Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.
https://github.com/williamyang1991/vtoonify
  
  📝Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.
https://github.com/williamyang1991/vtoonify
GitHub
  
  GitHub - williamyang1991/VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
  [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer - williamyang1991/VToonify
  Content-Based Search for Deep Generative Models 
📝The growing proliferation of pretrained generative models has made it infeasible for a user to be fully cognizant of every model in existence.
https://github.com/generative-intelligence-lab/modelverse
  
  📝The growing proliferation of pretrained generative models has made it infeasible for a user to be fully cognizant of every model in existence.
https://github.com/generative-intelligence-lab/modelverse
GitHub
  
  GitHub - generative-intelligence-lab/modelverse: Modelverse: Content-Based Search for Deep Generative Models
  Modelverse: Content-Based Search for Deep Generative Models - GitHub - generative-intelligence-lab/modelverse: Modelverse: Content-Based Search for Deep Generative Models
  DigiFace-1M: 1 Million Digital Face Images for Face Recognition 
📝Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet.
https://github.com/microsoft/digiface1m
  
  📝Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet.
https://github.com/microsoft/digiface1m
GitHub
  
  GitHub - microsoft/DigiFace1M
  Contribute to microsoft/DigiFace1M development by creating an account on GitHub.
  Ask Me Anything: A simple strategy for prompting language models 
📝Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
https://github.com/hazyresearch/ama_prompting
  
  📝Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
https://github.com/hazyresearch/ama_prompting
GitHub
  
  GitHub - HazyResearch/ama_prompting: Ask Me Anything language model prompting
  Ask Me Anything language model prompting. Contribute to HazyResearch/ama_prompting development by creating an account on GitHub.
  DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models 
📝We analyze prompts in the dataset and discuss key properties of these prompts.
https://github.com/poloclub/diffusiondb
  
  📝We analyze prompts in the dataset and discuss key properties of these prompts.
https://github.com/poloclub/diffusiondb
GitHub
  
  GitHub - poloclub/diffusiondb: A large-scale text-to-image prompt gallery dataset based on Stable Diffusion
  A large-scale text-to-image prompt gallery dataset based on Stable Diffusion - poloclub/diffusiondb
  Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform 
📝We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform.
https://github.com/masayakawamura/mb-istft-vits
  
  📝We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform.
https://github.com/masayakawamura/mb-istft-vits
GitHub
  
  GitHub - MasayaKawamura/MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse…
  Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform - GitHub - MasayaKawamura/MB-iSTFT-VITS: Lightweight and High-Fidelity En...
  High Fidelity Neural Audio Compression 
📝We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.
https://github.com/facebookresearch/encodec
  
  📝We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.
https://github.com/facebookresearch/encodec
GitHub
  
  GitHub - facebookresearch/encodec: State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo…
  State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio. - facebookresearch/encodec
  Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation 
📝In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods.
https://github.com/zju3dv/vox-fusion
  
  📝In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods.
https://github.com/zju3dv/vox-fusion
GitHub
  
  GitHub - zju3dv/Vox-Fusion: Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022
  Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022 - GitHub - zju3dv/Vox-Fusion: Code for "Dense Tracking and Mapping with ...
  Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training 
📝The success of Transformer models has pushed the deep learning model scale to billions of parameters.
https://github.com/hpcaitech/colossalai
  
  📝The success of Transformer models has pushed the deep learning model scale to billions of parameters.
https://github.com/hpcaitech/colossalai
GitHub
  
  GitHub - hpcaitech/ColossalAI: Making large AI models cheaper, faster and more accessible
  Making large AI models cheaper, faster and more accessible - hpcaitech/ColossalAI
  Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations 
📝Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.
https://github.com/hhguo/msmc-tts
  
  📝Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.
https://github.com/hhguo/msmc-tts
GitHub
  
  GitHub - hhguo/MSMC-TTS: Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
  Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS - GitHub - hhguo/MSMC-TTS: Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
  Referring Image Matting 
📝Image matting refers to extracting the accurate foregrounds in the image.
https://github.com/jizhizili/rim
  
  📝Image matting refers to extracting the accurate foregrounds in the image.
https://github.com/jizhizili/rim
GitHub
  
  GitHub - JizhiziLi/RIM: The official repo for the paper "Referring Image Matting".
  The official repo for the paper "Referring Image Matting".  - GitHub - JizhiziLi/RIM: The official repo for the paper "Referring Image Matting".
  What Makes Convolutional Models Great on Long Sequence Modeling? 
📝We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.
https://github.com/ctlllll/sgconv
  
  📝We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.
https://github.com/ctlllll/sgconv
GitHub
  
  GitHub - ctlllll/SGConv
  Contribute to ctlllll/SGConv development by creating an account on GitHub.
  MetaFormer Baselines for Vision 
📝By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
https://github.com/sail-sg/metaformer
  
  📝By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
https://github.com/sail-sg/metaformer
GitHub
  
  GitHub - sail-sg/metaformer: MetaFormer Baselines for Vision (TPAMI 2024)
  MetaFormer Baselines for Vision (TPAMI 2024). Contribute to sail-sg/metaformer development by creating an account on GitHub.
  Real-Time Target Sound Extraction 
📝We present the first neural network model to achieve real-time and streaming target sound extraction.
https://github.com/vb000/waveformer
  
  📝We present the first neural network model to achieve real-time and streaming target sound extraction.
https://github.com/vb000/waveformer
GitHub
  
  GitHub - vb000/Waveformer: A deep neural network architecture for low-latency audio processing
  A deep neural network architecture for low-latency audio processing - vb000/Waveformer
  Poisson Flow Generative Models 
📝We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation).
https://github.com/newbeeer/poisson_flow
  
  📝We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation).
https://github.com/newbeeer/poisson_flow
GitHub
  
  GitHub - Newbeeer/Poisson_flow: Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)
  Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM) - GitHub - Newbeeer/Poisson_flow: Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)
  