AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
📖Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).
https://github.com/mit-han-lab/llm-awq
📖Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).
https://github.com/mit-han-lab/llm-awq
GitHub
GitHub - mit-han-lab/llm-awq: [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - mit-han-lab/llm-awq
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
📖Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.
https://github.com/facebookresearch/hiera
📖Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.
https://github.com/facebookresearch/hiera
GitHub
GitHub - facebookresearch/hiera: Hiera: A fast, powerful, and simple hierarchical vision transformer.
Hiera: A fast, powerful, and simple hierarchical vision transformer. - facebookresearch/hiera
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation
📖The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.
https://github.com/icoz69/styleavatar3d
📖The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.
https://github.com/icoz69/styleavatar3d
GitHub
GitHub - icoz69/StyleAvatar3D: Official repo for StyleAvatar3D
Official repo for StyleAvatar3D. Contribute to icoz69/StyleAvatar3D development by creating an account on GitHub.
Humans in 4D: Reconstructing and Tracking Humans with Transformers
📖To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.
https://github.com/shubham-goel/4D-Humans
📖To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.
https://github.com/shubham-goel/4D-Humans
GitHub
GitHub - shubham-goel/4D-Humans: 4DHumans: Reconstructing and Tracking Humans with Transformers
4DHumans: Reconstructing and Tracking Humans with Transformers - shubham-goel/4D-Humans
📝DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement
https://github.com/rikorose/deepfilternet
https://github.com/rikorose/deepfilternet
GitHub
GitHub - Rikorose/DeepFilterNet: Noise supression using deep filtering
Noise supression using deep filtering. Contribute to Rikorose/DeepFilterNet development by creating an account on GitHub.
📝SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
https://github.com/vahe1994/spqr
https://github.com/vahe1994/spqr
GitHub
GitHub - Vahe1994/SpQR
Contribute to Vahe1994/SpQR development by creating an account on GitHub.
📝Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
https://github.com/damo-nlp-sg/video-llama
https://github.com/damo-nlp-sg/video-llama
GitHub
GitHub - DAMO-NLP-SG/Video-LLaMA: [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding - DAMO-NLP-SG/Video-LLaMA
Simple and Controllable Music Generation
📝We tackle the task of conditional music generation.
https://github.com/facebookresearch/audiocraft
📝We tackle the task of conditional music generation.
https://github.com/facebookresearch/audiocraft
GitHub
GitHub - facebookresearch/audiocraft: Audiocraft is a library for audio processing and generation with deep learning. It features…
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable...
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
📝In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.
https://github.com/flagai-open/flagai
📝In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.
https://github.com/flagai-open/flagai
GitHub
GitHub - FlagAI-Open/FlagAI: FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large…
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model. - GitHub - FlagAI-Open/FlagAI: FlagAI (Fast LArge-scale General AI models) is a fast...
📝Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation.
https://github.com/mbzuai-oryx/video-chatgpt
https://github.com/mbzuai-oryx/video-chatgpt
GitHub
GitHub - mbzuai-oryx/Video-ChatGPT: "Video-ChatGPT" is a video conversation model capable of generating meaningful conversation…
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder ada...
🦦 📝Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://github.com/luodian/otter
https://github.com/luodian/otter
GitHub
GitHub - Luodian/Otter: 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained…
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning a...
📝Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
https://github.com/franxyao/chain-of-thought-hub
https://github.com/franxyao/chain-of-thought-hub
GitHub
GitHub - FranxYao/chain-of-thought-hub: Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting - GitHub - FranxYao/chain-of-thought-hub: Benchmarking large language models' complex r...
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
📝https://github.com/osu-nlp-group/magicbrush
📝https://github.com/osu-nlp-group/magicbrush
GitHub
GitHub - OSU-NLP-Group/MagicBrush: Dataset, code and models for the paper "MagicBrush: A Manually Annotated Dataset for Instruction…
Dataset, code and models for the paper "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing". - GitHub - OSU-NLP-Group/MagicBrush: Dataset, code and mode...
ChemCrow: Augmenting large-language models with chemistry tools
📝https://github.com/ur-whitelab/chemcrow-public
📝https://github.com/ur-whitelab/chemcrow-public
GitHub
GitHub - ur-whitelab/chemcrow-public: Chemcrow
Chemcrow. Contribute to ur-whitelab/chemcrow-public development by creating an account on GitHub.
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
📝https://github.com/nlpxucan/wizardlm
📝https://github.com/nlpxucan/wizardlm
GitHub
GitHub - nlpxucan/WizardLM: LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath - nlpxucan/WizardLM
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
📝https://github.com/facebookresearch/ijepa
📝https://github.com/facebookresearch/ijepa
GitHub
GitHub - facebookresearch/ijepa: Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined…
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predic...