TAP-Vid: A Benchmark for Tracking Any Point in a Video
📝Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.
https://github.com/deepmind/tapnet
📝Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.
https://github.com/deepmind/tapnet
GitHub
GitHub - google-deepmind/tapnet: Tracking Any Point (TAP)
Tracking Any Point (TAP). Contribute to google-deepmind/tapnet development by creating an account on GitHub.
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
📝Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
https://github.com/Oneflow-Inc/oneflow
📝Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
https://github.com/Oneflow-Inc/oneflow
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
📝The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.
https://github.com/ofa-sys/chinese-clip
📝The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.
https://github.com/ofa-sys/chinese-clip
GitHub
GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation. - GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retri...
👍1
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping
📝Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis.
https://github.com/mindslab-ai/phaseaug
📝Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis.
https://github.com/mindslab-ai/phaseaug
GitHub
GitHub - maum-ai/phaseaug: ICASSP 2023 Accepted
ICASSP 2023 Accepted. Contribute to maum-ai/phaseaug development by creating an account on GitHub.
Example-Based Named Entity Recognition
📝We present a novel approach to named entity recognition (NER) in the presence of scarce data that we call example-based NER.
https://github.com/sayef/fsner
📝We present a novel approach to named entity recognition (NER) in the presence of scarce data that we call example-based NER.
https://github.com/sayef/fsner
GitHub
GitHub - sayef/fsner: Few-shot Named Entity Recognition
Few-shot Named Entity Recognition. Contribute to sayef/fsner development by creating an account on GitHub.
Fine-Tuning Language Models from Human Preferences
📝Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.
https://github.com/lvwerra/trl
📝Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.
https://github.com/lvwerra/trl
GitHub
GitHub - huggingface/trl: Train transformer language models with reinforcement learning.
Train transformer language models with reinforcement learning. - huggingface/trl
DiffusionInst: Diffusion Model for Instance Segmentation
📝This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.
https://github.com/chenhaoxing/DiffusionInst
📝This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.
https://github.com/chenhaoxing/DiffusionInst
GitHub
GitHub - chenhaoxing/DiffusionInst: This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24).
This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24). - chenhaoxing/DiffusionInst
DAMO-YOLO : A Report on Real-Time Object Detection Design
📝In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.
https://github.com/tinyvision/damo-yolo
📝In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.
https://github.com/tinyvision/damo-yolo
GitHub
GitHub - tinyvision/DAMO-YOLO: DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones…
DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement. - tinyvision/DAMO-YOLO
Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation
📝The introductory programming sequence has been the focus of much research in computing education.
https://github.com/deepmind/code_contests
📝The introductory programming sequence has been the focus of much research in computing education.
https://github.com/deepmind/code_contests
GitHub
GitHub - google-deepmind/code_contests
Contribute to google-deepmind/code_contests development by creating an account on GitHub.
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
📝In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.
https://github.com/baaivision/painter
📝In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.
https://github.com/baaivision/painter
GitHub
GitHub - baaivision/Painter: Painter & SegGPT Series: Vision Foundation Models from BAAI
Painter & SegGPT Series: Vision Foundation Models from BAAI - GitHub - baaivision/Painter: Painter & SegGPT Series: Vision Foundation Models from BAAI
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency
📝In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.
https://github.com/opendilab/ace
📝In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.
https://github.com/opendilab/ace
GitHub
GitHub - opendilab/ACE: [AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional…
[AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency". - GitHub - opendilab/ACE: [AAAI 2023] Official...
Learning Video Representations from Large Language Models
📝We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).
https://github.com/facebookresearch/lavila
📝We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).
https://github.com/facebookresearch/lavila
GitHub
GitHub - facebookresearch/LaViLa: Code release for "Learning Video Representations from Large Language Models"
Code release for "Learning Video Representations from Large Language Models" - facebookresearch/LaViLa
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
📝We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.
https://github.com/baaivision/eva
📝We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.
https://github.com/baaivision/eva
GitHub
GitHub - baaivision/EVA: EVA Series: Visual Representation Fantasies from BAAI
EVA Series: Visual Representation Fantasies from BAAI - baaivision/EVA
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
📝This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.
https://github.com/facebookresearch/convnext-v2
📝This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.
https://github.com/facebookresearch/convnext-v2
GitHub
GitHub - facebookresearch/ConvNeXt-V2: Code release for ConvNeXt V2 model
Code release for ConvNeXt V2 model. Contribute to facebookresearch/ConvNeXt-V2 development by creating an account on GitHub.
Cramming: Training a Language Model on a Single GPU in One Day
📝Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners.
https://github.com/jonasgeiping/cramming
📝Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners.
https://github.com/jonasgeiping/cramming
GitHub
GitHub - JonasGeiping/cramming: Cramming the training of a (BERT-type) language model into limited compute.
Cramming the training of a (BERT-type) language model into limited compute. - JonasGeiping/cramming
Muse: Text-To-Image Generation via Masked Generative Transformers
📝Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
https://github.com/lucidrains/muse-pytorch
📝Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
https://github.com/lucidrains/muse-pytorch
GitHub
GitHub - lucidrains/muse-maskgit-pytorch: Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers,…
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch - GitHub - lucidrains/muse-maskgit-pytorch: Implementation of Muse: Text-to-Image Generation via Mask...
A Survey for In-context Learning
📝With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples.
https://github.com/dqxiu/icl_paperlist
📝With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples.
https://github.com/dqxiu/icl_paperlist
GitHub
GitHub - dqxiu/ICL_PaperList: Paper List for In-context Learning 🌷
Paper List for In-context Learning 🌷. Contribute to dqxiu/ICL_PaperList development by creating an account on GitHub.
Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal
📝The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task.
https://github.com/liangke23/awesome-knowledge-graph-reasoning
📝The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task.
https://github.com/liangke23/awesome-knowledge-graph-reasoning
GitHub
GitHub - LIANGKE23/Awesome-Knowledge-Graph-Reasoning: AKGR: Awesome Knowledge Graph Reasoning is a collection of knowledge graph…
AKGR: Awesome Knowledge Graph Reasoning is a collection of knowledge graph reasoning works, including papers, codes and datasets - GitHub - LIANGKE23/Awesome-Knowledge-Graph-Reasoning: AKGR: Awesom...
BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits
📝In these experiments, we observe that BanditPAM returns the same results as state-of-the-art PAM-like algorithms up to 4x faster while performing up to 200x fewer distance computations.https://github.com/ThrunGroup/BanditPAM
📝In these experiments, we observe that BanditPAM returns the same results as state-of-the-art PAM-like algorithms up to 4x faster while performing up to 200x fewer distance computations.https://github.com/ThrunGroup/BanditPAM
GitHub
GitHub - motiwari/BanditPAM: BanditPAM C++ implementation and Python package
BanditPAM C++ implementation and Python package. Contribute to motiwari/BanditPAM development by creating an account on GitHub.