Проекты машинного обучения
78 subscribers
4 photos
414 links
Download Telegram
Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation

📝In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods.
https://github.com/zju3dv/vox-fusion
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

📝The success of Transformer models has pushed the deep learning model scale to billions of parameters.
https://github.com/hpcaitech/colossalai
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

📝Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.
https://github.com/hhguo/msmc-tts
What Makes Convolutional Models Great on Long Sequence Modeling?

📝We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.
https://github.com/ctlllll/sgconv
MetaFormer Baselines for Vision

📝By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
https://github.com/sail-sg/metaformer
Poisson Flow Generative Models

📝We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation).
https://github.com/newbeeer/poisson_flow
TAP-Vid: A Benchmark for Tracking Any Point in a Video

📝Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.
https://github.com/deepmind/tapnet
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

📝Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow, a novel distributed training framework based on an SBP (split, broadcast and partial-value) abstraction and the actor model.
https://github.com/Oneflow-Inc/oneflow
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

📝Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis.
https://github.com/mindslab-ai/phaseaug
Example-Based Named Entity Recognition

📝We present a novel approach to named entity recognition (NER) in the presence of scarce data that we call example-based NER.
https://github.com/sayef/fsner
Fine-Tuning Language Models from Human Preferences

📝Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.
https://github.com/lvwerra/trl
DiffusionInst: Diffusion Model for Instance Segmentation

📝This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.
https://github.com/chenhaoxing/DiffusionInst
Programming Is Hard -- Or at Least It Used to Be: Educational Opportunities And Challenges of AI Code Generation

📝The introductory programming sequence has been the focus of much research in computing education.
https://github.com/deepmind/code_contests
Images Speak in Images: A Generalist Painter for In-Context Visual Learning

📝In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.
https://github.com/baaivision/painter