Проекты машинного обучения
78 subscribers
4 photos
414 links
Download Telegram
Behavior Trees in Robotics and AI: An Introduction

📝A Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game.
https://github.com/BehaviorTree/BehaviorTree.CPP
👍2
FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

📝The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy.
https://github.com/adap/flower
👍1
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

📝Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling.
https://github.com/Rongjiehuang/ProDiff
👍1
Robust Speech Recognition via Large-Scale Weak Supervision

📝We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

https://github.com/openai/whisper
Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions

📝Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg).

https://github.com/hust-linyi/seg_trick
High-Resolution Image Synthesis with Latent Diffusion Models

📝By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.

https://github.com/compvis/stable-diffusion
An Efficient Person Clustering Algorithm for Open Checkout-free Groceries

📝Then, to ensure that the method adapts to the dynamic and unseen person flow, we propose Graph Convolutional Network (GCN) with a simple Nearest Neighbor (NN) strategy to accurately cluster the instances of CSG.

https://github.com/WuJunde/checkoutfree
VToonify: Controllable High-Resolution Portrait Video Style Transfer

📝Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.
https://github.com/williamyang1991/vtoonify
DigiFace-1M: 1 Million Digital Face Images for Face Recognition

📝Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet.
https://github.com/microsoft/digiface1m
Ask Me Anything: A simple strategy for prompting language models

📝Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
https://github.com/hazyresearch/ama_prompting