Forwarded from Machinelearning
๐ง GANformer: Generative Adversarial Transformers
Github: https://github.com/pengzhiliang/MAE-pytorch
Paper: https://arxiv.org/abs/2111.08960
Dataset: https://paperswithcode.com/dataset/coco
@ai_machinelearning_big_data
Github: https://github.com/pengzhiliang/MAE-pytorch
Paper: https://arxiv.org/abs/2111.08960
Dataset: https://paperswithcode.com/dataset/coco
@ai_machinelearning_big_data
๐1
โโSwin Transformer V2: Scaling Up Capacity and Resolution
The authors present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536ร1,536 resolution.
Vision models have the following difficulties when trying to scale them up: instability issues at scale, high GPU memory consumption for high-resolution images, and the fact that downstream tasks usually require high-resolution images/windows, while the models are pretrained on lower resolutions and the transfer isn't always efficient.
The authors introduce the following technics to circumvent those problems:
- a post normalization technique and a scaled cosine attention approach to improve the stability of large vision models;
- a log-spaced continuous position bias technique to effectively transfer models pre-trained at low-resolution images and windows to their higher-resolution counterparts;
In addition, they share how they were able to decrease GPU consumption significantly.
Swin Transformer V2 sets new records on four representative vision benchmarks: 84.0% top-1 accuracy on ImageNet-V2 image classification, 63.1 / 54.4 box / mask mAP on COCO object detection, 59.9 mIoU on ADE20K semantic segmentation, and 86.8% top-1 accuracy on Kinetics-400 video action classification.
Paper: https://arxiv.org/abs/2111.09883
Code: https://github.com/microsoft/Swin-Transformer
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-swin-v2
#deeplearning #cv #transformer
The authors present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536ร1,536 resolution.
Vision models have the following difficulties when trying to scale them up: instability issues at scale, high GPU memory consumption for high-resolution images, and the fact that downstream tasks usually require high-resolution images/windows, while the models are pretrained on lower resolutions and the transfer isn't always efficient.
The authors introduce the following technics to circumvent those problems:
- a post normalization technique and a scaled cosine attention approach to improve the stability of large vision models;
- a log-spaced continuous position bias technique to effectively transfer models pre-trained at low-resolution images and windows to their higher-resolution counterparts;
In addition, they share how they were able to decrease GPU consumption significantly.
Swin Transformer V2 sets new records on four representative vision benchmarks: 84.0% top-1 accuracy on ImageNet-V2 image classification, 63.1 / 54.4 box / mask mAP on COCO object detection, 59.9 mIoU on ADE20K semantic segmentation, and 86.8% top-1 accuracy on Kinetics-400 video action classification.
Paper: https://arxiv.org/abs/2111.09883
Code: https://github.com/microsoft/Swin-Transformer
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-swin-v2
#deeplearning #cv #transformer
๐3
Acquisition of Chess Knowledge in AlphaZero
69-pages paper of analysis how #AlphaZero plays chess. TLDR: lots of concepts self-learned by neural network can be mapped to human concepts.
This means that generally speaking we can train neural networks to do some task and then learn something from them. Opposite is also true: we might imagine teaching neural networks some human concepts in order to maek them more efficient.
Post: https://en.chessbase.com/post/acquisition-of-chess-knowledge-in-alphazero
Paper: https://arxiv.org/pdf/2111.09259.pdf
#RL
69-pages paper of analysis how #AlphaZero plays chess. TLDR: lots of concepts self-learned by neural network can be mapped to human concepts.
This means that generally speaking we can train neural networks to do some task and then learn something from them. Opposite is also true: we might imagine teaching neural networks some human concepts in order to maek them more efficient.
Post: https://en.chessbase.com/post/acquisition-of-chess-knowledge-in-alphazero
Paper: https://arxiv.org/pdf/2111.09259.pdf
#RL
Chess News
Acquisition of Chess Knowledge in AlphaZero
Researchers at DeepMind and Google Brain, in collaboration with Grandmaster Vladimir Kramnik, are working to explore what chess can teach us about AI and vice versa. Using Chessbaseโs extensive historical chess data along with the AlphaZero neural networkโฆ
๐3๐ฉ1
Augmented Reality for Haptic Teleoperation of a Robot with an Event-Based Soft Tactile Sensor
This paper presents a new teleoperation approach using an augmented reality-based interface combined with optimized haptic feedback to finely manipulate visually occluded objects.
The dynamic growth of emerging Augmented Reality (AR) interfaces has a high potential interest in robotic telemanipulation of objects under limited visibility conditions. On the userโs horizon, the real-world environment is overlayed by the virtual images of the robot end-effector and the object. To optimize the user experience in teleoperation, the visual augmentation is accompanied by a haptic stimulus. They both transmit the rendered signal of the contact force visually and haptically, respectively. The contact force is measured by an optical event-based tactile sensor (E-BTS) with a soft pad, vibrotactile stimuli are generated by the hand-held device (HHD) and the AR is projected in the head-mounted device (HMD).
Authors demonstrated experimentally their approach on teleoperated robot arm puncturing an occluded non-rigid membrane placed in vertical raw with similar membranes. A comparative study with 10 subjects has been carried out to quantify the impact of AR in a force control task with a human in the control loop. The results of the experiment show a promising potential application in the cable insertion in an industrial assembly task.
Video: YouTube
#AR
This paper presents a new teleoperation approach using an augmented reality-based interface combined with optimized haptic feedback to finely manipulate visually occluded objects.
The dynamic growth of emerging Augmented Reality (AR) interfaces has a high potential interest in robotic telemanipulation of objects under limited visibility conditions. On the userโs horizon, the real-world environment is overlayed by the virtual images of the robot end-effector and the object. To optimize the user experience in teleoperation, the visual augmentation is accompanied by a haptic stimulus. They both transmit the rendered signal of the contact force visually and haptically, respectively. The contact force is measured by an optical event-based tactile sensor (E-BTS) with a soft pad, vibrotactile stimuli are generated by the hand-held device (HHD) and the AR is projected in the head-mounted device (HMD).
Authors demonstrated experimentally their approach on teleoperated robot arm puncturing an occluded non-rigid membrane placed in vertical raw with similar membranes. A comparative study with 10 subjects has been carried out to quantify the impact of AR in a force control task with a human in the control loop. The results of the experiment show a promising potential application in the cable insertion in an industrial assembly task.
Video: YouTube
#AR
YouTube
Augmented Reality and Haptic Teleoperation submitted to RA-L, 2021
๐3
โโNรWA: Visual Synthesis Pre-training for Neural visUal World creAtion
In this paper, Microsoft Research Asia and Peking University researchers share a unified multimodal (texts, images, videos, sketches) pre-trained model called NรWA that can generate new or manipulate existing visual data for various visual synthesis tasks. Furthermore, they have designed a 3D transformer encoder-decoder framework with a 3D Nearby Attention (3DNA) mechanism to consider the nature of the visual data and reduce the computational complexity.
NรWA achieves state-of-the-art results on text-to-image generation, text-to-video generation, video prediction, and several other tasks and demonstrates good results on zero-shot text-guided image and video manipulation tasks.
Paper: https://arxiv.org/abs/2111.12417
Code: https://github.com/microsoft/NUWA
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nuwa
#deeplearning #cv #transformer #pretraining
In this paper, Microsoft Research Asia and Peking University researchers share a unified multimodal (texts, images, videos, sketches) pre-trained model called NรWA that can generate new or manipulate existing visual data for various visual synthesis tasks. Furthermore, they have designed a 3D transformer encoder-decoder framework with a 3D Nearby Attention (3DNA) mechanism to consider the nature of the visual data and reduce the computational complexity.
NรWA achieves state-of-the-art results on text-to-image generation, text-to-video generation, video prediction, and several other tasks and demonstrates good results on zero-shot text-guided image and video manipulation tasks.
Paper: https://arxiv.org/abs/2111.12417
Code: https://github.com/microsoft/NUWA
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nuwa
#deeplearning #cv #transformer #pretraining
๐2
Forwarded from Machinelearning
๐น End-to-End Referring Video Object Segmentation with Multimodal Transformers
Github: https://github.com/mttr2021/MTTR
Paper: https://arxiv.org/abs/2111.14821v1
Dataset: https://kgavrilyuk.github.io/publication/actor_action/
@ai_machinelearning_big_data
Github: https://github.com/mttr2021/MTTR
Paper: https://arxiv.org/abs/2111.14821v1
Dataset: https://kgavrilyuk.github.io/publication/actor_action/
@ai_machinelearning_big_data
YaTalks โ Yandex's conference for IT community.
Yandex will host its traditional conference on 3-4 December (starting tomorrow). Registration is open.
One of the tracks is devoted to Machine/Deep Learning with the focus on content generation.
Featured reports:
๐How too train text model on the minimal corpus
๐๏ธHow Yandex.Browser Machine Translation works
๐ค Facial Expressions Animation
Conference website: https://yatalks.yandex.ru/?from=tg_opendatascience
#conference #mt #nlu
Yandex will host its traditional conference on 3-4 December (starting tomorrow). Registration is open.
One of the tracks is devoted to Machine/Deep Learning with the focus on content generation.
Featured reports:
๐How too train text model on the minimal corpus
๐๏ธHow Yandex.Browser Machine Translation works
๐ค Facial Expressions Animation
Conference website: https://yatalks.yandex.ru/?from=tg_opendatascience
#conference #mt #nlu
yatalks.yandex.ru
YaTalks 2023 โ Yandex's premier conference for the IT community
On December 5-6, Moscow and Belgrade will host over 100 IT industry experts and scientists delivering technical presentations on development, ML, and giving popular science lectures.
๐5
Forwarded from Big Data Science
Visual Genome: the most labeled dataset
๐Scientists at Stanford University have collected the most annotated dataset with over 100,000 images. In total, the dataset contains almost 5.5 million object descriptions, attributes and relationships. You don't even have to download the dataset, but get the data you need by accessing the RESTful API endpoint using the GET-method. Despite the fact that the latest updates to the dataset are dated 2017, this is an excellent data set for training models in typical ML problems, from recognizing generated data using graph algorithms.
https://visualgenome.org/api/v0/api_home.html
๐Scientists at Stanford University have collected the most annotated dataset with over 100,000 images. In total, the dataset contains almost 5.5 million object descriptions, attributes and relationships. You don't even have to download the dataset, but get the data you need by accessing the RESTful API endpoint using the GET-method. Despite the fact that the latest updates to the dataset are dated 2017, this is an excellent data set for training models in typical ML problems, from recognizing generated data using graph algorithms.
https://visualgenome.org/api/v0/api_home.html
โโEditGAN: High-Precision Semantic Image Editing
Nvidia researches built an approach for editing segments of a picture with supposedly realtime picture augmentation according to the segment alterations. No demo is available yet though.
All the photoshop power users should relax, because appereance of such a tools means less work for them, not that the demand for the manual retouch will cease.
Website: https://nv-tlabs.github.io/editGAN/
ArXiV: https://arxiv.org/abs/2111.03186
#GAN #Nvidia
Nvidia researches built an approach for editing segments of a picture with supposedly realtime picture augmentation according to the segment alterations. No demo is available yet though.
All the photoshop power users should relax, because appereance of such a tools means less work for them, not that the demand for the manual retouch will cease.
Website: https://nv-tlabs.github.io/editGAN/
ArXiV: https://arxiv.org/abs/2111.03186
#GAN #Nvidia
๐3๐1๐ฅ1
Upgini โ dataset search automation library
Upgini is a new python library for an automated useful dataset search to boost supervised ML tasks
It enriches your dataset with intelligently crafted features from a broad range of curated data sources, including open and commercial datasets. The search is conducted for any combination of public IDs contained in your tabular dataset: IP, date, etc. Only features that could improve the prediction power of your ML model are returned.
Developers said that they wanted radically simplify data search and delivery for ML pipelines to make external data & features a standard approach. Like a hyperparameter tuning for machine learning nowadays.
A Free 30-days trial is available.
GitHub: https://github.com/upgini/upgini
Web: https://upgini.com
Upgini is a new python library for an automated useful dataset search to boost supervised ML tasks
It enriches your dataset with intelligently crafted features from a broad range of curated data sources, including open and commercial datasets. The search is conducted for any combination of public IDs contained in your tabular dataset: IP, date, etc. Only features that could improve the prediction power of your ML model are returned.
Developers said that they wanted radically simplify data search and delivery for ML pipelines to make external data & features a standard approach. Like a hyperparameter tuning for machine learning nowadays.
A Free 30-days trial is available.
GitHub: https://github.com/upgini/upgini
Web: https://upgini.com
GitHub
GitHub - upgini/upgini: Data search & enrichment library for Machine Learning โ Easily find and add relevant features to your MLโฆ
Data search & enrichment library for Machine Learning โ Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, inc...
๐5โค1
Forwarded from Silero News (Alexander)
New V3 Silero VAD is Already Here
Main changes
- One VAD to rule them all!
- New model includes the functionality of all of the previous ones with improved quality and speed!
- As far as we know, our VAD is the best in the world now;
- Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
- Flexible chunk size, minimum chunk size is just 30 milliseconds!
- Only 100k parameters;
- GPU inference and batching are supported (the model is small, so we decided not to publish a quantized model);
- Radically, drastically simplified examples;
We also drastically polished and simplified README, wiki and repo in general.
Links:
- Silero VAD repo - https://github.com/snakers4/silero-vad
- The migration to V3 is quite simple, here are some examples
- Quality metrics
- Performance metrics
- Examples and dependencies
- Colab with examples
If you like Silero VAD, please give us a โญ๏ธ and spread the news!
Main changes
- One VAD to rule them all!
- New model includes the functionality of all of the previous ones with improved quality and speed!
- As far as we know, our VAD is the best in the world now;
- Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
- Flexible chunk size, minimum chunk size is just 30 milliseconds!
- Only 100k parameters;
- GPU inference and batching are supported (the model is small, so we decided not to publish a quantized model);
- Radically, drastically simplified examples;
We also drastically polished and simplified README, wiki and repo in general.
Links:
- Silero VAD repo - https://github.com/snakers4/silero-vad
- The migration to V3 is quite simple, here are some examples
- Quality metrics
- Performance metrics
- Examples and dependencies
- Colab with examples
If you like Silero VAD, please give us a โญ๏ธ and spread the news!
GitHub
GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Silero VAD: pre-trained enterprise-grade Voice Activity Detector - snakers4/silero-vad
๐1
โโNL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
This paper presents a new participatory Python-based natural language augmentation framework that supports the creation of transformations (modifications to the data) and filters (data splits according to specific features).
The current version of the framework contains 117 transformations and 23 filters for a variety of natural language tasks.
The authors demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models.
Paper: https://arxiv.org/abs/2112.02721
Code: https://github.com/GEM-benchmark/NL-Augmenter
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nlaugmenter
#deeplearning #nlp #augmentation #robustness
This paper presents a new participatory Python-based natural language augmentation framework that supports the creation of transformations (modifications to the data) and filters (data splits according to specific features).
The current version of the framework contains 117 transformations and 23 filters for a variety of natural language tasks.
The authors demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models.
Paper: https://arxiv.org/abs/2112.02721
Code: https://github.com/GEM-benchmark/NL-Augmenter
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nlaugmenter
#deeplearning #nlp #augmentation #robustness
๐2
Forwarded from ะัะปััะน AI
We continue to conquer the time series together with ETNA! Using our library, we built a model to predict the number of new ๐ฆ COVID-19 cases in different countries. You can see the results we got in our recent article on ๐ Medium: Forecasting with ETNA - Fast and Furious. The article also shows in detail what a typical forecasting pipeline looks like and how you can quickly get a good baseline for a specific dataset. For all questions and suggestions - welcome to ETNA Community in Telegram. For all news related to AI/ML at Tinkoff โ stay tuned to this channel.
๐8๐ฅ3โค2
โโPerceiver IO: a scalable, fully-attentional model that works on any modality
#HuggingFace added neural network which is capable of working on all kinds of modailities: text, images, audio, video, coordinates, etc to the transformers library.
Blog: https://huggingface.co/blog/perceiver
#HuggingFace added neural network which is capable of working on all kinds of modailities: text, images, audio, video, coordinates, etc to the transformers library.
Blog: https://huggingface.co/blog/perceiver
๐4โค1๐ฅ1๐ค1
All the reactions had been enabled for @opendatascience
๐ฅ110๐29๐29๐ฉ17โค9๐คฉ7๐ข6๐4๐ฑ4๐คฎ4๐3
We got lot's of fine messages and feedback, let's discuss most notable papers and news of 2021 to assemble Community 2021 WrapUp in our chat:
https://t.iss.one/datascience_chat
https://t.iss.one/datascience_chat
Telegram
Data Chat
By ODS.ai
2021 WrapUps and Summaries
Those are two technical posts summarizing the progress which were published during 2021.
Papers with Code 2021 : A Year in Review post by Papers with Code
Medium: https://medium.com/paperswithcode/papers-with-code-2021-a-year-in-review-de75d5a77b8b
Post on KDNuggers
Post: https://www.kdnuggets.com/2021/12/2021-year-review-amazing-ai-papers.html
#summary
Those are two technical posts summarizing the progress which were published during 2021.
Papers with Code 2021 : A Year in Review post by Papers with Code
Medium: https://medium.com/paperswithcode/papers-with-code-2021-a-year-in-review-de75d5a77b8b
Post on KDNuggers
Post: https://www.kdnuggets.com/2021/12/2021-year-review-amazing-ai-papers.html
#summary
Medium
Papers with Code 2021 : A Year in Review
Papers with Code indexes various machine learning artifactsโโโpapers, code, resultsโโโto facilitate discovery and comparison. Using thisโฆ
๐11๐ฅ2๐คฉ2
The Illustrated Retrieval Transformer
by @jayalammar
The latest batch of language models can be much smaller yet achieve GPT-3 like performance by being able to query a database or search the web for information. A key indication is that building larger and larger models is not the only way to improve performance.
https://jalammar.github.io/illustrated-retrieval-transformer/
#nlp #gpt3 #retro #deepmind
by @jayalammar
The latest batch of language models can be much smaller yet achieve GPT-3 like performance by being able to query a database or search the web for information. A key indication is that building larger and larger models is not the only way to improve performance.
https://jalammar.github.io/illustrated-retrieval-transformer/
#nlp #gpt3 #retro #deepmind
๐ฅ16๐14โค2๐คฉ1
๐ฆ Hi!
We are the first Telegram Data Science channel.
Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.
Ultimate Posts
* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda
Open Data Science
ODS.ai is an international community of people anyhow related to Data Science.
Website: https://ods.ai
Hashtags
Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.
#deeplearning #DL โ post about deep neural networks (> 1 layer)
#cv โ posts related to Computer Vision. Pictures and videos
#nlp #nlu โ Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition โ related to audio information processing
#ar โ augmeneted reality related content
#rl โ Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart โ about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 โ related to special architectures or frameworks
#coding #CS โ content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface โ hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization
Chats
- Data Science Chat https://t.iss.one/datascience_chat
- ODS Slack through invite form at website
ODS resources
* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai
Feedback and Contacts
You are welcome to reach administration through telegram bot: @opendatasciencebot
We are the first Telegram Data Science channel.
Channel was started as a collection of notable papers, news and releases shared for the members of Open Data Science (ODS) community. Through the years of just keeping the thing going we grew to an independent online Media supporting principles of Free and Open access to the information related to Data Science.
Ultimate Posts
* Where to start learning more about Data Science. https://github.com/open-data-science/ultimate_posts/tree/master/where_to_start
* @opendatascience channel audience research. https://github.com/open-data-science/ods_channel_stats_eda
Open Data Science
ODS.ai is an international community of people anyhow related to Data Science.
Website: https://ods.ai
Hashtags
Through the years we accumulated a big collection of materials, most of them accompanied by hashtags.
#deeplearning #DL โ post about deep neural networks (> 1 layer)
#cv โ posts related to Computer Vision. Pictures and videos
#nlp #nlu โ Natural Language Processing and Natural Language Understanding. Texts and sequences
#audiolearning #speechrecognition โ related to audio information processing
#ar โ augmeneted reality related content
#rl โ Reinforcement Learning (agents, bots and neural networks capable of playing games)
#gan #generation #generatinveart #neuralart โ about neural artt and image generation
#transformer #vqgan #vae #bert #clip #StyleGAN2 #Unet #resnet #keras #Pytorch #GPT3 #GPT2 โ related to special architectures or frameworks
#coding #CS โ content related to software engineering sphere
#OpenAI #microsoft #Github #DeepMind #Yandex #Google #Facebook #huggingface โ hashtags related to certain companies
#productionml #sota #recommendation #embeddings #selfdriving #dataset #opensource #analytics #statistics #attention #machine #translation #visualization
Chats
- Data Science Chat https://t.iss.one/datascience_chat
- ODS Slack through invite form at website
ODS resources
* Main website: https://ods.ai
* ODS Community Telegram Channel (in Russian): @ods_ru
* ML trainings Telegram Channel: @mltrainings
* ODS Community Twitter: https://twitter.com/ods_ai
Feedback and Contacts
You are welcome to reach administration through telegram bot: @opendatasciencebot
GitHub
ultimate_posts/where_to_start at master ยท open-data-science/ultimate_posts
Ultimate posts for opendatascience telegram channel - open-data-science/ultimate_posts
๐56๐ฅ15โค8๐ฅฐ2๐2๐2โก1๐1๐1