Data Science by ODS.ai 🦜

🤗 multilingual datasets

- 611 datasets you can download in one line of python;
- 467 languages covered, 99 with at least 10 datasets;
- efficient pre-processing to free you from memory constraints;

https://github.com/huggingface/datasets

GitHub

GitHub - huggingface/datasets: 🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data…

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools - huggingface/datasets

19.5K views09:55

Data Science by ODS.ai 🦜

Open Software Packaging for Science

#opensource alternative to #conda.

Mamba (drop-in replacement) direct link: https://github.com/TheSnakePit/mamba
Link: https://medium.com/@QuantStack/open-software-packaging-for-science-61cecee7fc23

#python #packagemanagement

GitHub

GitHub - mamba-org/mamba: The Fast Cross-Platform Package Manager

The Fast Cross-Platform Package Manager. Contribute to mamba-org/mamba development by creating an account on GitHub.

👍1

17.9K views12:21

Data Science by ODS.ai 🦜

Characterising Bias in Compressed Models

Popular compression techniques turned out to amplify bias in deep neural networks.

ArXiV: https://arxiv.org/abs/2010.03058

#NN #DL #bias

17.4K views08:28

Data Science by ODS.ai 🦜

Interactive and explorable explanations

Collection of links to different explanations of how things work.

Link: https://explorabl.es
How network effect (ideas, diseases) works: https://meltingasphalt.com/interactive/going-critical/
How trust works: https://ncase.me/trust/

#howstuffworks #explanations

❤1

18.6K views10:06

Data Science by ODS.ai 🦜

Forwarded from Towards NLP🇺🇦

Choosing Transfer Languages for Cross-Lingual Learning

Given a particular task low-resource language and NLP task, how can we determine which languages we should be performing transfer from?
If we train models on the top K transfer languages suggested by the ranking model and pick the best one, how good is the best model expected to be?

In the era of transfer learning now we have a possibility not to collect the massive data for each language, but using already pretrained model achieve good scores training on smaller data. But how should we choose the language from which we can transfer knowledge? Will it be okay to transfer from English to Chinese or from Russian to Turkish?

The paper investigate on this question. The features the authors created to detect the best transfer language are the follows:

* Dataset Size: as simple as it is — do we have enough data in transfer language with respect to ratio to train language?
* Type-Token Ratio: diversity of both languages;
* Word Overlap and Subword Overlap: kind of similarity of languages; it is very good if both languages have as much the same words as possible;
* Geographic distance: are the languages from the territories that are close on the Earth surface?
* Genetic distance: are they close to each other in terms of language genealogical tree?
* Inventory distance: are they sound familiar?

The idea is pretty simple and clear but very important for studies of multilingual models.

The post is based on reading task from Multilingual NLP course by CMU (from the post).

Towards NLP

Multilingual NLP

Сейчас время начинать (или вспомнить, что забросили) новые учебные курсы. И, если честно, сейчас онлайн курсов невероятное количество и по классическому ML, и по deeplearning, и по NLP. Так, fast.ai перезапустили свой курс по глубокому обучению…

19.9K views17:47

Data Science by ODS.ai 🦜

oops :kekeke:

paper: https://arxiv.org/abs/2012.15332

18.4K viewsedited 09:42

Data Science by ODS.ai 🦜

Forwarded from Towards NLP🇺🇦

NLP Highlights of 2020

by Sebastian Ruder:

1. Scaling up—and down
2. Retrieval augmentation
3. Few-shot learning
4. Contrastive learning
5. Evaluation beyond accuracy
6. Practical concerns of large
7. LMs
8. Multilinguality
9. Image Transformers
10. ML for science
11. Reinforcement learning

https://ruder.io/research-highlights-2020/

ruder.io

ML and NLP Research Highlights of 2020

This post summarizes progress in 10 exciting and impactful directions in ML and NLP in 2020.

19.4K views12:06

Data Science by ODS.ai 🦜

S+SSPR Workshop: An online workshop on Statistical techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition.

The event is free to attend, it is happening today and tomorrow (online) with a fantastic list of keynotes: Nicholas Carlini, Michael Bronstein, Max Welling, Fabio Roli — professors and researcher in the field of geometric deep learning, pattern recognition and adversarial learning.

Live YouTube Streaming: https://www.youtube.com/channel/UCjA0Mhynad2FDlNaxzqGLhQ

Official Program here: https://www.dais.unive.it/sspr2020/program/

Don't miss it!

👍3

20.3K views11:55

Data Science by ODS.ai 🦜

Dear AC, who just submitted link through @opendatasciencebot, can you please do it once again and include your telegram handle?

The link you provided is incorrect and we can’t reach you

17.4K views10:58

Data Science by ODS.ai 🦜

more anime by #stylegan

page: https://thisanimedoesnotexist.ai
explanation: https://www.gwern.net/Faces#extended-stylegan2-danbooru2019-aydao

thisanimedoesnotexist.ai

This Anime Does Not Exist

TADNE: A website showcasing AI-generated images drawn in an anime style. "Notice me, Onee-chan!"

16.7K views16:52

Data Science by ODS.ai 🦜

Forwarded from Graph Machine Learning

Course: ODS Knowledge Graphs

Michael Galkin starts a self-paced course on knowledge graphs. For now, it's only in Russian, with the plan to make it in English after the first iteration. The first introduction lecture is available on YouTube. You can join discussion group for all your questions and proposals: @kg_course. The first lecture starts this Thursday, more in the channel @kg_course.

Course curriculum:
* Knowledge representations (RDF, RDFS, OWL)
* Storage and queries (SPARQL, Graph DBs)
* Consistency (RDF*, SHACL, ShEx)
* Semantic Data Integration
* Graph theory intro
* KG embeddings
* GNNs for KGs
* Applications: Question Answering, Query Embeddings

YouTube

Михаил Галкин - Анонс курса Knowledge Graphs

Михаил и его коллеги подготовили курс по графам знаний ( Knowledge Graphs ) https://ods.ai/tracks/kgcourse2021
Будет дан краткий анонс курса и ответы на вопросы участников семинара.

Graph Representation Learning (GRL) - одна из самых быстро растущих тем…

20.3K views13:38

Data Science by ODS.ai 🦜

The new year is a good reason to rearrange things

From now on we will post all reports, ML trainings, and other videos in English on the YouTube channel ODS AI Global 🌐. All English videos from Data Fest Online 2020 are already there – check them out and don't forget to subscribe! 👀

P.S.
All content in Russian will be posted on ODS AI RU 🇷🇺 as always.

18.4K views14:47

👀 18 🌐 12 🔥 30

Data Science by ODS.ai 🦜

JigsawGAN: Self-supervised Learning for Solving Jigsaw Puzzles with Generative Adversarial Networks

The authors suggest a GAN-based approach for solving jigsaw puzzles. JigsawGAN is a self-supervised method with a multi-task pipeline: classification branch classifies jigsaw permutations, GAN branch recovers features to images with the correct order.
The proposed method can solve jigsaw puzzles efficiently by utilizing both semantic information and edge information simultaneously.

Paper: https://arxiv.org/abs/2101.07555

#deeplearning #jigsaw #selfsupervised #gan

19.9K views12:35

🔝 37 👀 26

Открыть комментарии

Data Science by ODS.ai 🦜

Forwarded from Towards NLP🇺🇦

Open Datasets for Research

During last week there were several news about newly open datasets for researchers.

1. Twitter opened “full history of public conversation” for academics (specifically, for academics):
https://www.theverge.com/2021/1/26/22250203/twitter-academic-research-public-tweet-archive-free-access
We can happily conduct researches about social networks graphs, users behavior and fake news (especially fake news🙃) without fighting with Twitter API.

2. Papers with code are now also Papers with Datasets:
https://www.paperswithcode.com/datasets
Not for only NLP, but for all fields structured for easy search and download.

The Verge

Twitter is opening up its full tweet archive to academic researchers for free

A full searchable archive of public tweets will now be available for free.

17.5K views22:19

Data Science by ODS.ai 🦜

ObjectAug: Object-level Data Augmentation for Semantic Image Segmentation

The authors suggest ObjectAug perform object-level augmentation for semantic image segmentation.
This approach has the following steps:
- decouple the image into individual objects and the background using the semantic labels;
- augment each object separately;
- restore the black area brought by object augmentation using image inpainting;
- assemble the augmented objects and background;

Thanks to the fact that objects are separate, we can apply different augmentations to different categories and combine them with image-level augmentation methods.

Paper: https://arxiv.org/abs/2102.00221

#deeplearning #augmentation #imageinpainting #imagesegmentation

17.8K views15:49

👀 16 👍 23

Открыть комментарии

Data Science by ODS.ai 🦜

Call for speakers for Machine Learning REPA Week 2021

ML REPA and LeanDS communities organize an international online conference Machine Learning REPA Week 2021

We are inviting speakers to give talks or workshops on Machine Learning Engineering, Automation, MLOps and Management topics.

CALL FOR SPEAKERS

Conference language: ENGLISH
Dates: 5 - 11 April 2021 (7 pm - 9 pm Moscow time, GMT+3)
Format: Online, zoom
Content: Talks up to 30 min, workshops / demos up to 60 min
Topics: Management, Version Control, Pipelines Automation, MLOps, Testing, Monitoring
Deadline: 15 March 2021

Url to apply: https://mlrepa.com/mlrepa-week-2021

#conference #callforspeakers

21.5K views16:10

Data Science by ODS.ai 🦜

Ultimate post on where to start learning DS Most common request we received through the years was to share insights and advices on how to start career in data science and to recommend decent cources. Apparently, using hashtag #wheretostart wasn't enough…

Hands on ML notebook series

Updated our ultimate post with a series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Link: https://github.com/ageron/handson-ml

#wheretostart #opensource #jupyter

GitHub

GitHub - ageron/handson-ml: ⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead. - ageron/handson-ml

18.7K viewsedited 18:45

Data Science by ODS.ai 🦜

2:39

Media is too big

VIEW IN TELEGRAM

neuroplanets
took from the channel https://t.iss.one/NeuralShit

full video: https://www.youtube.com/watch?v=tPyPwW7W1GM

👍1

17.1K views21:38

Data Science by ODS.ai 🦜

Pie chart naming in different languages.

Credit: https://twitter.com/ElephantEating/status/1360988590814023683

18.7K views06:52

Data Science by ODS.ai 🦜

Forwarded from Towards NLP🇺🇦

2020 in ML and NLP publications

By conferences, countries, companies, universities and most productive scientists:
https://www.marekrei.com/blog/ml-and-nlp-publications-in-2020/

Marek Rei

ML and NLP Publications in 2020 - Marek Rei

I ran my paper analysis pipeline once again in order to get statistics for 2020. It certainly was an unusual year. While ML and NLP…

🔥1

17.9K views08:36

Data Science by ODS.ai 🦜

Introducing Model Search: An Open Source Platform for Finding Optimal ML Models

#Google has released an open source #AutoML framework capable of hyperparameter tuning and ensembling.

Blog post: https://ai.googleblog.com/2021/02/introducing-model-search-open-source.html
Repo: https://github.com/google/model_search

👍1

16.1K views12:06

About

Blog

Apps

Platform