Data Science by ODS.ai 🦜
46.2K subscribers
647 photos
74 videos
7 files
1.74K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
​​iRobot with poop detection

iRobot (company building cleaning house robots) had a problem with robots regarding pet poops. So they built a special model along with physical models of poop to test the product.

iRobot official YouTube: https://www.youtube.com/watch?v=2rj3VUmRNnU
TechCrunch: https://techcrunch.com/2021/09/09/actuator-4/

#aiproduct #marketinggurus
New attempt at proving P≠NP

Martin Dowd published a 5-page paper claiming to contain a proof that P β‰  NP. This is a fundamental question, comparing quickly checkable against quickly solvalble problems.

Basically, proving P != NP would mean that there will be unlimited demand alphago-like solutions in different spheres, because that will mean (as a scientific fact) that there are problems not having fast [enough] analytical solutions.

ResearchGate: https://www.researchgate.net/publication/354423778_P_Does_Not_Equal_NP
Wiki on the problem: https://en.wikipedia.org/wiki/P_versus_NP_problem

#fundamental #pnenp #computerscience
​​Counting Happiness and Where it Comes From

Researches asked 10 000 Mechanical Turk participants to name 10 things which are making them happy, resulting in creation of HappyDB.

And since that DB is open, Nathan Yau analyzed and vizualized this database in the perspective of subjects and actions, producing intersting visualization.

Hope that daily reading @opendatascience makes you at least content, if not happy.

Happines reason visualization link: https://flowingdata.com/2021/07/29/counting-happiness
HappyDB link: https://megagon.ai/projects/happydb-a-happiness-database-of-100000-happy-moments/

#dataset #emotions #visualization
πŸ‘2
​​SwinIR: Image Restoration Using Swin Transformer

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy, and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers, which show impressive performance on high-level vision tasks.

The authors use a model SwinIR based on the Swin Transformers. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks (image super-resolution, image denoising, and JPEG compression artifact reduction) by up to 0.14~0.45dB, while the total number of parameters can be reduced by up to 67%.

Paper: https://arxiv.org/abs/2108.10257
Code: https://github.com/JingyunLiang/SwinIR

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-swinir

#deeplearning #cv #transformer #superresolution #imagerestoration
πŸ‘2πŸ”₯1
​​Summarizing Books with Human Feedback

#OpenAI fine-tuned #GPT3 to summarize books well enough to be human-readable. Main approach: recursively split text into parts and then meta-summarize summaries.

This is really important because once there will be a great summarization #SOTA we won't need editors to write posts for you. And researchers ultimatively will have some asisstance interpreting models' results.

BlogPost: https://openai.com/blog/summarizing-books/
ArXiV: https://arxiv.org/abs/2109.10862

#summarization #NLU #NLP
πŸ‘2
​​AI Generated Pokemon Sprites with GPT-2

Author trained #GPT2 model to generate #pokemon sprites, encoding them as the lines of characters (including color). Surprisingly, results were decent, so this leaves us wonder if #GPT3 results would be better.

YouTube: https://www.youtube.com/watch?v=Z9K3cwSL6uM
GitHub: https://github.com/MatthewRayfield/pokemon-gpt-2
Article: https://matthewrayfield.com/articles/ai-generated-pokemon-sprites-with-gpt-2/
Example: https://matthewrayfield.com/projects/ai-pokemon/

#NLU #NLP #generation #neuralart
​​This Olesya doesn't exist

Author trained StyleGAN2-ADA network on 2445 personal photos to generate new photo on the site each time there is a refresh or click.


Website: https://thisolesyadoesnotexist.glitch.me
Olesya's personal site: https://monolesan.com

#StyleGAN2 #StyleGAN2ADA #generation #thisXdoesntexist
​​Real numbers, data science and chaos: How to fit any dataset with a single parameter

Gentle reminder that measure of information is bit and that single parameter can contain more information than multiple parameters.

ArXiV: https://arxiv.org/abs/1904.12320

#cs #bits #math
Experimenting with CLIP+VQGAN to Create AI Generated Art

Tips and tricks on prompts to #vqclip. TLDR:

* Adding rendered in unreal engine, trending on artstation, top of /r/art improves image quality significally.
* Using the pipe to split a prompt into separate prompts that are steered towards independently may be counterproductive.

Article: https://blog.roboflow.com/ai-generated-art/
Colab Notebook: https://colab.research.google.com/drive/1go6YwMFe5MX6XM9tv-cnQiSTU50N9EeT

#visualization #gan #generation #generatinveart #vqgan #clip
RoBERTa English Toxicity Classifier

We have released our fine-tuned RoBERTa based toxicity classifier for English language onπŸ€—:
https://huggingface.co/SkolkovoInstitute/roberta_toxicity_classifier

The model was trained on the merge of the English parts of the three datasets by Jigsaw. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the AUC-ROC of 0.98 and F1-score of 0.76.
So, you can use it now conveniently for any of your research or industrial tasks☺️
πŸ‘1
It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

Researchers from #Yandex have discovered that the reasoning capabilities of cross-lingual Transformers are concentrated in a small set of attention heads. A new multilingual dataset could encourage research on commonsense reasoning in Russian, French, Chinese and other languages.

Link: https://research.yandex.com/news/a-few-attention-heads-for-reasoning-in-multiple-languages

ArXiV: https://arxiv.org/abs/2106.12066

#transformer #nlu #nlp
Forwarded from Silero News (Alexander)
We Have Published a Model For Text Repunctuation and Recapitalization

The model works with SINGLE sentences (albeit long ones) and:

- Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
- Works for 4 languages (Russian, English, German, Spanish) and can be extended;
- By design is domain agnostic and is not based on any hard-coded rules;
- Has non-trivial metrics and succeeds in the task of improving text readability;

Links:

- Model repo - https://github.com/snakers4/silero-models#text-enhancement
- Colab notebook - https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_te.ipynb
- Russian article - https://habr.com/ru/post/581946/
- English article - https://habr.com/ru/post/581960/
πŸ‘3
AI for Earth Monitoring course

Course is about how to apply data science to datasets of Earth images collected by satellites. This course would benefit people interested in jumping into the real world application and working with real Earth observation image data.

Start date: 18 Oct. 2021
Duration: 6 weeks
Cost: Free
Link: https://bit.ly/3lerMti
πŸ‘2
Entropy and complexity unveil the landscape of memes evolution

Sunday research about how memes evolved from 2011 to present.
TLDR: memes are getting more complex and require more contextual knowledge to understand.

Link: https://www.nature.com/articles/s41598-021-99468-6
Data: https://github.com/cdcslab/MemesEvolution

#memes #openresearch
​​A Recipe For Arbitrary Text Style Transfer with Large Language Models

Text style transfer is rewriting text to incorporate additional or alternative stylistic elements while preserving the overall semantics and structure.

Large language models are trained only for continuation, but recently many approaches showed that it is possible to perform other NLP tasks by expressing them as prompts that encourage the model to output the desired answer as the continuation.

The authors present a new prompting method (augmented zero-shot learning), which frames style transfer as a sentence rewriting task and requires only natural language instruction.

There are many great examples in the paper and on the project page - both formal and informal.
For example, "include the word "oregano"" and "in the style of a pirate".

Paper: https://arxiv.org/abs/2109.03910
Code: https://storage.googleapis.com/style-transfer-paper-123/index.html

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-llmdialog

#deeplearning #nlp #styletransfer
πŸ‘2
​​πŸ”₯Alias-Free Generative Adversarial Networks (StyleGAN3) release

King is dead! Long live the King! #StyleGAN2 was #SOTA and default standard for generating images. #Nvidia released update version, which will lead to more realistic images generated by the community.

Article: https://nvlabs.github.io/stylegan3/
GitHub: https://github.com/NVlabs/stylegan3
Colab: https://colab.research.google.com/drive/1BXNHZBai-pXtP-ncliouXo_kUiG1Pq7M

#GAN #dl
Dear advertisers, who spammed @opendatasciencebot, you are kindly welcome to advertise on this channel for 1 ETH (~ $4,300).

This might seem unreasonable overpriced, but don’t fall for it β€” it is. We do not promote anything we won’t post here for free, because we are privileged and blessed to work on the sphere with :goodenough: compensation to put up a higher price tag on it.

😌
πŸ‘2❀1