Data Science by ODS.ai 🦜
45.7K subscribers
707 photos
79 videos
7 files
1.78K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
Download Telegram
Forwarded from Professor M
β€œSimplicity is the ultimate sophistication”

In early math classes, we learn that division by zero is a banned operation. If you could violate the rule and stealthily divide by zero in your derivations, you could, for example, prove absurdities such as 2=1. Later on, calculus introduces limits. Division by a number indistinguishable from zero (in the limit) is no longer banned; it produces infinity (in the limit).

When I code in Python, however, I divide by zero all the time. Why do I do this?

Occasionally, you want the script to stop execution when certain conditions occur. Perhaps, you’re still developing the program and, before moving forward, want to report the values of some variables when reaching a specific juncture in the code and then halt. How would you terminate the Python script?

The textbook way of doing so is with the command sys.exit(). Plus, you have to import the sys library with import sys. Twenty characters in total; pretty simple. But what if you could achieve that same objective of halting the script with three characters. How?

Divide by zero. Type 1/0 and the script would stop at this line because of the division-by-zero error. If you can achieve an objective with three characters, why would you use twenty?

When I use STATA for statistical analysis, I also sometimes want the script to terminate at specific points. In STATA, however, I don’t even know the textbook way of doing so. So how do I do it? I use a one-word command: stop. It turns out there is no pre-programmed command stop in STATA. When the software reaches this line, it doesn’t know what to do. It reports an error and thenβ€”stops.

I had frowned upon such seemingly non-elegant solutions before acquiring a taste for them. Wearing a tuxedo for a black-tie event is elegant; wearing one for breakfast reveals confusing priorities.
πŸ‘1
News on new Macbook Pro 13:

* M1 Apple chip with built in stuff for ML β€” but anyway you won't build models on the laptop
* Max 16 Gb RAM β€” so you won't be able to open more tabs in Chrome / Safari
* 100% recycled alluminium β€” good for nature
* Improved microphones and camera β€” collegues will see better picture of you and listen to your cats meowing clearer

And still no reasons to update if you are doing any DS.

#Apple
Benford’s Law, DS and the 2020 Election

This law can be used for the very basic check on wether the data was artificially generated or not. It assumes that lower digits have higher probability of occuring.

And there can be nothing better for #reproducibleresearch concept promotion, than #openresearch on poll data, because it shows that those can and should be transparent and open.

With the help of the repo below anyone can check compliance of poll data results with the #BenfordsLaw on unofficial (or official if you are able to get that data).

KDnuggets tutorial: https://www.kdnuggets.com/2020/09/diy-election-fraud-analysis-benfords-law.html
Github repo with examples on unofficial US election data: https://github.com/cjph8914/2020_benfords

#statistics
​​Three-dimensional residual channel attention networks denoise and sharpen fluorescence microscopy image volumes

#3DRCAN for denoising, super resolution and expansion microscopy.

GitHub: https://github.com/AiviaCommunity/3D-RCAN
ArXiV: https://www.biorxiv.org/content/10.1101/2020.08.27.270439v1

#biolearning #cv #dl
​​Tutorial on Generative Adversarial Networks (GANs) with Keras and TensorFlow

Nice tutorial with enough theory to understand what you are doing and code to get it done.

Link: https://www.pyimagesearch.com/2020/11/16/gans-with-keras-and-tensorflow/

#Keras #TensorFlow #tutorial #wheretostart #GAN
​​DeepMind significally (+100%) improved protein folding modelling

Why is this important: protein folding = protein structure = protein function = how protein works in the living speciment and what it does.
What this means: better vaccines, better meds, more curable diseases and more calamities easen by the medications or better understanding.

Dataset: ~170000 available protein structures from PDB
Hardware: 128 TPUv3 cores (roughly  equivalent to ~100-200 GPUs)

Link: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

#DL #NLU #proteinmodelling #bio #biolearning #insilico #deepmind #AlphaFold
​​Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

This technology allows to move camera a bit on any video, slow down time or do both. Great application for video producers and motion designers.

Website: https://www.cs.cornell.edu/~zl548/NSFF/
ArXiV: https://arxiv.org/abs/2011.13084
YouTube: https://youtu.be/qsMIH7gYRCc

#Nerf #videointerpolation #DL
πŸ‘©β€πŸŽ“Online lectures on Special Topics in AI: Deep Learning

Fresh free and open playlist on special topics in #DL from University of Wisconsin-Madison. Topics covering reliable deep learning, generalization, learning with less supervision, lifelong learning, deep generative models and more.

Overview Lecture: https://www.youtube.com/watch?v=6LSErxKe634&list=PLKvO2FVLnI9SYLe1umkXsOfIWmEez04Ii
YouTube Playlist: https://www.youtube.com/playlist?list=PLKvO2FVLnI9SYLe1umkXsOfIWmEez04Ii
Syllabus: https://pages.cs.wisc.edu/~sharonli/courses/cs839_fall2020/schedule.html

#wheretostart #lectures #YouTube
πŸ‘1
Opinion: Remote jobs are to stay even after pandemic

As Packy McCormick (author of Notboring blog) writes in his recent post: we are never going back. Pandemic has catalized the global switch to remote jobs and acceptance of it (including consideration of stock options for remote workers).

Given that we as @opendatascience community are able to admit that those are only opinions and the future is more complex and unpredictable, we are posting a list of remote job aggregators so every reader can explore those opportunities if needed.

Blog entry: https://notboring.substack.com/p/were-never-going-back
Audio version: https://open.spotify.com/show/6k1YLBvORRMyosKy3x1xIl?si=_Z7mdecqTSSYrwhHexwGEA

Telegram bots:

@sixnomads_bot β€” a bot that connects you with relevant remote and full-time jobs that fit your tech stack, desired time zone and salary
@remotejobss β€” a channel that posts new remote opportunities daily
@datasciencejobseeker β€” a chat for jobs in data science
@remotejobpositions β€” a channel with interesting remote jobs for developers.

Here are some websites that might be a good alternative for you:

remoteok.io β€” a colorful job board with remote jobs in tech companies (from the creators of Nomadlist)
weworkremotely.com β€” another job board with new remote opportunities updated every day.
remote.co β€” a hub to learn new tips on working remotely and find your new remote job.

#hr #career #job #remote
​​Tool for restoration of pixelated images

Tool uses De Bruijn sequence to restore the original information

Github: https://github.com/beurtschipper/Depix

#pixelization #github
​​MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs

Work on conditional image generation

ArXiV: https://arxiv.org/abs/2012.02821

#GAN #DL #food2vec
Yandex Team Talk at NeurIPS. Talk will be most interesting for those who are working on critical aspects of successful data collection and labeling.

Moderation team will focus on:
- Remoteness. A discussion about effectiveness and efficiency of remote work on crowdsourcing platforms.
- Fairness. How the working environment (e.g., a crowdsourcing platform) may help provide executors flexibility in choosing/switching tasks and working hours.
- Mechanisms. Discussion on bilateral mechanisms that not only provide flexibility to the performers, but also guarantee the quality of the result and the efficiency of the process to the customers.

Toloka's workshop info: https://clck.ru/SNwi3

#NeurIPS2020 #labeling #Yandex
​​Supporting content decision makers with machine learning

#Netflix shared a post providing information about how they research and prepare data for new title production.

Link: https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f

#NLU #NLP #recommendation #embeddings