#AI/ #DataScience/ #MachineLearning/ #ML:
7 Steps for Data Preparation Using #Python
Link => https://bit.ly/PyDataPrep
#datamining #statistics #bigdata #artificialintelligence
✴️ @AI_Python_EN
  7 Steps for Data Preparation Using #Python
Link => https://bit.ly/PyDataPrep
#datamining #statistics #bigdata #artificialintelligence
✴️ @AI_Python_EN
Data in the Life:  Authorship Attribution in Lennon-McCartney Songs", was just published in the first issue of the HARVARD DATA SCIENCE REVIEW, the inaugural publication of harvard datascience published by the mit press. Combining features of a premier research journal, a leading educational publication, and a popular magazine, HDSR leverages digital technologies and data visualizations to facilitate author-reader interactions globally. Besides our article, the first issue features articles on topics ranging from machine learning models for predicting drug approvals to artificial intelligence. Read it now: 
https://bit.ly/2Kuze2q.
#datascience #bigdata #machinelearing #statistics #AI
✴️ @AI_Python_EN
  https://bit.ly/2Kuze2q.
#datascience #bigdata #machinelearing #statistics #AI
✴️ @AI_Python_EN
Artificial Intelligence: the global landscape of ethics guidelines
Researchers: Anna Jobin, Marcello Ienca, Effy Vayena
Paper: https://ow.ly/mDA430p2R0q
#artificialintelligence #ai #ml #machinelearning #bigdata #deeplearning #technology #datascience
✴️ @AI_Python_EN
  Researchers: Anna Jobin, Marcello Ienca, Effy Vayena
Paper: https://ow.ly/mDA430p2R0q
#artificialintelligence #ai #ml #machinelearning #bigdata #deeplearning #technology #datascience
✴️ @AI_Python_EN
A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019). This is a lightweight graph convolutional neural network for the fast calculation of approximate graph similarity at scale. Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound.
https://lnkd.in/gA5tfuC
#datamining #machinelearning #deeplearning #datascience #bigdata
✴️ @AI_Python_EN
  https://lnkd.in/gA5tfuC
#datamining #machinelearning #deeplearning #datascience #bigdata
✴️ @AI_Python_EN
Understanding the Backpropagation Algorithm.
#BigData #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #Python #TensorFlow #CloudComputing #Algorithms
https://bit.ly/2ASKwqx
❇️ @AI_Python_EN
  #BigData #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #Python #TensorFlow #CloudComputing #Algorithms
https://bit.ly/2ASKwqx
❇️ @AI_Python_EN
What's the purpose of statistics?
"Do you think the purpose of existence is to pass out of existence is the purpose of existence?" - Ray Manzarek
The former Doors organist poses some fundamental questions to which definitive answers remain elusive. Happily, the purpose of statistics is easier to fathom since humans are its creator. Put simply, it is to enhance decision making.
These decisions could be those made by scientists, businesspeople, politicians and other government officials, by medical and legal professionals, or even by religious authorities. In informal ways, ordinary folks also use statistics to help make better decisions.
How does it do this?
One way is by providing basic information, such as how many, how much and how often. Stat in statistics is derived from the word state, as in nation state and, as it emerged as a formal discipline, describing nations quantitatively (e.g., population size, number of citizens working in manufacturing) became a fundamental purpose. Frequencies, means, medians and standard deviations are now familiar to anyone.
Often we must rely on samples to make inferences about our population of interest. From a consumer survey, for example, we might estimate mean annual household expenditures on snack foods. This is known as inferential statistics, and confidence intervals will be familiar to anyone who has taken an introductory course in statistics. So will methods such as t-tests and chi-squared tests which can be used to make population inferences about groups (e.g., are males more likely than females to eat pretzels?).
Another way statistics helps us make decisions is by exploring relationships among variables through the use of cross tabulations, correlations and data visualizations. Exploratory data analysis (EDA) can also take on more complex forms and draw upon methods such as principal components analysis, regression and cluster analysis. EDA is often used to develop hypotheses which will be assessed more rigorously in subsequent research.
These hypotheses are often causal in nature, for example, why some people avoid snacks. Randomized experiments are generally considered the best approach in causal analysis but are not always possible or appropriate; see Why experiment? for some more thoughts on this subject. Hypotheses can be further developed and refined, not simply tested through Null Hypothesis Significance Testing, though this has been traditionally frowned upon since we are using the same data for multiple purposes.
Many statisticians are actively involved in designing research, not merely using secondary data. This is a large subject but briefly summarized in Preaching About Primary Research.
Making classifications, predictions and forecasts is another traditional role of statistics. In a data science context, the first two are often called predictive analytics and employ methods such as random forests and standard (OLS) regression. Forecasting sales for the next year is a different matter and normally requires the use of time-series analysis. There is also unsupervised learning, which aims to find previously unknown patterns in unlabeled data. Using K-means clustering to partition consumer survey respondents into segments based on their attitudes is an example of this.
Quality control, operations research, what-if simulations and risk assessment are other areas where statistics play a key role. There are many others, as this page illustrates.
The fuzzy buzzy term analytics is frequently used interchangeably with statistics, an offense to which I also plead guilty.
"The best thing about being a statistician is that you get to play in everyone's backyard." - John Tukey
#ai #artificialintelligence #ml #statistics #bigdata #machinelearning
#datascience
❇️ @AI_Python_EN
  "Do you think the purpose of existence is to pass out of existence is the purpose of existence?" - Ray Manzarek
The former Doors organist poses some fundamental questions to which definitive answers remain elusive. Happily, the purpose of statistics is easier to fathom since humans are its creator. Put simply, it is to enhance decision making.
These decisions could be those made by scientists, businesspeople, politicians and other government officials, by medical and legal professionals, or even by religious authorities. In informal ways, ordinary folks also use statistics to help make better decisions.
How does it do this?
One way is by providing basic information, such as how many, how much and how often. Stat in statistics is derived from the word state, as in nation state and, as it emerged as a formal discipline, describing nations quantitatively (e.g., population size, number of citizens working in manufacturing) became a fundamental purpose. Frequencies, means, medians and standard deviations are now familiar to anyone.
Often we must rely on samples to make inferences about our population of interest. From a consumer survey, for example, we might estimate mean annual household expenditures on snack foods. This is known as inferential statistics, and confidence intervals will be familiar to anyone who has taken an introductory course in statistics. So will methods such as t-tests and chi-squared tests which can be used to make population inferences about groups (e.g., are males more likely than females to eat pretzels?).
Another way statistics helps us make decisions is by exploring relationships among variables through the use of cross tabulations, correlations and data visualizations. Exploratory data analysis (EDA) can also take on more complex forms and draw upon methods such as principal components analysis, regression and cluster analysis. EDA is often used to develop hypotheses which will be assessed more rigorously in subsequent research.
These hypotheses are often causal in nature, for example, why some people avoid snacks. Randomized experiments are generally considered the best approach in causal analysis but are not always possible or appropriate; see Why experiment? for some more thoughts on this subject. Hypotheses can be further developed and refined, not simply tested through Null Hypothesis Significance Testing, though this has been traditionally frowned upon since we are using the same data for multiple purposes.
Many statisticians are actively involved in designing research, not merely using secondary data. This is a large subject but briefly summarized in Preaching About Primary Research.
Making classifications, predictions and forecasts is another traditional role of statistics. In a data science context, the first two are often called predictive analytics and employ methods such as random forests and standard (OLS) regression. Forecasting sales for the next year is a different matter and normally requires the use of time-series analysis. There is also unsupervised learning, which aims to find previously unknown patterns in unlabeled data. Using K-means clustering to partition consumer survey respondents into segments based on their attitudes is an example of this.
Quality control, operations research, what-if simulations and risk assessment are other areas where statistics play a key role. There are many others, as this page illustrates.
The fuzzy buzzy term analytics is frequently used interchangeably with statistics, an offense to which I also plead guilty.
"The best thing about being a statistician is that you get to play in everyone's backyard." - John Tukey
#ai #artificialintelligence #ml #statistics #bigdata #machinelearning
#datascience
❇️ @AI_Python_EN
  AI, Python, Cognitive Neuroscience
What is a Time Series?  Many data sets are cross-sectional and represent a single slice of time.  However, we also have data collected over many periods - weekly sales data, for instance.  This is an example of time series data.  Time series analysis is a…
What is a Time Series?
Multiple Time Series
You might need to analyze multiple time series simultaneously, e.g., sales of your brands and key competitors. Figure 2 below is an example and shows weekly sales data for three brands over a one-year period. Since sales movements of brands competing with each other will typically be correlated over time, it often will make sense, and be more statistically rigorous, to include data for all key brands in one model instead of running separate models for each brand.
Vector Autoregression (VAR), the Vector Error Correction Model (VECM) and the more general State Space framework are three frequently-used approaches to multiple time series analysis. Causal data can be included and Market Response/Marketing Mix modeling conducted.
Other Methods
There are several additional methods relevant to marketing research and data science I'll now briefly describe.
Panel Models include cross sections in a time series analysis. Sales and marketing data for several brands, for instance, can be stacked on top of one another and analyzed simultaneously. Panel modeling permits category-level analysis and also comes in handy when data are infrequent (e.g., monthly or quarterly).
Longitudinal Analysis is a generic and sometimes confusingly-used term that can refer to Panel modeling with a small number of periods ("short panels"), as well as to Repeated Measures, Growth Curve Analysis or Multilevel Analysis. In a literal sense it subsumes time series analysis but many authorities reserve that term for analysis of data with many time periods (e.g., >25). Structural Equation Modeling (SEM) is one method widely-used in Growth Curve modeling and other longitudinal analyses.
Survival Analysis is a branch of #statistics for analyzing the expected length of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. It's also called Duration Analysis in Economics and Event History Analysis in Sociology. It is often used in customer churn analysis.
In some instances one model will not fit an entire series well because of structural changes within the series, and model parameters will vary across time. There are numerous breakpoint tests and models (e.g., State Space, Switching Regression) available for these circumstances.
You may also notice that sales, call center activity or other data series you are tracking exhibit clusters of volatility. That is, there may be periods in which the figures move up and down in much more extreme fashion than other periods.
In these cases, you should consider a class of models with the forbidding name of GARCH (Generalized Autoregressive Conditional Heteroskedasticity). ARCH and GARCH models were originally developed for financial markets but can used for other kinds of time series data when volatility is of interest. Volatility can fall into many patterns and, accordingly, there are many flavors of GARCH models. Causal variables can be included. There are also multivariate extensions (MGARCH) if you have two or more series you wish to analyze jointly.
Non-Parametric Econometrics is a very different approach to studying time series and longitudinal data that is now receiving a lot of attention because of #bigdata and the greater computing power we now enjoy. These methods are increasingly feasible and useful as alternatives to the more familiar methods such as those described in this article.
#MachineLearning (e.g., #ArtificialNeuralNetwork s) is also useful in some circumstances but the results can be hard to interpret - they predict well but may not help us understand the mechanism that generated to data (the Why). To some extent, this drawback also applies to non-parametric techniques.
Most of the methods I've mentioned are Time Domain techniques. Another group of methods known as Frequency Domain, plays a more limited role in Marketing Research.
❇️ @AI_Python_EN
  Multiple Time Series
You might need to analyze multiple time series simultaneously, e.g., sales of your brands and key competitors. Figure 2 below is an example and shows weekly sales data for three brands over a one-year period. Since sales movements of brands competing with each other will typically be correlated over time, it often will make sense, and be more statistically rigorous, to include data for all key brands in one model instead of running separate models for each brand.
Vector Autoregression (VAR), the Vector Error Correction Model (VECM) and the more general State Space framework are three frequently-used approaches to multiple time series analysis. Causal data can be included and Market Response/Marketing Mix modeling conducted.
Other Methods
There are several additional methods relevant to marketing research and data science I'll now briefly describe.
Panel Models include cross sections in a time series analysis. Sales and marketing data for several brands, for instance, can be stacked on top of one another and analyzed simultaneously. Panel modeling permits category-level analysis and also comes in handy when data are infrequent (e.g., monthly or quarterly).
Longitudinal Analysis is a generic and sometimes confusingly-used term that can refer to Panel modeling with a small number of periods ("short panels"), as well as to Repeated Measures, Growth Curve Analysis or Multilevel Analysis. In a literal sense it subsumes time series analysis but many authorities reserve that term for analysis of data with many time periods (e.g., >25). Structural Equation Modeling (SEM) is one method widely-used in Growth Curve modeling and other longitudinal analyses.
Survival Analysis is a branch of #statistics for analyzing the expected length of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. It's also called Duration Analysis in Economics and Event History Analysis in Sociology. It is often used in customer churn analysis.
In some instances one model will not fit an entire series well because of structural changes within the series, and model parameters will vary across time. There are numerous breakpoint tests and models (e.g., State Space, Switching Regression) available for these circumstances.
You may also notice that sales, call center activity or other data series you are tracking exhibit clusters of volatility. That is, there may be periods in which the figures move up and down in much more extreme fashion than other periods.
In these cases, you should consider a class of models with the forbidding name of GARCH (Generalized Autoregressive Conditional Heteroskedasticity). ARCH and GARCH models were originally developed for financial markets but can used for other kinds of time series data when volatility is of interest. Volatility can fall into many patterns and, accordingly, there are many flavors of GARCH models. Causal variables can be included. There are also multivariate extensions (MGARCH) if you have two or more series you wish to analyze jointly.
Non-Parametric Econometrics is a very different approach to studying time series and longitudinal data that is now receiving a lot of attention because of #bigdata and the greater computing power we now enjoy. These methods are increasingly feasible and useful as alternatives to the more familiar methods such as those described in this article.
#MachineLearning (e.g., #ArtificialNeuralNetwork s) is also useful in some circumstances but the results can be hard to interpret - they predict well but may not help us understand the mechanism that generated to data (the Why). To some extent, this drawback also applies to non-parametric techniques.
Most of the methods I've mentioned are Time Domain techniques. Another group of methods known as Frequency Domain, plays a more limited role in Marketing Research.
❇️ @AI_Python_EN
How to deliver on Machine Learning projects
A guide to the ML Engineering Loop.
By Emmanuel Ameisen and Adam Coates:
https://blog.insightdatascience.com/how-to-deliver-on-machine-learning-projects-c8d82ce642b0
#ArtificialIntelligence #BigData #DataScience #DeepLearning #MachineLearning
❇️ @AI_Python_EN
  A guide to the ML Engineering Loop.
By Emmanuel Ameisen and Adam Coates:
https://blog.insightdatascience.com/how-to-deliver-on-machine-learning-projects-c8d82ce642b0
#ArtificialIntelligence #BigData #DataScience #DeepLearning #MachineLearning
❇️ @AI_Python_EN
A good introduction to #MachineLearning and its 4 approaches:
https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0?gi=10a5fcd4decd
#BigData #DataScience #AI #Algorithms #ReinforcementLearning
❇️ @AI_Python_EN
  https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0?gi=10a5fcd4decd
#BigData #DataScience #AI #Algorithms #ReinforcementLearning
❇️ @AI_Python_EN
Decision trees are extremely fast when it comes to classify unknown records. Watch this video to know how Decision Tree algorithm works, in an easy way - https://bit.ly/2Ggsb9l
#DataScience #MachineLearning #AI #ML #ReinforcementLearning #Analytics #CloudComputing #Python #DeepLearning #BigData #Hadoop
  #DataScience #MachineLearning #AI #ML #ReinforcementLearning #Analytics #CloudComputing #Python #DeepLearning #BigData #Hadoop
Breast cancer classification with Keras and Deep Learning
To analyze the cellular structures in the breast histology images we were instead leveraging basic computer vision and image processing algorithms, but combining them in a novel way.
Researcher: Adrian Rosebrock
Paper & codes : https://ow.ly/yngq30qjLye
#artificialintelligence #ai #machinelearning #deeplearning #bigdata #datascience
❇️ @AI_Python_EN
  To analyze the cellular structures in the breast histology images we were instead leveraging basic computer vision and image processing algorithms, but combining them in a novel way.
Researcher: Adrian Rosebrock
Paper & codes : https://ow.ly/yngq30qjLye
#artificialintelligence #ai #machinelearning #deeplearning #bigdata #datascience
❇️ @AI_Python_EN
