Data Science Machine Learning Data Analysis

Topic: Python SciPy – From Easy to Top: Part 5 of 6: Working with SciPy Statistics

---

1. Introduction to `scipy.stats`

• The scipy.stats module contains a large number of probability distributions and statistical functions.
• You can perform tasks like descriptive statistics, hypothesis testing, sampling, and fitting distributions.

---

2. Descriptive Statistics

Use these functions to summarize and describe data characteristics:

from scipy import stats
import numpy as np

data = [2, 4, 4, 4, 5, 5, 7, 9]

mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data, keepdims=True)
std_dev = np.std(data)

print("Mean:", mean)
print("Median:", median)
print("Mode:", mode.mode[0])
print("Standard Deviation:", std_dev)

---

3. Probability Distributions

SciPy has built-in continuous and discrete distributions such as normal, binomial, Poisson, etc.

Normal Distribution Example

from scipy.stats import norm

# PDF at x = 0
print("PDF at 0:", norm.pdf(0, loc=0, scale=1))

# CDF at x = 1
print("CDF at 1:", norm.cdf(1, loc=0, scale=1))

# Generate 5 random numbers
samples = norm.rvs(loc=0, scale=1, size=5)
print("Random Samples:", samples)

---

4. Hypothesis Testing

One-sample t-test – test if the mean of a sample is equal to a known value:

sample = [5.1, 5.3, 5.5, 5.7, 5.9]
t_stat, p_val = stats.ttest_1samp(sample, popmean=5.0)

print("T-statistic:", t_stat)
print("P-value:", p_val)

Interpretation: If the p-value is less than 0.05, reject the null hypothesis.

---

5. Two-sample t-test

Test if two samples come from populations with equal means:

group1 = [20, 22, 19, 24, 25]
group2 = [28, 27, 26, 30, 31]

t_stat, p_val = stats.ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("P-value:", p_val)

---

6. Chi-Square Test for Independence

Use to test independence between two categorical variables:

# Example contingency table
data = [[10, 20], [20, 40]]
chi2, p, dof, expected = stats.chi2_contingency(data)

print("Chi-square statistic:", chi2)
print("P-value:", p)

---

7. Correlation and Covariance

Measure linear relationship between variables:

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

corr, _ = stats.pearsonr(x, y)
print("Pearson Correlation Coefficient:", corr)

Covariance:

cov_matrix = np.cov(x, y)
print("Covariance Matrix:\n", cov_matrix)

---

8. Fitting Distributions to Data

You can fit a distribution to real-world data:

data = np.random.normal(loc=50, scale=10, size=1000)
params = norm.fit(data)  # returns mean and std dev

print("Fitted mean:", params[0])
print("Fitted std dev:", params[1])

---

9. Sampling from Distributions

Generate random numbers from different distributions:

# Binomial distribution
samples = stats.binom.rvs(n=10, p=0.5, size=10)
print("Binomial Samples:", samples)

# Poisson distribution
samples = stats.poisson.rvs(mu=3, size=10)
print("Poisson Samples:", samples)

---

10. Summary

• scipy.stats is a powerful tool for statistical analysis.
• You can compute summaries, perform tests, model distributions, and generate random samples.

---

Exercise

• Generate 1000 samples from a normal distribution and compute mean, median, std, and mode.
• Test if a sample has a mean significantly different from 5.
• Fit a normal distribution to your own dataset and plot the histogram with the fitted PDF curve.

---

#Python #SciPy #Statistics #HypothesisTesting #DataAnalysis

https://t.iss.one/DataScienceM

❤3

1.75K views18:34

About

Blog

Apps

Platform