Topic: Python SciPy – From Easy to Top: Part 5 of 6: Working with SciPy Statistics
---
1. Introduction to `scipy.stats`
• The
• You can perform tasks like descriptive statistics, hypothesis testing, sampling, and fitting distributions.
---
2. Descriptive Statistics
Use these functions to summarize and describe data characteristics:
---
3. Probability Distributions
SciPy has built-in continuous and discrete distributions such as normal, binomial, Poisson, etc.
Normal Distribution Example
---
4. Hypothesis Testing
One-sample t-test – test if the mean of a sample is equal to a known value:
Interpretation: If the p-value is less than 0.05, reject the null hypothesis.
---
5. Two-sample t-test
Test if two samples come from populations with equal means:
---
6. Chi-Square Test for Independence
Use to test independence between two categorical variables:
---
7. Correlation and Covariance
Measure linear relationship between variables:
Covariance:
---
8. Fitting Distributions to Data
You can fit a distribution to real-world data:
---
9. Sampling from Distributions
Generate random numbers from different distributions:
---
10. Summary
•
• You can compute summaries, perform tests, model distributions, and generate random samples.
---
Exercise
• Generate 1000 samples from a normal distribution and compute mean, median, std, and mode.
• Test if a sample has a mean significantly different from 5.
• Fit a normal distribution to your own dataset and plot the histogram with the fitted PDF curve.
---
#Python #SciPy #Statistics #HypothesisTesting #DataAnalysis
https://t.iss.one/DataScienceM
---
1. Introduction to `scipy.stats`
• The
scipy.stats
module contains a large number of probability distributions and statistical functions.• You can perform tasks like descriptive statistics, hypothesis testing, sampling, and fitting distributions.
---
2. Descriptive Statistics
Use these functions to summarize and describe data characteristics:
from scipy import stats
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data, keepdims=True)
std_dev = np.std(data)
print("Mean:", mean)
print("Median:", median)
print("Mode:", mode.mode[0])
print("Standard Deviation:", std_dev)
---
3. Probability Distributions
SciPy has built-in continuous and discrete distributions such as normal, binomial, Poisson, etc.
Normal Distribution Example
from scipy.stats import norm
# PDF at x = 0
print("PDF at 0:", norm.pdf(0, loc=0, scale=1))
# CDF at x = 1
print("CDF at 1:", norm.cdf(1, loc=0, scale=1))
# Generate 5 random numbers
samples = norm.rvs(loc=0, scale=1, size=5)
print("Random Samples:", samples)
---
4. Hypothesis Testing
One-sample t-test – test if the mean of a sample is equal to a known value:
sample = [5.1, 5.3, 5.5, 5.7, 5.9]
t_stat, p_val = stats.ttest_1samp(sample, popmean=5.0)
print("T-statistic:", t_stat)
print("P-value:", p_val)
Interpretation: If the p-value is less than 0.05, reject the null hypothesis.
---
5. Two-sample t-test
Test if two samples come from populations with equal means:
group1 = [20, 22, 19, 24, 25]
group2 = [28, 27, 26, 30, 31]
t_stat, p_val = stats.ttest_ind(group1, group2)
print("T-statistic:", t_stat)
print("P-value:", p_val)
---
6. Chi-Square Test for Independence
Use to test independence between two categorical variables:
# Example contingency table
data = [[10, 20], [20, 40]]
chi2, p, dof, expected = stats.chi2_contingency(data)
print("Chi-square statistic:", chi2)
print("P-value:", p)
---
7. Correlation and Covariance
Measure linear relationship between variables:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
corr, _ = stats.pearsonr(x, y)
print("Pearson Correlation Coefficient:", corr)
Covariance:
cov_matrix = np.cov(x, y)
print("Covariance Matrix:\n", cov_matrix)
---
8. Fitting Distributions to Data
You can fit a distribution to real-world data:
data = np.random.normal(loc=50, scale=10, size=1000)
params = norm.fit(data) # returns mean and std dev
print("Fitted mean:", params[0])
print("Fitted std dev:", params[1])
---
9. Sampling from Distributions
Generate random numbers from different distributions:
# Binomial distribution
samples = stats.binom.rvs(n=10, p=0.5, size=10)
print("Binomial Samples:", samples)
# Poisson distribution
samples = stats.poisson.rvs(mu=3, size=10)
print("Poisson Samples:", samples)
---
10. Summary
•
scipy.stats
is a powerful tool for statistical analysis.• You can compute summaries, perform tests, model distributions, and generate random samples.
---
Exercise
• Generate 1000 samples from a normal distribution and compute mean, median, std, and mode.
• Test if a sample has a mean significantly different from 5.
• Fit a normal distribution to your own dataset and plot the histogram with the fitted PDF curve.
---
#Python #SciPy #Statistics #HypothesisTesting #DataAnalysis
https://t.iss.one/DataScienceM
❤3