Classical Regression Analysis: Overview
Standard Assumptions of Classical Linear Regression Model $${\displaystyle \mathbf{y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon} }$$ $${\displaystyle \text{Standard Normality: }\; {\boldsymbol{\varepsilon}} \mid \mathbf{X} \sim {\mathcal {N}}(\mathbf{0},\sigma _{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n}) \; \begin{cases} \text{Exogeneity:}\; &\operatorname{E}[\boldsymbol{\varepsilon} \mid \mathbf{X}] = \mathbf{0} \\ \text{Uncorrelated Spherical Error:}\; &\operatorname{Cov}[{\boldsymbol{\varepsilon}} \mid \mathbf{X}] = \sigma_{\boldsymbol{\varepsilon}}^{2} \mathbf{I}_{n} \\ \qquad \text{- Homoscedasticity:}\; & \operatorname{Var}[\varepsilon \mid \mathbf{X}] = \sigma_{\boldsymbol{\varepsilon}}^{2} \\ \qquad \text{- No Autocorrelation:}\; & \operatorname{Cov}[\varepsilon_{i}, \varepsilon_{j} \mid \mathbf{X}] = 0 \\ \end{cases} }$$ Estimator properties not to be hold for the assumptions $${\displaystyle \begin{aligned} \operatorname {E} [\hat{\theta}] \neq \theta \quad &\mid \underbrace{\operatorname{Var}[\hat{\theta}]}_{= \operatorname{E}[(\hat{\theta} - \operatorname{E}[\hat{\theta}])^{2}]} > \operatorname{Var}[\hat{\theta}_{\text{BLUE}}], \qquad \theta \in \{ \beta, \varepsilon \} \\ & \downarrow \\ {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} \\ \; &= \boldsymbol{\beta} + \underbrace{\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}}_{\text{multicollinearity}} \overbrace{\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon}}^{\text{non-orthogonality} } , \quad \underbrace{\boldsymbol{\varepsilon} \mid \mathbf {X}}_{\text{non-standard normality}} \nsim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n})\\ \operatorname{E}[{\hat {\boldsymbol {\beta }}} ] &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} {\color{Red}{ \operatorname{E}[ \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ]}} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}} ] &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} {\color{Red}{\operatorname{Var}[\mathbf{X}^{\mathsf {T}}\boldsymbol{\varepsilon} ]}} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \end{aligned} }$$ $${\displaystyle \begin{cases} \text{by endogeneity/exogeneity} \\ \quad \text{[Issue] Biased Estimator}\\ \quad \text{[Detection] Durbin-Wu-Hausman test, ...}\\ \quad \text{[Solution] Omitted Variable Bias, Instrumental Variables, ...}\\ \text{by multicollinearity} \\ \quad \text{[Issue] Efficient Estimator with High Variance of Estimate} \\ \quad \text{[Detection] VIF, CN, ...} \\ \quad \text{[Solution] PCA, ...}\\ \text{by autocorrelated errors} \\ \quad \text{[Issue] Not Efficient Estimator} \\ \quad \text{[Detection] Durbin-Watson Test, Breusch-Godfrey LM Test, ...} \\ \quad \text{[Solution] GLS, FGLS, MLE, ...}\\ \text{by heterosckedastic errors} \\ \quad \text{[Issue] Not Efficient Estimator} \\ \quad \text{[Detection] Goldfeld-Quandt Test, White’s Test, ... } \\ \quad \text{[Solution] GLS, ...}\\ \end{cases} }$$Covariance Structure $${\displaystyle \begin{aligned} \boldsymbol{\Sigma}{\text{ positive-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} >0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n}\setminus \{\mathbf {0} \} \\ \boldsymbol{\Sigma}{\text{ positive semi-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} \geq 0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n} \\ \boldsymbol{\Sigma}{\text{ negative-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} <0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n} \setminus \{\mathbf {0} \} \\ \boldsymbol{\Sigma}{\text{ negative semi-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} \leq 0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n} \\ \end{aligned} }$$
$${\displaystyle \begin{aligned} & \quad \text{structures for positive-definite} \; \boldsymbol{\Sigma}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \boldsymbol{\Sigma}) \\ \boldsymbol{\Sigma} &\rightarrow \sigma_{\varepsilon}^{2}\mathbf{I}_{n} = \underbrace{\begin{bmatrix} \sigma_{\varepsilon}^{2}& 0 &\cdots & 0 &0\\ 0&\sigma_{\varepsilon}^{2}&\cdots & 0 &0 \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0&0&\cdots &\sigma_{\varepsilon}^{2}& 0 \\ 0&0&\cdots &0 & \sigma_{\varepsilon}^{2} \\ \end{bmatrix}}_{\text{OLS}}, \quad \boldsymbol{\Sigma} \rightarrow \boldsymbol{\Omega} = \underbrace{{\begin{bmatrix} \sigma_{1}^{2}& 0 &\cdots & 0 &0\\ 0&\sigma_{2}^{2}&\cdots & 0 &0 \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0&0&\cdots &\sigma_{n-1}^{2}& 0 \\ 0&0&\cdots &0 & \sigma_{n}^{2} \\ \end{bmatrix}}}_{\text{GLS: Cross-Heteroskedasticity}} \\ \boldsymbol{\Sigma} &\rightarrow \boldsymbol{\Sigma} = \underbrace{{\begin{bmatrix} \gamma_{0} = \sigma_{\varepsilon}^{2}& \gamma_{-1} &\cdots & \gamma_{-(p-1)} &\gamma_{-p}\\ \gamma_{1}&\gamma_{0} =\sigma_{\varepsilon}^{2}&\cdots & \gamma_{-(p-2)} & \gamma_{-(p-1)} \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ \gamma_{p-1}&\gamma_{p-2}&\cdots &\gamma_{0} =\sigma_{\varepsilon}^{2}& \gamma_{-1} \\ \gamma_{p}&\gamma_{p-3}&\cdots &\gamma_{1} & \gamma_{0} =\sigma_{\varepsilon}^{2} \\ \end{bmatrix}}}_{\text{ARMA: Autocorrelation}}\quad : \text{structure for the conditional mean} \\ \boldsymbol{\Sigma} &\rightarrow \boldsymbol{\Sigma}_{t} = \underbrace{{\begin{bmatrix} \sigma_{1}^{2}& 0 &\cdots & 0 & 0 \\ 0 &\sigma_{2}^{2}&\cdots & 0 & 0 \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0 & 0 &\cdots & \sigma_{t-1}^{2}& 0 \\ 0 & 0 &\cdots & 0 & \sigma_{t}^{2} \\ \end{bmatrix}}}_{\text{ARCH: Auto-Heteroskedasticity}}\quad : \text{structure for the conditional variance} \\ \end{aligned} }$$
Analysis Example
import numpy as np
import pandas as pd
import statsmodels.tsa.api as smt
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
data = pd.DataFrame(np.random.normal(10, 1, size=1000), columns=['Y'])
n_groups = 10
for g in range(n_groups):
data.loc[data.index[g::n_groups], 'G'] = f'G{g}'
data = data.set_index('G').sort_index()
# ANOVA Data
index_func = lambda key: data.index.get_level_values(0) == key
for i in range(n_groups):
data.loc[index_func(f'G{i}'), 'Y'] = data.loc[index_func(f'G{i}'), 'Y'] + np.random.normal(i, 1, size=index_func(f'G{i}').sum())
# Simple Regression Data
np.random.seed(0)
data['X1'] = np.random.normal(0, 1, size=1000)
data['X2'] = smt.ArmaProcess(ar=[1, -.3, -.2], ma=[1]).generate_sample(nsample=1000, burnin=50).cumsum()
data['Y'] = data['Y'] + 3*data['X1'] + 2*data['X2']
data = data.reset_index()
# Analysis
result = smf.ols('Y ~ 1 + C(G) + X1 + X2', data=data).fit()
display(result.summary().tables[1])
display(sms.anova_lm(result, typ=2))
#result.get_robustcov_results(cov_type='HC0', use_t=None).cov_params()
#result.cov_params() # result.cov_type
# Variance Components
data['Y'].plot()
result.fittedvalues.plot()
result.resid.plot()
sms.anova_lm(result, typ=1)['sum_sq'].sum(), result.centered_tss, result.ess, result.ssr
Model Diagnostic
# https://www.statsmodels.org/dev/diagnostic.html
# https://www.statsmodels.org/stable/stats.html#module-statsmodels.stats.stattools
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
import statsmodels
import statsmodels.api as sm
import statsmodels.stats.api as sms
X, y = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=10, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=False, random_state=np.random.randint(1000))
data = pd.DataFrame(np.c_[X, y], columns= [ 'X'+str(i) for i in range(0, X.shape[1])]+['y'])
model = sm.OLS(endog=data['y'], exog=sm.add_constant(data[data.columns.difference(['y'], sort=False)]))
result = model.fit()
#display(result.summary())
# normality
# https://www.statsmodels.org/stable/generated/statsmodels.stats.stattools.jarque_bera.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.stattools.omni_normtest.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.normal_ad.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.kstest_normal.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.lilliefors.html
chi2_stats, two_sided_pval = sms.omni_normtest(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
jb_stats, jb_pval, skew, kurtosis = sms.jarque_bera(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
ad_stats, ad_pval = sms.normal_ad(result.resid) # Null (H0): Normality is not present. / Alternative (HA): Normality is present.
ks_stats, ks_pval = sms.lilliefors(result.resid, dist='norm', pvalmethod='table') # Null (H0): Normality is not present. / Alternative (HA): Normality is present. # alias of kstest
two_sided_pval, jb_pval, ad_pval, ks_pval
# heteroscedasticity
# https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.breakvar_heteroskedasticity_test.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.het_breuschpagan.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.het_white.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.het_goldfeldquandt.html
bv_stats, bv_pval = statsmodels.tsa.stattools.breakvar_heteroskedasticity_test(result.resid, subset_length=0.3333333333333333, alternative='two-sided', use_f=True) # Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
bp_lm_stats, bp_lm_pval, bp_f_stats, bp_f_pval = sms.het_breuschpagan(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
w_lm_stats, w_lm_pval, w_f_stats, w_f_pval = sms.het_white(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
stats, pval, _ = sms.het_goldfeldquandt(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
bp_lm_pval, bp_f_pval, w_lm_pval, w_f_pval, pval
# autocorrelation
# https://www.statsmodels.org/stable/generated/statsmodels.stats.stattools.durbin_watson.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_breusch_godfrey.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_ljungbox.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_lm.html
dw_stats = sms.durbin_watson(resids=result.resid) # if statistic is within the range of 1.5 and 2.5, No correlation. # Null (H0): There is no correlation among the residuals / Alternative (HA): The residuals are autocorrelated.
bg_lm_stats, bg_lm_pval, bg_f_stats, bg_f_pval = sms.acorr_breusch_godfrey(res=result, nlags=None, store=False) # Null (H0): There is no autocorrelation at any order less than or equal to p. / Alternative (HA): There exists autocorrelation at some order less than or equal to p.
engle_lm_stats, engle_lm_pval, engle_f_stats, engle_f_pval = sms.acorr_lm(resid=result.resid, nlags=None, store=False, period=None, ddof=0, cov_type='nonrobust', cov_kwargs=None)
lb_stats_and_pval = sms.acorr_ljungbox(x=result.resid, lags=None, boxpierce=False, model_df=0, period=None, return_df=True, auto_lag=False) # Null (H0): The residuals are independently distributed / Alternative (HA): The residuals are not independently distributed; they exhibit serial correlation.
bg_lm_pval, bg_f_pval, engle_lm_pval, engle_f_pval, engle_lm_pval, engle_f_pval
# linearity
hc_stat, hc_pval = sms.linear_harvey_collier(res=result)
w_stat, w_pval, w_dof = sms.spec_white(resid=result.resid, exog=sm.add_constant(data[data.columns.difference(['y'], sort=False)])) # ols
Training Model
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
# popluation space
P = 30; N = 500
beta = np.random.randint(-10, 11, P+1)[:, np.newaxis]
population_X = np.c_[np.ones((N,1)), np.random.normal(0, 1, size=(N, P))]
varepsilon = np.random.normal(0, 10, size=N)[:, np.newaxis]
f = lambda X: population_X@beta + varepsilon
population_y = f(population_X)
population_data = np.c_[population_X, population_y]; np.random.shuffle(population_data)
population_data = pd.DataFrame(population_data, columns=[ 'X' + str(i) for i in range(population_X.shape[1])] +['y'])
population_index = population_data.index
population_X = population_data.loc[:, population_data.columns != 'y'].values
population_y = population_data.loc[:, population_data.columns == 'y'].values.ravel()
# sampling space
p = 5; n = 50
sampling_index = np.random.choice(population_data.index, n, replace=False)
sampling_columns = np.random.choice(population_data.columns.difference(['y'], sort=False), p, replace=False)
sampling_columns = np.concatenate([sampling_columns, ['y']])
sampling_data = population_data.loc[sampling_index, sampling_columns].copy()
sampling_data.insert(0, 'const', 1)
sampling_X = sampling_data.loc[:, sampling_data.columns != 'y'].values
sampling_y = sampling_data.loc[:, sampling_data.columns == 'y'].values.ravel()
# simulation for population interpretation
simulation_data = population_data[sampling_columns].copy()
simulation_X = simulation_data.loc[:, simulation_data.columns != 'y'].values
simulation_y = simulation_data.loc[:, simulation_data.columns == 'y'].values.ravel()
# traing
result = sm.OLS(endog=sampling_y, exog=sampling_X).fit()
result.model.endog == result.fittedvalues + result.resid
display(result.summary())
# total sample variance/residual sample variance analysis
influence = result.get_influence(); leverage = influence.hat_matrix_diag
studentized_residuals = result.resid / (result.model.endog.std(ddof=1)*np.sqrt(1-leverage))
print(studentized_residuals.mean(), studentized_residuals.var(ddof=0)) # Approximation for mean 0, Approximation for variance of 1
# visualization
plt.figure(figsize=(30,5))
plt.plot(population_index, population_y, lw=0, marker='o', color='black', label='popluation_y')
plt.plot(sampling_index, sampling_y, lw=0, marker='o', markerfacecolor='w', color='black', label='sampling_y')
plt.plot(population_index, result.predict(np.c_[np.ones((simulation_X.shape[0], 1)), simulation_X]), lw=0, ms=5, marker='x', color='b', label='estimated_population_y')
plt.plot(sampling_index, result.predict(sampling_X), lw=0, ms=10, marker='x', color='r', label='estimated_sampling_y')
plt.legend()
# Homoscedasticity/Heteroscedasticity
plt.figure(figsize=(10,10))
plt.scatter(result.fittedvalues, result.resid, label='Homoscedasticity/Heteroscedasticity', color='black')
boundary = max(result.fittedvalues.std(), result.resid.std())
plt.xlim(-boundary*10, boundary*10)
plt.ylim(-boundary*10, boundary*10)
plt.legend()
TABLE OF CONTENTS
Regression Analysis
Ordinary Least Squares (OLS)
Weighted Least Square(WLS) and Generalized Least Square(GLS)
Maximum Likelihood Estimation
Regression Noise: Residual Analysis
Classification: MLE
Logistic Regression
Probit Regression
Tobit Regression
Robust Linear Regression
Regression Analysis
Classical Linear Regression Model: Vector Form $${\displaystyle y_{i}= \mathbf {x} _{i}^{\mathsf {T}}{\boldsymbol {\beta }}+\varepsilon _{i} =\beta _{0} + \beta _{1}\ x_{i1}+\beta _{2}\ x_{i2}+\cdots +\beta _{p}\ x_{ip}+\varepsilon _{i}, \qquad i=1,\ldots ,n \quad \And \quad j=0,\ldots ,p }$$Classical Linear Regression Model: Matrix Form $${\displaystyle \mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }} }$$ $${\displaystyle {\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{bmatrix}} = {\begin{bmatrix}1&x_{11}&\cdots &x_{1p}\\1&x_{21}&\cdots &x_{2p}\\\vdots &\vdots &\ddots &\vdots \\1&x_{n1}&\cdots &x_{np}\end{bmatrix}} {\begin{bmatrix}\beta _{0}\\\beta _{1}\\\beta _{2}\\\vdots \\\beta _{p}\end{bmatrix}} + {\begin{bmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\vdots \\\varepsilon _{n}\end{bmatrix}} }$$ $${\displaystyle \begin{align} \mathbf {X} & : \text{Design matrix} \\ {\boldsymbol {\beta }} & : \text{Deterministic variable (Regression coefficient or Population parameters)} \\ {\boldsymbol {\varepsilon }} &: \text{Random variable (Disturbance or Noise) } \\ \end{align} }$$
$${\displaystyle \begin{aligned} \rho &= \frac{\operatorname{Cov}(\mathbf{X}\boldsymbol{\beta},\mathbf{y})}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} = \frac{\operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\mathbf{y}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{y}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{\operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon})] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{\operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\mathbf{X}\boldsymbol{\beta}] + \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\boldsymbol{\varepsilon}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{X}\boldsymbol{\beta}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\boldsymbol{\varepsilon}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{ \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\mathbf{X}\boldsymbol{\beta}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{X}\boldsymbol{\beta}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} = \frac{ \operatorname{Var}[\mathbf{X}\boldsymbol{\beta}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{ \sqrt{ \operatorname{Var}[\mathbf{X}\boldsymbol{\beta}]}}{\sqrt{\operatorname{Var}[\mathbf{y}]}} = \frac{\sigma_{\mathbf{X}\boldsymbol{\beta}}}{\sqrt{\sigma_{\mathbf{X}\boldsymbol{\beta}}^{2} + \sigma_{\boldsymbol{\varepsilon}}^{2} + 2\sigma_{\mathbf{X}\boldsymbol{\beta}, \boldsymbol{\varepsilon}}}}\\ \; &= \frac{\sigma_{\mathbf{X}\boldsymbol{\beta}}}{\sigma_{\mathbf{y}}} = \frac{\sigma_{\mathbf{X}\boldsymbol{\beta}}}{\sigma_{\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}}} \\ \end{aligned} }$$ $${\displaystyle \mathbf{X}, \mathbf{y} \quad \rightarrow \quad \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\varepsilon}} \quad \rightarrow \quad \hat{\rho} \\ }$$
Ordinary Least Squares (OLS)
Objective Function: Sum of Squared Residuals(SSR, RSS:Residual Sum of Squares) $${\displaystyle S({\hat {\boldsymbol {\beta }}}) \equiv \sum _{i=1}^{n}\left|y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right|^{2}=\left\|\mathbf {y} -\mathbf {X} {\hat {\boldsymbol {\beta }}} \right\|^{2} =\left\|\mathbf {y} - {\hat {\mathbf {y}}} \right\|^{2} = \left\| {\hat{\boldsymbol {\varepsilon }}} \right\|^{2} \quad \And \quad {\frac {\partial}{\partial {\hat {\beta }}}}S({\hat {\boldsymbol {\beta }}}) = 0 }$$ $${\displaystyle \textit{Normal Equation} }$$ $${\displaystyle \begin{aligned} {\frac {\partial}{\partial {\hat {\beta }_{0}}}}S({\hat {\boldsymbol {\beta }}}) &= {\frac {\partial}{\partial {\hat {\beta }_{0}}}} \sum _{i=1}^{n}\left|y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right|^{2} \\ &= 2\sum _{i=1}^{n} \underbrace{\left( y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right)}_{\hat{\varepsilon}_{i}=\epsilon_{i}} \left( - \underline{X_{i0}}_{\text{dummy:1}} \right) = 0 \\ &\qquad \therefore \sum _{i=1}^{n} \epsilon_{i} = 0 \quad \cdots\text{(1)} \\ {\frac {\partial}{\partial {\hat {\beta }_{j^{*}|j \neq 0}}}}S({\hat {\boldsymbol {\beta }}}) &= {\frac {\partial}{\partial {\hat {\beta }_{j^{*}|j \neq 0}}}} \sum _{i=1}^{n}\left|y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right|^{2} \\ &= 2\sum _{i=1}^{n} \underbrace{\left( y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right)}_{\hat{\varepsilon}_{i}=\epsilon_{i}} \left( -X_{ij^{*}} \right) = 0 \\ &\qquad \therefore \sum _{i=1}^{n} \epsilon_{i}X_{ij^{*}} = 0 \quad \cdots\text{(2)} \\ \end{aligned} }$$ $${\displaystyle \text{Sample Perpendicularity/Orthogonality} \quad \begin{cases} \begin{aligned} \quad \sum _{i=1}^{n}& \epsilon_{i} = 0 \\ \quad \sum _{i=1}^{n}& \epsilon_{i}X_{ij} = 0 \\ \end{aligned} \\ \end{cases} }$$ Projection Properties $${\displaystyle \begin{aligned} \underbrace{\mathbf{y}}_{\text{observation}}, & \quad \underbrace{\mathbf{X}}_{\text{given explanatory variables}} \\ &\downarrow \\ \mathbf{y} = &\mathbf{X}\underbrace{\boldsymbol{\beta}}_{?}+\boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n}) \\ &\downarrow \\ \left\| \mathbf{y} - \hat{\mathbf{y}} \right\|^{2} &= \left\| \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \right\|^{2} \\ \; &= \left( \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \right)^{T}\left( \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \right) \\ \; &= \mathbf{y}^{T}\mathbf{y} - \mathbf{y}^{T}\mathbf{X}\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}^{T}\mathbf{X}^{T}\mathbf{y} + \hat{\boldsymbol{\beta}}^{T}\mathbf{X}^{T}\mathbf{X}\hat{\boldsymbol{\beta}} \\ &\downarrow \\ \frac{\partial}{\partial \hat{\boldsymbol{\beta}}} \left\| \mathbf{y} - \hat{\mathbf{y}} \right\|^{2} &= - 2 \mathbf{X}^{T}\mathbf{y} +2 \mathbf{X}^{T}\mathbf{X}\hat{\boldsymbol{\beta}} = 0 \\ \therefore \hat{\boldsymbol{\beta}} &= \left( \mathbf{X}^{T}\mathbf{X}\right)^{-1} \mathbf{X}^{T}\mathbf{y} \\ \end{aligned} }$$$${\displaystyle \text{Corollary about the assumptions} \; \begin{cases} \; \operatorname{E}[\hat{\boldsymbol{\beta}}] &= \boldsymbol{\beta} \quad &:\text{Unbiasedness} \\ \; \operatorname{Var}[\hat{\boldsymbol{\beta}}] &= \sigma_{\boldsymbol{\varepsilon}}^{2}\left( \mathbf{X}^{T}\mathbf{X} \right)^{-1} \quad &:\text{Efficiency} \\ \end{cases} \\ }$$ $${\displaystyle \begin{aligned} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} \\ \; &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}(\mathbf {X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\ \; &= \boldsymbol{\beta} + \underbrace{\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon}}_{\text{orthogonality}(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon}=0) \rightarrow 0} \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} \operatorname{E}[{\hat {\boldsymbol {\beta }}}] &= \boldsymbol{\beta} + \operatorname{E}[ \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ] \\ \; & = \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\operatorname{E}[ \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ] \quad \text{by exogeneity(nonstochastic X)}\\ \; & = \boldsymbol{\beta} \quad \text{by perpendicularity} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}}] &= \operatorname{Var}[\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ]\\ &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}} \operatorname{Var}[\boldsymbol{\varepsilon} ] \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n}) \\ \; & = \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} (\sigma_{\boldsymbol{\varepsilon}}^{2} \mathbf {X} ^{\mathsf {T}} \mathbf {X}) \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \; \text{by normality} \\ \; & = \sigma_{\boldsymbol{\varepsilon}}^{2}\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \end{aligned} }$$
$${\displaystyle \textit{Projection Summary} }$$ $${\displaystyle \begin{align} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} = \mathbf {X}^{+} \mathbf {y} \quad \sim \text{equivalent to MLE} \\ \; & \left( \approx \frac{\operatorname{Cov}[\mathbf{X}, \mathbf{y}]}{\operatorname{Var}[\mathbf{X}]} = \frac{\sigma_{\mathbf{X},\mathbf{y}}}{\sigma_{\mathbf{X}}^{2}} = \frac{\rho_{\mathbf{X},\mathbf{y}}\sigma_{\mathbf{X}}\sigma_{\mathbf{y}}}{\sigma_{\mathbf{X}}^{2}} = \rho_{\mathbf{X},\mathbf{y}}\frac{\sigma_{\mathbf{y}}}{\sigma_{\mathbf{X}}} \right) \\ {\hat {\mathbf {y}}} &= \mathbf {H} \mathbf {y} \qquad \left( h_{i'i}={\boldsymbol {x}}_{i'}^{\top }\left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}{\boldsymbol {x}}_{i} \approx {\frac {\operatorname {Cov} \left[{\hat {y}}_{i'},y_{i}\right]}{\operatorname {Var} \left[y_{i}\right]}} \right) \\ {\hat{\boldsymbol {\varepsilon }}} &= \mathbf {M} \mathbf {y} = \mathbf {M} \left( \mathbf {X} {\boldsymbol {\beta }} + {\boldsymbol {\varepsilon }} \right) = \mathbf {M}{\boldsymbol {\varepsilon }} \\ \end{align} }$$ $${\displaystyle \begin{cases} \; & \mathbf {X}^{+} & \equiv \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}}\\ \; & \mathbf {H} & \equiv \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}} = \mathbf {X} \mathbf {X}^{+} \\ \; & \mathbf {M} & \equiv \mathbf {I} - \mathbf {H} = \mathbf {I} - \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}} = \mathbf {I} - \mathbf {X} \mathbf {X}^{+} \\ \; & {\hat {\mathbf {y}}} & \equiv \mathbf {X} {\hat {\boldsymbol {\beta }}} = \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}} \mathbf {y} = \mathbf {X} \mathbf {X}^{+} \mathbf {y} = \mathbf {H} \mathbf {y} \\ \; & {\hat{\boldsymbol {\varepsilon }}} & \equiv \mathbf {y} - {\hat {\mathbf {y}}} = \mathbf {y} - \mathbf {H}\mathbf {y} = \left( \mathbf {I} - \mathbf {H} \right) \mathbf {y} = \mathbf {M}\mathbf {y} \\ \end{cases} }$$ $${\displaystyle \begin{align} {\hat{\boldsymbol {\varepsilon }}} & : \text{Residual}({\boldsymbol {\epsilon }}) \\ \mathbf {X}^{+} & : \text{Pseudoinverse Matrix, Moore–Penrose Inverse Matrix} \\ \mathbf {H} & : \text{Projection Matrix, Influence} \\ \mathbf {M} & : \text{Residual Marker Matrix} \\ \mathbf {I} & : \text{Identity Matrix} \\ \end{align} }$$ $${\displaystyle }$$ $${\displaystyle \begin{aligned} \hat{\mathbf{y}} = \mathbf{H} \mathbf{y} \\ \left \langle \hat{\mathbf{y}} \right | = \left \langle \mathbf{y} \right | \mathbf{H} \\ \left \langle \hat{\mathbf{y}} | \mathbf{y} \right \rangle = \left \langle \mathbf{y} | \mathbf{H} | \mathbf{y} \right \rangle \\ \text{if H is the identity aoperator,}\quad \therefore \; \left \langle \hat{\mathbf{y}} | \mathbf{y} \right \rangle = \left \langle \mathbf{y} | \mathbf{y} \right \rangle \\ \end{aligned} }$$ $${\displaystyle \text{Leverage Property} \; \begin{cases} \; & 0 \le h_{ii} \le 1 \\ \; & \sum _{i=1}^{n}h_{ii}=\operatorname {Tr} (\mathbf {H} )=\operatorname {Tr} \left(\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\right)=\operatorname {Tr} \left(\mathbf {X} ^{\top }\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\right)=\operatorname {Tr} (\mathbf {I} _{(p+1)})=(p+1) \\ \; & \operatorname E[h_{diagonal}] = (p+1)/n \\ \end{cases} }$$ $${\displaystyle \begin{align} symmetric & : \quad \mathbf{M}^{T} = \mathbf{M} \quad && \And \quad \mathbf{H}^{T} = \mathbf{H} \\ idempotent & : \quad \mathbf{M}^{2} = \mathbf{M} \quad && \And \quad \mathbf{H}^{2} = \mathbf{H} \\ orthogonal & : \quad \mathbf{M} \mathbf{H} = 0 \quad && \And \quad \mathbf{M} \mathbf{X} = 0 \\ projection & : \quad \mathbf{H} \mathbf{X} = \mathbf{X} \quad && \; \\ \end{align} }$$ Assumptions of Classical Linear Regression Model $${\displaystyle \textit{Assumptions} }$$ $${\displaystyle \begin{align} \text{Strict exogeneity } & : \operatorname {E} [\,{\boldsymbol{\varepsilon}} \mid \mathbf{X}\,] = 0 \\ \text{No linear dependence } & : \Pr \!{\big [}\,\operatorname {rank} (\mathbf{X}) = (p+1)\,{\big ]} =1 \\ \text{Spherical errors(Homoscedasticity) } & : \operatorname {Var} [\,{\boldsymbol{\varepsilon}} \mid \mathbf{X}\,] = \sigma ^{2}\mathbf{I}_{n} \\ \text{Normality } & : {\boldsymbol{\varepsilon}} \mid \mathbf{X} \sim {\mathcal {N}}(0,\sigma ^{2}\mathbf{I}_{n}) \\ \; & \; \\ \end{align} }$$ $${\displaystyle \textit{Corollaries} }$$ $${\displaystyle \operatorname {E}[\varepsilon] = 0 \quad \And \quad \operatorname {E}[ \mathbf{X}^{T}\varepsilon] = 0 }$$ $${\displaystyle \mathbf{y} \mid \mathbf{X} \sim {\mathcal {N}}(\mathbf{X} {\boldsymbol{\beta}}, \sigma ^{2}\mathbf{I}_{n}) \quad \And \quad {\hat{\boldsymbol{\beta}}} \mid \mathbf{X} \sim {\mathcal {N}} \left( {\boldsymbol{\beta}}, \sigma ^{2} (\mathbf{X}^{T} \mathbf{X})^{-1} \right) \quad \And \quad {\hat{\boldsymbol {\varepsilon }}} \mid \mathbf{X} \sim {\mathcal {N}} \left(0, \sigma^{2} (\mathbf{I}_{n} - \mathbf{H}) \right) }$$ $${\displaystyle \frac{\hat{\beta_{j}} - \beta_{j} }{ \sigma \sqrt{(\mathbf{X}^{T}\mathbf{X})^{-1}_{jj} }} \mid \mathbf{X} \sim {\mathcal {N}}(0, 1^{2}) \quad \And \quad \frac{\hat{\beta_{j}} - \beta_{j} }{ s_{\alpha} \sqrt{(\mathbf{X}^{T}\mathbf{X})^{-1}_{jj \mid \alpha = n-(p+1)} }} \mid \mathbf{X} \sim {t}_{\alpha = n-(p+1) }, \; \left( \frac{\hat{\beta_{j}} - \beta_{j} }{ s_{\alpha} \sqrt{(\mathbf{X}^{T}\mathbf{X})^{-1}_{jj \mid \alpha = n-(p+1)} }} \right)^{2} \mid \mathbf{X} \sim F_{1, \alpha = n-(p+1)} }$$ $${\displaystyle \because \; \frac{\alpha s_{\alpha_{=num-1}}^{2}}{\sigma^{2}} \; \sim \; \chi^{2}_{\alpha_{=num-1}} \quad \text{for} \; \alpha s_{\alpha_{=num-1}}^{2} \equiv \sum_{i=1}^{num} (X_{i} - \overline{X} )^{2} \; \text{with} \; X \sim {\mathcal {N}}(*, \sigma^{2}) }$$ Stochastic Regressor Properties by Assumption $${\displaystyle \begin{aligned} \text{coefficient mean vector with unbiasedness for } {\boldsymbol {\beta}} & \qquad \operatorname {E} [{\hat{{\boldsymbol {\beta}}}}] = {\boldsymbol {\beta}} \\ \text{coefficient covariance matrix} & \qquad \operatorname {Var} [{\hat{{\boldsymbol {\beta}}}}] = \sigma^{2} \left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)_{p+1}^{-1} \simeq s_{total}^{2} \left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)_{p+1}^{-1} \equiv \mathbf{S}_{explained}^{2} \\ \text{noise covariance matrix} & \qquad \operatorname {Var} [{\boldsymbol {\varepsilon}} ] = \sigma^{2}\mathbf{I}_{n} \simeq s_{total}^{2}\mathbf{I}_{n} \equiv \mathbf{S}_{total}^{2} \\ \text{residual covariance matrix} & \qquad \operatorname {Var} [{\boldsymbol {\epsilon}} ] = \sigma^{2}(\mathbf{I}_{n} - \mathbf{H}) \simeq s_{total}^{2}(\mathbf{I}_{n} - \mathbf{H}) \equiv \mathbf{S}_{residual}^{2} \\ \; & \; \\ \end{aligned} }$$ $${\displaystyle \text{Population variances}}$$ $${\displaystyle \begin{align} \text{noise variance } & \qquad \operatorname {Var} \left[ \varepsilon \right] = \sigma^{2} \\ \text{studentized residual variance } & \qquad \operatorname {Var} \left[ \epsilon_{studentized,i} \equiv \frac{\epsilon_{i}}{s_{total} \sqrt{1-h_{ii}} } \right] \approx 1^{2} \\ \; & \qquad \operatorname {Var} \left[ s_{total}\epsilon_{studentized,i}=\frac{\epsilon_{i}}{\sqrt{1-h_{ii}} } \right] \approx \sigma^{2} \\ \; & \; \\ \end{align} }$$ $${\displaystyle \text{Sample variances}}$$ $${\displaystyle \begin{align} \text{total sample variance with unbiasedness for } \sigma^{2} & \qquad \operatorname {E} \left[s_{total}^{2} \right] = \operatorname {Var} \left[\varepsilon \right] = \sigma^{2} \\ \text{residual sample variance with biasedness for } \sigma^{2} & \qquad \operatorname {E} \left[s_{residual}^{2} \right] = \operatorname {Var} \left[\epsilon \right] \approx \sigma^{2}(1-h_{ii}) \\ \; & \; \\ \end{align} }$$ $${\displaystyle \text{ Decision boundary for outliers}}$$ $${\displaystyle \begin{align} \; \text{Cook's Distance } &: D_{i} = \frac{r_{i}^{2}}{RSS} \left[ \frac{h_{ii}}{(1-h_{ii})^{2}} \right] \\ \; \text{Fox's Recommendation } &: D_{i} \ge \frac{4}{N - K -1} \\ \; & \; \\ \end{align} }$$ $${\displaystyle \begin{align} \text{(z-statistic) endog-variable confidence interval:}\quad & \bar{y} \pm Z_{1-\alpha/2} \times \sqrt{ \frac{\sigma^{2}}{n}} \\ \text{(t-statistic) endog-variable confidence interval:}\quad & \bar{y} \pm t_{1-\alpha/2, n-1} \times \sqrt{ \frac{s_{total}^{2}}{n}} \\ \text{slope confidence interval:}\quad & \hat{\beta}_{j\neq0} \pm t_{1-\alpha/2, n-(p+1)}\times \sqrt{MS_{residual} \times \left( \frac{1}{\sum(x_i-\bar{x})^2} \right)} \\ \text{bias confidence interval:}\quad & \hat{\beta}_0 \pm t_{1-\alpha/2, n-(p+1)} \times \sqrt{MS_{residual} \times \left( \dfrac{1}{n}+\dfrac{\bar{x}^2}{\sum(x_i-\bar{x})^2} \right) } \\ \text{fitted-value confidence interval:}\quad & \hat{y}_i^{*} \pm t_{1-\alpha/2, n-(p+1)} \times \sqrt{MS_{residual} \times \left(\dfrac{1}{n} + \dfrac{(x_i^{*}-\bar{x})^2}{\sum(x_i-\bar{x})^2}\right)} \\ \text{prediction interval:}\quad & \hat{y}_i^{*} \pm t_{1-\alpha/2, n-(p+1)} \times \sqrt{MS_{residual} \times \left(1+ \frac{1}{n} + \dfrac{(x_i^{*}-\bar{x})^2}{\sum(x_i-\bar{x})^2}\right)} \\ \end{align} }$$ Analysis of Variance $${\displaystyle \text{Fixed Effect Model} }$$ $${\displaystyle \begin{aligned} \operatorname{Var} [\mathbf{y}] = \operatorname{Var} [\mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}] &= \boldsymbol{\beta}^{T}\operatorname{Var} [\mathbf{X}]\boldsymbol{\beta} + \operatorname{Var} [\boldsymbol{\varepsilon}] + 2 \operatorname{Cov} [\mathbf{X}\boldsymbol{\beta}, \boldsymbol{\varepsilon}] \\ \; =\operatorname{Var} [\mathbf{X}\hat{\boldsymbol{\beta}} + \boldsymbol{\epsilon}] &= \operatorname{Var} [\mathbf{X}\hat{\boldsymbol{\beta}}] + \operatorname{Var} [\boldsymbol{\epsilon}] + 2\operatorname{Cov} [\mathbf{X}\hat{\boldsymbol{\beta}}, \boldsymbol{\epsilon}]\\ \; =\operatorname{Var} [\hat{\mathbf{y}} + \boldsymbol{\epsilon}] &= \operatorname{Var} [\hat{\mathbf{y}}] + \operatorname{Var} [\boldsymbol{\epsilon}] + 2\underbrace{\operatorname{Cov} [\hat{\mathbf{y}}, \boldsymbol{\epsilon}]}_{\text{Perpendicularity}} \\ \end{aligned} }$$
$${\displaystyle \begin{align} \underbrace{\left(\sum\limits_{i=1}^{n}(y_i-\bar{y})^2\right)}_{\underset{\text{Total Sum of Squares}}{\text{TSS}}} &= \underbrace{\sum\limits_{i=1} ^{n} \left( \hat{y}_{i} - \overline{y} \right)^{2}}_{\underset{\text{Explained Sum Squares/Regression of Sums}}{\text{ESS(SSR)}}} + \underbrace{\sum\limits_{i=1} ^{n} \left( y_{i} - \hat{y} \right)^{2}}_{\underset{\text{Residual Sum Squares/Error Sum of Squares}}{\text{RSS(SSE)}}} \\ \end{align} }$$ $${\displaystyle \begin{aligned} \left\langle \mathbf{y} | \mathbf{y} \right\rangle & = \left\langle {\hat{\mathbf{y}}} | {\hat{\mathbf{y}}} \right\rangle + \left\langle{\boldsymbol {\epsilon }} | {\boldsymbol {\epsilon }} \right\rangle &\quad TSS & = ESS + RSS \\ 1 & = \frac{\left\langle {\hat{\mathbf{y}}} | {\hat{\mathbf{y}}} \right\rangle}{\left\langle \mathbf{y} | \mathbf{y} \right\rangle} + \frac{\left\langle{\boldsymbol {\epsilon }} | {\boldsymbol {\epsilon }} \right\rangle}{\left\langle \mathbf{y} | \mathbf{y} \right\rangle} &\quad 1 & = r^{2} + k^{2} \\ \end{aligned} }$$ $${\displaystyle \begin{cases} \mathrm{TSS} & \equiv \sum _{i=1}^{n} \left( y_{i} - \overline{y} \right)^{2} &&= \Vert \mathbf{y} - \overline{\mathbf{y}} \Vert^{2} &&& = SS_{\text{total}} \\ \mathrm{ESS}(\text{Between Groups, SSB}) & \equiv \sum _{i=1}^{n} \left( {\hat{y}}_{i} - \overline{y} \right)^{2} &&= \Vert \hat{\mathbf{y}} - \overline{\mathbf{y}} \Vert^{2} &&&= SS_{\text{explained}} \\ \mathrm{RSS}(\text{Within Groups, SSW}) & \equiv \sum _{i=1}^{n} \left( y_{i} - {\hat{y}}_{i} \right)^{2} &&= \Vert \mathbf{y} - \hat{\mathbf{y}} \Vert^{2} = {\boldsymbol {\epsilon }}^{T} {\boldsymbol {\epsilon }} &&&= SS_{\text{residual}} \\ \end{cases} }$$ $${\displaystyle \begin{aligned} SS_{\text{total}} & = \sum _{i=1}^{n} \left( y_{i} - \overline{y} \right)^{2} && = MS_{\text{total}} \underbrace{\left( n - 1 \right)}_{\text{df}_{\text{total}}} && \equiv s_{total}^{2} \left( n - 1 \right) \\ SS_{\text{explained}} & = \sum _{i=1}^{n} \left( {\hat{y}}_{i} - \overline{y} \right)^{2} &&= MS_{\text{explained}} \underbrace{\left( (p+1) - 1 \right)}_{\text{df}_{\text{explained}}} && \equiv s_{explained}^{2}(n-1) \\ SS_{\text{residual}} & = \sum _{i=1}^{n} \left( y_{i} - {\hat{y}}_{i} \right)^{2} &&= \underbrace{MS_{\text{residual}}}_{\text{prediction interval-square}} \underbrace{\left( n - (p+1) \right)}_{\text{df}_{\text{residual}}} && \equiv s_{residual}^{2}(n-1) \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} MSE& \equiv MS_{residual} = \dfrac{\sum(y_i-\hat{y}_i)^2}{n-(p+1)}=\dfrac{RSS}{n-(p+1)} \\ MSR& \equiv MS_{explained} = \dfrac{\sum(\hat{y}_i-\bar{y})^2}{(p+1)-1}=\dfrac{ESS}{(p+1)-1} \\ \end{aligned} }$$
$${\displaystyle \begin{aligned} \text{Source} & \quad \text{Degree of Freedom} && \quad SS && \quad MS && \quad F \\ \hline \\ \text{total} & \quad n-1 && \quad SS_{\text{total}} && \quad \; && \quad \\ \text{explained} & \quad (p+1) - 1 && \quad SS_{\text{explained}} && \quad SS_{\text{explained}} / \text{df}_{\text{explained}} && \quad MS_{\text{explained}} / MS_{\text{residual}}\\ \text{residual} & \quad n-(p+1) && \quad SS_{\text{residual}} && \quad SS_{\text{residual}} / \text{df}_{\text{residual}} && \quad \\ \hline \\ \end{aligned} }$$ $${\displaystyle \text{The centered total sum of squares for the case with an intercept} }$$ $${\displaystyle {\begin{aligned} \sum _{i=1}^{n}(y_{i}-{\overline {y}})^{2}&=\sum _{i=1}^{n}(y_{i}-{\overline {y}}+{\hat {y}}_{i}-{\hat {y}}_{i})^{2}=\sum _{i=1}^{n}(({\hat {y}}_{i}-{\bar {y}})+\underbrace {(y_{i}-{\hat {y}}_{i})} _{\epsilon_{i}})^{2}\\ &=\sum _{i=1}^{n}(({\hat {y}}_{i}-{\bar {y}})^{2}+2\epsilon_{i}({\hat {y}}_{i}-{\bar {y}})+\epsilon_{i}^{2})\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}({\hat {y}}_{i}-{\bar {y}})\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}({\hat {\beta }}_{0}+{\hat {\beta }}_{1}x_{i1}+\cdots +{\hat {\beta }}_{p}x_{ip}-{\overline {y}})\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2} + {\color{Red} \underbrace{ 2({\hat {\beta }}_{0}-{\overline {y}})\sum _{i=1}^{n}\epsilon_{i}} _{0 \,(\text{centered term with intercept})}} +2{\hat {\beta }}_{1}\underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{i1}} _{0}+\cdots +2{\hat {\beta }}_{p} \underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{ip}} _{0}\\&=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2} =\sum _{i=1}^{n} {\color{Red} \underbrace{ \left((\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}}) -{\bar {y}}\right)^{2} }_{\text{degree of freedom: }(p+1)-1} }+ \underbrace{ \sum _{i=1}^{n}\epsilon_{i}^{2} }_{\text{degree of freedom: } n - (p+1)} \\ &= \mathrm {ESS} +\mathrm {RSS} \\ \end{aligned}} }$$ $${\displaystyle \text{The uncentered total sum of squares for the case without an intercept} }$$ $${\displaystyle {\begin{aligned} \sum _{i=1}^{n} y_{i}^{2}&=\sum _{i=1}^{n}(y_{i}+{\hat {y}}_{i}-{\hat {y}}_{i})^{2}=\sum _{i=1}^{n}({\hat {y}}_{i}+\underbrace {(y_{i}-{\hat {y}}_{i})} _{\epsilon_{i}})^{2}\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}^{2}+2\epsilon_{i}{\hat {y}}_{i}+\epsilon_{i}^{2})\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}{\hat {y}}_{i}\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}({\hat {\beta }}_{1}x_{i1}+\cdots +{\hat {\beta }}_{p}x_{ip})\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2} + 2{\hat {\beta }}_{1}\underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{i1}} _{0}+\cdots +2{\hat {\beta }}_{p}\underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{ip}} _{0}\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}=\mathrm {ESS} +\mathrm {RSS} \\ \end{aligned}} }$$
Bias-Variance Trade-Off $${\displaystyle \text{Theoretical Assumption}:\quad y=f(x)+\varepsilon }$$ $${\displaystyle \begin{aligned} {\text{MSE}}&\triangleq \operatorname {E} {\big [}(y-{\hat {f}})^{2}{\big ]} \\ &=\operatorname {E} {\big [}y^{2}-2y{\hat {f}}+{\hat {f}}^{2}{\big ]}=\underbrace{\operatorname {E} {\big [}y^{2}{\big ]}}_{(1)}-2\underbrace{\operatorname {E} {\big [}y{\hat {f}}{\big ]}}_{(2)}+ \underbrace{\operatorname {E} {\big [}{\hat {f}}^{2}{\big ]}}_{(3)} \\ &=f^{2}+\sigma ^{2}-2f\operatorname {E} [{\hat {f}}]+\operatorname {Var} [{\hat {f}}]+\operatorname {E} [{\hat {f}}]^{2}\\&=(f-\operatorname {E} [{\hat {f}}])^{2}+\sigma ^{2}+\operatorname {Var} {\big [}{\hat {f}}{\big ]}\\[5pt]&=\operatorname {Bias} [{\hat {f}}]^{2}+\operatorname {Var} {\big [}{\hat {f}}{\big ]} +\sigma ^{2} \\ \end{aligned} }$$ $${\displaystyle {\begin{aligned} (1):\quad \operatorname {E} {\big [}y^{2}{\big ]}&=\operatorname {E} {\big [}(f+\varepsilon )^{2}{\big ]}\\&=\operatorname {E} [f^{2}]+2\operatorname {E} [f\varepsilon ]+\operatorname {E} [\varepsilon ^{2}]&&{\text{by linearity of }}\operatorname {E} \\&=f^{2}+2f\operatorname {E} [\varepsilon ]+\operatorname {E} [\varepsilon ^{2}]&&{\text{since }}f{\text{ does not depend on the data}}\\&=f^{2}+2f\cdot 0+\sigma ^{2}&&{\text{since }}\varepsilon {\text{ has zero mean and variance }}\sigma ^{2} \end{aligned}} }$$ $${\displaystyle {\begin{aligned} (2):\quad \operatorname {E} {\big [}y{\hat {f}}{\big ]}&=\operatorname {E} {\big [}(f+\varepsilon ){\hat {f}}{\big ]}\\&=\operatorname {E} [f{\hat {f}}]+\operatorname {E} [\varepsilon {\hat {f}}]&&{\text{by linearity of }}\operatorname {E} \\&=\operatorname {E} [f{\hat {f}}]+\operatorname {E} [\varepsilon ]\operatorname {E} [{\hat {f}}]&&{\text{since }}{\hat {f}}{\text{ and }}\varepsilon {\text{ are independent}}\\&=f\operatorname {E} [{\hat {f}}]&&{\text{since }}\operatorname {E} [\varepsilon ]=0 \end{aligned}} }$$ $${\displaystyle {\begin{aligned} (3):\quad \operatorname {E} {\big [}{\hat {f}}^{2}{\big ]}&=\operatorname {Var} ({\hat {f}})+\operatorname {E} [{\hat {f}}]^{2}&&{\text{since }}\operatorname {Var} [X]\triangleq \operatorname {E} {\Big [}(X-\operatorname {E} [X])^{2}{\Big ]}=\operatorname {E} [X^{2}]-\operatorname {E} [X]^{2}{\text{ for any random variable }}X \end{aligned}} }$$
Model Evaluation $${\displaystyle {\begin{aligned} \text{Coefficient of determination:} &\quad r^{2} && \equiv 1 - \frac{\mathrm{RSS}}{\mathrm{TSS}} = \frac{\mathrm{ESS}}{\mathrm{TSS}} \\ \text{Coefficient of alienation:} &\quad k^{2} && \equiv \frac{\mathrm{RSS}}{\mathrm{TSS}} = 1 - r^{2} \\ \text{F-test statistic:} &\quad F && \equiv \frac{\mathrm{ESS}}{(p+1)-1} \div \frac{\mathrm{RSS}}{n-(p+1)} \\ \; & \; &&= \frac{\mathrm{ESS}/[(p+1)-1]}{\mathrm{RSS}/[n-(p+1)]} \\ \; & \; &&= \underbrace{ \frac{\mathrm{MSR}}{\mathrm{MSE}} \sim \mathbf{F}((p+1)-1, n-(p+1))}_{\text{treatment effect}} \\ \; & \; && \; \\ \end{aligned}} }$$
Implementation: numpy
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from statsmodels.stats.outliers_influence import variance_inflation_factor
bias = 10.0
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
X = np.c_[np.ones((X.shape[0], 1)), X]
coef = [bias] + coef.tolist()
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))
unbiased_coef = np.linalg.inv(X.T @ X) @ X.T @ y
display(pd.DataFrame(data=dict(coef=coef, unbiased_coef=unbiased_coef)))
H = X@np.linalg.inv(X.T@X)@X.T # projection matrix
h = np.diag(H) # matrix diagonal: influence hat
M = np.eye(X.shape[0]) - H # residual marker matrix
nobs = X.shape[0]
rank = X.shape[1]
outlier_index = h > 3*(rank / nobs)
outliers = np.c_[X, y][outlier_index]
prediction = lambda X: X@unbiased_coef
y_pred = prediction(X)
y_true = y
residuals = y_true - y_pred
studentized_residuals = residuals / (y_true.std(ddof=1)*np.sqrt(1-h))
intercept = True
if intercept:
# [centered case with intercept]
# tss = rss + ess
tss = ((y_true - y_true.mean())**2).sum() # result.centered_tss ~ ((model.endog - model.endog.mean())**2).sum()
ess = ((y_pred - y_true.mean())**2).sum() # result.ess ~ ((result.fittedvalues - model.endog.mean())**2).sum()
rss = ((y_true - y_pred)**2).sum() # result.ssr ~ (result.resid**2).sum()
else:
# [uncentered case without intercept]
# tss = rss + ess
tss = (y_true**2).sum() # result.uncentered_tss ~ (model.endog**2).sum()
ess = (y_pred**2).sum() # result.ess ~ (result.fittedvalues**2).sum()
rss = ((y_true - y_pred)**2).sum() # result.ssr ~ (result.resid**2).sum()
rsquared = 1 - (rss/tss)
# normality test
stat_normality, p_normailty = stats.shapiro(residuals) # normality
stat_normality, p_normailty = stats.shapiro(studentized_residuals) # normality
stats.probplot(studentized_residuals, plot=plt) # qqplot
Implementation: sklearn
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from statsmodels.stats.outliers_influence import variance_inflation_factor
bias = 10.0
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
coef = [bias] + coef.tolist()
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))
model = LinearRegression(fit_intercept=True)
#model = Lasso(alpha=0.01)
#model = Ridge(alpha=0.01)
#model = ElasticNet(alpha=0.01, l1_ratio=0.5)
model = model.fit(X, y)
unbiased_coef = [model.intercept_] + model.coef_.tolist()
display(pd.DataFrame(data=dict(coef=coef, unbiased_coef=unbiased_coef)))
y_pred = model.predict(X)
y_true = y
Implementation: statsmodels(OLS class)
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# data
bias = 10.0
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
X = np.c_[np.ones((X.shape[0], 1)), X]
coef = [bias] + coef.tolist()
data = pd.DataFrame(np.c_[X, y], columns= ['const']+[ 'X'+str(i) for i in range(1, X.shape[1])]+['Y'])
# multicollinearity
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))
sm.graphics.plot_partregress('X1', 'Y', data.columns.difference(['Y', 'X1'], sort=False).tolist(), data=data, obs_labels=False, ret_coords=False)
# training
model = sm.OLS(endog=data['Y'], exog=data[data.columns.difference(['Y'], sort=False)])
result = model.fit()
display(result.summary()) # null hypothesis: coefficient beta = 0
# result visualization
sm.graphics.plot_ccpr(result, 'X1')
sm.graphics.plot_regress_exog(result, 'X1')
# prediction
y_pred = result.predict(X) # result.fittedvalues
y_true = model.endog
# Coefficient Estimation
result.t_test('const = 9.5')
result.t_test('X1 = 39.4')
result.t_test('X3 = 39.4')
result.t_test('X5 = 39.8')
result.t_test('X1 = X2')
result.t_test('X4 = X1')
# Covariance Matrix of Model Coefficients
# If no argument is specified returns the covariance matrix of a model (scale)*(X.T X)^(-1)
# If contrast is specified it pre and post-multiplies as follows (scale) * r_matrix (X.T X)^(-1) r_matrix.T
# If contrast and other are specified returns (scale) * r_matrix (X.T X)^(-1) other.T
# If column is specified returns (scale) * (X.T X)^(-1)[column,column] if column is 0d OR (scale) * (X.T X)^(-1)[column][:,column] if column is 1d
cov_matrix = result.cov_params()
# Projection Properties
influence = result.get_influence()
leverage = influence.hat_matrix_diag
cooks_d, pvals = influence.cooks_distance
# result.resid / result.resid.std(ddof=int(result.model.rank))
studentized_residual = result.outlier_test()['student_resid']
#estimated_internal_noise_deviation = np.sqrt((result.resid**2).sum() / (result.nobs - model.rank))
#studentized_internal_residual = result.resid / ((estimated_internal_noise_deviation)*np.sqrt(1 - h_ii))
studentized_internal_residual = influence.resid_studentized_internal
studentized_external_residual = influence.resid_studentized_external
# pearson_residual normality
# result.resid / np.sqrt(result.mse_resid * (1 - result.get_influence().hat_matrix_diag))
# result.resid / (result.resid.std(ddof=int(model.rank))*np.sqrt(1 - h_ii))
pearson_residual = result.resid_pearson
sm.qqplot(pearson_residual, line='45') # residual normality
# outliers1
h_ii = influence.hat_matrix_diag
outlier_index = h_ii > 3*(model.rank/model.nobs)
outliers = np.c_[X, y][outlier_index]
sm.graphics.plot_leverage_resid2(result) # residual & leverage
# outliers2
K = influence.k_vars # model.rank
cooks_d2, pvals = influence.cooks_distance
outlier_boundary = 4 / (model.nobs - K - 1)
outlier_index = cooks_d2 > outlier_boundary
outliers = np.c_[X, y][outlier_index]
sm.graphics.influence_plot(result, plot_alpha=0.3) # Cook’s distance
# Core Parameters
intercept = True
tss = result.centered_tss if intercept else result.uncentered_tss
ess = result.ess
rss = result.ssr # ((result.resid)**2).sum()
rsquared = result.rsquared # ~ ess/tss # result.ess/result.centered_tss or result.ess/result.uncentered_tss
df_model = result.df_model # model.rank - model.k_constant
df_resid = result.df_resid # result.nobs - model.rank
mse_tss = result.mse_total # result.centered_tss / (result.df_model + result.df_resid) or result.uncentered_tss / (result.df_model + result.df_resid)
mse_ess = result.mse_model # result.ess / result.df_model
mse_rss = result.mse_resid # result.ssr / result.df_resid
fvalue = result.fvalue # result.mse_model / result.mse_resid
# Additional Parameters
summary_scalar_table = dict()
summary_scalar_table['Log-Likelihood'] = result.llf
summary_scalar_table['R-squared'] = result.rsquared
summary_scalar_table['Adj. R-squared'] = result.rsquared_adj
summary_scalar_table['F-statistic'] = result.fvalue
summary_scalar_table['Prob (F-statistic)'] = result.f_pvalue
summary_scalar_table['AIC'] = result.aic # -2*result.llf + 2*model.rank
summary_scalar_table['BIC'] = result.bic # -2*result.llf + np.log(result.nobs)*model.rank
summary_scalar_table['Omnibus'] = sm.stats.omni_normtest(result.resid).statistic
summary_scalar_table['Prob(Omnibus)'] = sm.stats.omni_normtest(result.resid).pvalue
summary_scalar_table['Jarque-Bera (JB)'] = sm.stats.jarque_bera(result.resid)[0]
summary_scalar_table['Prob(JB)'] = sm.stats.jarque_bera(result.resid)[1]
summary_scalar_table['Skew'] = sm.stats.jarque_bera(result.resid)[2]
summary_scalar_table['Kurtosis'] = sm.stats.jarque_bera(result.resid)[3]
summary_scalar_table['Cond. No.'] = result.condition_number
summary_scalar_table['No. Observations'] = result.nobs
summary_scalar_table['Df Residuals'] = result.df_resid # result.nobs - model.rank
summary_scalar_table['Df Model'] = result.df_model # model.rank - model.k_constant
summary_scalar_table['Rank'] = model.rank
summary_scalar_table['Constraint'] = model.k_constant
summary_scalar_table['tss'] = result.centered_tss # with intercept
summary_scalar_table['tss_'] = result.uncentered_tss # without intercept
summary_scalar_table['ess'] = result.ess
summary_scalar_table['rss'] = result.ssr # (result.resid**2).sum()
summary_scalar_table['mse_total'] = result.mse_total # (result.centered_tss or result.uncentered_tss) / ( result.df_model + result.df_resid )
summary_scalar_table['mse_model'] = result.mse_model # result.ess / result.df_model
summary_scalar_table['mse_resid'] = result.mse_resid # result.ssr / result.df_resid
summary_vector_table = dict()
summary_vector_table['coef'] = result.params.squeeze()
summary_vector_table['cov_matrix'] = result.cov_params()
summary_vector_table['y_true'] = y # > tss
summary_vector_table['y_pred'] = result.fittedvalues # > ess # result.predict(df[df.columns.difference(['Y'], sort=False)])
summary_vector_table['residual'] = result.resid # > rss
summary_vector_table['pearson_residual'] = result.resid_pearson
summary_vector_table['hat'] = result.get_influence().hat_matrix_diag
Implementation: statsmodels(from_formula; pasty formula)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
import statsmodels.api as sm
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
bias = 100
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=3, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=50.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
df = pd.DataFrame(data=np.c_[X, y], columns=list(map(lambda x: 'X'+str(x), range(1,6))) + ['Y'])
df.insert(0, 'const', 1.0)
coef = [bias] + coef.tolist()
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))
model = sm.OLS.from_formula("Y ~ X1 + X2 + X3 + X4 + X5 ", data=df) # smf.ols("Y ~ X1 + X2 + X3 + X4 + X5 ", data=df)
result = model.fit()
sm.graphics.plot_partregress_grid(result, fig=plt.figure(figsize=(30,10))) # partial regresssion
sm.graphics.plot_ccpr_grid(result, fig=plt.figure(figsize=(30,10))) # ccpr: component component plus residual
h_ii = result.get_influence().hat_matrix_diag
outlier_index = h_ii > 3*(model.rank/model.nobs)
outliers = np.c_[X, y][outlier_index]
sm.qqplot(result.resid_pearson, line='45') # residual normality
display(result.summary())
evaluation = dict()
evaluation['rsquared'] = result.rsquared
evaluation['fvalue'] = result.fvalue
param_space = dict()
param_space['target'] = coef
param_space['naive'] = result.params.tolist()
param_space['ridge'] = model.fit_regularized(alpha=0.01, L1_wt=0).params.tolist() # Ridge Constraint: w ** 2
param_space['elastic'] = model.fit_regularized(alpha=0.01, L1_wt=.5).params.tolist() # Elastic Constraint: absolute w & w ** 2
param_space['lasso'] = model.fit_regularized(alpha=0.01, L1_wt=1).params.tolist() # Lasso Constraint: absolute w
# F-Test for Coefficient-Hypothesis
# sms.anova_lm(result, typ=2)
sms.anova_lm(sm.OLS.from_formula("Y ~ X1 + X2 + X3 + X4 + X5 ", data=df).fit(), typ=2)
# T-Test for Coefficient-Hypothesis
sm.OLS(endog=df['Y'], exog=df[df.columns.difference(['Y'], sort=False)]).fit().t_test('const = 0')
# heteroscedasticity
lm_stats, lm_pval, f_stats, f_pval = sms.het_breuschpagan(result.resid, result.model.exog) # Null (H0): Signifies that Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
lm_stats, lm_pval, f_stats, f_pval = sms.het_white(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
stats, pval, _ = sms.het_goldfeldquandt(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
Weighted Least Square(WLS) and Generalized Least Square(GLS)
$${\displaystyle \begin{aligned} \text{OLS} \;&\begin{cases} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} \\ \; &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} , \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n})\\ \operatorname{E}[{\hat {\boldsymbol {\beta }}}] &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \operatorname{E}[ \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ] = \boldsymbol{\beta} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}}] &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \operatorname{Var}[\mathbf{X}^{\mathsf {T}}\boldsymbol{\varepsilon} ] \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \; &= \sigma_{\boldsymbol{\varepsilon}}^{2} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \end{cases} \\ \text{GLS} \;&\begin{cases} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X}^{\mathsf {T}} \boldsymbol{\Omega}^{-1} \mathbf {X} \right)^{-1}\mathbf {X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {y} \\ \; &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\boldsymbol{\varepsilon} , \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Omega})\quad \boldsymbol{\Omega}:\text{Diagonal Matrix} \\ \operatorname{E}[{\hat {\boldsymbol {\beta }}}] &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \operatorname{E}[ \mathbf {X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\boldsymbol{\varepsilon} ] = \boldsymbol{\beta} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}}] &= \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \operatorname{Var}[\mathbf{X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\boldsymbol{\varepsilon} ] \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \\ \; &= \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \underbrace{\mathbf{X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\operatorname{Var}[\boldsymbol{\varepsilon} ]\boldsymbol{\Omega}^{-\mathsf {T}} \mathbf{X} }_{= \mathbf{X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1} \mathbf{X}}\left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \\ \; &= \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \\ \end{cases} \\ \end{aligned} }$$import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import matplotlib.pyplot as plt
X = np.random.normal(0, 1, size=3000)
y = 3*X + np.random.normal(0, 1, size=3000)
ols = sm.OLS(y, X).fit()
wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit()
gls = sm.GLS(y, X, sigma=np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit()
regularized_ols = sm.OLS(y, X).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_gls = sm.GLS(y, X, sigma=np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
ols = sm.WLS(y, X, weights=1).fit()
wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit()
gls = sm.WLS(y, X, weights=1/np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit()
regularized_ols = sm.WLS(y, X, weights=1).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_gls = sm.WLS(y, X, weights=1/np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
# prediction intervals
prstd, obs_ci_lower, obs_ci_upper = wls_prediction_std(ols)
prstd, obs_ci_lower, obs_ci_upper = wls_prediction_std(wls)
prstd, obs_ci_lower, obs_ci_upper = wls_prediction_std(gls)
ols.get_prediction().summary_frame(alpha=.05)
wls.get_prediction().summary_frame(alpha=.05)
gls.get_prediction().summary_frame(alpha=.05)
# visualization
fig, axes = plt.subplots(4, 1, figsize=(30,10))
axes[0].plot(ols.resid, label=f'OLS[{ols.resid.mean().round(6)}, {ols.resid.var(ddof=0).round(6)}]', alpha=.3)
axes[0].plot(wls.resid, label=f'WLS[{wls.resid.mean().round(6)}, {wls.resid.var(ddof=0).round(6)}]', alpha=.3)
axes[0].plot(gls.resid, label=f'GLS[{gls.resid.mean().round(6)}, {gls.resid.var(ddof=0).round(6)}]', alpha=.3)
axes[0].legend(loc='upper right', frameon=False); axes[0].set_title('Residuals')
axes[1].plot(X); axes[1].set_title('X')
axes[2].plot(y); axes[2].set_title('y')
axes[3].plot(y - 3*X); axes[3].set_title('e')
plt.tight_layout()
pd.DataFrame(np.c_[y, X, ols.fittedvalues, ols.resid, wls.fittedvalues, wls.resid, gls.fittedvalues, gls.resid], columns=['y_true','X','y_ols','resid_ols','y_wls','resid_wls','y_gls','resid_gls']).corr()
import numpy as np
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
X = np.random.normal(0, 1, size=3000)
y = 3*X + np.random.normal(0, 1, size=3000)
ols_sm = sm.WLS(y, X, weights=1).fit()
ols_sklearn = LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y, sample_weight=1)
print(ols_sm.params, ols_sklearn.coef_)
weights = 1/sm.OLS(y, X).fit().resid**2
sample_weight = 1/(y - LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y).predict(X[:, np.newaxis]))**2
wls_sm = sm.WLS(y, X, weights=weights).fit()
wls_sklearn = LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y, sample_weight=sample_weight)
print(wls_sm.params, wls_sklearn.coef_)
weights = 1/np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))
sample_weight = 1/np.diag(np.kron(((y - LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y).predict(X[:, np.newaxis]))**2).var(ddof=0), np.identity(y.shape[0])))
gls_sm = sm.WLS(y, X, weights=weights).fit()
gls_sklearn = LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y, sample_weight=sample_weight)
print(gls_sm.params, gls_sklearn.coef_)
Maximum Likelihood Estimation
Maximum Likelihood Estimation $${\displaystyle \text{Likelihood}}$$ $${\displaystyle \begin{align} {\mathcal {L}}({\boldsymbol{\theta}} \mid \mathbf{y}) &= p (\mathbf{y} \mid {\boldsymbol{\theta}} ), \quad \mathbf{y} \sim {\mathcal {N}}(\mathbf{X}{\boldsymbol{\beta}}, \sigma^{2}\mathbf{I}_{n}) \\ \; &= \frac{1}{({2\pi})^{(p+1)/2} | \sigma | } \text{exp} \left\{ - \frac{1}{2} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} )^{T}(\sigma^{2}\mathbf{I}_{n})^{-1}(\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}}) \right\} \\ \; & \; \\ \text{log}\,{\mathcal {L}}({\boldsymbol{\theta}} \mid \mathbf{y}) &= -\frac{1}{2\sigma^{2}} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} )^{T} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} ) - \left( \frac{p+1}{2}\text{log}\,2\pi + \text{log}\, | \sigma | \right) \\ \; & \; \\ \end{align} }$$ $${\displaystyle \text{Estimation}}$$ $${\displaystyle \begin{align} {\underset {{\boldsymbol{\beta}} \in \mathbb {R}^{p+1} }{\operatorname {argmax} }}\, \text{log}\, {\mathcal {L}}({\boldsymbol{\theta}} = {\boldsymbol{\beta}} \mid \mathbf{y}) &= {\underset {{\boldsymbol{\beta}} \in \mathbb {R}^{p+1} }{\operatorname {argmax} }}\, \left\{ \sum_{i=1}^{n} \left( -\frac{1}{2\sigma^{2}} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} )^{T} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} ) \right) - n \left( \frac{p+1}{2}\text{log}\,2\pi + \text{log}\, | \sigma | \right) \right\} \\ \end{align} }$$ $${\displaystyle \begin{align} \frac{\partial}{\partial {\boldsymbol{\beta}}}\text{log}\, {\mathcal {L}}({\boldsymbol{\theta}} = {\boldsymbol{\beta}} \mid \mathbf{y}) &= \frac{1}{2\sigma^{2}} (2\mathbf{X}^{T}\mathbf{X} {\boldsymbol{\beta}} - 2\mathbf{X}^{T}\mathbf{y}) = 0 \\ \therefore\; {\hat {\boldsymbol {\beta }}} &\equiv {\boldsymbol{\beta}}_{MLE} = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \\ {\hat{\mathbf{y}}} &\equiv \mathbf{X}{\hat {\boldsymbol {\beta }}} \\ \end{align} }$$
Generalized Linear Regression: MLE
Assumption: Probability Distribution for Linearization
- https://www.statsmodels.org/dev/examples/index.html#generalized-linear-models
- https://www.statsmodels.org/dev/examples/notebooks/generated/glm.html
# https://en.wikipedia.org/wiki/Generalized_linear_model
import numpy as np
import pandas as pd
import statsmodels.api as sm
X = np.linspace(-3, 3, 100)
y = -(.03*X + .0001*X**2 - 1.0) + .001 * np.random.random(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gaussian(sm.families.links.Identity())).fit()
result.summary()
X = np.linspace(-3, 3, 100)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + .1 * np.random.random(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gaussian(sm.families.links.Log())).fit()
result.summary()
X = np.linspace(-3, 3, 100)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + .001 * np.random.gamma(shape=1, scale=1.0, size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gamma(sm.families.links.Log())).fit()
result.summary()
X = np.linspace(-3, 3, 100)
y = -(.03*X + .0001*X**2 - 1.0) + .001 * np.random.random(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Tweedie(sm.families.links.Identity())).fit()
result.summary()
# https://en.wikipedia.org/wiki/Generalized_linear_model
import numpy as np
import pandas as pd
import statsmodels.api as sm
X = np.linspace(-3, 3, 100)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + .001 * np.random.normal(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gaussian(sm.families.links.Log())).fit()
result.summary()
# residuals
result.resid_response # result.model.endog = result.fittedvalues + result.resid_response
result.resid_pearson
result.resid_working
result.resid_deviance
result.resid_anscombe
result.resid_anscombe_scaled
result.resid_anscombe_unscaled
# canonical relations between distribtion and link function
sm.families.Binomial(sm.families.links.Logit()) # y range: [0, 1, 2, ..., n] / count of # of "yes" occurrences out of N yes/no occurrences
sm.families.Poisson(sm.families.links.Log()) # y range: [0, 1, 2, ..., np.inf) / count of occurrences in fixed amount of time/space
sm.families.Gamma(sm.families.links.InversePower()) # y range: (0, np.inf) / Exponential-response data, scale parameters
sm.families.Gaussian(sm.families.links.Identity()) # y range: (-np.inf, np.inf) / Linear-response data
sm.families.InverseGaussian(sm.families.links.InverseSquared()) # y range: (0, np.inf)
# distributions
sm.families.Binomial
sm.families.NegativeBinomial
sm.families.Poisson
sm.families.Gaussian
sm.families.Gamma
sm.families.InverseGaussian
sm.families.Tweedie
# link functions
sm.families.links.CDFLink()
sm.families.links.CLogLog()
sm.families.links.Cauchy()
sm.families.links.Identity()
sm.families.links.InversePower()
sm.families.links.InverseSquared()
sm.families.links.Log()
sm.families.links.LogC()
sm.families.links.LogLog()
sm.families.links.Logit()
sm.families.links.NegativeBinomial()
sm.families.links.Power()
sm.families.links.Probit()
sm.families.links.Sqrt()
# https://en.wikipedia.org/wiki/Generalized_linear_model
import numpy as np
import pandas as pd
import statsmodels.api as sm
X = np.random.uniform(-3, 3, 1000)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + 0.01*np.random.gamma(1, 2, size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gamma(sm.families.links.Log()), freq_weights=None, var_weights=y).fit()
weighted_glm_corr = pd.DataFrame(np.c_[y, X, result.fittedvalues, result.resid_response], columns=['y', 'X', 'fittedvalues', 'resid']).corr().round(4)
weighted_glm_corr.columns = pd.MultiIndex.from_product([['WeightGLM'], weighted_glm_corr.columns.tolist()])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gamma(sm.families.links.Log()), freq_weights=None, var_weights=None).fit()
glm_corr = pd.DataFrame(np.c_[y, X, result.fittedvalues, result.resid_response], columns=['y', 'X', 'fittedvalues', 'resid']).corr().round(4)
glm_corr.columns = pd.MultiIndex.from_product([['GLM'], glm_corr.columns.tolist()])
result = sm.OLS(np.log(y), np.c_[np.ones_like(X), X, X**2]).fit()
ols_corr = pd.DataFrame(np.c_[y, X, result.fittedvalues, result.resid], columns=['y', 'X', 'fittedvalues', 'resid']).corr().round(4)
ols_corr.columns = pd.MultiIndex.from_product([['OLS'], ols_corr.columns.tolist()])
corr = pd.concat([weighted_glm_corr, glm_corr, ols_corr], axis=1)
corr
Regression Noise: Residual Analysis
import numpy as np
import pandas as pd
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, ARX, ConstantVariance, GARCH, Normal
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
def residual_analysis(y, nlags=40):
y_m1 = y; y_m2 = y**2
fig = plt.figure(figsize=(30, 10)); gs = mpl.gridspec.GridSpec(4, 12)
fig.add_subplot(gs[0, 0:6]).plot(y_m1); fig.axes[0].set_title('Time Series: 1st Moment'); [fig.axes[0].spines[i].set_visible(False) for i in ['top', 'right']]
fig.add_subplot(gs[0, 6:12]).plot(y_m2); fig.axes[1].set_title('Time Series: 2nd Moment'); [fig.axes[1].spines[i].set_visible(False) for i in ['top', 'right']]
smt.graphics.plot_acf(y_m1, lags=nlags, ax=fig.add_subplot(gs[1, 0:4])); [fig.axes[2].spines[i].set_visible(False) for i in ['top', 'right']]
smt.graphics.plot_pacf(y_m1, lags=nlags, ax=fig.add_subplot(gs[2, 0:4])); [fig.axes[3].spines[i].set_visible(False) for i in ['top', 'right']]
sns.heatmap(pd.DataFrame(np.c_[[np.roll(y_m1, shift=i) for i in range(nlags)]].T).corr(), cmap="RdBu", cbar_kws={"shrink": 0.5}, square=True, center=0, linewidth=.5, ax=fig.add_subplot(gs[1:3, 4:6])).set_title('Autocorrelation')
smt.graphics.plot_acf(y_m2, lags=nlags, ax=fig.add_subplot(gs[1, 6:10])); [fig.axes[6].spines[i].set_visible(False) for i in ['top', 'right']]
smt.graphics.plot_pacf(y_m2, lags=nlags, ax=fig.add_subplot(gs[2, 6:10])); [fig.axes[7].spines[i].set_visible(False) for i in ['top', 'right']]
sns.heatmap(pd.DataFrame(np.c_[[np.roll(y_m2, shift=i) for i in range(nlags)]].T).corr(), cmap="RdBu", cbar_kws={"shrink": 0.5}, square=True, center=0, linewidth=.5, ax=fig.add_subplot(gs[1:3, 10:12])).set_title('Autocorrelation')
list(map(lambda lag: pd.plotting.lag_plot(pd.Series(y_m1), lag=lag, ax=fig.add_subplot(gs[3, lag-1])), range(1, 7)))
list(map(lambda lag: pd.plotting.lag_plot(pd.Series(y_m2), lag=lag, ax=fig.add_subplot(gs[3, 6+lag-1])), range(1, 7)))
plt.tight_layout()
y_m1_acf = smt.acf(y_m1, adjusted=False, nlags=nlags, qstat=False, fft=True, alpha=None, bartlett_confint=True, missing='none')
y_m1_pacf = smt.pacf(y_m1, nlags=nlags, method='ywadjusted', alpha=None)
y_m2_acf = smt.acf(y_m2, adjusted=False, nlags=nlags, qstat=False, fft=True, alpha=None, bartlett_confint=True, missing='none')
y_m2_pacf = smt.pacf(y_m2, nlags=nlags, method='ywadjusted', alpha=None)
autocorr = pd.DataFrame(np.c_[y_m1_acf, y_m1_pacf, y_m2_acf, y_m2_pacf])
autocorr.index.name = 'Lags'
autocorr.columns = pd.MultiIndex.from_tuples([('1M', 'ACF'), ('1M', 'PACF'), ('2M', 'ACF'), ('2M', 'PACF')])
display(autocorr.T)
white_noise = np.random.normal(size=1000) # white noise
arma_noise = smt.ArmaProcess(ar=[1, -.3, -.2, .3], ma=[1]).generate_sample(nsample=1000, burnin=50)
arx_garch_noise = ARX(lags=list(range(1, 3)), volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0.5, 0.8] + [0.1, 0.1, 0.1, 0.1], nobs=1000, burn=500)['data'].values
cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [0.1, .8, .1], nobs=1000, burn=500)['data'].values
y = cont_garch_noise
residual_analysis(y, nlags=40)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from arch.univariate import ARX, GARCH, Normal
import statsmodels.api as sm
cont_garch_noise = ARX(volatility=GARCH(p=1, o=0, q=1), distribution=Normal(), lags=2).simulate([0] + [.4, .3] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
X1 = np.random.normal(size=1000)
X2 = .5*X1 + np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = X1 + X2 + X3 + cont_garch_noise + 5*epsilon
resid = sm.OLS(y, np.c_[X1, X2, X3]).fit().resid
# Heteroscedasticity
pd.Series(resid**2).plot(figsize=(30,5))
# Autocorrelation[residual(t) <-> residual(t-1): Correlation]
# Multicollinearity[X <-> X': Correlation]
# Endogeneity[X <-> resid: No Correlation & Z <-> resid: Correlation]
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(pd.DataFrame(np.c_[[np.roll(resid, i) for i in range(1,3)]].T).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # Autocorrealtion
display(pd.DataFrame(np.c_[X1, X2]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # Multicollinearity
display(pd.DataFrame(np.c_[X3, Z, resid]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # Endogeneity
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.tsa.api as smt
y = smt.ArmaProcess(ar=[1, -.3, -.2], ma=[1]).generate_sample(nsample=1000, burnin=50).cumsum()
y = pd.Series(y, index=pd.date_range('00:00:00', periods=y.shape[0], freq='D'), name='Regression')
result = sm.OLS(endog=y, exog=np.c_[np.ones_like(y), np.arange(y.shape[0]), y.shift(1).ewm(com=10).var().bfill(), y.shift(1).bfill(), y.shift(2).bfill()]).fit()
y_true = y.copy()
y_pred = pd.Series(result.predict(), index=y.index)
y_resid = result.resid
plt.figure(figsize=(30,5)); [plt.gca().spines[spine].set_visible(False) for spine in ['top', 'right']]
plt.plot(y_resid, c='green', alpha=.4, label='residual')
plt.plot(y_true, c='black', label='data')
plt.plot(y_pred, c='r', label='prediction')
plt.legend(frameon=False)
Estimator not to be hold for the assumptions
Exogeneity: IV2SLS
import numpy as np
import pandas as pd
import statsmodels.api as sm
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + epsilon # target
data = pd.DataFrame(np.c_[X1, X2, X3, Z, y], columns=['X1', 'X2', 'X3', 'Z', 'y'])
# Endogeneity
resid = sm.OLS(data['y'], exog=data.drop(['y', 'Z'], axis=1)).fit().resid
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(pd.DataFrame(np.c_[X3, Z, resid]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))
#display(pd.DataFrame(np.c_[X3, Z, epsilon]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))
import numpy as np
import pandas as pd
import statsmodels.api as sm
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + epsilon # target
data = pd.DataFrame(np.c_[X1, X2, X3, Z, y], columns=['X1', 'X2', 'X3', 'Z', 'y'])
print('[ISSUE:Perpendicularity]', np.corrcoef(X3, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
result = sm.OLS(data['y'], exog=data.drop(['y', 'Z'], axis=1)).fit()
display(result.summary().tables[1]) # Check X3 coefficient, R-squared is not working, because of linearity.
first_stage_ols_result = sm.OLS(endog=X3, exog=data.drop(['y', 'X3'], axis=1)).fit()
X_hat = first_stage_ols_result.fittedvalues
last_stage_ols_result = sm.OLS(endog=y, exog=np.c_[data.drop(['y', 'X3', 'Z'], axis=1), X_hat]).fit()
display(last_stage_ols_result.summary().tables[1]) # Check X3 coefficient, R-squared is not working, because of linearity.
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.sandbox.regression import gmm
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + epsilon # target
data = pd.DataFrame(np.c_[X1, X2, X3, Z, y], columns=['X1', 'X2', 'X3', 'Z', 'y'])
print('[ISSUE:Perpendicularity]', np.corrcoef(X3, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
result = gmm.IV2SLS(y, exog=data.drop(['y', 'Z'], axis=1), instrument=data.drop(['y', 'X3'], axis=1)).fit()
display(result.summary().tables[1]) # Check X3 coefficient, R-squared is not working, because of linearity.
Multicollinearity: PCA
import numpy as np
import pandas as pd
X1 = np.random.normal(size=1000)
X2 = .5*X1 + np.random.normal(size=1000)
X3 = .5*X1 + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + np.random.normal(0, 10, size=1000)
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
# Multicollinearity
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(data.drop('y', axis=1).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))
#display(pd.DataFrame(np.c_[X1, X2, X3]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))
import numpy as np
import pandas as pd
import statsmodels.api as sm
X1 = np.random.normal(size=1000)
X2 = .5*X1 + np.random.normal(size=1000)
X3 = .5*X1 + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + np.random.normal(0, 10, size=1000)
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
print('[ISSUE:Multicollinearity]', data[['X1', 'X2', 'X3']].corr().iloc[0, :].round(4).tolist())
result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[2]) # Check Cond. No.
pca = sm.PCA(data=data.drop('y', axis=1), ncomp=2, demean=True, standardize=True, normalize=False, method='svd', gls=False, weights=None, missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100, svd_full_matrices=False)
result2 = sm.OLS(data['y'], exog=pca.factors).fit()
display(result2.summary().tables[2]) # Check Cond. No.
Autocorrelation: ARIMA
import numpy as np
import pandas as pd
import statsmodels.api as sm
white_noise = np.random.normal(size=1000)
y = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
# Autocorrelation
resid = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit().resid
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(pd.DataFrame(np.c_[[np.roll(resid, i) for i in range(1,3)]].T).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # np.cov(df.T.values)
import numpy as np
import pandas as pd
import statsmodels.api as sm
white_noise = np.random.normal(size=1000)
y = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Durbin-Watson and R-squared
print('[ISSUE:Autocorrelation]', pd.DataFrame(np.c_[[np.roll(result1.resid, i) for i in range(1,4)]].T).corr().iloc[0,:].round(4).tolist())
data['shift1_r'] = result1.resid.shift(1).fillna(result1.resid.iloc[:1])
data['shift2_r'] = result1.resid.shift(2).fillna(result1.resid.iloc[:2])
result2 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result2.summary().tables[1]) # Check Durbin-Watson and R-squared
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
white_noise = np.random.normal(size=1000)
y = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Durbin-Watson and R-squared
print('[ISSUE:Autocorrelation]', pd.DataFrame(np.c_[[np.roll(result1.resid, i) for i in range(1,4)]].T).corr().iloc[0,:].round(4).tolist())
result2 = smt.SARIMAX(data['y'], exog=data.drop(['y'], axis=1), order=(2,0,0)).fit(disp=0)
display(result2.summary().tables[1])
Heteroscedasticity: GLS
import numpy as np
import pandas as pd
import statsmodels.api as sm
from arch.univariate import ConstantMean, GARCH, Normal
cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + cont_garch_noise
data = pd.DataFrame(np.c_[np.ones_like(y), X1, X2, X3, y], columns=['Const', 'X1', 'X2', 'X3', 'y'])
# Heteroscedasticity
resid = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit().resid
pd.Series(resid**2).plot(figsize=(30,5))
#pd.Series(cont_garch_noise**2).plot(figsize=(30,5))
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, GARCH, Normal
from scipy.linalg import toeplitz
cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + cont_garch_noise
data = pd.DataFrame(np.c_[np.ones_like(y), X1, X2, X3, y], columns=['Const', 'X1', 'X2', 'X3', 'y'])
result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Omnibus and Jarque-Bera
print('[ISSUE:Heteroscedasticity]', round(result1.resid.iloc[:500].var(ddof=1), 4), round(result1.resid.iloc[500:].var(ddof=1), 4))
rho = sm.OLS(np.asarray(result1.resid)[1:], sm.add_constant(np.asarray(result1.resid)[:-1])).fit().params[1]
order = toeplitz(range(len(result1.resid)))
sigma = rho ** order
result2 = sm.GLS(data['y'], data.drop('y', axis=1), sigma=sigma).fit()
display(result2.summary().tables[1]) # for unbiased estimation
Autocorrelation & Heteroscedasticity: GLSAR
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, GARCH, Normal
from scipy.linalg import toeplitz
cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
y = np.empty_like(cont_garch_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(cont_garch_noise):
y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
# Heteroscedasticity & Autocorrelation
resid = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit().resid
pd.Series(resid**2).plot(figsize=(30,5)) #pd.Series(cont_garch_noise**2).plot(figsize=(30,5))
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(pd.DataFrame(np.c_[[np.roll(resid, i) for i in range(1,3)]].T).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # np.cov(df.T.values)
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, GARCH, Normal
from scipy.linalg import toeplitz
cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
y = np.empty_like(cont_garch_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(cont_garch_noise):
y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Omnibus and Jarque-Bera
print('[ISSUE:Heteroscedasticity]', round(result1.resid.iloc[:500].var(ddof=1), 4), round(result1.resid.iloc[500:].var(ddof=1), 4))
print('[ISSUE:Autocorrelation]', pd.DataFrame(np.c_[[np.roll(result1.resid, i) for i in range(1,4)]].T).corr().iloc[0,:].round(4).tolist())
result2 = sm.GLSAR(data['y'], exog=data.drop('y', axis=1), rho=2).iterative_fit(1)
display(result2.summary().tables[1]) # for unbiased estimation
Covariance Estimation
# https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.get_robustcov_results.html
# https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.cov_params.html
Confidence intervals
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
X = np.random.normal(size=100)
y = 3*X + 5 + np.random.normal(size=100)
result = sm.OLS(endog=y, exog=np.c_[np.ones_like(y), X]).fit()
fig, axes = plt.subplots(2, 1, figsize=(30,7)); [axes[i].spines[spine].set_visible(False) for spine in ['top', 'right'] for i in range(0,2)]
y_true = result.model.endog
y_pred = result.fittedvalues
y_resid = result.resid
axes[0].plot(y_true, c='black', marker='o', label='data')
axes[0].plot(y_resid, c='green', alpha=.4, label='residual')
axes[0].plot(y_pred, c='r', label='prediction')
axes[0].legend(frameon=False)
sf = result.get_prediction().summary_frame(alpha=.05)
sf.insert(0, 'obs_X', X)
sf.insert(1, 'obs_y', result.model.endog)
sf = sf.sort_values(by=['obs_X'], ascending=True)
axes[1].plot(sf['obs_X'], sf['obs_y'], c='black', lw=0, marker='o', label='data')
axes[1].plot(sf['obs_X'], sf['mean'], c='red', label='prediction')
axes[1].errorbar(sf['obs_X'], sf['mean'], yerr=sf['mean_se'], alpha=.4, color='blue')
axes[1].fill_between(sf['obs_X'], y1=sf['mean_ci_lower'], y2=sf['mean_ci_upper'], alpha=.5, color='green', label='prediction ci')
axes[1].fill_between(sf['obs_X'], y1=sf['obs_ci_lower'], y2=sf['obs_ci_upper'], alpha=.1, color='green', label='observation ci')
axes[1].legend(frameon=False)
Decision boundary
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer, QuantileTransformer, StandardScaler, Normalizer, RobustScaler, MinMaxScaler, MaxAbsScaler
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X4 = np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + np.random.normal(0, 1, size=1000)
numerical_columns = ['X1', 'X2', 'X3', 'X4']
data = pd.DataFrame(np.c_[X1, X2, X3, X4, y], columns=numerical_columns + ['y'])
data['boundary'] = data['y'].apply(lambda y: 'positive' if y - data['y'].mean() > 0 else 'negative')
numerical_columns += ['y']
# decision boundary
#sns.pairplot(pd.DataFrame(np.c_[StandardScaler().fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
#sns.pairplot(pd.DataFrame(np.c_[MaxAbsScaler().fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
#sns.pairplot(pd.DataFrame(np.c_[PowerTransformer(method='yeo-johnson').fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
#sns.pairplot(pd.DataFrame(np.c_[QuantileTransformer(output_distribution='normal').fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(data[numerical_columns].corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # np.cov(df.T.values)
with warnings.catch_warnings():
warnings.simplefilter(action='ignore', category=FutureWarning)
sns.pairplot(data[numerical_columns+['boundary']], hue='boundary')
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer, QuantileTransformer, StandardScaler, Normalizer, RobustScaler, MinMaxScaler, MaxAbsScaler
white_noise = np.random.normal(size=1000)
y1 = np.empty_like(white_noise)
y2 = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X4 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
y1[t] = X[t] + .5*(y1[t-1] - X[t-1]) + .2*(y1[t-2]-X[t-2]) + noise
y2[t] = .5*(y2[t-1] - X[t-1]) + .2*(y2[t-2]-X[t-2]) + noise
y = y2
numerical_columns = ['X1', 'X2', 'X3', 'X4']
data = pd.DataFrame(np.c_[X1, X2, X3, X4, y], columns=numerical_columns + ['y'])
data['shift1_X1'] = data['X1'].shift(1).fillna(data['X1'].iloc[:1])
data['shift1_X2'] = data['X2'].shift(1).fillna(data['X2'].iloc[:1])
data['shift1_X3'] = data['X3'].shift(1).fillna(data['X3'].iloc[:1])
data['shift1_X4'] = data['X4'].shift(1).fillna(data['X4'].iloc[:1])
data['residual'] = sm.OLS(y, exog=np.c_[X1, X2, X3, X4]).fit().resid
data['shift1_r'] = data['residual'].shift(1).fillna(data['residual'].iloc[:1])
data['shift1_y'] = data['y'].shift(1).fillna(data['y'].iloc[:1])
data['boundary'] = data['y'].apply(lambda y: 'positive' if y - data['y'].mean() > 0 else 'negative')
numerical_columns += ['shift1_X1', 'shift1_X2', 'shift1_X3', 'shift1_X4', 'shift1_y', 'shift1_r', 'residual', 'y']
with pd.option_context('display.float_format', '{:0.2f}'.format):
display(data[numerical_columns].corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # np.cov(df.T.values)
with warnings.catch_warnings():
warnings.simplefilter(action='ignore', category=FutureWarning)
sns.pairplot(data[numerical_columns+['boundary']], hue='boundary')
Classification: MLE
$${\displaystyle \begin{aligned} \text{Sigmoid Function:}\quad \sigma (x) &={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{1+e^{x}}}=1-\sigma (-x) \\ \text{Softmax Function:}\quad\sigma (\mathbf {z} )_{i}&={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}} \\ \; & \quad {\text{ for }}i=1,\dotsc ,K{\text{ and }}\mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K} \\ \end{aligned} }$$
Logistic Regression
Logistic Regression
$${\displaystyle \begin{aligned} f_{p}(x)&={\frac {1}{1+e^{-(x-\mu )/s}}}, \quad \beta_{0}=-\frac{\mu}{s}, \; \beta_{1}= \frac{1}{s}\\ \; &={\frac {1}{1+e^{-(\beta_{0} + \beta_{1}x)}}}, \quad \mu=-\frac{\beta_{0}}{\beta_{1}}, \; s=\frac{1}{\beta_{1}} \\ \end{aligned} }$$import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
def logistic(X, w0=3, w1=6):
Z = w0 + w1*X
y = 1/(1+np.exp(-Z))
return y.round()
X = np.random.normal(2, 1, size=3000)
X = X[X.argsort()].copy()
y = logistic(X)
model = sm.Logit(y, np.c_[np.ones_like(X), X])
model = sm.MNLogit(y, np.c_[np.ones_like(X), X])
result = model.fit(start_params=None, method=['newton', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg', 'basinhopping', 'minimize'][7], maxiter=1000, full_output=1, disp=0, callback=None)
result = model.fit_regularized(start_params=None, method=['l1', 'l1_cvxopt_cp'][0], maxiter='defined_by_method', full_output=1, disp=0, callback=None, alpha=.3, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03, qc_verbose=False)
result.summary()
w0, w1 = result.params
plt.plot(X, y, lw=0, marker='o', c='black')
plt.plot(X, logistic(X, w0, w1), marker='x') #plt.plot(x, np.exp(model.fittedvalues)/(1+np.exp(model.fittedvalues)))
result.summary()
Analysis on classification task
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
data = pd.DataFrame(np.c_[
stats.binom.rvs(n=2, p=2/3, size=1000), # X0: categorical explainatory variable1
stats.binom.rvs(n=2, p=2/3, size=1000), # X1: categorical explainatory variable2
stats.norm.rvs(1,1, size=1000), # X2: numerical explainatory variable1
stats.norm.rvs(1,1, size=1000), # X3: numerical explainatory variable2
stats.bernoulli.rvs(p=2/3, size=1000), # Y: response variable
], columns=['X0', 'X1', 'X2', 'X3', 'y'])
categorical_columns = ['X0', 'X1']
numerical_columns = ['X2', 'X3']
centering_data = data.copy()
for column in numerical_columns:
centering_data[column] = centering_data[column] - centering_data[column].mean()
standardizing_data = data.copy()
for column in numerical_columns:
standardizing_data[column] = (standardizing_data[column] - standardizing_data[column].mean()) / standardizing_data[column].std(ddof=1)
# No Interaction
patsy_formula = "y ~ " + ' + '.join(numerical_columns) + ' + ' + ' + '.join(list(map(lambda column: 'C('+column+')', categorical_columns))) + " - 1"
centering_model = sm.Logit.from_formula(patsy_formula, data=centering_data).fit()
centering_summary = pd.DataFrame(centering_model.summary().tables[1].data)
centering_summary.columns = ['feature'] + centering_summary.loc[0, 1:].tolist()
centering_summary = centering_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(centering_model, fig=plt.figure(figsize=(30,10))) # partial regresssion
standardizing_model = sm.Logit.from_formula(patsy_formula, data=standardizing_data).fit()
standardizing_summary = pd.DataFrame(standardizing_model.summary().tables[1].data)
standardizing_summary.columns = ['feature'] + standardizing_summary.loc[0, 1:].tolist()
standardizing_summary = standardizing_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(standardizing_model, fig=plt.figure(figsize=(30,10))) # partial regresssion
summary = pd.concat([centering_summary, standardizing_summary], axis=1)
summary.columns = pd.MultiIndex.from_tuples(map(lambda column: ('centering', column[1]) if column[0] < 7 else ('standardizing', column[1]), enumerate(summary.columns)))
summary
Multinomial Logistic Regression
#
Conditional Logistic Regression
#https://rinterested.github.io/statistics/logistic_regression.html
#https://rinterested.github.io/statistics/conditional_logistic_regression.html
Probit Regression
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
def logistic(X, w0=3, w1=6):
Z = w0 + w1*X
y = 1/(1+np.exp(-Z))
return y.round()
X = np.random.normal(2, 1, size=3000)
X = X[X.argsort()].copy()
y = logistic(X)
model = sm.Probit(y, np.c_[np.ones_like(X), X])
result = model.fit(start_params=None, method=['newton', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg', 'basinhopping', 'minimize'][7], maxiter=1000, full_output=1, disp=0, callback=None)
result = model.fit_regularized(start_params=None, method=['l1', 'l1_cvxopt_cp'][0], maxiter='defined_by_method', full_output=1, disp=0, callback=None, alpha=.3, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03, qc_verbose=False)
result.summary()
w0, w1 = result.params
plt.plot(X, y, lw=0, marker='o', c='black')
plt.plot(X, logistic(X, w0, w1), marker='x') #plt.plot(x, np.exp(model.fittedvalues)/(1+np.exp(model.fittedvalues)))
result.summary()
Tobit Regression
#
Robust Linear Regression
- https://www.statsmodels.org/dev/rlm.html
- https://www.statsmodels.org/dev/examples/notebooks/generated/robust_models_0.html
- https://www.statsmodels.org/dev/examples/notebooks/generated/robust_models_1.html
import statsmodels.api as sm
sm.RLM
TABLE OF CONTENTS
Linear Mixed Effects Models
Varying-Coefficient Models
Variance Components Model
Linear Mixed Effect Model Structure
Multilevel Regression:Hierarchical and Nested Group Analysis
Generalized Linear Regression
Generalized Additive Models (GAM)
Generalized Linear Mixed Effects (GLIMMIX)
Generalized Method of Moments (GMM)
Generalized Estimating Equation (GEE)
General Linear Regression: OLS
Seemingly Unrelated Regression(SUR/SURE)
Linear Mixed Effects Models
Varying-Coefficient Models
- https://www.statsmodels.org/dev/mixed_linear.html
- https://www.statsmodels.org/dev/examples/notebooks/generated/mixed_lm_example.html
REML: Mixed Effects Model
# https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from statsmodels.regression.mixed_linear_model import MixedLMParams, VCSpec, Penalty
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts and slopes
result = sm.MixedLM(endog=data['y'], exog=data[['const', 'X']], exog_re=data[['const', 'X']], exog_vc=None, groups=data['group']).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = sm.MixedLM.from_formula(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()
# MixedLM regularization
result = sm.MixedLM(endog=data['y'], exog=data[['const', 'X']], exog_re=data[['const', 'X']], exog_vc=None, groups=data['group']).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = sm.MixedLM.from_formula(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result.fe_params, result.random_effects
mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
random_effect = pd.Series(random_effect.values, index=result.fe_params.index)
mixed_params[group] = result.fe_params + random_effect
fixed_params[group] = result.fe_params
random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params
REML: Instrumental Random Effects Model
# https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from statsmodels.regression.mixed_linear_model import MixedLMParams, VCSpec, Penalty
np.random.seed(0)
N1 = np.random.normal(size=1000)
N2 = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N1, N2, C]
y0 = 2 * X[np.where(C==0)[0], 0] + 1 + 2.5 * X[np.where(C==0)[0], 1] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 * X[np.where(C==1)[0], 0] + 2 + 3.5 * X[np.where(C==1)[0], 1] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 2 * X[np.where(C==2)[0], 0] + 3 + 4.5 * X[np.where(C==2)[0], 1] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'Z', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts and slopes
result = sm.MixedLM(endog=data['y'], exog=data[['X']], exog_re=data[['const', 'Z']], exog_vc=None, groups=data['group']).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = sm.MixedLM.from_formula(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = smf.mixedlm(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()
# MixedLM regularization
result = sm.MixedLM(endog=data['y'], exog=data[['X']], exog_re=data[['const', 'Z']], exog_vc=None, groups=data['group']).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = sm.MixedLM.from_formula(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = smf.mixedlm(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result.fe_params, result.random_effects
mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
fixed_effect = pd.Series(np.r_[np.array([0]), result.fe_params.values], index=pd.Index(random_effect.index.to_series().replace('Z', 'X')))
fixed_params[group] = fixed_effect
random_params[group] = random_effect
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([fixed_params, random_params], axis=1)
params
Variance Components Model
1-Level Variance Decomposition
$${\displaystyle \text{Variance Components Models} }$$ $${\displaystyle \text{Components of variance:} \quad \operatorname {Var} [\mathbf{y}] =\operatorname {E} \left[\operatorname {Var} [\mathbf{y}\mid \mathbf{X}] \right]+\operatorname {Var} \left[ \operatorname {E} [\mathbf{y}\mid \mathbf{X}] \right]}$$ $${\displaystyle \begin{aligned} \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \underbrace{\boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma_{\varepsilon}^{2}\mathbf{I}_{n})}_{\text{Uncentainty}} & \;& \;\\ &\downarrow & \; & \; \\ \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \overbrace{\mathbf{v}}^{\text{Variance Source Term}}, &\quad \mathbf{v}=f_{v}(\mathbf{u}) \sim \mathcal{N}( \overbrace{\boldsymbol{\alpha}^{*}}^{\text{Population-Randomness}} , \sigma^{2}_{v} \mathbf{I}), \quad &\operatorname{Cov}[\mathbf{X}, \mathbf{v}] = \mathbf{0} \\ \; &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\alpha}^{*} + \mathbf{u}, &\quad \boldsymbol{\alpha}^{*} \sim \mathcal{N}( \boldsymbol{\mu} , \sigma^{2}_{\alpha} \mathbf{I}), \quad \mathbf{u} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{u} \mathbf{I}), \quad &\operatorname{Cov}[\boldsymbol{\alpha}^{*}, \mathbf{u}] = \mathbf{0} \\ \; &= \mathbf{X}\boldsymbol{\beta} + \underbrace{\boldsymbol{\mu} + \boldsymbol{\alpha}}_{=\boldsymbol{\mu}_{\alpha}} + \mathbf{u}, &\quad \boldsymbol{\alpha} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{\alpha}\mathbf{I}), \quad \mathbf{u} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{u} \mathbf{I}), \quad &\operatorname{Cov}[\boldsymbol{\alpha}, \mathbf{u}] = \mathbf{0} \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= \boldsymbol{\mu} \\ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \boldsymbol{\mu} + \boldsymbol{\alpha} \\ \operatorname{Cov} \left[ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{\alpha} \mathbf{I} \\ \\ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= ( \sigma^{2}_{\alpha} + \sigma^{2}_{u}) \mathbf{I} \\ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \sigma^{2}_{u}\mathbf{I} \\ \operatorname{E} \left[ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{u}\mathbf{I} \\ \end{aligned} }$$Scoring Group $${\displaystyle \begin{aligned} M(G) & : \text{a score of the group}, \; G \in \{1, 2, \dots, N_{G} \} \\ N=\sum_{G=1}^{N_{G}} n(G) & : \text{total number of instances} \\ N_{G} & : \text{number of groups} \\ n(G) & : \text{number of instances in the group} \\ \overline{M}_{\text{micro}} &= \frac{1}{1} \sum_{G=1}^{N_{G}=1} M(G) = M(1) \\ \overline{M}_{\text{macro}} &= \frac{1}{N_{G}} \sum_{G=1}^{N_{G}} M(G) \\ \overline{M}_{\text{weighted}} &= \sum_{G=1}^{N_{G}} \frac{n(G)}{N} M(G) \\ \end{aligned} }$$
import numpy as np
import pandas as pd
# Components of variance
data = pd.DataFrame(np.c_[np.kron(np.arange(5), np.ones(200)), np.random.normal(0, .1, size=1000)], columns=['X', 'Y']).set_index('X').sort_index()
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(y.index[0], 1, size=y.shape[0])) # group mean effect
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(0, y.index[0], size=y.shape[0])) # group variance effect
data.groupby('X')['Y'].mean().mean() # expected value: population mean >>>>> data['M']
data.groupby('X')['Y'].mean().var(ddof=0) # expected value: population variance / number of sampling by group >>>>> data['V'] / 200
data.groupby('X')['Y'].var(ddof=1).mean() # expected value: population variance >>>>> data['V']
data['M'] = data['Y'].mean()
data['m'] = data.groupby('X')['Y'].mean()
data['V'] = data['Y'].var(ddof=0)
data['v'] = data.groupby('X')['Y'].var(ddof=1)
data['treatment effect'] = data['m'] - data['M']
data['variance effect'] = data['V'] - data['v']
aggregation = data[['M', 'm', 'treatment effect', 'V', 'v', 'variance effect']].drop_duplicates()
total_variance = aggregation['m'].var(ddof=0) + aggregation['v'].mean()
total_variance, data['Y'].var(ddof=0)
import numpy as np
import pandas as pd
# Components of variance
data = pd.DataFrame(np.c_[np.random.randint(0, 5, size=1000), np.random.normal(0, .1, size=1000)], columns=['X', 'Y']).set_index('X').sort_index()
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(y.index[0], 1, size=y.shape[0])) # group mean effect
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(0, y.index[0], size=y.shape[0])) # group variance effect
data['M'] = data['Y'].mean()
data['V'] = data['Y'].var(ddof=0)
data['m'] = data.groupby('X')['Y'].mean()
data['v'] = data.groupby('X')['Y'].var(ddof=1)
data['treatment effect'] = data['m'] - data['M']
data['variance effect'] = data['V'] - data['v']
aggregation = data[['M', 'm', 'treatment effect', 'V', 'v', 'variance effect']].drop_duplicates()
total_variance = aggregation['m'].var(ddof=0) + aggregation['v'].mean()
total_variance, data['Y'].var(ddof=0)
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
# Components of variance
data = pd.DataFrame(np.c_[np.random.randint(0, 5, size=1000), np.random.normal(0, .1, size=1000)], columns=['X', 'Y']).set_index('X').sort_index()
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(y.index[0], 1, size=y.shape[0])) # group mean effect
data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(0, y.index[0], size=y.shape[0])) # group variance effect
result = smf.ols(formula='Y ~ 0 + C(X)', data=data.reset_index()).fit()
display(result.summary())
display(sms.anova_lm(result, typ=2))
mc = sms.multicomp.MultiComparison(data['Y'], data.index)
posthoc_results = mc.tukeyhsd() # TUKEY HONESTLY SIGNIFICANT DIFFERENCE (HSD)
display(posthoc_results.summary())
REML: 1-Level Group Effect
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
n_group1 = 20
data = pd.DataFrame(data=0, columns=['e'], index=pd.Index(np.arange(n_group1), name='G1'))
G1_effect = data.groupby(['G1'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 2)).rename(columns={'e':'G1_Effect'})
data = pd.concat([data, G1_effect], axis=1)
data = pd.concat([data]*10, axis=0)
data['y'] = data['G1_Effect'] + np.random.normal(0, 1, size=data.shape[0])
result = sm.MixedLM(endog=data['y'], exog=np.ones_like(data['y']), groups=data.index).fit()
result = smf.mixedlm(
"y ~ 1",
re_formula="1",
vc_formula=None,
groups="G1",
data=data.reset_index(),
).fit()
print(data['y'].var(ddof=1))
print(data.groupby(['G1'])['y'].mean().var(ddof=1))
print(data.groupby(['G1'])['y'].var(ddof=1).mean())
result.summary()
LS/MLE: One-way ANCOVA
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
# https://patsy.readthedocs.io/en/latest/formulas.html
result = sm.OLS.from_formula(formula='y ~ 0 + group + group:X', data=data, missing='none', hasconst=None).fit()
result = sm.WLS.from_formula(formula='y ~ 0 + group + group:X', data=data, weights=1.0, missing='none', hasconst=None).fit()
result = sm.GLS.from_formula(formula='y ~ 0 + group + group:X', data=data, sigma=1.0, missing='none', hasconst=None).fit()
result = sm.GLM.from_formula(formula='y ~ 0 + group + group:X', data=data, family=sm.families.Gaussian(sm.families.links.Identity())).fit(); result.ssr = result.resid_response.apply(lambda residual: residual**2).sum()
result = smf.ols(formula='y ~ 0 + group + group:X', data=data, missing='none', hasconst=None).fit()
result = smf.wls(formula='y ~ 0 + group + group:X', data=data, weights=1.0, missing='none', hasconst=None).fit()
result = smf.gls(formula='y ~ 0 + group + group:X', data=data, sigma=1.0, missing='none', hasconst=None).fit()
result = smf.glm(formula='y ~ 0 + group + group:X', data=data, family=sm.families.Gaussian(sm.families.links.Identity())).fit(); result.ssr = result.resid_response.apply(lambda residual: residual**2).sum()
display(result.summary())
display(sms.anova_lm(result, typ=2))
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
def individual_regression(df):
result = sm.OLS(endog=df['y'], exog=df[['const', 'X']], missing='none', hasconst=None).fit()
result = sm.WLS(endog=df['y'], exog=df[['const', 'X']], weights=1.0, missing='none', hasconst=None).fit()
result = sm.GLS(endog=df['y'], exog=df[['const', 'X']], sigma=1.0, missing='none', hasconst=None).fit()
result = sm.GLM(endog=df['y'], exog=df[['const', 'X']], family=sm.families.Gaussian(sm.families.links.Identity())).fit()
return result.params
data.groupby(['group']).apply(individual_regression)
2-Level Variance Decomposition
$${\displaystyle \begin{cases} \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= \boldsymbol{\mu} \\ \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \boldsymbol{\mu} + \boldsymbol{\alpha}_{1} + \boldsymbol{\alpha}_{2} \\ \; \operatorname{Cov} \left[ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= (\sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}} + \sigma^{2}_{\alpha_{1} \times \alpha_{2}}) \mathbf{I} \\ \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= ( \sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}} + \sigma^{2}_{\alpha_{1} \times \alpha_{2}} + \sigma^{2}_{u}) \mathbf{I} \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \sigma^{2}_{u}\mathbf{I} \\ \; \operatorname{E} \left[ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{u}\mathbf{I} \\ \end{cases} }$$ $${\displaystyle \operatorname{Cov}[\boldsymbol{\alpha}_{1}, \boldsymbol{\alpha}_{2}] = \mathbf{0} \; \begin{cases} \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= \boldsymbol{\mu} \\ \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \boldsymbol{\mu} + \boldsymbol{\alpha}_{1} + \boldsymbol{\alpha}_{2} \\ \; \operatorname{Cov} \left[ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= (\sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}}) \mathbf{I} \\ \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= ( \sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}} + \sigma^{2}_{u}) \mathbf{I} \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \sigma^{2}_{u}\mathbf{I} \\ \; \operatorname{E} \left[ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{u}\mathbf{I} \\ \end{cases} }$$
REML: 2-Level Nested Group Effect
: a hierarchical or nested structure where individuals are grouped within larger categories
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
n_group1 = 20; n_group2 = 100
data = pd.DataFrame(data=0, columns=['e'], index=pd.MultiIndex.from_product([np.arange(n_group1), np.arange(n_group2)], names=['G1', 'G2']))
G1_effect = data.groupby(['G1'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 2)).rename(columns={'e':'G1_Effect'})
G1G2_effect = data.groupby(['G1', 'G2'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 3)).rename(columns={'e':'G1G2_Effect'})
data = pd.concat([data, G1_effect, G1G2_effect], axis=1)
data = pd.concat([data]*10, axis=0)
data['y'] = data['G1_Effect'] + data['G1G2_Effect'] + np.random.normal(0, 1, size=data.shape[0])
result = smf.mixedlm(
"y ~ 1",
re_formula="1",
vc_formula={"G2": "0 + C(G2)"},
groups="G1",
data=data.reset_index(),
).fit() # nested effect
print(data['y'].var())
print(data.groupby('G1')['y'].mean().var())
print(data.groupby('G2')['y'].mean().var()) # ~ 0
print(data.groupby(['G1', 'G2'])['y'].mean().var() - data.groupby('G1')['y'].mean().var())
result.summary()
REML: 2-Level Crossed Group Effect
: individuals belonging to multiple independent groups simultaneously
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
n_group1 = 20; n_group2 = 100
group = pd.DataFrame(data=0, columns=['e'], index=pd.MultiIndex.from_product([np.arange(n_group1), np.arange(n_group2)], names=['G1', 'G2']))
G1_effect = group.groupby(['G1'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 2)).rename(columns={'e':'G1_Effect'})
G2_effect = group.groupby(['G2'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 3)).rename(columns={'e':'G2_Effect'})
group = pd.concat([group, G1_effect, G2_effect], axis=1)
group = pd.concat([group]*10, axis=0)
group['y'] = group['G1_Effect'] + group['G2_Effect'] + np.random.normal(0, 1, size=group.shape[0])
result = smf.mixedlm(
"y ~ 1",
re_formula=None,
vc_formula={"G1": "0 + C(G1)", "G2": "0 + C(G2)"},
groups=np.ones_like(group['y']),
data=group.reset_index(),
).fit() # crossed effect
print(group['y'].var(ddof=1))
print(group.groupby(['G1'])['y'].mean().var(ddof=1), group.groupby(['G1'])['y'].var(ddof=1).mean())
print(group.groupby(['G2'])['y'].mean().var(ddof=1), group.groupby(['G2'])['y'].var(ddof=1).mean())
print(group.groupby(['G1', 'G2'])['y'].mean().var(ddof=1), group.groupby(['G1', 'G2'])['y'].var(ddof=1).mean())
result.summary()
LS/MLE: Two-way ANOVA
#
LS/MLE: Two-way ANCOVA
#
Linear Mixed Effect Model Structure
REML: Varying-coefficient model for intercepts
- Intercept: Mixed Effect (Fixed Effect + Random Effect)
- Slope: Fixed Effect
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 1 + 3 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 3 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts
result = smf.mixedlm(formula='y ~ 1 + X', re_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1', groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()
mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
random_effect = pd.Series(np.r_[random_effect.values, np.array([0])], index=result.fe_params.index)
mixed_params[group] = result.fe_params + random_effect
fixed_params[group] = result.fe_params
random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params
REML: Varying-coefficient model for slopes
- Intercept: Fixed Effect
- Slope: Mixed Effect (Fixed Effect + Random Effect)
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 3 + 1 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 3 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
# REML: restricted maximum likelihood
# Varying-coefficient model for slopes
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ - 1 + X', groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()
mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
random_effect = pd.Series(np.r_[np.array([0]), random_effect.values], index=result.fe_params.index)
mixed_params[group] = result.fe_params + random_effect
fixed_params[group] = result.fe_params
random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params
REML: Varying-coefficient model for intercepts and slopes
- Intercept: Mixed Effect (Fixed Effect + Random Effect)
- Slope: Mixed Effect (Fixed Effect + Random Effect)
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts and slopes
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1 + X', groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()
mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
random_effect = pd.Series(random_effect.values, index=result.fe_params.index)
mixed_params[group] = result.fe_params + random_effect
fixed_params[group] = result.fe_params
random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params
LS/MLE: Variance components model with covariate factors
- Intercept: Fixed Effect
- Slope: Fixed Effect
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]
y0 = 1 + 3 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 3 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])
result = smf.ols(formula='y ~ 0 + group + X', data=data, missing='none', hasconst=None).fit()
result = smf.wls(formula='y ~ 0 + group + X', data=data, weights=1.0, missing='none', hasconst=None).fit()
result = smf.gls(formula='y ~ 0 + group + X', data=data, sigma=1.0, missing='none', hasconst=None).fit()
result = smf.glm(formula='y ~ 0 + group + X', data=data, family=sm.families.Gaussian(sm.families.links.Identity())).fit(); result.ssr = result.resid_response.apply(lambda residual: residual**2).sum()
sms.anova_lm(result, typ=2)
Multilevel Regression:Hierarchical and Nested Group Analysis
Generalized Linear Regression
Analysis on regression task
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
data = pd.DataFrame(np.c_[
stats.binom.rvs(n=2, p=2/3, size=1000), # X0: categorical explainatory variable1
stats.binom.rvs(n=2, p=2/3, size=1000), # X1: categorical explainatory variable2
stats.norm.rvs(1,1, size=1000), # X2: numerical explainatory variable1
stats.norm.rvs(1,1, size=1000), # X3: numerical explainatory variable2
stats.norm.rvs(1,1, size=1000), # Y: response variable
], columns=['X0', 'X1', 'X2', 'X3', 'y'])
categorical_columns = ['X0', 'X1']
numerical_columns = ['X2', 'X3']
centering_data = data.copy()
for column in numerical_columns:
centering_data[column] = centering_data[column] - centering_data[column].mean()
standardizing_data = data.copy()
for column in numerical_columns:
standardizing_data[column] = (standardizing_data[column] - standardizing_data[column].mean()) / standardizing_data[column].std(ddof=1)
# No Interaction
patsy_formula = "y ~ " + ' + '.join(numerical_columns) + ' + ' + ' + '.join(list(map(lambda column: 'C('+column+')', categorical_columns))) + " - 1"
centering_model = sm.GLM.from_formula(patsy_formula, data=centering_data, family=sm.families.Gaussian(sm.families.links.Log())).fit()
centering_summary = pd.DataFrame(centering_model.summary().tables[1].data)
centering_summary.columns = ['feature'] + centering_summary.loc[0, 1:].tolist()
centering_summary = centering_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(centering_model, fig=plt.figure(figsize=(30,10))) # partial regresssion
standardizing_model = sm.GLM.from_formula(patsy_formula, data=standardizing_data, family=sm.families.Gaussian(sm.families.links.Log())).fit()
standardizing_summary = pd.DataFrame(standardizing_model.summary().tables[1].data)
standardizing_summary.columns = ['feature'] + standardizing_summary.loc[0, 1:].tolist()
standardizing_summary = standardizing_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(standardizing_model, fig=plt.figure(figsize=(30,10))) # partial regresssion
summary = pd.concat([centering_summary, standardizing_summary], axis=1)
summary.columns = pd.MultiIndex.from_tuples(map(lambda column: ('centering', column[1]) if column[0] < 7 else ('standardizing', column[1]), enumerate(summary.columns)))
summary
Generalized Additive Models (GAM)
import statsmodels.api as sm
sm.GLMGam
Generalized Linear Mixed Effects (GLIMMIX)
https://www.statsmodels.org/dev/mixed_glm.html
import statsmodels.api as sm
sm.BinomialBayesMixedGLM
sm.PoissonBayesMixedGLM
Generalized Method of Moments (GMM)
from statsmodels.sandbox.regression import gmm
gmm.GMM#(endog, exog, instrument, k_moms=None, k_params=None, missing='none')
gmm.IVGMM#(endog, exog, instrument, k_moms=None, k_params=None, missing='none')
gmm.IV2SLS#(endog, exog, instrument=None)
gmm.DistQuantilesGMM#(endog, exog, instrument)
gmm.LikelihoodModel#(endog, exog=None)
gmm.LinearIVGMM#(endog, exog, instrument, k_moms=None, k_params=None, missing='none')
gmm.NonlinearIVGMM#(endog, exog, instrument, func)
Two-Stage Least Squares
$${\displaystyle \begin{aligned} \mathbf{y} &= \underbrace{\mathbf{X}}_{\text{Uncentainty}}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma_{\varepsilon}^{2}\mathbf{I}_{n}) \\ &\downarrow \quad \mathbf{X} = [X_{np}]_{n \ge 1, p \ge 0}, \quad \mathbf{Z} = [Z_{nm}]_{m \le p-1} \\ \text{IV2SLS Equation}& \quad \begin{cases} \quad \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \operatorname{Cov}[\mathbf{X}, \boldsymbol{\varepsilon}] \neq \mathbf{0} \\ \quad \mathbf{X} &= \mathbf{Z}\boldsymbol{\delta} + \mathbf{v}, \quad \operatorname{Cov}[\mathbf{Z}, \boldsymbol{\varepsilon}] = \mathbf{0} \\ \end{cases} \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} \text{Stage 1)}\quad &{\hat {\boldsymbol{\delta} }}=(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{X} \\ & \rightarrow \quad {\mathbf{\widehat {X}}}=\mathbf{Z}{\hat {\boldsymbol{\delta} }}={\color {Red}\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }}\mathbf{X} \equiv {\color {Red}\mathbf{P}_{\mathbf{Z}}}\mathbf{X}.\, \\ & \quad \quad \quad {\color {Red}\mathbf{P}_{\mathbf{Z}}} = \mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} } : \; \text{symmetric and idempotent matrix} \\ & \quad \quad \quad \quad \rightarrow \; \mathbf{P}_{\mathbf{Z}}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}=\mathbf{P}_{\mathbf{Z}}\mathbf{P}_{\mathbf{Z}}=\mathbf{P}_{\mathbf{Z}} \\ \text{Stage 2)}\quad & \boldsymbol{\hat{\beta}} _{\text{2SLS}}=({\widehat {\mathbf{X}}}^{\mathrm {T} }{\widehat {\mathbf{X}}})^{-1}{\widehat {\mathbf{X}}}^{\mathrm {T} }\mathbf{y} \quad \rightarrow \quad \mathbf{y}={\widehat {\mathbf{X}}}\tilde{\boldsymbol{\beta}} + \boldsymbol{\varepsilon}^{\prime} \\ &=\left(\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}^{\mathrm {T} }\mathbf{y} \\ &=\left(\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}\mathbf{y} \\ &= (\mathbf{X}^{\mathrm {T} }\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y}\\ &=(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})(\mathbf{X}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y} \\ &=(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y}\\ &=(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y} \\ &= (\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} } (\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} )\\ &= (\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{X}\boldsymbol{\beta} +(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\boldsymbol{\varepsilon} \\ &= \boldsymbol{\beta} + \underbrace{(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\boldsymbol{\varepsilon}}_{\text{orthogonality}(\mathbf{Z}^{\mathrm {T} }\boldsymbol{\varepsilon}=\mathbf{0}) \rightarrow \mathbf{0} } \\ & \therefore \quad \operatorname{E}[\boldsymbol{\hat{\beta}} _{\text{2SLS}}] = \boldsymbol{\tilde{\beta}} = \boldsymbol{\beta} \\ \end{aligned} }$$import numpy as np
import pandas as pd
import statsmodels.api as sm
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X = 3*Z + 1 + epsilon + np.random.normal(size=1000)
y = 5*X + 4 + epsilon
true_beta0 = 4.0000
true_beta1 = 5.0000
true_beta_corr_Xe = np.corrcoef(X, epsilon)[0,1].round(4)
print('%25s'%'True Orthogonality:', np.corrcoef(X, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
print('%25s'%'True Beta Coefficients:', true_beta1, true_beta0)
# OLS
ols_result = sm.OLS(endog=y, exog=np.c_[np.ones_like(X), X]).fit()
estimated_beta0 = ols_result.params[0].round(4)
estimated_beta1 = ols_result.params[1].round(4)
estimated_corr_Ze = np.corrcoef(Z, ols_result.resid)[0,1].round(4)
estimated_corr_Xe = np.corrcoef(ols_result.fittedvalues, ols_result.resid)[0,1].round(4)
print('%50s'%'OLS Correlation[instrumental vs residual]: ', estimated_corr_Ze)
print('%50s'%'OLS Correlation[esitmate vs residual]: ', estimated_corr_Xe)
print('%50s'%'OLS Beta: ', estimated_beta1, estimated_beta0)
# 2SLS
first_stage_ols_result = sm.OLS(endog=X, exog=np.c_[np.ones_like(y), Z]).fit()
X_hat = first_stage_ols_result.fittedvalues
last_stage_ols_result = sm.OLS(endog=y, exog=np.c_[np.ones_like(y), X_hat]).fit()
estimated_beta0 = last_stage_ols_result.params[0].round(4)
estimated_beta1 = last_stage_ols_result.params[1].round(4)
estimated_corr_Zv = np.corrcoef(Z, last_stage_ols_result.resid)[0,1].round(4)
estimated_corr_Xv = np.corrcoef(last_stage_ols_result.fittedvalues, last_stage_ols_result.resid)[0,1].round(4)
print('%50s'%'IV2SLS Correlation[instrumental vs residual]: ', estimated_corr_Zv)
print('%50s'%'IV2SLS Correlation[estimate vs residual]: ', estimated_corr_Xv)
print('%50s'%'IV2SLS Beta: ', estimated_beta1, estimated_beta0)
import numpy as np
import pandas as pd
from statsmodels.sandbox.regression import gmm
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X = 3*Z + 1 + epsilon + np.random.normal(size=1000)
y = 5*X + 4 + epsilon
print('%25s'%'True Orthogonality:', np.corrcoef(X, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
result = gmm.IV2SLS(y, exog=np.c_[np.ones_like(y), X], instrument=np.c_[np.ones_like(y), Z]).fit()
result.summary()
# https://www.datascienceconcepts.com/tutorials/python-programming-language/exogeneity-wu-hausman-and-sargan-tests-in-python/
import statsmodels.api as sm
import statsmodels.formula.api as smf
import linearmodels.iv.model as lm
houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
mlr1 = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()
mdatac = sm.add_constant(data=houseprices, prepend=False)
mlr2 = lm.IV2SLS(dependent=mdatac["price"], exog=mdatac[["const", "bedrooms"]], endog=mdatac["lotsize"], instruments=mdatac[["driveway", "garage"]]).fit(cov_type="homoskedastic", debiased=True)
mlr2.wu_hausman() # Wu-Hausman test of exogeneity
mlr2.wooldridge_regression # Wooldridge's regression test of exogeneity(Wu-Hausman test results)
mlr2.sargan # Sargan's test
Generalized Estimating Equation (GEE)
Assumption: Covariance Structure
- https://www.statsmodels.org/dev/examples/index.html#generalized-estimating-equations
- https://www.statsmodels.org/dev/examples/notebooks/generated/gee_nested_simulation.html
import statsmodels.api as sm
sm.GEE
sm.NominalGEE
sm.OrdinalGEE
sm.cov_struct.Exchangeable
sm.cov_struct.GlobalOddsRatio
sm.cov_struct.Nested
sm.cov_struct.Autoregressive
sm.cov_struct.Stationary
sm.cov_struct.Independence
sm.cov_struct.Equivalence
sm.cov_struct.Unstructured
sm.cov_struct.OrdinalIndependence
sm.cov_struct.NominalIndependence
sm.cov_struct.CovStruct
sm.cov_struct.CategoricalCovStruct
sm.cov_struct.Appender
sm.cov_struct.OutputWarning
sm.cov_struct.NotImplementedWarning
sm.cov_struct.ConvergenceWarning
General Linear Regression: OLS
General Linear Model | Generalized Linear Model | |
Typical Estimation | Least Squares | Maximum Likelihood, Bayesian |
Examples | ANOVA, ANCOVA | Logistic Regression, Possion Regression |
Extensions | MANOVA, MANCOVA | GLMM, GEE |
statsmodels
# https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_custom_models.html
# https://www.statsmodels.org/stable/statespace.html#custom-state-space-models
sklearn
import numpy as np
from sklearn.linear_model import LinearRegression
X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)
X = np.c_[X1, X2]
Y = np.c_[
1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
]
model = LinearRegression(fit_intercept=True)
model.fit(X, Y)
print(model.coef_)
print(model.intercept_)
pytorch
import torch
import torch.nn as nn
from torch import optim
X1 = torch.normal(0,1,size=(1000,1))
X2 = torch.normal(0,1,size=(1000,1))
X = torch.cat([X1, X2], dim=1)
Y = torch.cat([
1 + 2 * X1 + 3 * X2 + torch.normal(0,1,size=(1000,1)),
1 + 4 * X1 - 2 * X2 + torch.normal(0,1,size=(1000,1))
], dim=1)
model = nn.Linear(2,2)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
epochs = 1000
for epoch in range(epochs):
hypothesis = model(X)
cost = criterion(hypothesis, Y)
optimizer.zero_grad()
cost.backward()
optimizer.step()
print(model.weight.detach().numpy())
print(model.bias.detach().numpy())
tensorflow
import tensorflow as tf
from tensorflow.keras import layers, losses, optimizers
X1 = tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32)
X2 = tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32)
X = tf.concat([X1, X2], axis=1)
Y = tf.concat([
1 + 2 * X1 + 3 * X2 + tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32),
1 + 4 * X1 - 2 * X2 + tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32)
], axis=1)
model = layers.Dense(2, activation='linear')
criterion = losses.MeanSquaredError()
optimizer = optimizers.SGD(0.1)
epochs = 1000
for epoch in range(epochs):
with tf.GradientTape() as tape:
hypothesis = model(X, training=True)
cost = criterion(Y, hypothesis)
gradients = tape.gradient(cost, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
print(model.weights[0].numpy().T)
print(model.weights[1].numpy())
Seemingly Unrelated Regression(SUR/SURE)
- https://en.wikipedia.org/wiki/General_linear_model
- https://en.wikipedia.org/wiki/Simultaneous_equations_model
- https://en.wikipedia.org/wiki/Seemingly_unrelated_regressions
from collections import OrderedDict
import numpy as np
import pandas as pd
from linearmodels.system import SUR
X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)
X = np.c_[X1, X2]
Y = np.c_[
1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
]
data = pd.DataFrame(data=np.c_[np.ones(X.shape[0]), X, Y], columns=['const', 'X0', 'X1', 'Y0', 'Y1'])
equations = OrderedDict()
equations['Y0'] = {'dependent': data['Y0'],
'exog': data[['const', 'X0', 'X1']]}
equations['Y1'] = {'dependent': data['Y1'],
'exog': data[['const', 'X0', 'X1']]}
# [cov_type]: [1]'unadjusted':'homoskedastic', [2]'rubost':'heteroskedastic', [3]'kernel':'heteroskedastic' + 'autocorrelation', [4]'clustered': 2-way clustering of errors
model = SUR(equations)
model.fit(iterate=True, method=['ols', 'gls', None][1], cov_type='unadjusted')
import numpy as np
import pandas as pd
from linearmodels.system import SUR
X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)
X = np.c_[X1, X2]
Y = np.c_[
1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
]
data = pd.DataFrame(data=np.c_[np.ones(X.shape[0]), X, Y], columns=['const', 'X0', 'X1', 'Y0', 'Y1'])
equations = dict()
equations['Y0'] = 'Y0 ~ 1 + X0 + X1'
equations['Y1'] = 'Y1 ~ 1 + X0 + X1'
# [cov_type]: [1]'unadjusted':'homoskedastic', [2]'rubost':'heteroskedastic', [3]'kernel':'heteroskedastic' + 'autocorrelation', [4]'clustered': 2-way clustering of errors
model = SUR.from_formula(equations, data)
model.fit(iterate=True, method=['ols', 'gls', None][1], cov_type='unadjusted')
import numpy as np
import pandas as pd
from linearmodels.system import SUR
X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)
X = np.c_[X1, X2]
Y = np.c_[
1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
]
data = pd.DataFrame(data=np.c_[np.ones(X.shape[0]), X, Y], columns=['const', 'X0', 'X1', 'Y0', 'Y1'])
# [cov_type]: [1]'unadjusted':'homoskedastic', [2]'rubost':'heteroskedastic', [3]'kernel':'heteroskedastic' + 'autocorrelation', [4]'clustered': 2-way clustering of errors
# [kernel]: [1]"bartlett":Newey-West, [2]"parzen":Gallant, [3]"qs":Quadratic Spectral, Andrews
model = SUR.multivariate_ls(data[['Y0', 'Y1']], data[['const', 'X0', 'X1']])
model.fit(iterate=True, method=['ols', 'gls', None][1], cov_type='kernel', kernel='parzen')
Reference(1)
statsmodels api for modeling and testing
import numpy as np
from scipy import stats
import statsmodels
import statsmodels.api as sm
import statsmodels.tsa.api as smt
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
import statsmodels.graphics.api as smg
# stats
X = np.random.normal(0, 3, size=3000)
#X = np.c_[np.ones_like(X), np.arange(X.shape[0]), X]
stats.skew(X, axis=0, keepdims=False)
stats.kurtosis(X, axis=0, keepdims=False, fisher=True) # fisher=True: gaussian-based kurtosis is 0, fisher=False: gaussian-based kurtosis is 3
stats.moment(X, moment=2, axis=0, keepdims=False)
# sm: statsmodels.api
# https://www.statsmodels.org/dev/examples/index.html#linear-regression-models
sm.PCA#(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd', missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100, svd_full_matrices=False)
sm.Factor#(endog=None, n_factor=1, corr=None, method='pa', smc=True, endog_names=None, nobs=None, missing='drop')
sm.Logit#(endog, exog, offset=None, check_rank=True)
sm.MNLogit#(endog, exog, check_rank=True)
sm.Probit#(endog, exog, offset=None, check_rank=True)
sm.ConditionalLogit#(endog, exog, groups, missing='none')
sm.ConditionalMNLogit#(endog, exog, groups, missing='none')
sm.OLS#(endog, exog=None, missing='none', hasconst=None)
sm.WLS#(endog, exog, weights=1.0, missing='none', hasconst=None) # core argument: weights
sm.GLS#(endog, exog, sigma=None, missing='none', hasconst=None) # core argument: sigma
sm.GLSAR#(endog, exog=None, rho=1, missing='none', hasconst=None) # core argument: rho
sm.GLM#(endog, exog, family=None, offset=None, exposure=None, freq_weights=None, var_weights=None, missing='none') # core argument: family, freq_weights, var_weights
sm.GLMGam#(endog, exog=None, smoother=None, alpha=0, family=None, offset=None, exposure=None, missing='none') # core argument: family
sm.RLM#(endog, exog, M=None, missing='none')
sm.GEE#(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, exposure=None, dep_data=None, constraint=None, update_dep=True, weights=None) # core argument: cov_struct
sm.NominalGEE#(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, dep_data=None, constraint=None) # core argument: cov_struct
sm.OrdinalGEE#(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, dep_data=None, constraint=None) # core argument: cov_struct
sm.MixedLM#(endog, exog, groups, exog_re=None, exog_vc=None, use_sqrt=True, missing='none') # core argument: groups, exog_re
sm.MANOVA#(endog, exog, missing='none', hasconst=None) # core argument: endog
#sms.anova_lm#(args, typ=2, scale=None, robust=None)
#sms.AnovaRM#(data, depvar, subject, within=None, between=None, aggregate_func=None)
sm.Factor.from_formula#(formula, data, subset=None, drop_cols=None)
sm.Logit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.MNLogit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.Probit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.ConditionalLogit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.ConditionalMNLogit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.OLS.from_formula#(formula, data, subset=None, drop_cols=None)
sm.WLS.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLS.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLSAR.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLM.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLMGam.from_formula#(formula, data, subset=None, drop_cols=None)
sm.RLM.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GEE.from_formula#(formula, groups, data, subset=None, time=None, offset=None, exposure=None)
sm.NominalGEE.from_formula#(formula, groups, data, subset=None, time=None, offset=None, exposure=None)
sm.OrdinalGEE.from_formula#(formula, groups, data, subset=None, time=None, offset=None, exposure=None)
sm.MixedLM.from_formula#(formula, data, re_formula=None, vc_formula=None, subset=None, use_sparse=False, missing='none')
sm.MANOVA.from_formula#(formula, data, subset=None, drop_cols=None)
sm.BayesGaussMI
sm.MI
sm.MICE
sm.PHReg
sm.SurvfuncRight
sm.RecursiveLS
sm.ConditionalPoisson
sm.GeneralizedPoisson
sm.HurdleCountModel
sm.NegativeBinomial
sm.NegativeBinomialP
sm.TruncatedLFNegativeBinomialP
sm.TruncatedLFPoisson
sm.ZeroInflatedNegativeBinomialP
sm.ZeroInflatedPoisson
sm.ZeroInflatedGeneralizedPoisson
sm.Poisson
sm.PoissonBayesMixedGLM
sm.BinomialBayesMixedGLM
# smt: statsmodels.tsa.api
smt.UnobservedComponents#(endog=y, irregular=True, level=True, stochastic_level=True, trend=True, stochastic_trend=True, cycle=True, damped_cycle=True, stochastic_cycle=True, cycle_period_bounds=None, seasonal=None, stochastic_seasonal=True, stochastic_freq_seasonal=None)
smt.SARIMAX#(endog=y, order=(2,0,0), seasonal_order=(0,0,0,0))
smt.ETSModel#(endog=y, error=['add', 'mul'][0], trend=['add', 'mul', None][-1], damped_trend=False, seasonal=["add", "mul", None][-1], seasonal_periods=None)
smt.AutoReg#(endog=y, lags=2)
smt.ARIMA#(endog=y, order=(2,0,0), seasonal_order=(0,0,0,0))
smt.ExponentialSmoothing#(endog=y, trend=['add', 'mul', None][-1], damped_trend=False, seasonal=["add", "mul", None][-1], seasonal_periods=None) # [level, trend, seasonal]: Exponential smoothing with trend and seasonal components.
smt.SimpleExpSmoothing#(endog=y) # [level] Basic exponential smoothing with only a level component.
smt.Holt#(endog=y, exponential=False, damped_trend=False)
smt.SVAR#(endog=Y, svar_type=['A', 'B', 'AB'][0])
smt.VAR#(endog=Y)
smt.VARMAX#(endog=Y)
smt.VECM#(endog=Y)
smt.UECM#(endog=Y, lags=1)
smt.ARDL#(endog=Y, lags=1)
smt.DynamicFactorMQ#(endog=Y)
smt.DynamicFactor#(endog=Y, k_factors=1, factor_order=2)
smt.MarkovRegression#(y, k_regimes=2)
smt.MarkovAutoregression#(y, k_regimes=2, order=4, switching_ar=False).fit()
# smf: statsmodels.formula.api
smf.logit
smf.mnlogit
smf.probit
smf.conditionallogit
smf.conditionalmllogit
smf.ols
smf.wls
smf.gls
smf.glm
smf.mixedlm
smf.quantreg
smf.gee
smf.nominal_gee
smf.ordinal_gee
# sms: statsmodels.stats.api
sms.omni_normtest#(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
sms.jarque_bera#(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
sms.normal_ad#(result.resid) # Null (H0): Normality is not present. / Alternative (HA): Normality is present.
sms.lilliefors#(result.resid, dist='norm', pvalmethod='table') # Null (H0): Normality is not present. / Alternative (HA): Normality is present. # alias of kstest
sms.het_breuschpagan#(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
sms.het_white#(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
sms.het_goldfeldquandt#(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
sms.durbin_watson#(resids=result.resid) # if statistic is within the range of 1.5 and 2.5, No correlation. # Null (H0): There is no correlation among the residuals / Alternative (HA): The residuals are autocorrelated.
sms.acorr_breusch_godfrey#(res=result, nlags=None, store=False) # Null (H0): There is no autocorrelation at any order less than or equal to p. / Alternative (HA): There exists autocorrelation at some order less than or equal to p.
sms.acorr_lm#(resid=result.resid, nlags=None, store=False, period=None, ddof=0, cov_type='nonrobust', cov_kwargs=None)
sms.acorr_ljungbox#(x=result.resid, lags=None, boxpierce=False, model_df=0, period=None, return_df=True, auto_lag=False) # Null (H0): The residuals are independently distributed / Alternative (HA): The residuals are not independently distributed; they exhibit serial correlation.
sms.proportion_confint#(count, nobs, alpha, method='normal') # count: number of success, nobs: number of observation, alpha: significant level, default
# smg: statsmodels.graphics.api
smg.influence_plot#(results, external=True, alpha=0.05, criterion='cooks', size=48, plot_alpha=0.75, ax=None)
smg.plot_ccpr#(results, exog_idx, ax=None)
smg.plot_ccpr_grid#(results, exog_idx=None, grid=None, fig=None)
smg.plot_corr#(dcorr)
smg.plot_corr_grid#(dcorrs)
smg.plot_partregress#(endog, exog_i, exog_others)
smg.plot_partregress_grid#(results, exog_idx=None, grid=None, fig=None)
smg.plot_fit#(results, exog_idx, y_true=None)
smg.plot_leverage_resid2#(results, alpha=0.05)
smg.qqplot#(data)
smg.tsa.plot_acf#(x, lags=None, alpha=0.05, use_vlines=True, adjusted=False, fft=False, missing='none', title='Autocorrelation', zero=True, auto_ylims=False, bartlett_confint=True)
smg.tsa.plot_pacf#(x, lags=None, alpha=0.05, method='ywm', use_vlines=True, title='Partial Autocorrelation', zero=True)
pasty formula and design matrix
import numpy as np
import pandas as pd
from patsy import dmatrix, dmatrices
def function(x):
return 2*x
# design matrix(1): numpy ndarray
np.random.seed(0)
x1 = np.random.rand(5) + 10
x2 = np.random.rand(5) * 10
dmatrix("x1")
dmatrix("x1 - 1")
dmatrix("x1 + 0")
dmatrix("x1 + x2")
dmatrix("x1 + x2 - 1")
dmatrix("function(x1)")
np.asarray(dmatrix("function(x1)")) # designmatrix to ndarray
pd.DataFrame(dmatrix("function(x1)")) # designmatrix to dataframe
# design matrix(2): pandas dataframe
df = pd.DataFrame(data=np.c_[np.random.normal(0, size=5), np.random.normal(1, size=5), np.random.poisson(10, size=5)] ,columns=['N1', 'N2', 'N3'])
df['C1'] = np.array(['P', 'Q', 'R', 'S', 'P'], dtype=str)
dmatrix("N1 + N2 - 1", data=df)
dmatrix("N1 + np.log(np.abs(N2))", data=df)
dmatrix("N1 + N2 + np.log(np.abs(N2)) + I(np.log(np.abs(N2)))", data=df)
dmatrix("N1 + function(N1)", data=df)
dmatrix("N1 + center(N1) + standardize(N1)", data=df)
dmatrix("N2", data=df)
dmatrix("N3", data=df)
dmatrix("C1", data=df)
dmatrix("C1 + 0", data=df)
dmatrix("C(N3)", data=df)
dmatrix("C(N3) + 0", data=df)
np.asarray(dmatrix("C1", data=df)) # designmatrix to ndarray
pd.DataFrame(dmatrix("C1", data=df)) # designmatrix to dataframe
# pasty formula(0): Intercepts(Constant)
dmatrix("1", data=df, return_type='dataframe') # intercept(constant)
dmatrix("0", data=df, return_type='dataframe') # zero intercept(no intercept)
dmatrix("-1", data=df, return_type='dataframe') # zero intercept(no intercept)
# pasty formula(1-1): Intercepts(by Categorical Variable or by Group, 'C1')
dmatrix("C1", data=df, return_type='dataframe') # full-rank intercept
dmatrix("1 + C1", data=df, return_type='dataframe') # full-rank intercept
dmatrix("0 + C1", data=df, return_type='dataframe') # reduced-rank intercept
dmatrix("C(C1)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("1 + C(C1)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("0 + C(C1)", data=df, return_type='dataframe') # reduced-rank intercept
dmatrix('C(C1, Treatment(reference="P"))', data=df, return_type='dataframe') # full-rank intercept with reference
dmatrix('C(C1, Treatment(reference="Q"))', data=df, return_type='dataframe') # full-rank intercept with reference
dmatrix('C(C1, Treatment(reference="R"))', data=df, return_type='dataframe') # full-rank intercept with reference
dmatrix('C(C1, Treatment(reference="S"))', data=df, return_type='dataframe') # full-rank intercept with reference
# pasty formula(1-2): Intercepts(by Numerical Variable or by Group, 'N3')
dmatrix("C(N3)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("1 + C(N3)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("0 + C(N3)", data=df, return_type='dataframe') # reduced-rank intercept
# pasty formula(1-3): Intercepts(Nested Intercepts)
dmatrix("0 + C1 + C(N3)", data=df, return_type='dataframe')
dmatrix("0 + C1 + C(N3, Sum)", data=df, return_type='dataframe')
dmatrix("0 + C1 + C1:C(N3)", data=df, return_type='dataframe')
# pasty formula(1-4): Intercepts(Contrast)
dmatrix("C(C1, Treatment)", data=df, return_type='dataframe')
dmatrix("C(C1, Sum)", data=df, return_type='dataframe')
dmatrix("C(C1, Diff)", data=df, return_type='dataframe')
dmatrix("C(C1, Poly)", data=df, return_type='dataframe')
dmatrix("C(C1, Helmert)", data=df, return_type='dataframe')
# pasty formula(2): Intercepts(by Categorical Variable or by Group, 'C1') + Slope(Constant)
dmatrix("0 + C1 + N1", data=df, return_type='dataframe') # full-rank intercept / slope(constant)
dmatrix("C1 + N1", data=df, return_type='dataframe') # reduced-rank intercept / slope(constant)
dmatrix("1 + C1 + N1", data=df, return_type='dataframe') # reduced-rank intercept / slope(constant)
# pasty formula(3): Intercepts(by Categorical Variable or by Group, 'C1') + Slopes(by Categorical Variable or by Group, 'C1')
dmatrix("0 + C1 + C1:N1", data=df, return_type='dataframe') # full-rank intercept / full-rank slopes
dmatrix("0 + C1 + N1 + C1:N1", data=df, return_type='dataframe') # full-rank intercept / reduced-rank slopes
dmatrix("1 + C1 + C1:N1", data=df, return_type='dataframe') # reduced-rank intercept / full-rank slopes
dmatrix("1 + C1 + N1 + C1:N1", data=df, return_type='dataframe') # reduced-rank intercept / reduced-rank slopes
# pasty formula(4): No Intercept + Slopes(by Categorical Variable or by Group, 'C1')
dmatrix("0 + C1:N1", data=df, return_type='dataframe') # no intercept / full-rank slopes
dmatrix("0 + N1 + C1:N1", data=df, return_type='dataframe') # no intercept / reduced-rank slopes
# pasty formula(5): Intercept(Constant) + Slope(Constant)
dmatrix("-1 + N1", data=df, return_type='dataframe') # zero intercept(no intercept) / slope(constant)
dmatrix("0 + N1", data=df, return_type='dataframe') # zero intercept(no intercept) / slope(constant)
dmatrix("1 + N1", data=df, return_type='dataframe') # intercept(constant) / slope(constant)
Reference(2)
- https://online.stat.psu.edu/stat501/
- https://www.statology.org/python-guides/
- https://www.tutorialspoint.com/statistics/
- https://mathworld.wolfram.com/CorrelationCoefficient.html
- https://en.wikipedia.org/wiki/Category:Regression_models
- https://en.wikipedia.org/wiki/Linear_regression
- https://en.wikipedia.org/wiki/Errors_and_residuals
- https://en.wikipedia.org/wiki/Leverage_(statistics)
- https://en.wikipedia.org/wiki/Ordinary_least_squares
- https://en.wikipedia.org/wiki/Weighted_least_squares
- https://en.wikipedia.org/wiki/Generalized_least_squares
- https://en.wikipedia.org/wiki/Proofs_involving_ordinary_least_squares
- https://en.wikipedia.org/wiki/Homoscedasticity_and_heteroscedasticity
- https://en.wikipedia.org/wiki/Partition_of_sums_of_squares
- https://en.wikipedia.org/wiki/Squared_deviations_from_the_mean
- https://en.wikipedia.org/wiki/Projection_matrix
- https://en.wikipedia.org/wiki/Studentized_residual
- https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse
- https://en.wikipedia.org/wiki/Lack-of-fit_sum_of_squares
- https://en.wikipedia.org/wiki/Coefficient_of_determination
- https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- https://en.wikipedia.org/wiki/Empirical_distribution_function
- https://en.wikipedia.org/wiki/Mixed_model
- https://en.wikipedia.org/wiki/Generalized_linear_mixed_model
- https://en.wikipedia.org/wiki/Hierarchical_generalized_linear_model
- https://en.wikipedia.org/wiki/Multilevel_model
- https://en.wikipedia.org/wiki/Nonlinear_mixed-effects_model
- https://en.wikipedia.org/wiki/Best_linear_unbiased_prediction
- https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem
- https://en.wikipedia.org/wiki/Errors-in-variables_models
- https://en.wikipedia.org/wiki/Multilevel_modeling_for_repeated_measures
'quantitative analysis > statistics' 카테고리의 다른 글
Survival Analysis (0) | 2023.07.06 |
---|---|
Bayesian Estimation (0) | 2023.05.31 |
Panel Analysis (4) | 2023.05.20 |
Correlation and Covariance Analysis (0) | 2023.05.08 |
Analysis of Variance (ANOVA) (1) | 2023.05.06 |