Regression Analysis

2023. 5. 6. 13:05

Classical Regression Analysis: Overview

Standard Assumptions of Classical Linear Regression Model $${\displaystyle \mathbf{y} = \mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon} }$$ $${\displaystyle \text{Standard Normality: }\; {\boldsymbol{\varepsilon}} \mid \mathbf{X} \sim {\mathcal {N}}(\mathbf{0},\sigma _{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n}) \; \begin{cases} \text{Exogeneity:}\; &\operatorname{E}[\boldsymbol{\varepsilon} \mid \mathbf{X}] = \mathbf{0} \\ \text{Uncorrelated Spherical Error:}\; &\operatorname{Cov}[{\boldsymbol{\varepsilon}} \mid \mathbf{X}] = \sigma_{\boldsymbol{\varepsilon}}^{2} \mathbf{I}_{n} \\ \qquad \text{- Homoscedasticity:}\; & \operatorname{Var}[\varepsilon \mid \mathbf{X}] = \sigma_{\boldsymbol{\varepsilon}}^{2} \\ \qquad \text{- No Autocorrelation:}\; & \operatorname{Cov}[\varepsilon_{i}, \varepsilon_{j} \mid \mathbf{X}] = 0 \\ \end{cases} }$$ Estimator properties not to be hold for the assumptions $${\displaystyle \begin{aligned} \operatorname {E} [\hat{\theta}] \neq \theta \quad &\mid \underbrace{\operatorname{Var}[\hat{\theta}]}_{= \operatorname{E}[(\hat{\theta} - \operatorname{E}[\hat{\theta}])^{2}]} > \operatorname{Var}[\hat{\theta}_{\text{BLUE}}], \qquad \theta \in \{ \beta, \varepsilon \} \\ & \downarrow \\ {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} \\ \; &= \boldsymbol{\beta} + \underbrace{\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}}_{\text{multicollinearity}} \overbrace{\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon}}^{\text{non-orthogonality} } , \quad \underbrace{\boldsymbol{\varepsilon} \mid \mathbf {X}}_{\text{non-standard normality}} \nsim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n})\\ \operatorname{E}[{\hat {\boldsymbol {\beta }}} ] &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} {\color{Red}{ \operatorname{E}[ \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ]}} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}} ] &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} {\color{Red}{\operatorname{Var}[\mathbf{X}^{\mathsf {T}}\boldsymbol{\varepsilon} ]}} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \end{aligned} }$$ $${\displaystyle \begin{cases} \text{by endogeneity/exogeneity} \\ \quad \text{[Issue] Biased Estimator}\\ \quad \text{[Detection] Durbin-Wu-Hausman test, ...}\\ \quad \text{[Solution] Omitted Variable Bias, Instrumental Variables, ...}\\ \text{by multicollinearity} \\ \quad \text{[Issue] Efficient Estimator with High Variance of Estimate} \\ \quad \text{[Detection] VIF, CN, ...} \\ \quad \text{[Solution] PCA, ...}\\ \text{by autocorrelated errors} \\ \quad \text{[Issue] Not Efficient Estimator} \\ \quad \text{[Detection] Durbin-Watson Test, Breusch-Godfrey LM Test, ...} \\ \quad \text{[Solution] GLS, FGLS, MLE, ...}\\ \text{by heterosckedastic errors} \\ \quad \text{[Issue] Not Efficient Estimator} \\ \quad \text{[Detection] Goldfeld-Quandt Test, White’s Test, ... } \\ \quad \text{[Solution] GLS, ...}\\ \end{cases} }$$
Covariance Structure $${\displaystyle \begin{aligned} \boldsymbol{\Sigma}{\text{ positive-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} >0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n}\setminus \{\mathbf {0} \} \\ \boldsymbol{\Sigma}{\text{ positive semi-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} \geq 0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n} \\ \boldsymbol{\Sigma}{\text{ negative-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} <0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n} \setminus \{\mathbf {0} \} \\ \boldsymbol{\Sigma}{\text{ negative semi-definite}}\quad &\iff \quad \mathbf {x} ^{\operatorname {T} } \boldsymbol{\Sigma} \mathbf {x} \leq 0{\text{ for all }}\mathbf {x} \in \mathbb {R} ^{n} \\ \end{aligned} }$$
$${\displaystyle \begin{aligned} & \quad \text{structures for positive-definite} \; \boldsymbol{\Sigma}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \boldsymbol{\Sigma}) \\ \boldsymbol{\Sigma} &\rightarrow \sigma_{\varepsilon}^{2}\mathbf{I}_{n} = \underbrace{\begin{bmatrix} \sigma_{\varepsilon}^{2}& 0 &\cdots & 0 &0\\ 0&\sigma_{\varepsilon}^{2}&\cdots & 0 &0 \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0&0&\cdots &\sigma_{\varepsilon}^{2}& 0 \\ 0&0&\cdots &0 & \sigma_{\varepsilon}^{2} \\ \end{bmatrix}}_{\text{OLS}}, \quad \boldsymbol{\Sigma} \rightarrow \boldsymbol{\Omega} = \underbrace{{\begin{bmatrix} \sigma_{1}^{2}& 0 &\cdots & 0 &0\\ 0&\sigma_{2}^{2}&\cdots & 0 &0 \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0&0&\cdots &\sigma_{n-1}^{2}& 0 \\ 0&0&\cdots &0 & \sigma_{n}^{2} \\ \end{bmatrix}}}_{\text{GLS: Cross-Heteroskedasticity}} \\ \boldsymbol{\Sigma} &\rightarrow \boldsymbol{\Sigma} = \underbrace{{\begin{bmatrix} \gamma_{0} = \sigma_{\varepsilon}^{2}& \gamma_{-1} &\cdots & \gamma_{-(p-1)} &\gamma_{-p}\\ \gamma_{1}&\gamma_{0} =\sigma_{\varepsilon}^{2}&\cdots & \gamma_{-(p-2)} & \gamma_{-(p-1)} \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ \gamma_{p-1}&\gamma_{p-2}&\cdots &\gamma_{0} =\sigma_{\varepsilon}^{2}& \gamma_{-1} \\ \gamma_{p}&\gamma_{p-3}&\cdots &\gamma_{1} & \gamma_{0} =\sigma_{\varepsilon}^{2} \\ \end{bmatrix}}}_{\text{ARMA: Autocorrelation}}\quad : \text{structure for the conditional mean} \\ \boldsymbol{\Sigma} &\rightarrow \boldsymbol{\Sigma}_{t} = \underbrace{{\begin{bmatrix} \sigma_{1}^{2}& 0 &\cdots & 0 & 0 \\ 0 &\sigma_{2}^{2}&\cdots & 0 & 0 \\ \vdots &\vdots &\ddots &\vdots &\vdots \\ 0 & 0 &\cdots & \sigma_{t-1}^{2}& 0 \\ 0 & 0 &\cdots & 0 & \sigma_{t}^{2} \\ \end{bmatrix}}}_{\text{ARCH: Auto-Heteroskedasticity}}\quad : \text{structure for the conditional variance} \\ \end{aligned} }$$

Analysis Example

import numpy as np
import pandas as pd
import statsmodels.tsa.api as smt
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

data = pd.DataFrame(np.random.normal(10, 1, size=1000), columns=['Y'])
n_groups = 10
for g in range(n_groups):
    data.loc[data.index[g::n_groups], 'G'] = f'G{g}'
data = data.set_index('G').sort_index()

# ANOVA Data
index_func = lambda key: data.index.get_level_values(0) == key
for i in range(n_groups):
    data.loc[index_func(f'G{i}'), 'Y'] = data.loc[index_func(f'G{i}'), 'Y'] + np.random.normal(i, 1, size=index_func(f'G{i}').sum())

# Simple Regression Data
np.random.seed(0)
data['X1'] = np.random.normal(0, 1, size=1000)
data['X2'] = smt.ArmaProcess(ar=[1, -.3, -.2], ma=[1]).generate_sample(nsample=1000, burnin=50).cumsum()
data['Y'] = data['Y'] + 3*data['X1'] + 2*data['X2'] 
data = data.reset_index()

# Analysis
result = smf.ols('Y ~ 1 + C(G) + X1 + X2', data=data).fit()
display(result.summary().tables[1])
display(sms.anova_lm(result, typ=2))
#result.get_robustcov_results(cov_type='HC0', use_t=None).cov_params()
#result.cov_params() # result.cov_type

# Variance Components
data['Y'].plot()
result.fittedvalues.plot()
result.resid.plot()
sms.anova_lm(result, typ=1)['sum_sq'].sum(), result.centered_tss, result.ess, result.ssr

Model Diagnostic

# https://www.statsmodels.org/dev/diagnostic.html
# https://www.statsmodels.org/stable/stats.html#module-statsmodels.stats.stattools
import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
import statsmodels
import statsmodels.api as sm
import statsmodels.stats.api as sms

X, y = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=10, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=False, random_state=np.random.randint(1000))
data = pd.DataFrame(np.c_[X, y], columns= [ 'X'+str(i) for i in range(0, X.shape[1])]+['y'])

model = sm.OLS(endog=data['y'], exog=sm.add_constant(data[data.columns.difference(['y'], sort=False)]))
result = model.fit()
#display(result.summary())

# normality
# https://www.statsmodels.org/stable/generated/statsmodels.stats.stattools.jarque_bera.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.stattools.omni_normtest.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.normal_ad.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.kstest_normal.html  
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.lilliefors.html
chi2_stats, two_sided_pval = sms.omni_normtest(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
jb_stats, jb_pval, skew, kurtosis = sms.jarque_bera(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
ad_stats, ad_pval = sms.normal_ad(result.resid) # Null (H0): Normality is not present. / Alternative (HA): Normality is present.
ks_stats, ks_pval = sms.lilliefors(result.resid, dist='norm', pvalmethod='table') # Null (H0): Normality is not present. / Alternative (HA): Normality is present. # alias of kstest 
two_sided_pval, jb_pval, ad_pval, ks_pval

# heteroscedasticity
# https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.breakvar_heteroskedasticity_test.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.het_breuschpagan.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.het_white.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.het_goldfeldquandt.html
bv_stats, bv_pval = statsmodels.tsa.stattools.breakvar_heteroskedasticity_test(result.resid, subset_length=0.3333333333333333, alternative='two-sided', use_f=True) # Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
bp_lm_stats, bp_lm_pval, bp_f_stats, bp_f_pval = sms.het_breuschpagan(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
w_lm_stats, w_lm_pval, w_f_stats, w_f_pval = sms.het_white(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
stats, pval, _ = sms.het_goldfeldquandt(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
bp_lm_pval, bp_f_pval, w_lm_pval, w_f_pval, pval

# autocorrelation
# https://www.statsmodels.org/stable/generated/statsmodels.stats.stattools.durbin_watson.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_breusch_godfrey.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_ljungbox.html
# https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_lm.html
dw_stats = sms.durbin_watson(resids=result.resid) # if statistic is within the range of 1.5 and 2.5, No correlation. # Null (H0): There is no correlation among the residuals / Alternative (HA): The residuals are autocorrelated.
bg_lm_stats, bg_lm_pval, bg_f_stats, bg_f_pval = sms.acorr_breusch_godfrey(res=result, nlags=None, store=False) # Null (H0): There is no autocorrelation at any order less than or equal to p. / Alternative (HA): There exists autocorrelation at some order less than or equal to p. 
engle_lm_stats, engle_lm_pval, engle_f_stats, engle_f_pval = sms.acorr_lm(resid=result.resid, nlags=None, store=False, period=None, ddof=0, cov_type='nonrobust', cov_kwargs=None) 
lb_stats_and_pval = sms.acorr_ljungbox(x=result.resid, lags=None, boxpierce=False, model_df=0, period=None, return_df=True, auto_lag=False) # Null (H0): The residuals are independently distributed / Alternative (HA): The residuals are not independently distributed; they exhibit serial correlation.
bg_lm_pval, bg_f_pval, engle_lm_pval, engle_f_pval, engle_lm_pval, engle_f_pval

# linearity
hc_stat, hc_pval = sms.linear_harvey_collier(res=result)
w_stat, w_pval, w_dof = sms.spec_white(resid=result.resid, exog=sm.add_constant(data[data.columns.difference(['y'], sort=False)])) # ols

Training Model

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# popluation space
P = 30; N = 500
beta = np.random.randint(-10, 11, P+1)[:, np.newaxis]
population_X = np.c_[np.ones((N,1)), np.random.normal(0, 1, size=(N, P))]
varepsilon = np.random.normal(0, 10, size=N)[:, np.newaxis]
f = lambda X: population_X@beta + varepsilon 
population_y = f(population_X) 

population_data = np.c_[population_X, population_y]; np.random.shuffle(population_data)
population_data = pd.DataFrame(population_data, columns=[ 'X' + str(i) for i in range(population_X.shape[1])] +['y'])
population_index = population_data.index
population_X = population_data.loc[:, population_data.columns != 'y'].values
population_y = population_data.loc[:, population_data.columns == 'y'].values.ravel()


# sampling space
p = 5; n = 50
sampling_index = np.random.choice(population_data.index, n, replace=False)
sampling_columns = np.random.choice(population_data.columns.difference(['y'], sort=False), p, replace=False)
sampling_columns = np.concatenate([sampling_columns, ['y']])
sampling_data = population_data.loc[sampling_index, sampling_columns].copy()
sampling_data.insert(0, 'const', 1)
sampling_X = sampling_data.loc[:, sampling_data.columns != 'y'].values
sampling_y = sampling_data.loc[:, sampling_data.columns == 'y'].values.ravel()

# simulation for population interpretation
simulation_data = population_data[sampling_columns].copy()
simulation_X = simulation_data.loc[:, simulation_data.columns != 'y'].values
simulation_y = simulation_data.loc[:, simulation_data.columns == 'y'].values.ravel()

# traing
result = sm.OLS(endog=sampling_y, exog=sampling_X).fit()
result.model.endog == result.fittedvalues + result.resid
display(result.summary())

# total sample variance/residual sample variance analysis
influence = result.get_influence(); leverage = influence.hat_matrix_diag
studentized_residuals = result.resid / (result.model.endog.std(ddof=1)*np.sqrt(1-leverage))
print(studentized_residuals.mean(), studentized_residuals.var(ddof=0)) # Approximation for mean 0, Approximation for variance of 1

# visualization
plt.figure(figsize=(30,5))
plt.plot(population_index, population_y, lw=0, marker='o', color='black', label='popluation_y')
plt.plot(sampling_index, sampling_y, lw=0, marker='o', markerfacecolor='w', color='black', label='sampling_y')
plt.plot(population_index, result.predict(np.c_[np.ones((simulation_X.shape[0], 1)), simulation_X]), lw=0, ms=5, marker='x', color='b', label='estimated_population_y')
plt.plot(sampling_index, result.predict(sampling_X), lw=0, ms=10, marker='x', color='r', label='estimated_sampling_y')
plt.legend()

# Homoscedasticity/Heteroscedasticity
plt.figure(figsize=(10,10))
plt.scatter(result.fittedvalues, result.resid, label='Homoscedasticity/Heteroscedasticity', color='black')
boundary = max(result.fittedvalues.std(), result.resid.std())
plt.xlim(-boundary*10, boundary*10)
plt.ylim(-boundary*10, boundary*10)
plt.legend()

TABLE OF CONTENTS

Regression Analysis
Ordinary Least Squares (OLS)
Weighted Least Square(WLS) and Generalized Least Square(GLS)

Maximum Likelihood Estimation

Regression Noise: Residual Analysis

Classification: MLE
Logistic Regression
Probit Regression
Tobit Regression

Robust Linear Regression

Regression Analysis

Classical Linear Regression Model: Vector Form $${\displaystyle y_{i}= \mathbf {x} _{i}^{\mathsf {T}}{\boldsymbol {\beta }}+\varepsilon _{i} =\beta _{0} + \beta _{1}\ x_{i1}+\beta _{2}\ x_{i2}+\cdots +\beta _{p}\ x_{ip}+\varepsilon _{i}, \qquad i=1,\ldots ,n \quad \And \quad j=0,\ldots ,p }$$
Classical Linear Regression Model: Matrix Form $${\displaystyle \mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }} }$$ $${\displaystyle {\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{bmatrix}} = {\begin{bmatrix}1&x_{11}&\cdots &x_{1p}\\1&x_{21}&\cdots &x_{2p}\\\vdots &\vdots &\ddots &\vdots \\1&x_{n1}&\cdots &x_{np}\end{bmatrix}} {\begin{bmatrix}\beta _{0}\\\beta _{1}\\\beta _{2}\\\vdots \\\beta _{p}\end{bmatrix}} + {\begin{bmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\vdots \\\varepsilon _{n}\end{bmatrix}} }$$ $${\displaystyle \begin{align} \mathbf {X} & : \text{Design matrix} \\ {\boldsymbol {\beta }} & : \text{Deterministic variable (Regression coefficient or Population parameters)} \\ {\boldsymbol {\varepsilon }} &: \text{Random variable (Disturbance or Noise) } \\ \end{align} }$$
$${\displaystyle \begin{aligned} \rho &= \frac{\operatorname{Cov}(\mathbf{X}\boldsymbol{\beta},\mathbf{y})}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} = \frac{\operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\mathbf{y}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{y}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{\operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}(\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon})] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{\operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\mathbf{X}\boldsymbol{\beta}] + \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\boldsymbol{\varepsilon}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{X}\boldsymbol{\beta}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\boldsymbol{\varepsilon}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{ \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}\mathbf{X}\boldsymbol{\beta}] - \operatorname{E}[(\mathbf{X}\boldsymbol{\beta})^{T}]\operatorname{E}[\mathbf{X}\boldsymbol{\beta}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} = \frac{ \operatorname{Var}[\mathbf{X}\boldsymbol{\beta}]}{\sqrt{\operatorname{Var}[(\mathbf{X}\boldsymbol{\beta})^{T}]}\sqrt{\operatorname{Var}[\mathbf{y}]}} \\ \; &= \frac{ \sqrt{ \operatorname{Var}[\mathbf{X}\boldsymbol{\beta}]}}{\sqrt{\operatorname{Var}[\mathbf{y}]}} = \frac{\sigma_{\mathbf{X}\boldsymbol{\beta}}}{\sqrt{\sigma_{\mathbf{X}\boldsymbol{\beta}}^{2} + \sigma_{\boldsymbol{\varepsilon}}^{2} + 2\sigma_{\mathbf{X}\boldsymbol{\beta}, \boldsymbol{\varepsilon}}}}\\ \; &= \frac{\sigma_{\mathbf{X}\boldsymbol{\beta}}}{\sigma_{\mathbf{y}}} = \frac{\sigma_{\mathbf{X}\boldsymbol{\beta}}}{\sigma_{\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}}} \\ \end{aligned} }$$ $${\displaystyle \mathbf{X}, \mathbf{y} \quad \rightarrow \quad \hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\varepsilon}} \quad \rightarrow \quad \hat{\rho} \\ }$$

Ordinary Least Squares (OLS)

Objective Function: Sum of Squared Residuals(SSR, RSS:Residual Sum of Squares) $${\displaystyle S({\hat {\boldsymbol {\beta }}}) \equiv \sum _{i=1}^{n}\left|y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right|^{2}=\left\|\mathbf {y} -\mathbf {X} {\hat {\boldsymbol {\beta }}} \right\|^{2} =\left\|\mathbf {y} - {\hat {\mathbf {y}}} \right\|^{2} = \left\| {\hat{\boldsymbol {\varepsilon }}} \right\|^{2} \quad \And \quad {\frac {\partial}{\partial {\hat {\beta }}}}S({\hat {\boldsymbol {\beta }}}) = 0 }$$ $${\displaystyle \textit{Normal Equation} }$$ $${\displaystyle \begin{aligned} {\frac {\partial}{\partial {\hat {\beta }_{0}}}}S({\hat {\boldsymbol {\beta }}}) &= {\frac {\partial}{\partial {\hat {\beta }_{0}}}} \sum _{i=1}^{n}\left|y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right|^{2} \\ &= 2\sum _{i=1}^{n} \underbrace{\left( y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right)}_{\hat{\varepsilon}_{i}=\epsilon_{i}} \left( - \underline{X_{i0}}_{\text{dummy:1}} \right) = 0 \\ &\qquad \therefore \sum _{i=1}^{n} \epsilon_{i} = 0 \quad \cdots\text{(1)} \\ {\frac {\partial}{\partial {\hat {\beta }_{j^{*}|j \neq 0}}}}S({\hat {\boldsymbol {\beta }}}) &= {\frac {\partial}{\partial {\hat {\beta }_{j^{*}|j \neq 0}}}} \sum _{i=1}^{n}\left|y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right|^{2} \\ &= 2\sum _{i=1}^{n} \underbrace{\left( y_{i}-\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}} \right)}_{\hat{\varepsilon}_{i}=\epsilon_{i}} \left( -X_{ij^{*}} \right) = 0 \\ &\qquad \therefore \sum _{i=1}^{n} \epsilon_{i}X_{ij^{*}} = 0 \quad \cdots\text{(2)} \\ \end{aligned} }$$ $${\displaystyle \text{Sample Perpendicularity/Orthogonality} \quad \begin{cases} \begin{aligned} \quad \sum _{i=1}^{n}& \epsilon_{i} = 0 \\ \quad \sum _{i=1}^{n}& \epsilon_{i}X_{ij} = 0 \\ \end{aligned} \\ \end{cases} }$$

Projection Properties $${\displaystyle \begin{aligned} \underbrace{\mathbf{y}}_{\text{observation}}, & \quad \underbrace{\mathbf{X}}_{\text{given explanatory variables}} \\ &\downarrow \\ \mathbf{y} = &\mathbf{X}\underbrace{\boldsymbol{\beta}}_{?}+\boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n}) \\ &\downarrow \\ \left\| \mathbf{y} - \hat{\mathbf{y}} \right\|^{2} &= \left\| \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \right\|^{2} \\ \; &= \left( \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \right)^{T}\left( \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} \right) \\ \; &= \mathbf{y}^{T}\mathbf{y} - \mathbf{y}^{T}\mathbf{X}\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}^{T}\mathbf{X}^{T}\mathbf{y} + \hat{\boldsymbol{\beta}}^{T}\mathbf{X}^{T}\mathbf{X}\hat{\boldsymbol{\beta}} \\ &\downarrow \\ \frac{\partial}{\partial \hat{\boldsymbol{\beta}}} \left\| \mathbf{y} - \hat{\mathbf{y}} \right\|^{2} &= - 2 \mathbf{X}^{T}\mathbf{y} +2 \mathbf{X}^{T}\mathbf{X}\hat{\boldsymbol{\beta}} = 0 \\ \therefore \hat{\boldsymbol{\beta}} &= \left( \mathbf{X}^{T}\mathbf{X}\right)^{-1} \mathbf{X}^{T}\mathbf{y} \\ \end{aligned} }$$
$${\displaystyle \text{Corollary about the assumptions} \; \begin{cases} \; \operatorname{E}[\hat{\boldsymbol{\beta}}] &= \boldsymbol{\beta} \quad &:\text{Unbiasedness} \\ \; \operatorname{Var}[\hat{\boldsymbol{\beta}}] &= \sigma_{\boldsymbol{\varepsilon}}^{2}\left( \mathbf{X}^{T}\mathbf{X} \right)^{-1} \quad &:\text{Efficiency} \\ \end{cases} \\ }$$ $${\displaystyle \begin{aligned} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} \\ \; &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}(\mathbf {X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\ \; &= \boldsymbol{\beta} + \underbrace{\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon}}_{\text{orthogonality}(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon}=0) \rightarrow 0} \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} \operatorname{E}[{\hat {\boldsymbol {\beta }}}] &= \boldsymbol{\beta} + \operatorname{E}[ \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ] \\ \; & = \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\operatorname{E}[ \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ] \quad \text{by exogeneity(nonstochastic X)}\\ \; & = \boldsymbol{\beta} \quad \text{by perpendicularity} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}}] &= \operatorname{Var}[\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ]\\ &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}} \operatorname{Var}[\boldsymbol{\varepsilon} ] \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n}) \\ \; & = \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} (\sigma_{\boldsymbol{\varepsilon}}^{2} \mathbf {X} ^{\mathsf {T}} \mathbf {X}) \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \; \text{by normality} \\ \; & = \sigma_{\boldsymbol{\varepsilon}}^{2}\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \end{aligned} }$$
$${\displaystyle \textit{Projection Summary} }$$ $${\displaystyle \begin{align} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} = \mathbf {X}^{+} \mathbf {y} \quad \sim \text{equivalent to MLE} \\ \; & \left( \approx \frac{\operatorname{Cov}[\mathbf{X}, \mathbf{y}]}{\operatorname{Var}[\mathbf{X}]} = \frac{\sigma_{\mathbf{X},\mathbf{y}}}{\sigma_{\mathbf{X}}^{2}} = \frac{\rho_{\mathbf{X},\mathbf{y}}\sigma_{\mathbf{X}}\sigma_{\mathbf{y}}}{\sigma_{\mathbf{X}}^{2}} = \rho_{\mathbf{X},\mathbf{y}}\frac{\sigma_{\mathbf{y}}}{\sigma_{\mathbf{X}}} \right) \\ {\hat {\mathbf {y}}} &= \mathbf {H} \mathbf {y} \qquad \left( h_{i'i}={\boldsymbol {x}}_{i'}^{\top }\left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}{\boldsymbol {x}}_{i} \approx {\frac {\operatorname {Cov} \left[{\hat {y}}_{i'},y_{i}\right]}{\operatorname {Var} \left[y_{i}\right]}} \right) \\ {\hat{\boldsymbol {\varepsilon }}} &= \mathbf {M} \mathbf {y} = \mathbf {M} \left( \mathbf {X} {\boldsymbol {\beta }} + {\boldsymbol {\varepsilon }} \right) = \mathbf {M}{\boldsymbol {\varepsilon }} \\ \end{align} }$$ $${\displaystyle \begin{cases} \; & \mathbf {X}^{+} & \equiv \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}}\\ \; & \mathbf {H} & \equiv \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}} = \mathbf {X} \mathbf {X}^{+} \\ \; & \mathbf {M} & \equiv \mathbf {I} - \mathbf {H} = \mathbf {I} - \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}} = \mathbf {I} - \mathbf {X} \mathbf {X}^{+} \\ \; & {\hat {\mathbf {y}}} & \equiv \mathbf {X} {\hat {\boldsymbol {\beta }}} = \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}} \mathbf {y} = \mathbf {X} \mathbf {X}^{+} \mathbf {y} = \mathbf {H} \mathbf {y} \\ \; & {\hat{\boldsymbol {\varepsilon }}} & \equiv \mathbf {y} - {\hat {\mathbf {y}}} = \mathbf {y} - \mathbf {H}\mathbf {y} = \left( \mathbf {I} - \mathbf {H} \right) \mathbf {y} = \mathbf {M}\mathbf {y} \\ \end{cases} }$$ $${\displaystyle \begin{align} {\hat{\boldsymbol {\varepsilon }}} & : \text{Residual}({\boldsymbol {\epsilon }}) \\ \mathbf {X}^{+} & : \text{Pseudoinverse Matrix, Moore–Penrose Inverse Matrix} \\ \mathbf {H} & : \text{Projection Matrix, Influence} \\ \mathbf {M} & : \text{Residual Marker Matrix} \\ \mathbf {I} & : \text{Identity Matrix} \\ \end{align} }$$ $${\displaystyle }$$ $${\displaystyle \begin{aligned} \hat{\mathbf{y}} = \mathbf{H} \mathbf{y} \\ \left \langle \hat{\mathbf{y}} \right | = \left \langle \mathbf{y} \right | \mathbf{H} \\ \left \langle \hat{\mathbf{y}} | \mathbf{y} \right \rangle = \left \langle \mathbf{y} | \mathbf{H} | \mathbf{y} \right \rangle \\ \text{if H is the identity aoperator,}\quad \therefore \; \left \langle \hat{\mathbf{y}} | \mathbf{y} \right \rangle = \left \langle \mathbf{y} | \mathbf{y} \right \rangle \\ \end{aligned} }$$ $${\displaystyle \text{Leverage Property} \; \begin{cases} \; & 0 \le h_{ii} \le 1 \\ \; & \sum _{i=1}^{n}h_{ii}=\operatorname {Tr} (\mathbf {H} )=\operatorname {Tr} \left(\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\mathbf {X} ^{\top }\right)=\operatorname {Tr} \left(\mathbf {X} ^{\top }\mathbf {X} \left(\mathbf {X} ^{\top }\mathbf {X} \right)^{-1}\right)=\operatorname {Tr} (\mathbf {I} _{(p+1)})=(p+1) \\ \; & \operatorname E[h_{diagonal}] = (p+1)/n \\ \end{cases} }$$ $${\displaystyle \begin{align} symmetric & : \quad \mathbf{M}^{T} = \mathbf{M} \quad && \And \quad \mathbf{H}^{T} = \mathbf{H} \\ idempotent & : \quad \mathbf{M}^{2} = \mathbf{M} \quad && \And \quad \mathbf{H}^{2} = \mathbf{H} \\ orthogonal & : \quad \mathbf{M} \mathbf{H} = 0 \quad && \And \quad \mathbf{M} \mathbf{X} = 0 \\ projection & : \quad \mathbf{H} \mathbf{X} = \mathbf{X} \quad && \; \\ \end{align} }$$

Assumptions of Classical Linear Regression Model $${\displaystyle \textit{Assumptions} }$$ $${\displaystyle \begin{align} \text{Strict exogeneity } & : \operatorname {E} [\,{\boldsymbol{\varepsilon}} \mid \mathbf{X}\,] = 0 \\ \text{No linear dependence } & : \Pr \!{\big [}\,\operatorname {rank} (\mathbf{X}) = (p+1)\,{\big ]} =1 \\ \text{Spherical errors(Homoscedasticity) } & : \operatorname {Var} [\,{\boldsymbol{\varepsilon}} \mid \mathbf{X}\,] = \sigma ^{2}\mathbf{I}_{n} \\ \text{Normality } & : {\boldsymbol{\varepsilon}} \mid \mathbf{X} \sim {\mathcal {N}}(0,\sigma ^{2}\mathbf{I}_{n}) \\ \; & \; \\ \end{align} }$$ $${\displaystyle \textit{Corollaries} }$$ $${\displaystyle \operatorname {E}[\varepsilon] = 0 \quad \And \quad \operatorname {E}[ \mathbf{X}^{T}\varepsilon] = 0 }$$ $${\displaystyle \mathbf{y} \mid \mathbf{X} \sim {\mathcal {N}}(\mathbf{X} {\boldsymbol{\beta}}, \sigma ^{2}\mathbf{I}_{n}) \quad \And \quad {\hat{\boldsymbol{\beta}}} \mid \mathbf{X} \sim {\mathcal {N}} \left( {\boldsymbol{\beta}}, \sigma ^{2} (\mathbf{X}^{T} \mathbf{X})^{-1} \right) \quad \And \quad {\hat{\boldsymbol {\varepsilon }}} \mid \mathbf{X} \sim {\mathcal {N}} \left(0, \sigma^{2} (\mathbf{I}_{n} - \mathbf{H}) \right) }$$ $${\displaystyle \frac{\hat{\beta_{j}} - \beta_{j} }{ \sigma \sqrt{(\mathbf{X}^{T}\mathbf{X})^{-1}_{jj} }} \mid \mathbf{X} \sim {\mathcal {N}}(0, 1^{2}) \quad \And \quad \frac{\hat{\beta_{j}} - \beta_{j} }{ s_{\alpha} \sqrt{(\mathbf{X}^{T}\mathbf{X})^{-1}_{jj \mid \alpha = n-(p+1)} }} \mid \mathbf{X} \sim {t}_{\alpha = n-(p+1) }, \; \left( \frac{\hat{\beta_{j}} - \beta_{j} }{ s_{\alpha} \sqrt{(\mathbf{X}^{T}\mathbf{X})^{-1}_{jj \mid \alpha = n-(p+1)} }} \right)^{2} \mid \mathbf{X} \sim F_{1, \alpha = n-(p+1)} }$$ $${\displaystyle \because \; \frac{\alpha s_{\alpha_{=num-1}}^{2}}{\sigma^{2}} \; \sim \; \chi^{2}_{\alpha_{=num-1}} \quad \text{for} \; \alpha s_{\alpha_{=num-1}}^{2} \equiv \sum_{i=1}^{num} (X_{i} - \overline{X} )^{2} \; \text{with} \; X \sim {\mathcal {N}}(*, \sigma^{2}) }$$

Stochastic Regressor Properties by Assumption $${\displaystyle \begin{aligned} \text{coefficient mean vector with unbiasedness for } {\boldsymbol {\beta}} & \qquad \operatorname {E} [{\hat{{\boldsymbol {\beta}}}}] = {\boldsymbol {\beta}} \\ \text{coefficient covariance matrix} & \qquad \operatorname {Var} [{\hat{{\boldsymbol {\beta}}}}] = \sigma^{2} \left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)_{p+1}^{-1} \simeq s_{total}^{2} \left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)_{p+1}^{-1} \equiv \mathbf{S}_{explained}^{2} \\ \text{noise covariance matrix} & \qquad \operatorname {Var} [{\boldsymbol {\varepsilon}} ] = \sigma^{2}\mathbf{I}_{n} \simeq s_{total}^{2}\mathbf{I}_{n} \equiv \mathbf{S}_{total}^{2} \\ \text{residual covariance matrix} & \qquad \operatorname {Var} [{\boldsymbol {\epsilon}} ] = \sigma^{2}(\mathbf{I}_{n} - \mathbf{H}) \simeq s_{total}^{2}(\mathbf{I}_{n} - \mathbf{H}) \equiv \mathbf{S}_{residual}^{2} \\ \; & \; \\ \end{aligned} }$$ $${\displaystyle \text{Population variances}}$$ $${\displaystyle \begin{align} \text{noise variance } & \qquad \operatorname {Var} \left[ \varepsilon \right] = \sigma^{2} \\ \text{studentized residual variance } & \qquad \operatorname {Var} \left[ \epsilon_{studentized,i} \equiv \frac{\epsilon_{i}}{s_{total} \sqrt{1-h_{ii}} } \right] \approx 1^{2} \\ \; & \qquad \operatorname {Var} \left[ s_{total}\epsilon_{studentized,i}=\frac{\epsilon_{i}}{\sqrt{1-h_{ii}} } \right] \approx \sigma^{2} \\ \; & \; \\ \end{align} }$$ $${\displaystyle \text{Sample variances}}$$ $${\displaystyle \begin{align} \text{total sample variance with unbiasedness for } \sigma^{2} & \qquad \operatorname {E} \left[s_{total}^{2} \right] = \operatorname {Var} \left[\varepsilon \right] = \sigma^{2} \\ \text{residual sample variance with biasedness for } \sigma^{2} & \qquad \operatorname {E} \left[s_{residual}^{2} \right] = \operatorname {Var} \left[\epsilon \right] \approx \sigma^{2}(1-h_{ii}) \\ \; & \; \\ \end{align} }$$ $${\displaystyle \text{ Decision boundary for outliers}}$$ $${\displaystyle \begin{align} \; \text{Cook's Distance } &: D_{i} = \frac{r_{i}^{2}}{RSS} \left[ \frac{h_{ii}}{(1-h_{ii})^{2}} \right] \\ \; \text{Fox's Recommendation } &: D_{i} \ge \frac{4}{N - K -1} \\ \; & \; \\ \end{align} }$$ $${\displaystyle \begin{align} \text{(z-statistic) endog-variable confidence interval:}\quad & \bar{y} \pm Z_{1-\alpha/2} \times \sqrt{ \frac{\sigma^{2}}{n}} \\ \text{(t-statistic) endog-variable confidence interval:}\quad & \bar{y} \pm t_{1-\alpha/2, n-1} \times \sqrt{ \frac{s_{total}^{2}}{n}} \\ \text{slope confidence interval:}\quad & \hat{\beta}_{j\neq0} \pm t_{1-\alpha/2, n-(p+1)}\times \sqrt{MS_{residual} \times \left( \frac{1}{\sum(x_i-\bar{x})^2} \right)} \\ \text{bias confidence interval:}\quad & \hat{\beta}_0 \pm t_{1-\alpha/2, n-(p+1)} \times \sqrt{MS_{residual} \times \left( \dfrac{1}{n}+\dfrac{\bar{x}^2}{\sum(x_i-\bar{x})^2} \right) } \\ \text{fitted-value confidence interval:}\quad & \hat{y}_i^{*} \pm t_{1-\alpha/2, n-(p+1)} \times \sqrt{MS_{residual} \times \left(\dfrac{1}{n} + \dfrac{(x_i^{*}-\bar{x})^2}{\sum(x_i-\bar{x})^2}\right)} \\ \text{prediction interval:}\quad & \hat{y}_i^{*} \pm t_{1-\alpha/2, n-(p+1)} \times \sqrt{MS_{residual} \times \left(1+ \frac{1}{n} + \dfrac{(x_i^{*}-\bar{x})^2}{\sum(x_i-\bar{x})^2}\right)} \\ \end{align} }$$

Analysis of Variance $${\displaystyle \text{Fixed Effect Model} }$$ $${\displaystyle \begin{aligned} \operatorname{Var} [\mathbf{y}] = \operatorname{Var} [\mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}] &= \boldsymbol{\beta}^{T}\operatorname{Var} [\mathbf{X}]\boldsymbol{\beta} + \operatorname{Var} [\boldsymbol{\varepsilon}] + 2 \operatorname{Cov} [\mathbf{X}\boldsymbol{\beta}, \boldsymbol{\varepsilon}] \\ \; =\operatorname{Var} [\mathbf{X}\hat{\boldsymbol{\beta}} + \boldsymbol{\epsilon}] &= \operatorname{Var} [\mathbf{X}\hat{\boldsymbol{\beta}}] + \operatorname{Var} [\boldsymbol{\epsilon}] + 2\operatorname{Cov} [\mathbf{X}\hat{\boldsymbol{\beta}}, \boldsymbol{\epsilon}]\\ \; =\operatorname{Var} [\hat{\mathbf{y}} + \boldsymbol{\epsilon}] &= \operatorname{Var} [\hat{\mathbf{y}}] + \operatorname{Var} [\boldsymbol{\epsilon}] + 2\underbrace{\operatorname{Cov} [\hat{\mathbf{y}}, \boldsymbol{\epsilon}]}_{\text{Perpendicularity}} \\ \end{aligned} }$$
$${\displaystyle \begin{align} \underbrace{\left(\sum\limits_{i=1}^{n}(y_i-\bar{y})^2\right)}_{\underset{\text{Total Sum of Squares}}{\text{TSS}}} &= \underbrace{\sum\limits_{i=1} ^{n} \left( \hat{y}_{i} - \overline{y} \right)^{2}}_{\underset{\text{Explained Sum Squares/Regression of Sums}}{\text{ESS(SSR)}}} + \underbrace{\sum\limits_{i=1} ^{n} \left( y_{i} - \hat{y} \right)^{2}}_{\underset{\text{Residual Sum Squares/Error Sum of Squares}}{\text{RSS(SSE)}}} \\ \end{align} }$$ $${\displaystyle \begin{aligned} \left\langle \mathbf{y} | \mathbf{y} \right\rangle & = \left\langle {\hat{\mathbf{y}}} | {\hat{\mathbf{y}}} \right\rangle + \left\langle{\boldsymbol {\epsilon }} | {\boldsymbol {\epsilon }} \right\rangle &\quad TSS & = ESS + RSS \\ 1 & = \frac{\left\langle {\hat{\mathbf{y}}} | {\hat{\mathbf{y}}} \right\rangle}{\left\langle \mathbf{y} | \mathbf{y} \right\rangle} + \frac{\left\langle{\boldsymbol {\epsilon }} | {\boldsymbol {\epsilon }} \right\rangle}{\left\langle \mathbf{y} | \mathbf{y} \right\rangle} &\quad 1 & = r^{2} + k^{2} \\ \end{aligned} }$$ $${\displaystyle \begin{cases} \mathrm{TSS} & \equiv \sum _{i=1}^{n} \left( y_{i} - \overline{y} \right)^{2} &&= \Vert \mathbf{y} - \overline{\mathbf{y}} \Vert^{2} &&& = SS_{\text{total}} \\ \mathrm{ESS}(\text{Between Groups, SSB}) & \equiv \sum _{i=1}^{n} \left( {\hat{y}}_{i} - \overline{y} \right)^{2} &&= \Vert \hat{\mathbf{y}} - \overline{\mathbf{y}} \Vert^{2} &&&= SS_{\text{explained}} \\ \mathrm{RSS}(\text{Within Groups, SSW}) & \equiv \sum _{i=1}^{n} \left( y_{i} - {\hat{y}}_{i} \right)^{2} &&= \Vert \mathbf{y} - \hat{\mathbf{y}} \Vert^{2} = {\boldsymbol {\epsilon }}^{T} {\boldsymbol {\epsilon }} &&&= SS_{\text{residual}} \\ \end{cases} }$$ $${\displaystyle \begin{aligned} SS_{\text{total}} & = \sum _{i=1}^{n} \left( y_{i} - \overline{y} \right)^{2} && = MS_{\text{total}} \underbrace{\left( n - 1 \right)}_{\text{df}_{\text{total}}} && \equiv s_{total}^{2} \left( n - 1 \right) \\ SS_{\text{explained}} & = \sum _{i=1}^{n} \left( {\hat{y}}_{i} - \overline{y} \right)^{2} &&= MS_{\text{explained}} \underbrace{\left( (p+1) - 1 \right)}_{\text{df}_{\text{explained}}} && \equiv s_{explained}^{2}(n-1) \\ SS_{\text{residual}} & = \sum _{i=1}^{n} \left( y_{i} - {\hat{y}}_{i} \right)^{2} &&= \underbrace{MS_{\text{residual}}}_{\text{prediction interval-square}} \underbrace{\left( n - (p+1) \right)}_{\text{df}_{\text{residual}}} && \equiv s_{residual}^{2}(n-1) \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} MSE& \equiv MS_{residual} = \dfrac{\sum(y_i-\hat{y}_i)^2}{n-(p+1)}=\dfrac{RSS}{n-(p+1)} \\ MSR& \equiv MS_{explained} = \dfrac{\sum(\hat{y}_i-\bar{y})^2}{(p+1)-1}=\dfrac{ESS}{(p+1)-1} \\ \end{aligned} }$$
$${\displaystyle \begin{aligned} \text{Source} & \quad \text{Degree of Freedom} && \quad SS && \quad MS && \quad F \\ \hline \\ \text{total} & \quad n-1 && \quad SS_{\text{total}} && \quad \; && \quad \\ \text{explained} & \quad (p+1) - 1 && \quad SS_{\text{explained}} && \quad SS_{\text{explained}} / \text{df}_{\text{explained}} && \quad MS_{\text{explained}} / MS_{\text{residual}}\\ \text{residual} & \quad n-(p+1) && \quad SS_{\text{residual}} && \quad SS_{\text{residual}} / \text{df}_{\text{residual}} && \quad \\ \hline \\ \end{aligned} }$$
$${\displaystyle \text{The centered total sum of squares for the case with an intercept} }$$ $${\displaystyle {\begin{aligned} \sum _{i=1}^{n}(y_{i}-{\overline {y}})^{2}&=\sum _{i=1}^{n}(y_{i}-{\overline {y}}+{\hat {y}}_{i}-{\hat {y}}_{i})^{2}=\sum _{i=1}^{n}(({\hat {y}}_{i}-{\bar {y}})+\underbrace {(y_{i}-{\hat {y}}_{i})} _{\epsilon_{i}})^{2}\\ &=\sum _{i=1}^{n}(({\hat {y}}_{i}-{\bar {y}})^{2}+2\epsilon_{i}({\hat {y}}_{i}-{\bar {y}})+\epsilon_{i}^{2})\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}({\hat {y}}_{i}-{\bar {y}})\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}({\hat {\beta }}_{0}+{\hat {\beta }}_{1}x_{i1}+\cdots +{\hat {\beta }}_{p}x_{ip}-{\overline {y}})\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2} + {\color{Red} \underbrace{ 2({\hat {\beta }}_{0}-{\overline {y}})\sum _{i=1}^{n}\epsilon_{i}} _{0 \,(\text{centered term with intercept})}} +2{\hat {\beta }}_{1}\underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{i1}} _{0}+\cdots +2{\hat {\beta }}_{p} \underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{ip}} _{0}\\&=\sum _{i=1}^{n}({\hat {y}}_{i}-{\bar {y}})^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2} =\sum _{i=1}^{n} {\color{Red} \underbrace{ \left((\sum _{j=0}^{p}X_{ij}{\hat \beta_{j}}) -{\bar {y}}\right)^{2} }_{\text{degree of freedom: }(p+1)-1} }+ \underbrace{ \sum _{i=1}^{n}\epsilon_{i}^{2} }_{\text{degree of freedom: } n - (p+1)} \\ &= \mathrm {ESS} +\mathrm {RSS} \\ \end{aligned}} }$$ $${\displaystyle \text{The uncentered total sum of squares for the case without an intercept} }$$ $${\displaystyle {\begin{aligned} \sum _{i=1}^{n} y_{i}^{2}&=\sum _{i=1}^{n}(y_{i}+{\hat {y}}_{i}-{\hat {y}}_{i})^{2}=\sum _{i=1}^{n}({\hat {y}}_{i}+\underbrace {(y_{i}-{\hat {y}}_{i})} _{\epsilon_{i}})^{2}\\ &=\sum _{i=1}^{n}({\hat {y}}_{i}^{2}+2\epsilon_{i}{\hat {y}}_{i}+\epsilon_{i}^{2})\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}{\hat {y}}_{i}\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}+2\sum _{i=1}^{n}\epsilon_{i}({\hat {\beta }}_{1}x_{i1}+\cdots +{\hat {\beta }}_{p}x_{ip})\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2} + 2{\hat {\beta }}_{1}\underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{i1}} _{0}+\cdots +2{\hat {\beta }}_{p}\underbrace {\sum _{i=1}^{n}\epsilon_{i}x_{ip}} _{0}\\ &=\sum _{i=1}^{n}{\hat {y}}_{i}^{2}+\sum _{i=1}^{n}\epsilon_{i}^{2}=\mathrm {ESS} +\mathrm {RSS} \\ \end{aligned}} }$$

Bias-Variance Trade-Off $${\displaystyle \text{Theoretical Assumption}:\quad y=f(x)+\varepsilon }$$ $${\displaystyle \begin{aligned} {\text{MSE}}&\triangleq \operatorname {E} {\big [}(y-{\hat {f}})^{2}{\big ]} \\ &=\operatorname {E} {\big [}y^{2}-2y{\hat {f}}+{\hat {f}}^{2}{\big ]}=\underbrace{\operatorname {E} {\big [}y^{2}{\big ]}}_{(1)}-2\underbrace{\operatorname {E} {\big [}y{\hat {f}}{\big ]}}_{(2)}+ \underbrace{\operatorname {E} {\big [}{\hat {f}}^{2}{\big ]}}_{(3)} \\ &=f^{2}+\sigma ^{2}-2f\operatorname {E} [{\hat {f}}]+\operatorname {Var} [{\hat {f}}]+\operatorname {E} [{\hat {f}}]^{2}\\&=(f-\operatorname {E} [{\hat {f}}])^{2}+\sigma ^{2}+\operatorname {Var} {\big [}{\hat {f}}{\big ]}\\[5pt]&=\operatorname {Bias} [{\hat {f}}]^{2}+\operatorname {Var} {\big [}{\hat {f}}{\big ]} +\sigma ^{2} \\ \end{aligned} }$$ $${\displaystyle {\begin{aligned} (1):\quad \operatorname {E} {\big [}y^{2}{\big ]}&=\operatorname {E} {\big [}(f+\varepsilon )^{2}{\big ]}\\&=\operatorname {E} [f^{2}]+2\operatorname {E} [f\varepsilon ]+\operatorname {E} [\varepsilon ^{2}]&&{\text{by linearity of }}\operatorname {E} \\&=f^{2}+2f\operatorname {E} [\varepsilon ]+\operatorname {E} [\varepsilon ^{2}]&&{\text{since }}f{\text{ does not depend on the data}}\\&=f^{2}+2f\cdot 0+\sigma ^{2}&&{\text{since }}\varepsilon {\text{ has zero mean and variance }}\sigma ^{2} \end{aligned}} }$$ $${\displaystyle {\begin{aligned} (2):\quad \operatorname {E} {\big [}y{\hat {f}}{\big ]}&=\operatorname {E} {\big [}(f+\varepsilon ){\hat {f}}{\big ]}\\&=\operatorname {E} [f{\hat {f}}]+\operatorname {E} [\varepsilon {\hat {f}}]&&{\text{by linearity of }}\operatorname {E} \\&=\operatorname {E} [f{\hat {f}}]+\operatorname {E} [\varepsilon ]\operatorname {E} [{\hat {f}}]&&{\text{since }}{\hat {f}}{\text{ and }}\varepsilon {\text{ are independent}}\\&=f\operatorname {E} [{\hat {f}}]&&{\text{since }}\operatorname {E} [\varepsilon ]=0 \end{aligned}} }$$ $${\displaystyle {\begin{aligned} (3):\quad \operatorname {E} {\big [}{\hat {f}}^{2}{\big ]}&=\operatorname {Var} ({\hat {f}})+\operatorname {E} [{\hat {f}}]^{2}&&{\text{since }}\operatorname {Var} [X]\triangleq \operatorname {E} {\Big [}(X-\operatorname {E} [X])^{2}{\Big ]}=\operatorname {E} [X^{2}]-\operatorname {E} [X]^{2}{\text{ for any random variable }}X \end{aligned}} }$$

Model Evaluation $${\displaystyle {\begin{aligned} \text{Coefficient of determination:} &\quad r^{2} && \equiv 1 - \frac{\mathrm{RSS}}{\mathrm{TSS}} = \frac{\mathrm{ESS}}{\mathrm{TSS}} \\ \text{Coefficient of alienation:} &\quad k^{2} && \equiv \frac{\mathrm{RSS}}{\mathrm{TSS}} = 1 - r^{2} \\ \text{F-test statistic:} &\quad F && \equiv \frac{\mathrm{ESS}}{(p+1)-1} \div \frac{\mathrm{RSS}}{n-(p+1)} \\ \; & \; &&= \frac{\mathrm{ESS}/[(p+1)-1]}{\mathrm{RSS}/[n-(p+1)]} \\ \; & \; &&= \underbrace{ \frac{\mathrm{MSR}}{\mathrm{MSE}} \sim \mathbf{F}((p+1)-1, n-(p+1))}_{\text{treatment effect}} \\ \; & \; && \; \\ \end{aligned}} }$$

Implementation: numpy

import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from statsmodels.stats.outliers_influence import variance_inflation_factor

bias = 10.0
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
X = np.c_[np.ones((X.shape[0], 1)), X]
coef = [bias] + coef.tolist()
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))

unbiased_coef = np.linalg.inv(X.T @ X) @ X.T @ y
display(pd.DataFrame(data=dict(coef=coef, unbiased_coef=unbiased_coef)))

H = X@np.linalg.inv(X.T@X)@X.T # projection matrix
h = np.diag(H) # matrix diagonal: influence hat
M = np.eye(X.shape[0]) - H # residual marker matrix

nobs = X.shape[0]
rank = X.shape[1]
outlier_index = h > 3*(rank / nobs)
outliers = np.c_[X, y][outlier_index]

prediction = lambda X: X@unbiased_coef
y_pred = prediction(X)
y_true = y
residuals = y_true - y_pred
studentized_residuals = residuals / (y_true.std(ddof=1)*np.sqrt(1-h))

intercept = True
if intercept:
    # [centered case with intercept]
    # tss = rss + ess
    tss = ((y_true - y_true.mean())**2).sum() # result.centered_tss ~ ((model.endog - model.endog.mean())**2).sum()
    ess = ((y_pred - y_true.mean())**2).sum() # result.ess ~ ((result.fittedvalues - model.endog.mean())**2).sum()
    rss = ((y_true - y_pred)**2).sum()        # result.ssr ~ (result.resid**2).sum()
else:
    # [uncentered case without intercept]
    # tss = rss + ess 
    tss = (y_true**2).sum()                   # result.uncentered_tss ~ (model.endog**2).sum()
    ess = (y_pred**2).sum()                   # result.ess ~ (result.fittedvalues**2).sum()
    rss = ((y_true - y_pred)**2).sum()        # result.ssr ~ (result.resid**2).sum()

rsquared = 1 - (rss/tss)

# normality test
stat_normality, p_normailty = stats.shapiro(residuals) # normality
stat_normality, p_normailty = stats.shapiro(studentized_residuals) # normality
stats.probplot(studentized_residuals, plot=plt) # qqplot

Implementation: sklearn

import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from statsmodels.stats.outliers_influence import variance_inflation_factor

bias = 10.0
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
coef = [bias] + coef.tolist()
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))

model = LinearRegression(fit_intercept=True)
#model = Lasso(alpha=0.01)
#model = Ridge(alpha=0.01)
#model = ElasticNet(alpha=0.01, l1_ratio=0.5)
model = model.fit(X, y)

unbiased_coef = [model.intercept_] + model.coef_.tolist()
display(pd.DataFrame(data=dict(coef=coef, unbiased_coef=unbiased_coef)))

y_pred = model.predict(X)
y_true = y

Implementation: statsmodels(OLS class)

import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor

# data
bias = 10.0
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=5, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
X = np.c_[np.ones((X.shape[0], 1)), X]
coef = [bias] + coef.tolist()
data = pd.DataFrame(np.c_[X, y], columns= ['const']+[ 'X'+str(i) for i in range(1, X.shape[1])]+['Y'])

# multicollinearity
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))
sm.graphics.plot_partregress('X1', 'Y', data.columns.difference(['Y', 'X1'], sort=False).tolist(), data=data, obs_labels=False, ret_coords=False)

# training
model = sm.OLS(endog=data['Y'], exog=data[data.columns.difference(['Y'], sort=False)])
result = model.fit()
display(result.summary()) # null hypothesis: coefficient beta = 0

# result visualization
sm.graphics.plot_ccpr(result, 'X1')
sm.graphics.plot_regress_exog(result, 'X1')

# prediction
y_pred = result.predict(X) # result.fittedvalues
y_true = model.endog

# Coefficient Estimation
result.t_test('const = 9.5')
result.t_test('X1 = 39.4')
result.t_test('X3 = 39.4')
result.t_test('X5 = 39.8')
result.t_test('X1 = X2')
result.t_test('X4 = X1')


# Covariance Matrix of Model Coefficients
# If no argument is specified returns the covariance matrix of a model (scale)*(X.T X)^(-1)
# If contrast is specified it pre and post-multiplies as follows (scale) * r_matrix (X.T X)^(-1) r_matrix.T
# If contrast and other are specified returns (scale) * r_matrix (X.T X)^(-1) other.T
# If column is specified returns (scale) * (X.T X)^(-1)[column,column] if column is 0d OR (scale) * (X.T X)^(-1)[column][:,column] if column is 1d
cov_matrix = result.cov_params()


# Projection Properties
influence = result.get_influence()
leverage = influence.hat_matrix_diag
cooks_d, pvals = influence.cooks_distance

# result.resid / result.resid.std(ddof=int(result.model.rank))
studentized_residual = result.outlier_test()['student_resid']

#estimated_internal_noise_deviation = np.sqrt((result.resid**2).sum() / (result.nobs - model.rank))
#studentized_internal_residual = result.resid / ((estimated_internal_noise_deviation)*np.sqrt(1 - h_ii))
studentized_internal_residual = influence.resid_studentized_internal

studentized_external_residual = influence.resid_studentized_external


# pearson_residual normality
# result.resid / np.sqrt(result.mse_resid * (1 - result.get_influence().hat_matrix_diag))
# result.resid / (result.resid.std(ddof=int(model.rank))*np.sqrt(1 - h_ii))
pearson_residual = result.resid_pearson 
sm.qqplot(pearson_residual, line='45') # residual normality

# outliers1
h_ii = influence.hat_matrix_diag
outlier_index = h_ii > 3*(model.rank/model.nobs)
outliers = np.c_[X, y][outlier_index]
sm.graphics.plot_leverage_resid2(result) # residual & leverage

# outliers2
K = influence.k_vars # model.rank
cooks_d2, pvals = influence.cooks_distance
outlier_boundary = 4 / (model.nobs - K - 1)
outlier_index = cooks_d2 > outlier_boundary
outliers = np.c_[X, y][outlier_index]
sm.graphics.influence_plot(result, plot_alpha=0.3) # Cook’s distance


# Core Parameters
intercept = True
tss = result.centered_tss if intercept else result.uncentered_tss
ess = result.ess
rss = result.ssr # ((result.resid)**2).sum()
rsquared = result.rsquared # ~ ess/tss # result.ess/result.centered_tss or result.ess/result.uncentered_tss
df_model = result.df_model # model.rank - model.k_constant
df_resid = result.df_resid # result.nobs - model.rank
mse_tss = result.mse_total # result.centered_tss / (result.df_model + result.df_resid) or result.uncentered_tss / (result.df_model + result.df_resid)
mse_ess = result.mse_model # result.ess / result.df_model
mse_rss = result.mse_resid # result.ssr / result.df_resid
fvalue = result.fvalue # result.mse_model / result.mse_resid

# Additional Parameters
summary_scalar_table = dict()
summary_scalar_table['Log-Likelihood'] = result.llf
summary_scalar_table['R-squared'] = result.rsquared
summary_scalar_table['Adj. R-squared'] = result.rsquared_adj
summary_scalar_table['F-statistic'] = result.fvalue
summary_scalar_table['Prob (F-statistic)'] = result.f_pvalue
summary_scalar_table['AIC'] = result.aic # -2*result.llf + 2*model.rank
summary_scalar_table['BIC'] = result.bic # -2*result.llf + np.log(result.nobs)*model.rank
summary_scalar_table['Omnibus'] = sm.stats.omni_normtest(result.resid).statistic
summary_scalar_table['Prob(Omnibus)'] = sm.stats.omni_normtest(result.resid).pvalue
summary_scalar_table['Jarque-Bera (JB)'] = sm.stats.jarque_bera(result.resid)[0]
summary_scalar_table['Prob(JB)'] = sm.stats.jarque_bera(result.resid)[1]
summary_scalar_table['Skew'] = sm.stats.jarque_bera(result.resid)[2]
summary_scalar_table['Kurtosis'] = sm.stats.jarque_bera(result.resid)[3]
summary_scalar_table['Cond. No.'] = result.condition_number
summary_scalar_table['No. Observations'] = result.nobs
summary_scalar_table['Df Residuals'] = result.df_resid # result.nobs - model.rank
summary_scalar_table['Df Model'] = result.df_model # model.rank - model.k_constant
summary_scalar_table['Rank'] = model.rank
summary_scalar_table['Constraint'] = model.k_constant
summary_scalar_table['tss'] = result.centered_tss     # with intercept
summary_scalar_table['tss_'] = result.uncentered_tss  # without intercept
summary_scalar_table['ess'] = result.ess
summary_scalar_table['rss'] = result.ssr # (result.resid**2).sum()
summary_scalar_table['mse_total'] = result.mse_total # (result.centered_tss or result.uncentered_tss) / ( result.df_model + result.df_resid )
summary_scalar_table['mse_model'] = result.mse_model # result.ess / result.df_model
summary_scalar_table['mse_resid'] = result.mse_resid # result.ssr / result.df_resid

summary_vector_table = dict()
summary_vector_table['coef'] = result.params.squeeze()
summary_vector_table['cov_matrix'] = result.cov_params()
summary_vector_table['y_true'] = y                   # > tss
summary_vector_table['y_pred'] = result.fittedvalues # > ess # result.predict(df[df.columns.difference(['Y'], sort=False)])
summary_vector_table['residual'] = result.resid      # > rss
summary_vector_table['pearson_residual'] = result.resid_pearson
summary_vector_table['hat'] = result.get_influence().hat_matrix_diag

Implementation: statsmodels(from_formula; pasty formula)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
import statsmodels.api as sm
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor

bias = 100
X, y, coef = make_regression(n_samples=3000, n_features=5, n_informative=3, n_targets=1, bias=bias, effective_rank=None, tail_strength=0.5, noise=50.0, shuffle=True, coef=True, random_state=np.random.randint(1000))
df = pd.DataFrame(data=np.c_[X, y], columns=list(map(lambda x: 'X'+str(x), range(1,6))) + ['Y'])
df.insert(0, 'const', 1.0)
coef = [bias] + coef.tolist()
vif = pd.Series(map(lambda i: variance_inflation_factor(X, i), range(X.shape[1])))

model = sm.OLS.from_formula("Y ~ X1 + X2 + X3 + X4 + X5 ", data=df) # smf.ols("Y ~ X1 + X2 + X3 + X4 + X5 ", data=df)
result = model.fit()
sm.graphics.plot_partregress_grid(result, fig=plt.figure(figsize=(30,10))) # partial regresssion
sm.graphics.plot_ccpr_grid(result, fig=plt.figure(figsize=(30,10))) # ccpr: component component plus residual

h_ii = result.get_influence().hat_matrix_diag
outlier_index = h_ii > 3*(model.rank/model.nobs)
outliers = np.c_[X, y][outlier_index]
sm.qqplot(result.resid_pearson, line='45') # residual normality
display(result.summary())

evaluation = dict()
evaluation['rsquared'] = result.rsquared
evaluation['fvalue'] = result.fvalue

param_space = dict()
param_space['target'] = coef
param_space['naive'] = result.params.tolist()
param_space['ridge'] = model.fit_regularized(alpha=0.01, L1_wt=0).params.tolist()  # Ridge Constraint: w ** 2
param_space['elastic'] = model.fit_regularized(alpha=0.01, L1_wt=.5).params.tolist() # Elastic Constraint: absolute w & w ** 2  
param_space['lasso'] = model.fit_regularized(alpha=0.01, L1_wt=1).params.tolist()  # Lasso Constraint: absolute w

# F-Test for Coefficient-Hypothesis
# sms.anova_lm(result, typ=2)
sms.anova_lm(sm.OLS.from_formula("Y ~ X1 + X2 + X3 + X4 + X5 ", data=df).fit(), typ=2)

# T-Test for Coefficient-Hypothesis
sm.OLS(endog=df['Y'], exog=df[df.columns.difference(['Y'], sort=False)]).fit().t_test('const = 0')

# heteroscedasticity
lm_stats, lm_pval, f_stats, f_pval = sms.het_breuschpagan(result.resid, result.model.exog) # Null (H0): Signifies that Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
lm_stats, lm_pval, f_stats, f_pval = sms.het_white(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
stats, pval, _ = sms.het_goldfeldquandt(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.

Weighted Least Square(WLS) and Generalized Least Square(GLS)

$${\displaystyle \begin{aligned} \text{OLS} \;&\begin{cases} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} \\ \; &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} , \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma_{\boldsymbol{\varepsilon}}^{2}\mathbf{I}_{n})\\ \operatorname{E}[{\hat {\boldsymbol {\beta }}}] &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \operatorname{E}[ \mathbf {X} ^{\mathsf {T}}\boldsymbol{\varepsilon} ] = \boldsymbol{\beta} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}}] &= \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \operatorname{Var}[\mathbf{X}^{\mathsf {T}}\boldsymbol{\varepsilon} ] \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \; &= \sigma_{\boldsymbol{\varepsilon}}^{2} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1} \\ \end{cases} \\ \text{GLS} \;&\begin{cases} {\hat {\boldsymbol {\beta }}} &=\left(\mathbf {X}^{\mathsf {T}} \boldsymbol{\Omega}^{-1} \mathbf {X} \right)^{-1}\mathbf {X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {y} \\ \; &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\boldsymbol{\varepsilon} , \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Omega})\quad \boldsymbol{\Omega}:\text{Diagonal Matrix} \\ \operatorname{E}[{\hat {\boldsymbol {\beta }}}] &= \boldsymbol{\beta} + \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \operatorname{E}[ \mathbf {X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\boldsymbol{\varepsilon} ] = \boldsymbol{\beta} \\ \operatorname{Var}[{\hat {\boldsymbol {\beta }}}] &= \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \operatorname{Var}[\mathbf{X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\boldsymbol{\varepsilon} ] \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \\ \; &= \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \underbrace{\mathbf{X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\operatorname{Var}[\boldsymbol{\varepsilon} ]\boldsymbol{\Omega}^{-\mathsf {T}} \mathbf{X} }_{= \mathbf{X}^{\mathsf {T}}\boldsymbol{\Omega}^{-1} \mathbf{X}}\left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \\ \; &= \left(\mathbf {X} ^{\mathsf {T}}\boldsymbol{\Omega}^{-1}\mathbf {X} \right)^{-1} \\ \end{cases} \\ \end{aligned} }$$

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import matplotlib.pyplot as plt

X = np.random.normal(0, 1, size=3000)
y = 3*X + np.random.normal(0, 1, size=3000)

ols = sm.OLS(y, X).fit()
wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit()
gls = sm.GLS(y, X, sigma=np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit()

regularized_ols = sm.OLS(y, X).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_gls = sm.GLS(y, X, sigma=np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)

ols = sm.WLS(y, X, weights=1).fit()
wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit()
gls = sm.WLS(y, X, weights=1/np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit()

regularized_ols = sm.WLS(y, X, weights=1).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_wls = sm.WLS(y, X, weights=1/sm.OLS(y, X).fit().resid**2).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
regularized_gls = sm.WLS(y, X, weights=1/np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))).fit_regularized(method=['elastic_net', 'sqrt_lasso'][0], alpha=0.1, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)

# prediction intervals
prstd, obs_ci_lower, obs_ci_upper = wls_prediction_std(ols)
prstd, obs_ci_lower, obs_ci_upper = wls_prediction_std(wls)
prstd, obs_ci_lower, obs_ci_upper = wls_prediction_std(gls)
ols.get_prediction().summary_frame(alpha=.05)
wls.get_prediction().summary_frame(alpha=.05)
gls.get_prediction().summary_frame(alpha=.05)

# visualization
fig, axes = plt.subplots(4, 1, figsize=(30,10))
axes[0].plot(ols.resid, label=f'OLS[{ols.resid.mean().round(6)}, {ols.resid.var(ddof=0).round(6)}]', alpha=.3)
axes[0].plot(wls.resid, label=f'WLS[{wls.resid.mean().round(6)}, {wls.resid.var(ddof=0).round(6)}]', alpha=.3)
axes[0].plot(gls.resid, label=f'GLS[{gls.resid.mean().round(6)}, {gls.resid.var(ddof=0).round(6)}]', alpha=.3)
axes[0].legend(loc='upper right', frameon=False); axes[0].set_title('Residuals')
axes[1].plot(X); axes[1].set_title('X')
axes[2].plot(y); axes[2].set_title('y')
axes[3].plot(y - 3*X); axes[3].set_title('e')
plt.tight_layout()

pd.DataFrame(np.c_[y, X, ols.fittedvalues, ols.resid, wls.fittedvalues, wls.resid, gls.fittedvalues, gls.resid], columns=['y_true','X','y_ols','resid_ols','y_wls','resid_wls','y_gls','resid_gls']).corr()

import numpy as np
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

X = np.random.normal(0, 1, size=3000)
y = 3*X + np.random.normal(0, 1, size=3000)

ols_sm = sm.WLS(y, X, weights=1).fit()
ols_sklearn = LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y, sample_weight=1)
print(ols_sm.params, ols_sklearn.coef_)

weights = 1/sm.OLS(y, X).fit().resid**2
sample_weight = 1/(y - LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y).predict(X[:, np.newaxis]))**2
wls_sm = sm.WLS(y, X, weights=weights).fit()
wls_sklearn = LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y, sample_weight=sample_weight)
print(wls_sm.params, wls_sklearn.coef_)

weights = 1/np.diag(np.kron(sm.OLS(y, X).fit().resid.var(ddof=0), np.identity(y.shape[0])))
sample_weight = 1/np.diag(np.kron(((y - LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y).predict(X[:, np.newaxis]))**2).var(ddof=0), np.identity(y.shape[0])))
gls_sm = sm.WLS(y, X, weights=weights).fit()
gls_sklearn = LinearRegression(fit_intercept=False).fit(X[:, np.newaxis], y, sample_weight=sample_weight)
print(gls_sm.params, gls_sklearn.coef_)

Maximum Likelihood Estimation

Maximum Likelihood Estimation $${\displaystyle \text{Likelihood}}$$ $${\displaystyle \begin{align} {\mathcal {L}}({\boldsymbol{\theta}} \mid \mathbf{y}) &= p (\mathbf{y} \mid {\boldsymbol{\theta}} ), \quad \mathbf{y} \sim {\mathcal {N}}(\mathbf{X}{\boldsymbol{\beta}}, \sigma^{2}\mathbf{I}_{n}) \\ \; &= \frac{1}{({2\pi})^{(p+1)/2} | \sigma | } \text{exp} \left\{ - \frac{1}{2} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} )^{T}(\sigma^{2}\mathbf{I}_{n})^{-1}(\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}}) \right\} \\ \; & \; \\ \text{log}\,{\mathcal {L}}({\boldsymbol{\theta}} \mid \mathbf{y}) &= -\frac{1}{2\sigma^{2}} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} )^{T} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} ) - \left( \frac{p+1}{2}\text{log}\,2\pi + \text{log}\, | \sigma | \right) \\ \; & \; \\ \end{align} }$$ $${\displaystyle \text{Estimation}}$$ $${\displaystyle \begin{align} {\underset {{\boldsymbol{\beta}} \in \mathbb {R}^{p+1} }{\operatorname {argmax} }}\, \text{log}\, {\mathcal {L}}({\boldsymbol{\theta}} = {\boldsymbol{\beta}} \mid \mathbf{y}) &= {\underset {{\boldsymbol{\beta}} \in \mathbb {R}^{p+1} }{\operatorname {argmax} }}\, \left\{ \sum_{i=1}^{n} \left( -\frac{1}{2\sigma^{2}} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} )^{T} (\mathbf{y} - \mathbf{X}{\boldsymbol{\beta}} ) \right) - n \left( \frac{p+1}{2}\text{log}\,2\pi + \text{log}\, | \sigma | \right) \right\} \\ \end{align} }$$ $${\displaystyle \begin{align} \frac{\partial}{\partial {\boldsymbol{\beta}}}\text{log}\, {\mathcal {L}}({\boldsymbol{\theta}} = {\boldsymbol{\beta}} \mid \mathbf{y}) &= \frac{1}{2\sigma^{2}} (2\mathbf{X}^{T}\mathbf{X} {\boldsymbol{\beta}} - 2\mathbf{X}^{T}\mathbf{y}) = 0 \\ \therefore\; {\hat {\boldsymbol {\beta }}} &\equiv {\boldsymbol{\beta}}_{MLE} = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \\ {\hat{\mathbf{y}}} &\equiv \mathbf{X}{\hat {\boldsymbol {\beta }}} \\ \end{align} }$$

Generalized Linear Regression: MLE

Assumption: Probability Distribution for Linearization

# https://en.wikipedia.org/wiki/Generalized_linear_model
import numpy as np
import pandas as pd
import statsmodels.api as sm

X = np.linspace(-3, 3, 100)
y = -(.03*X + .0001*X**2 - 1.0) + .001 * np.random.random(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gaussian(sm.families.links.Identity())).fit()
result.summary()

X = np.linspace(-3, 3, 100)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + .1 * np.random.random(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gaussian(sm.families.links.Log())).fit()
result.summary()

X = np.linspace(-3, 3, 100)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + .001 * np.random.gamma(shape=1, scale=1.0, size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gamma(sm.families.links.Log())).fit()
result.summary()

X = np.linspace(-3, 3, 100)
y = -(.03*X + .0001*X**2 - 1.0) + .001 * np.random.random(size=X.shape[0])
result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Tweedie(sm.families.links.Identity())).fit()
result.summary()

# https://en.wikipedia.org/wiki/Generalized_linear_model
import numpy as np
import pandas as pd
import statsmodels.api as sm

X = np.linspace(-3, 3, 100)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + .001 * np.random.normal(size=X.shape[0])

result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gaussian(sm.families.links.Log())).fit()
result.summary()

# residuals
result.resid_response # result.model.endog = result.fittedvalues + result.resid_response
result.resid_pearson
result.resid_working
result.resid_deviance
result.resid_anscombe
result.resid_anscombe_scaled
result.resid_anscombe_unscaled

# canonical relations between distribtion and link function 
sm.families.Binomial(sm.families.links.Logit()) # y range: [0, 1, 2, ..., n] / count of # of "yes" occurrences out of N yes/no occurrences	
sm.families.Poisson(sm.families.links.Log()) # y range: [0, 1, 2, ..., np.inf) / count of occurrences in fixed amount of time/space	
sm.families.Gamma(sm.families.links.InversePower()) # y range: (0, np.inf) / Exponential-response data, scale parameters
sm.families.Gaussian(sm.families.links.Identity())  # y range: (-np.inf, np.inf) / Linear-response data	
sm.families.InverseGaussian(sm.families.links.InverseSquared()) # y range: (0, np.inf)

# distributions
sm.families.Binomial
sm.families.NegativeBinomial
sm.families.Poisson
sm.families.Gaussian
sm.families.Gamma
sm.families.InverseGaussian
sm.families.Tweedie

# link functions
sm.families.links.CDFLink()
sm.families.links.CLogLog()
sm.families.links.Cauchy()
sm.families.links.Identity()
sm.families.links.InversePower()
sm.families.links.InverseSquared()
sm.families.links.Log()
sm.families.links.LogC()
sm.families.links.LogLog()
sm.families.links.Logit()
sm.families.links.NegativeBinomial()
sm.families.links.Power()
sm.families.links.Probit()
sm.families.links.Sqrt()

# https://en.wikipedia.org/wiki/Generalized_linear_model
import numpy as np
import pandas as pd
import statsmodels.api as sm

X = np.random.uniform(-3, 3, 1000)
y = np.exp(-(.03*X + .0001*X**2 - 1.0)) + 0.01*np.random.gamma(1, 2, size=X.shape[0])

result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gamma(sm.families.links.Log()), freq_weights=None, var_weights=y).fit()
weighted_glm_corr = pd.DataFrame(np.c_[y, X, result.fittedvalues, result.resid_response], columns=['y', 'X', 'fittedvalues', 'resid']).corr().round(4)
weighted_glm_corr.columns = pd.MultiIndex.from_product([['WeightGLM'], weighted_glm_corr.columns.tolist()])

result = sm.GLM(y, np.c_[np.ones_like(X), X, X**2], family=sm.families.Gamma(sm.families.links.Log()), freq_weights=None, var_weights=None).fit()
glm_corr = pd.DataFrame(np.c_[y, X, result.fittedvalues, result.resid_response], columns=['y', 'X', 'fittedvalues', 'resid']).corr().round(4)
glm_corr.columns = pd.MultiIndex.from_product([['GLM'], glm_corr.columns.tolist()])

result = sm.OLS(np.log(y), np.c_[np.ones_like(X), X, X**2]).fit()
ols_corr = pd.DataFrame(np.c_[y, X, result.fittedvalues, result.resid], columns=['y', 'X', 'fittedvalues', 'resid']).corr().round(4)
ols_corr.columns = pd.MultiIndex.from_product([['OLS'], ols_corr.columns.tolist()])

corr = pd.concat([weighted_glm_corr, glm_corr, ols_corr], axis=1)
corr

Regression Noise: Residual Analysis

import numpy as np
import pandas as pd
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, ARX, ConstantVariance, GARCH, Normal
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

def residual_analysis(y, nlags=40):
    y_m1 = y; y_m2 = y**2
    fig = plt.figure(figsize=(30, 10)); gs = mpl.gridspec.GridSpec(4, 12)
    fig.add_subplot(gs[0, 0:6]).plot(y_m1); fig.axes[0].set_title('Time Series: 1st Moment'); [fig.axes[0].spines[i].set_visible(False) for i in ['top', 'right']]
    fig.add_subplot(gs[0, 6:12]).plot(y_m2); fig.axes[1].set_title('Time Series: 2nd Moment'); [fig.axes[1].spines[i].set_visible(False) for i in ['top', 'right']]
    smt.graphics.plot_acf(y_m1, lags=nlags, ax=fig.add_subplot(gs[1, 0:4])); [fig.axes[2].spines[i].set_visible(False) for i in ['top', 'right']]
    smt.graphics.plot_pacf(y_m1, lags=nlags, ax=fig.add_subplot(gs[2, 0:4])); [fig.axes[3].spines[i].set_visible(False) for i in ['top', 'right']]
    sns.heatmap(pd.DataFrame(np.c_[[np.roll(y_m1, shift=i) for i in range(nlags)]].T).corr(), cmap="RdBu", cbar_kws={"shrink": 0.5}, square=True, center=0, linewidth=.5, ax=fig.add_subplot(gs[1:3, 4:6])).set_title('Autocorrelation')
    smt.graphics.plot_acf(y_m2, lags=nlags, ax=fig.add_subplot(gs[1, 6:10])); [fig.axes[6].spines[i].set_visible(False) for i in ['top', 'right']]
    smt.graphics.plot_pacf(y_m2, lags=nlags, ax=fig.add_subplot(gs[2, 6:10])); [fig.axes[7].spines[i].set_visible(False) for i in ['top', 'right']]
    sns.heatmap(pd.DataFrame(np.c_[[np.roll(y_m2, shift=i) for i in range(nlags)]].T).corr(), cmap="RdBu", cbar_kws={"shrink": 0.5}, square=True, center=0, linewidth=.5, ax=fig.add_subplot(gs[1:3, 10:12])).set_title('Autocorrelation')
    list(map(lambda lag: pd.plotting.lag_plot(pd.Series(y_m1), lag=lag, ax=fig.add_subplot(gs[3, lag-1])), range(1, 7)))
    list(map(lambda lag: pd.plotting.lag_plot(pd.Series(y_m2), lag=lag, ax=fig.add_subplot(gs[3, 6+lag-1])), range(1, 7)))
    plt.tight_layout()

    y_m1_acf = smt.acf(y_m1, adjusted=False, nlags=nlags, qstat=False, fft=True, alpha=None, bartlett_confint=True, missing='none')
    y_m1_pacf = smt.pacf(y_m1, nlags=nlags, method='ywadjusted', alpha=None)
    y_m2_acf = smt.acf(y_m2, adjusted=False, nlags=nlags, qstat=False, fft=True, alpha=None, bartlett_confint=True, missing='none')
    y_m2_pacf = smt.pacf(y_m2, nlags=nlags, method='ywadjusted', alpha=None)
    autocorr = pd.DataFrame(np.c_[y_m1_acf, y_m1_pacf, y_m2_acf, y_m2_pacf])
    autocorr.index.name = 'Lags'
    autocorr.columns = pd.MultiIndex.from_tuples([('1M', 'ACF'), ('1M', 'PACF'), ('2M', 'ACF'), ('2M', 'PACF')])
    display(autocorr.T)

white_noise = np.random.normal(size=1000) # white noise
arma_noise = smt.ArmaProcess(ar=[1, -.3, -.2, .3], ma=[1]).generate_sample(nsample=1000, burnin=50) 
arx_garch_noise = ARX(lags=list(range(1, 3)), volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0.5, 0.8] + [0.1, 0.1, 0.1, 0.1], nobs=1000, burn=500)['data'].values
cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [0.1, .8, .1], nobs=1000, burn=500)['data'].values
y = cont_garch_noise

residual_analysis(y, nlags=40)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from arch.univariate import ARX, GARCH, Normal
import statsmodels.api as sm

cont_garch_noise = ARX(volatility=GARCH(p=1, o=0, q=1), distribution=Normal(), lags=2).simulate([0] + [.4, .3]  + [.8, .2, .7], nobs=1000, burn=500)['data'].values
X1 = np.random.normal(size=1000)
X2 = .5*X1 + np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = X1 + X2 + X3 + cont_garch_noise + 5*epsilon
resid = sm.OLS(y, np.c_[X1, X2, X3]).fit().resid

# Heteroscedasticity
pd.Series(resid**2).plot(figsize=(30,5)) 

# Autocorrelation[residual(t) <-> residual(t-1): Correlation]
# Multicollinearity[X <-> X': Correlation]
# Endogeneity[X <-> resid: No Correlation & Z <-> resid: Correlation]
with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(pd.DataFrame(np.c_[[np.roll(resid, i) for i in range(1,3)]].T).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # Autocorrealtion
    display(pd.DataFrame(np.c_[X1, X2]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # Multicollinearity
    display(pd.DataFrame(np.c_[X3, Z, resid]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'})) # Endogeneity

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.tsa.api as smt

y = smt.ArmaProcess(ar=[1, -.3, -.2], ma=[1]).generate_sample(nsample=1000, burnin=50).cumsum()
y = pd.Series(y, index=pd.date_range('00:00:00', periods=y.shape[0], freq='D'), name='Regression')

result = sm.OLS(endog=y, exog=np.c_[np.ones_like(y), np.arange(y.shape[0]), y.shift(1).ewm(com=10).var().bfill(), y.shift(1).bfill(), y.shift(2).bfill()]).fit()
y_true = y.copy()
y_pred = pd.Series(result.predict(), index=y.index)
y_resid = result.resid

plt.figure(figsize=(30,5)); [plt.gca().spines[spine].set_visible(False) for spine in ['top', 'right']]
plt.plot(y_resid, c='green', alpha=.4, label='residual')
plt.plot(y_true, c='black', label='data')
plt.plot(y_pred, c='r', label='prediction')
plt.legend(frameon=False)

Estimator not to be hold for the assumptions

Exogeneity: IV2SLS

import numpy as np
import pandas as pd
import statsmodels.api as sm

X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + epsilon # target
data = pd.DataFrame(np.c_[X1, X2, X3, Z, y], columns=['X1', 'X2', 'X3', 'Z', 'y'])

# Endogeneity
resid = sm.OLS(data['y'], exog=data.drop(['y', 'Z'], axis=1)).fit().resid
with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(pd.DataFrame(np.c_[X3, Z, resid]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))       
    #display(pd.DataFrame(np.c_[X3, Z, epsilon]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))

import numpy as np
import pandas as pd
import statsmodels.api as sm

X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + epsilon # target
data = pd.DataFrame(np.c_[X1, X2, X3, Z, y], columns=['X1', 'X2', 'X3', 'Z', 'y'])

print('[ISSUE:Perpendicularity]', np.corrcoef(X3, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
result = sm.OLS(data['y'], exog=data.drop(['y', 'Z'], axis=1)).fit()
display(result.summary().tables[1]) # Check X3 coefficient, R-squared is not working, because of linearity.

first_stage_ols_result = sm.OLS(endog=X3, exog=data.drop(['y', 'X3'], axis=1)).fit()
X_hat = first_stage_ols_result.fittedvalues
last_stage_ols_result = sm.OLS(endog=y, exog=np.c_[data.drop(['y', 'X3', 'Z'], axis=1), X_hat]).fit()
display(last_stage_ols_result.summary().tables[1]) # Check X3 coefficient, R-squared is not working, because of linearity.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.sandbox.regression import gmm

X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X3 = 3*Z + epsilon + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + epsilon # target
data = pd.DataFrame(np.c_[X1, X2, X3, Z, y], columns=['X1', 'X2', 'X3', 'Z', 'y'])

print('[ISSUE:Perpendicularity]', np.corrcoef(X3, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
result = gmm.IV2SLS(y, exog=data.drop(['y', 'Z'], axis=1), instrument=data.drop(['y', 'X3'], axis=1)).fit()
display(result.summary().tables[1]) # Check X3 coefficient, R-squared is not working, because of linearity.

Multicollinearity: PCA

import numpy as np
import pandas as pd

X1 = np.random.normal(size=1000)
X2 = .5*X1 + np.random.normal(size=1000)
X3 = .5*X1 + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + np.random.normal(0, 10, size=1000)
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])

# Multicollinearity
with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(data.drop('y', axis=1).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))   
    #display(pd.DataFrame(np.c_[X1, X2, X3]).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))

import numpy as np
import pandas as pd
import statsmodels.api as sm

X1 = np.random.normal(size=1000)
X2 = .5*X1 + np.random.normal(size=1000)
X3 = .5*X1 + np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + np.random.normal(0, 10, size=1000)
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])
print('[ISSUE:Multicollinearity]', data[['X1', 'X2', 'X3']].corr().iloc[0, :].round(4).tolist())

result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[2]) # Check Cond. No.

pca = sm.PCA(data=data.drop('y', axis=1), ncomp=2, demean=True, standardize=True, normalize=False, method='svd', gls=False, weights=None, missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100, svd_full_matrices=False)
result2 = sm.OLS(data['y'], exog=pca.factors).fit()
display(result2.summary().tables[2]) # Check Cond. No.

Autocorrelation: ARIMA

import numpy as np
import pandas as pd
import statsmodels.api as sm

white_noise = np.random.normal(size=1000)
y = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
    y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])

# Autocorrelation
resid = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit().resid
with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(pd.DataFrame(np.c_[[np.roll(resid, i) for i in range(1,3)]].T).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))   # np.cov(df.T.values)

import numpy as np
import pandas as pd
import statsmodels.api as sm

white_noise = np.random.normal(size=1000)
y = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
    y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])

result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Durbin-Watson and R-squared
print('[ISSUE:Autocorrelation]', pd.DataFrame(np.c_[[np.roll(result1.resid, i) for i in range(1,4)]].T).corr().iloc[0,:].round(4).tolist())

data['shift1_r'] = result1.resid.shift(1).fillna(result1.resid.iloc[:1])
data['shift2_r'] = result1.resid.shift(2).fillna(result1.resid.iloc[:2])
result2 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result2.summary().tables[1]) # Check Durbin-Watson and R-squared

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt

white_noise = np.random.normal(size=1000)
y = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
    y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])

result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Durbin-Watson and R-squared
print('[ISSUE:Autocorrelation]', pd.DataFrame(np.c_[[np.roll(result1.resid, i) for i in range(1,4)]].T).corr().iloc[0,:].round(4).tolist())

result2 = smt.SARIMAX(data['y'], exog=data.drop(['y'], axis=1), order=(2,0,0)).fit(disp=0)
display(result2.summary().tables[1])

Heteroscedasticity: GLS

import numpy as np
import pandas as pd
import statsmodels.api as sm
from arch.univariate import ConstantMean, GARCH, Normal

cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + cont_garch_noise
data = pd.DataFrame(np.c_[np.ones_like(y), X1, X2, X3, y], columns=['Const', 'X1', 'X2', 'X3', 'y'])

# Heteroscedasticity
resid = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit().resid
pd.Series(resid**2).plot(figsize=(30,5))
#pd.Series(cont_garch_noise**2).plot(figsize=(30,5))

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, GARCH, Normal
from scipy.linalg import toeplitz

cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + cont_garch_noise
data = pd.DataFrame(np.c_[np.ones_like(y), X1, X2, X3, y], columns=['Const', 'X1', 'X2', 'X3', 'y'])

result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Omnibus and Jarque-Bera
print('[ISSUE:Heteroscedasticity]', round(result1.resid.iloc[:500].var(ddof=1), 4), round(result1.resid.iloc[500:].var(ddof=1), 4))

rho = sm.OLS(np.asarray(result1.resid)[1:], sm.add_constant(np.asarray(result1.resid)[:-1])).fit().params[1]
order = toeplitz(range(len(result1.resid)))
sigma = rho ** order

result2 = sm.GLS(data['y'], data.drop('y', axis=1), sigma=sigma).fit()
display(result2.summary().tables[1]) # for unbiased estimation

Autocorrelation & Heteroscedasticity: GLSAR

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, GARCH, Normal
from scipy.linalg import toeplitz

cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
y = np.empty_like(cont_garch_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(cont_garch_noise):
    y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])

# Heteroscedasticity & Autocorrelation
resid = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit().resid
pd.Series(resid**2).plot(figsize=(30,5)) #pd.Series(cont_garch_noise**2).plot(figsize=(30,5))
with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(pd.DataFrame(np.c_[[np.roll(resid, i) for i in range(1,3)]].T).corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))   # np.cov(df.T.values)

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from arch.univariate import ConstantMean, GARCH, Normal
from scipy.linalg import toeplitz

cont_garch_noise = ConstantMean(volatility=GARCH(p=1, o=0, q=1), distribution=Normal()).simulate([0] + [.8, .2, .7], nobs=1000, burn=500)['data'].values
y = np.empty_like(cont_garch_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(cont_garch_noise):
    y[t] = X[t] + .5*(y[t-1] - X[t-1]) + .2*(y[t-2]-X[t-2]) + noise
data = pd.DataFrame(np.c_[X1, X2, X3, y], columns=['X1', 'X2', 'X3', 'y'])

result1 = sm.OLS(data['y'], exog=data.drop('y', axis=1)).fit()
display(result1.summary().tables[1]) # Check Omnibus and Jarque-Bera
print('[ISSUE:Heteroscedasticity]', round(result1.resid.iloc[:500].var(ddof=1), 4), round(result1.resid.iloc[500:].var(ddof=1), 4))
print('[ISSUE:Autocorrelation]', pd.DataFrame(np.c_[[np.roll(result1.resid, i) for i in range(1,4)]].T).corr().iloc[0,:].round(4).tolist())

result2 = sm.GLSAR(data['y'], exog=data.drop('y', axis=1), rho=2).iterative_fit(1)
display(result2.summary().tables[1]) # for unbiased estimation

Covariance Estimation

# https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.get_robustcov_results.html
# https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.RegressionResults.cov_params.html

Confidence intervals

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

X = np.random.normal(size=100)
y = 3*X + 5 +  np.random.normal(size=100)
result = sm.OLS(endog=y, exog=np.c_[np.ones_like(y), X]).fit()
fig, axes = plt.subplots(2, 1, figsize=(30,7)); [axes[i].spines[spine].set_visible(False) for spine in ['top', 'right'] for i in range(0,2)]

y_true = result.model.endog
y_pred = result.fittedvalues
y_resid = result.resid
axes[0].plot(y_true, c='black', marker='o', label='data')
axes[0].plot(y_resid, c='green', alpha=.4, label='residual')
axes[0].plot(y_pred, c='r', label='prediction')
axes[0].legend(frameon=False)

sf = result.get_prediction().summary_frame(alpha=.05)
sf.insert(0, 'obs_X', X)
sf.insert(1, 'obs_y', result.model.endog)
sf = sf.sort_values(by=['obs_X'], ascending=True)

axes[1].plot(sf['obs_X'], sf['obs_y'], c='black', lw=0, marker='o', label='data')
axes[1].plot(sf['obs_X'], sf['mean'], c='red', label='prediction')
axes[1].errorbar(sf['obs_X'], sf['mean'], yerr=sf['mean_se'], alpha=.4, color='blue')
axes[1].fill_between(sf['obs_X'], y1=sf['mean_ci_lower'], y2=sf['mean_ci_upper'], alpha=.5, color='green', label='prediction ci')
axes[1].fill_between(sf['obs_X'], y1=sf['obs_ci_lower'], y2=sf['obs_ci_upper'], alpha=.1, color='green', label='observation ci')
axes[1].legend(frameon=False)

Decision boundary

import warnings

import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer, QuantileTransformer, StandardScaler, Normalizer, RobustScaler, MinMaxScaler, MaxAbsScaler

X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X4 = np.random.normal(size=1000)
y = 3*X1 + 2*X2 - 1*X3 + np.random.normal(0, 1, size=1000)

numerical_columns = ['X1', 'X2', 'X3', 'X4']
data = pd.DataFrame(np.c_[X1, X2, X3, X4, y], columns=numerical_columns + ['y'])
data['boundary'] = data['y'].apply(lambda y: 'positive' if y - data['y'].mean() > 0 else 'negative')
numerical_columns += ['y']

# decision boundary
#sns.pairplot(pd.DataFrame(np.c_[StandardScaler().fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
#sns.pairplot(pd.DataFrame(np.c_[MaxAbsScaler().fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
#sns.pairplot(pd.DataFrame(np.c_[PowerTransformer(method='yeo-johnson').fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
#sns.pairplot(pd.DataFrame(np.c_[QuantileTransformer(output_distribution='normal').fit_transform(data[numerical_columns]), data['boundary']], columns=numerical_columns + ['boundary']), hue='boundary')
with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(data[numerical_columns].corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))   # np.cov(df.T.values)

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    sns.pairplot(data[numerical_columns+['boundary']], hue='boundary')

import warnings

import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer, QuantileTransformer, StandardScaler, Normalizer, RobustScaler, MinMaxScaler, MaxAbsScaler

white_noise = np.random.normal(size=1000)
y1 = np.empty_like(white_noise)
y2 = np.empty_like(white_noise)
X1 = np.random.normal(size=1000)
X2 = np.random.normal(size=1000)
X3 = np.random.normal(size=1000)
X4 = np.random.normal(size=1000)
X = 3*X1 + 2*X2 - 1*X3
for t, noise in enumerate(white_noise):
    y1[t] = X[t] + .5*(y1[t-1] - X[t-1]) + .2*(y1[t-2]-X[t-2]) + noise
    y2[t] = .5*(y2[t-1] - X[t-1]) + .2*(y2[t-2]-X[t-2]) +  noise

y = y2
numerical_columns = ['X1', 'X2', 'X3', 'X4']
data = pd.DataFrame(np.c_[X1, X2, X3, X4, y], columns=numerical_columns + ['y'])
data['shift1_X1'] = data['X1'].shift(1).fillna(data['X1'].iloc[:1])
data['shift1_X2'] = data['X2'].shift(1).fillna(data['X2'].iloc[:1])
data['shift1_X3'] = data['X3'].shift(1).fillna(data['X3'].iloc[:1])
data['shift1_X4'] = data['X4'].shift(1).fillna(data['X4'].iloc[:1])
data['residual'] = sm.OLS(y, exog=np.c_[X1, X2, X3, X4]).fit().resid
data['shift1_r'] = data['residual'].shift(1).fillna(data['residual'].iloc[:1])
data['shift1_y'] = data['y'].shift(1).fillna(data['y'].iloc[:1])
data['boundary'] = data['y'].apply(lambda y: 'positive' if y - data['y'].mean() > 0 else 'negative')
numerical_columns += ['shift1_X1', 'shift1_X2', 'shift1_X3', 'shift1_X4', 'shift1_y', 'shift1_r', 'residual', 'y']

with pd.option_context('display.float_format', '{:0.2f}'.format):
    display(data[numerical_columns].corr().style.background_gradient().set_properties(**{'font-size': '10pt'}))   # np.cov(df.T.values)

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    sns.pairplot(data[numerical_columns+['boundary']], hue='boundary')

Classification: MLE

https://www.statsmodels.org/dev/examples/index.html#discrete-choice-models

$${\displaystyle \begin{aligned} \text{Sigmoid Function:}\quad \sigma (x) &={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{1+e^{x}}}=1-\sigma (-x) \\ \text{Softmax Function:}\quad\sigma (\mathbf {z} )_{i}&={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}} \\ \; & \quad {\text{ for }}i=1,\dotsc ,K{\text{ and }}\mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K} \\ \end{aligned} }$$

Logistic Regression

$${\displaystyle \begin{aligned} f_{p}(x)&={\frac {1}{1+e^{-(x-\mu )/s}}}, \quad \beta_{0}=-\frac{\mu}{s}, \; \beta_{1}= \frac{1}{s}\\ \; &={\frac {1}{1+e^{-(\beta_{0} + \beta_{1}x)}}}, \quad \mu=-\frac{\beta_{0}}{\beta_{1}}, \; s=\frac{1}{\beta_{1}} \\ \end{aligned} }$$

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

def logistic(X, w0=3, w1=6):
    Z = w0 + w1*X
    y = 1/(1+np.exp(-Z))
    return y.round()
    
X = np.random.normal(2, 1, size=3000)
X = X[X.argsort()].copy()
y = logistic(X)

model = sm.Logit(y, np.c_[np.ones_like(X), X])
model = sm.MNLogit(y, np.c_[np.ones_like(X), X])
result = model.fit(start_params=None, method=['newton', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg', 'basinhopping', 'minimize'][7], maxiter=1000, full_output=1, disp=0, callback=None)
result = model.fit_regularized(start_params=None, method=['l1', 'l1_cvxopt_cp'][0], maxiter='defined_by_method',  full_output=1, disp=0, callback=None, alpha=.3, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03, qc_verbose=False)
result.summary()

w0, w1 = result.params
plt.plot(X, y, lw=0, marker='o', c='black')
plt.plot(X, logistic(X, w0, w1), marker='x') #plt.plot(x, np.exp(model.fittedvalues)/(1+np.exp(model.fittedvalues)))
result.summary()

Analysis on classification task

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt

data = pd.DataFrame(np.c_[
    stats.binom.rvs(n=2, p=2/3, size=1000), # X0: categorical explainatory variable1
    stats.binom.rvs(n=2, p=2/3, size=1000), # X1: categorical explainatory variable2
    stats.norm.rvs(1,1, size=1000),         # X2: numerical explainatory variable1
    stats.norm.rvs(1,1, size=1000),         # X3: numerical explainatory variable2    
    stats.bernoulli.rvs(p=2/3, size=1000),  # Y: response variable
], columns=['X0', 'X1', 'X2', 'X3', 'y'])
categorical_columns = ['X0', 'X1']
numerical_columns = ['X2', 'X3']

centering_data = data.copy()
for column in numerical_columns:
    centering_data[column] = centering_data[column] - centering_data[column].mean()
    
standardizing_data = data.copy()
for column in numerical_columns:
    standardizing_data[column] = (standardizing_data[column] - standardizing_data[column].mean()) / standardizing_data[column].std(ddof=1)
    
# No Interaction
patsy_formula = "y ~ " + ' + '.join(numerical_columns) + ' + ' + ' + '.join(list(map(lambda column: 'C('+column+')', categorical_columns))) +  " - 1"

centering_model = sm.Logit.from_formula(patsy_formula, data=centering_data).fit()
centering_summary = pd.DataFrame(centering_model.summary().tables[1].data)
centering_summary.columns = ['feature'] + centering_summary.loc[0, 1:].tolist()
centering_summary = centering_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(centering_model, fig=plt.figure(figsize=(30,10))) # partial regresssion

standardizing_model = sm.Logit.from_formula(patsy_formula, data=standardizing_data).fit()
standardizing_summary = pd.DataFrame(standardizing_model.summary().tables[1].data)
standardizing_summary.columns = ['feature'] + standardizing_summary.loc[0, 1:].tolist()
standardizing_summary = standardizing_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(standardizing_model, fig=plt.figure(figsize=(30,10))) # partial regresssion

summary = pd.concat([centering_summary, standardizing_summary], axis=1)
summary.columns = pd.MultiIndex.from_tuples(map(lambda column: ('centering', column[1]) if column[0] < 7 else ('standardizing', column[1]), enumerate(summary.columns)))
summary

Multinomial Logistic Regression

Conditional Logistic Regression

#https://rinterested.github.io/statistics/logistic_regression.html
#https://rinterested.github.io/statistics/conditional_logistic_regression.html

Probit Regression

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

def logistic(X, w0=3, w1=6):
    Z = w0 + w1*X
    y = 1/(1+np.exp(-Z))
    return y.round()
    
X = np.random.normal(2, 1, size=3000)
X = X[X.argsort()].copy()
y = logistic(X)

model = sm.Probit(y, np.c_[np.ones_like(X), X])
result = model.fit(start_params=None, method=['newton', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg', 'basinhopping', 'minimize'][7], maxiter=1000, full_output=1, disp=0, callback=None)
result = model.fit_regularized(start_params=None, method=['l1', 'l1_cvxopt_cp'][0], maxiter='defined_by_method',  full_output=1, disp=0, callback=None, alpha=.3, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03, qc_verbose=False)
result.summary()

w0, w1 = result.params
plt.plot(X, y, lw=0, marker='o', c='black')
plt.plot(X, logistic(X, w0, w1), marker='x') #plt.plot(x, np.exp(model.fittedvalues)/(1+np.exp(model.fittedvalues)))
result.summary()

Tobit Regression

https://www.statsmodels.org/dev/examples/notebooks/generated/discrete_choice_example.html

Robust Linear Regression

import statsmodels.api as sm

sm.RLM

TABLE OF CONTENTS

Linear Mixed Effects Models
Varying-Coefficient Models
Variance Components Model
Linear Mixed Effect Model Structure
Multilevel Regression:Hierarchical and Nested Group Analysis

Generalized Linear Regression
Generalized Additive Models (GAM)

Generalized Linear Mixed Effects (GLIMMIX)
Generalized Method of Moments (GMM)
Generalized Estimating Equation (GEE)

General Linear Regression: OLS
Seemingly Unrelated Regression(SUR/SURE)

Linear Mixed Effects Models

Varying-Coefficient Models

$${\displaystyle \text{Varying-Coefficient Models} }$$ $${\displaystyle \begin{aligned} \mathbf{y} &= \mathbf{X}\underbrace{\boldsymbol{\beta}}_{\text{Uncentainty}} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma_{\varepsilon}^{2}\mathbf{I}_{n}) \\ &\downarrow \\ \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \mathbf{v}, \quad \mathbf{v}=f_{v}(\mathbf{u}) \sim \mathcal{N}( \mathbf{Z}\boldsymbol{\gamma} , \sigma^{2}_{v} \mathbf{I}), \quad \operatorname{Cov}[\mathbf{X}, \mathbf{v}] = \mathbf{0} \\ \; &= \mathbf{X}\underbrace{\boldsymbol{\beta}}_{\text{Fixed Effect}} + \mathbf{Z}\underbrace{\boldsymbol{\gamma}}_{\text{Random Effect}} + \mathbf{u}, \quad \mathbf{u} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{u} \mathbf{I}) \\ \end{aligned} }$$

REML: Mixed Effects Model

# https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from statsmodels.regression.mixed_linear_model import MixedLMParams, VCSpec, Penalty

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts and slopes
result = sm.MixedLM(endog=data['y'], exog=data[['const', 'X']], exog_re=data[['const', 'X']], exog_vc=None, groups=data['group']).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = sm.MixedLM.from_formula(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()

# MixedLM regularization
result = sm.MixedLM(endog=data['y'], exog=data[['const', 'X']], exog_re=data[['const', 'X']], exog_vc=None, groups=data['group']).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = sm.MixedLM.from_formula(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1 + X', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result.fe_params, result.random_effects

mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
    random_effect = pd.Series(random_effect.values, index=result.fe_params.index)
    mixed_params[group] = result.fe_params + random_effect
    fixed_params[group] = result.fe_params
    random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params

REML: Instrumental Random Effects Model

# https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from statsmodels.regression.mixed_linear_model import MixedLMParams, VCSpec, Penalty

np.random.seed(0)
N1 = np.random.normal(size=1000)
N2 = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N1, N2, C]

y0 = 2 * X[np.where(C==0)[0], 0] + 1 + 2.5 * X[np.where(C==0)[0], 1] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 * X[np.where(C==1)[0], 0] + 2 + 3.5 * X[np.where(C==1)[0], 1] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 2 * X[np.where(C==2)[0], 0] + 3 + 4.5 * X[np.where(C==2)[0], 1] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'Z', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts and slopes
result = sm.MixedLM(endog=data['y'], exog=data[['X']], exog_re=data[['const', 'Z']], exog_vc=None, groups=data['group']).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = sm.MixedLM.from_formula(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = smf.mixedlm(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()

# MixedLM regularization
result = sm.MixedLM(endog=data['y'], exog=data[['X']], exog_re=data[['const', 'Z']], exog_vc=None, groups=data['group']).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = sm.MixedLM.from_formula(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result = smf.mixedlm(formula='y ~ -1 + X', re_formula='~ 1 + Z', vc_formula=None, groups=data['group'], data=data).fit_regularized(start_params=None, method='l1', alpha=0.01, ceps=0.0001, ptol=1e-06, maxit=200)
result.fe_params, result.random_effects

mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
    fixed_effect = pd.Series(np.r_[np.array([0]), result.fe_params.values], index=pd.Index(random_effect.index.to_series().replace('Z', 'X')))
    fixed_params[group] = fixed_effect
    random_params[group] = random_effect
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([fixed_params, random_params], axis=1)
params

Variance Components Model

https://www.statsmodels.org/dev/examples/notebooks/generated/variance_components.html

1-Level Variance Decomposition

$${\displaystyle \text{Variance Components Models} }$$ $${\displaystyle \text{Components of variance:} \quad \operatorname {Var} [\mathbf{y}] =\operatorname {E} \left[\operatorname {Var} [\mathbf{y}\mid \mathbf{X}] \right]+\operatorname {Var} \left[ \operatorname {E} [\mathbf{y}\mid \mathbf{X}] \right]}$$ $${\displaystyle \begin{aligned} \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \underbrace{\boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma_{\varepsilon}^{2}\mathbf{I}_{n})}_{\text{Uncentainty}} & \;& \;\\ &\downarrow & \; & \; \\ \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \overbrace{\mathbf{v}}^{\text{Variance Source Term}}, &\quad \mathbf{v}=f_{v}(\mathbf{u}) \sim \mathcal{N}( \overbrace{\boldsymbol{\alpha}^{*}}^{\text{Population-Randomness}} , \sigma^{2}_{v} \mathbf{I}), \quad &\operatorname{Cov}[\mathbf{X}, \mathbf{v}] = \mathbf{0} \\ \; &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\alpha}^{*} + \mathbf{u}, &\quad \boldsymbol{\alpha}^{*} \sim \mathcal{N}( \boldsymbol{\mu} , \sigma^{2}_{\alpha} \mathbf{I}), \quad \mathbf{u} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{u} \mathbf{I}), \quad &\operatorname{Cov}[\boldsymbol{\alpha}^{*}, \mathbf{u}] = \mathbf{0} \\ \; &= \mathbf{X}\boldsymbol{\beta} + \underbrace{\boldsymbol{\mu} + \boldsymbol{\alpha}}_{=\boldsymbol{\mu}_{\alpha}} + \mathbf{u}, &\quad \boldsymbol{\alpha} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{\alpha}\mathbf{I}), \quad \mathbf{u} \sim \mathcal{N}( \mathbf{0} , \sigma^{2}_{u} \mathbf{I}), \quad &\operatorname{Cov}[\boldsymbol{\alpha}, \mathbf{u}] = \mathbf{0} \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= \boldsymbol{\mu} \\ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \boldsymbol{\mu} + \boldsymbol{\alpha} \\ \operatorname{Cov} \left[ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{\alpha} \mathbf{I} \\ \\ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= ( \sigma^{2}_{\alpha} + \sigma^{2}_{u}) \mathbf{I} \\ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \sigma^{2}_{u}\mathbf{I} \\ \operatorname{E} \left[ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{u}\mathbf{I} \\ \end{aligned} }$$
Scoring Group $${\displaystyle \begin{aligned} M(G) & : \text{a score of the group}, \; G \in \{1, 2, \dots, N_{G} \} \\ N=\sum_{G=1}^{N_{G}} n(G) & : \text{total number of instances} \\ N_{G} & : \text{number of groups} \\ n(G) & : \text{number of instances in the group} \\ \overline{M}_{\text{micro}} &= \frac{1}{1} \sum_{G=1}^{N_{G}=1} M(G) = M(1) \\ \overline{M}_{\text{macro}} &= \frac{1}{N_{G}} \sum_{G=1}^{N_{G}} M(G) \\ \overline{M}_{\text{weighted}} &= \sum_{G=1}^{N_{G}} \frac{n(G)}{N} M(G) \\ \end{aligned} }$$

import numpy as np
import pandas as pd

# Components of variance
data = pd.DataFrame(np.c_[np.kron(np.arange(5), np.ones(200)), np.random.normal(0, .1, size=1000)], columns=['X', 'Y']).set_index('X').sort_index()
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(y.index[0], 1, size=y.shape[0])) # group mean effect
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(0, y.index[0], size=y.shape[0])) # group variance effect

data.groupby('X')['Y'].mean().mean() # expected value: population mean >>>>> data['M']
data.groupby('X')['Y'].mean().var(ddof=0) # expected value: population variance / number of sampling by group >>>>> data['V'] / 200
data.groupby('X')['Y'].var(ddof=1).mean() # expected value: population variance >>>>> data['V']

data['M'] = data['Y'].mean()
data['m'] = data.groupby('X')['Y'].mean()

data['V'] = data['Y'].var(ddof=0)
data['v'] = data.groupby('X')['Y'].var(ddof=1)

data['treatment effect'] = data['m'] - data['M']
data['variance effect'] = data['V'] - data['v']

aggregation = data[['M', 'm', 'treatment effect', 'V', 'v', 'variance effect']].drop_duplicates()
total_variance = aggregation['m'].var(ddof=0) + aggregation['v'].mean()
total_variance, data['Y'].var(ddof=0)

import numpy as np
import pandas as pd

# Components of variance
data = pd.DataFrame(np.c_[np.random.randint(0, 5, size=1000), np.random.normal(0, .1, size=1000)], columns=['X', 'Y']).set_index('X').sort_index()
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(y.index[0], 1, size=y.shape[0])) # group mean effect
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(0, y.index[0], size=y.shape[0])) # group variance effect

data['M'] = data['Y'].mean()
data['V'] = data['Y'].var(ddof=0)
data['m'] = data.groupby('X')['Y'].mean()
data['v'] = data.groupby('X')['Y'].var(ddof=1)
data['treatment effect'] = data['m'] - data['M']
data['variance effect'] = data['V'] - data['v']

aggregation = data[['M', 'm', 'treatment effect', 'V', 'v', 'variance effect']].drop_duplicates()
total_variance = aggregation['m'].var(ddof=0) + aggregation['v'].mean()
total_variance, data['Y'].var(ddof=0)

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf

# Components of variance
data = pd.DataFrame(np.c_[np.random.randint(0, 5, size=1000), np.random.normal(0, .1, size=1000)], columns=['X', 'Y']).set_index('X').sort_index()
#data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(y.index[0], 1, size=y.shape[0])) # group mean effect
data['Y'] = data.groupby('X', group_keys=False)['Y'].apply(lambda y: y + np.random.normal(0, y.index[0], size=y.shape[0])) # group variance effect

result = smf.ols(formula='Y ~ 0 + C(X)', data=data.reset_index()).fit()
display(result.summary())
display(sms.anova_lm(result, typ=2))

mc = sms.multicomp.MultiComparison(data['Y'], data.index)
posthoc_results = mc.tukeyhsd() # TUKEY HONESTLY SIGNIFICANT DIFFERENCE (HSD)
display(posthoc_results.summary())

REML: 1-Level Group Effect

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

n_group1 = 20
data = pd.DataFrame(data=0, columns=['e'], index=pd.Index(np.arange(n_group1), name='G1'))
G1_effect = data.groupby(['G1'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 2)).rename(columns={'e':'G1_Effect'})
data = pd.concat([data, G1_effect], axis=1)
data = pd.concat([data]*10, axis=0)
data['y'] = data['G1_Effect'] + np.random.normal(0, 1, size=data.shape[0])

result = sm.MixedLM(endog=data['y'], exog=np.ones_like(data['y']), groups=data.index).fit()
result = smf.mixedlm(
	"y ~ 1", 
    re_formula="1", 
    vc_formula=None,
    groups="G1",
    data=data.reset_index(),
).fit() 

print(data['y'].var(ddof=1))
print(data.groupby(['G1'])['y'].mean().var(ddof=1))
print(data.groupby(['G1'])['y'].var(ddof=1).mean())
result.summary()

LS/MLE: One-way ANCOVA

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

# https://patsy.readthedocs.io/en/latest/formulas.html
result = sm.OLS.from_formula(formula='y ~ 0 + group + group:X', data=data, missing='none', hasconst=None).fit() 
result = sm.WLS.from_formula(formula='y ~ 0 + group + group:X', data=data, weights=1.0, missing='none', hasconst=None).fit() 
result = sm.GLS.from_formula(formula='y ~ 0 + group + group:X', data=data, sigma=1.0, missing='none', hasconst=None).fit() 
result = sm.GLM.from_formula(formula='y ~ 0 + group + group:X', data=data, family=sm.families.Gaussian(sm.families.links.Identity())).fit(); result.ssr = result.resid_response.apply(lambda residual: residual**2).sum() 
result = smf.ols(formula='y ~ 0 + group + group:X', data=data, missing='none', hasconst=None).fit() 
result = smf.wls(formula='y ~ 0 + group + group:X', data=data, weights=1.0, missing='none', hasconst=None).fit() 
result = smf.gls(formula='y ~ 0 + group + group:X', data=data, sigma=1.0, missing='none', hasconst=None).fit() 
result = smf.glm(formula='y ~ 0 + group + group:X', data=data, family=sm.families.Gaussian(sm.families.links.Identity())).fit(); result.ssr = result.resid_response.apply(lambda residual: residual**2).sum() 

display(result.summary())
display(sms.anova_lm(result, typ=2))

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

def individual_regression(df):
    result = sm.OLS(endog=df['y'], exog=df[['const', 'X']], missing='none', hasconst=None).fit()
    result = sm.WLS(endog=df['y'], exog=df[['const', 'X']], weights=1.0, missing='none', hasconst=None).fit()
    result = sm.GLS(endog=df['y'], exog=df[['const', 'X']], sigma=1.0, missing='none', hasconst=None).fit()
    result = sm.GLM(endog=df['y'], exog=df[['const', 'X']], family=sm.families.Gaussian(sm.families.links.Identity())).fit()
    return result.params
    
data.groupby(['group']).apply(individual_regression)

2-Level Variance Decomposition

$${\displaystyle \begin{cases} \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= \boldsymbol{\mu} \\ \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \boldsymbol{\mu} + \boldsymbol{\alpha}_{1} + \boldsymbol{\alpha}_{2} \\ \; \operatorname{Cov} \left[ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= (\sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}} + \sigma^{2}_{\alpha_{1} \times \alpha_{2}}) \mathbf{I} \\ \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= ( \sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}} + \sigma^{2}_{\alpha_{1} \times \alpha_{2}} + \sigma^{2}_{u}) \mathbf{I} \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \sigma^{2}_{u}\mathbf{I} \\ \; \operatorname{E} \left[ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{u}\mathbf{I} \\ \end{cases} }$$ $${\displaystyle \operatorname{Cov}[\boldsymbol{\alpha}_{1}, \boldsymbol{\alpha}_{2}] = \mathbf{0} \; \begin{cases} \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= \boldsymbol{\mu} \\ \; \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \boldsymbol{\mu} + \boldsymbol{\alpha}_{1} + \boldsymbol{\alpha}_{2} \\ \; \operatorname{Cov} \left[ \operatorname{E}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= (\sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}}) \mathbf{I} \\ \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta}] &= ( \sigma^{2}_{\alpha_{1}} + \sigma^{2}_{\alpha_{2}} + \sigma^{2}_{u}) \mathbf{I} \\ \; \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] &= \sigma^{2}_{u}\mathbf{I} \\ \; \operatorname{E} \left[ \operatorname{Cov}[\mathbf{y} - \mathbf{X}\boldsymbol{\beta} \mid \mathbf{X}] \right] &= \sigma^{2}_{u}\mathbf{I} \\ \end{cases} }$$

REML: 2-Level Nested Group Effect

: a hierarchical or nested structure where individuals are grouped within larger categories

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

n_group1 = 20; n_group2 = 100
data = pd.DataFrame(data=0, columns=['e'], index=pd.MultiIndex.from_product([np.arange(n_group1), np.arange(n_group2)], names=['G1', 'G2']))
G1_effect = data.groupby(['G1'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 2)).rename(columns={'e':'G1_Effect'})
G1G2_effect = data.groupby(['G1', 'G2'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 3)).rename(columns={'e':'G1G2_Effect'})
data = pd.concat([data, G1_effect, G1G2_effect], axis=1)
data = pd.concat([data]*10, axis=0)
data['y'] = data['G1_Effect'] + data['G1G2_Effect'] + np.random.normal(0, 1, size=data.shape[0])

result = smf.mixedlm(
	"y ~ 1", 
    re_formula="1", 
    vc_formula={"G2": "0 + C(G2)"},
    groups="G1",
    data=data.reset_index(),
).fit() # nested effect

print(data['y'].var())
print(data.groupby('G1')['y'].mean().var())
print(data.groupby('G2')['y'].mean().var()) # ~ 0 
print(data.groupby(['G1', 'G2'])['y'].mean().var() - data.groupby('G1')['y'].mean().var())
result.summary()

REML: 2-Level Crossed Group Effect

: individuals belonging to multiple independent groups simultaneously

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

n_group1 = 20; n_group2 = 100
group = pd.DataFrame(data=0, columns=['e'], index=pd.MultiIndex.from_product([np.arange(n_group1), np.arange(n_group2)], names=['G1', 'G2']))
G1_effect = group.groupby(['G1'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 2)).rename(columns={'e':'G1_Effect'})
G2_effect = group.groupby(['G2'], group_keys=False)[['e']].apply(lambda e: e + np.random.normal(0, 3)).rename(columns={'e':'G2_Effect'})
group = pd.concat([group, G1_effect, G2_effect], axis=1)
group = pd.concat([group]*10, axis=0)
group['y'] = group['G1_Effect'] + group['G2_Effect'] + np.random.normal(0, 1, size=group.shape[0])

result = smf.mixedlm(
	"y ~ 1", 
    re_formula=None, 
    vc_formula={"G1": "0 + C(G1)", "G2": "0 + C(G2)"},
    groups=np.ones_like(group['y']),
    data=group.reset_index(),
).fit() # crossed effect

print(group['y'].var(ddof=1))
print(group.groupby(['G1'])['y'].mean().var(ddof=1), group.groupby(['G1'])['y'].var(ddof=1).mean())
print(group.groupby(['G2'])['y'].mean().var(ddof=1), group.groupby(['G2'])['y'].var(ddof=1).mean())
print(group.groupby(['G1', 'G2'])['y'].mean().var(ddof=1), group.groupby(['G1', 'G2'])['y'].var(ddof=1).mean())
result.summary()

LS/MLE: Two-way ANOVA

LS/MLE: Two-way ANCOVA

Linear Mixed Effect Model Structure

REML: Varying-coefficient model for intercepts

Intercept: Mixed Effect (Fixed Effect + Random Effect)
Slope: Fixed Effect

$${\displaystyle \begin{aligned} y_{i} &= \alpha_{i}^{*} + \mathbf{x}_{i}^{T}\boldsymbol{\beta} +u_{i} \\ \; &= \alpha_{fe,i} + \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{fe} + \left( \alpha_{re,i} \right) +u_{i} \\ \end{aligned} }$$

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 1 + 3 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 3 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts
result = smf.mixedlm(formula='y ~ 1 + X', re_formula=None, groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1', groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()

mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
    random_effect = pd.Series(np.r_[random_effect.values, np.array([0])], index=result.fe_params.index)
    mixed_params[group] = result.fe_params + random_effect
    fixed_params[group] = result.fe_params
    random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params

REML: Varying-coefficient model for slopes

Intercept: Fixed Effect
Slope: Mixed Effect (Fixed Effect + Random Effect)

$${\displaystyle \begin{aligned} y_{i} &= \alpha + \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{i}^{*} +u_{i} \\ \; &= \alpha_{fe} + \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{fe,i} + \left( \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{re,i} \right) +u_{i} \\ \end{aligned} }$$

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 3 + 1 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 3 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

# REML: restricted maximum likelihood
# Varying-coefficient model for slopes
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ - 1 + X', groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()

mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
    random_effect = pd.Series(np.r_[np.array([0]), random_effect.values], index=result.fe_params.index)
    mixed_params[group] = result.fe_params + random_effect
    fixed_params[group] = result.fe_params
    random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params

REML: Varying-coefficient model for intercepts and slopes

Intercept: Mixed Effect (Fixed Effect + Random Effect)
Slope: Mixed Effect (Fixed Effect + Random Effect)

$${\displaystyle \begin{aligned} y_{i} &= \alpha_{i}^{*} + \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{i}^{*} +u_{i} \\ \; &= \alpha_{fe,i} + \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{fe,i} + \left( \alpha_{re,i} + \mathbf{x}_{i}^{T}\boldsymbol{\beta}_{re,i} \right) +u_{i} \\ \end{aligned} }$$

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 1 + 2 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 5 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

# REML: restricted maximum likelihood
# Varying-coefficient model for intercepts and slopes
result = smf.mixedlm(formula='y ~ 1 + X', re_formula='~ 1 + X', groups=data['group'], data=data).fit(method=['nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'basinhopping', 'minimize'][1], start_params=None, reml=True, niter_sa=0, do_cg=True, fe_pen=None, cov_pen=None, free=None, full_output=False)
result.fe_params, result.random_effects
result.summary()

mixed_params = dict()
fixed_params = dict()
random_params = dict()
for group, random_effect in result.random_effects.items():
    random_effect = pd.Series(random_effect.values, index=result.fe_params.index)
    mixed_params[group] = result.fe_params + random_effect
    fixed_params[group] = result.fe_params
    random_params[group] = random_effect
mixed_params = pd.DataFrame(mixed_params).T; mixed_params.columns = pd.MultiIndex.from_product([['Mixed'], mixed_params.columns.tolist()])
fixed_params = pd.DataFrame(fixed_params).T; fixed_params.columns = pd.MultiIndex.from_product([['Fixed'], fixed_params.columns.tolist()])
random_params = pd.DataFrame(random_params).T; random_params.columns = pd.MultiIndex.from_product([['Random'], random_params.columns.tolist()])
params = pd.concat([mixed_params, fixed_params, random_params], axis=1)
params

LS/MLE: Variance components model with covariate factors

Intercept: Fixed Effect
Slope: Fixed Effect

$${\displaystyle \begin{aligned} y_{i} &= \mathbf{x}_{i}^{T}\boldsymbol{\beta} + v_{i} \\ \; &= \mathbf{x}_{i}^{T}\boldsymbol{\beta} + \alpha_{i}^{*} + u_{i} \\ \; &= \mathbf{x}_{i}^{T}\boldsymbol{\beta} + \mu + \alpha_{i} + u_{i} \\ \end{aligned} }$$

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms

np.random.seed(0)
N = np.random.normal(size=1000)
C = np.random.randint(3, size=1000)
X = np.c_[N, C]

y0 = 1 + 3 * X[np.where(C==0)[0], 0] + np.random.normal(0, 1, size=np.where(C==0)[0].shape[0])
y1 = 2 + 3 * X[np.where(C==1)[0], 0] + np.random.normal(0, 1, size=np.where(C==1)[0].shape[0])
y2 = 3 + 3 * X[np.where(C==2)[0], 0] + np.random.normal(0, 1, size=np.where(C==2)[0].shape[0])
y = np.r_[y0, y1, y2]
X = np.r_[X[np.where(C==0)[0]], X[np.where(C==1)[0]], X[np.where(C==2)[0]]]
data = pd.DataFrame(np.c_[y, X], columns=['y', 'X', 'group'])
data['group'] = data['group'].astype('category')
data['const'] = np.ones_like(data['y'])

result = smf.ols(formula='y ~ 0 + group + X', data=data, missing='none', hasconst=None).fit()
result = smf.wls(formula='y ~ 0 + group + X', data=data, weights=1.0, missing='none', hasconst=None).fit()
result = smf.gls(formula='y ~ 0 + group + X', data=data, sigma=1.0, missing='none', hasconst=None).fit()
result = smf.glm(formula='y ~ 0 + group + X', data=data, family=sm.families.Gaussian(sm.families.links.Identity())).fit(); result.ssr = result.resid_response.apply(lambda residual: residual**2).sum()
sms.anova_lm(result, typ=2)

Multilevel Regression:Hierarchical and Nested Group Analysis

Generalized Linear Regression

Analysis on regression task

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt

data = pd.DataFrame(np.c_[
    stats.binom.rvs(n=2, p=2/3, size=1000), # X0: categorical explainatory variable1
    stats.binom.rvs(n=2, p=2/3, size=1000), # X1: categorical explainatory variable2
    stats.norm.rvs(1,1, size=1000),         # X2: numerical explainatory variable1
    stats.norm.rvs(1,1, size=1000),         # X3: numerical explainatory variable2    
    stats.norm.rvs(1,1, size=1000),         # Y: response variable
], columns=['X0', 'X1', 'X2', 'X3', 'y'])
categorical_columns = ['X0', 'X1']
numerical_columns = ['X2', 'X3']

centering_data = data.copy()
for column in numerical_columns:
    centering_data[column] = centering_data[column] - centering_data[column].mean()
    
standardizing_data = data.copy()
for column in numerical_columns:
    standardizing_data[column] = (standardizing_data[column] - standardizing_data[column].mean()) / standardizing_data[column].std(ddof=1)
    
    
# No Interaction
patsy_formula = "y ~ " + ' + '.join(numerical_columns) + ' + ' + ' + '.join(list(map(lambda column: 'C('+column+')', categorical_columns))) +  " - 1"

centering_model = sm.GLM.from_formula(patsy_formula, data=centering_data, family=sm.families.Gaussian(sm.families.links.Log())).fit()
centering_summary = pd.DataFrame(centering_model.summary().tables[1].data)
centering_summary.columns = ['feature'] + centering_summary.loc[0, 1:].tolist()
centering_summary = centering_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(centering_model, fig=plt.figure(figsize=(30,10))) # partial regresssion

standardizing_model = sm.GLM.from_formula(patsy_formula, data=standardizing_data, family=sm.families.Gaussian(sm.families.links.Log())).fit()
standardizing_summary = pd.DataFrame(standardizing_model.summary().tables[1].data)
standardizing_summary.columns = ['feature'] + standardizing_summary.loc[0, 1:].tolist()
standardizing_summary = standardizing_summary.drop(0, axis=0)
sm.graphics.plot_partregress_grid(standardizing_model, fig=plt.figure(figsize=(30,10))) # partial regresssion

summary = pd.concat([centering_summary, standardizing_summary], axis=1)
summary.columns = pd.MultiIndex.from_tuples(map(lambda column: ('centering', column[1]) if column[0] < 7 else ('standardizing', column[1]), enumerate(summary.columns)))
summary

Generalized Additive Models (GAM)

https://www.statsmodels.org/dev/gam.html

import statsmodels.api as sm

sm.GLMGam

Generalized Linear Mixed Effects (GLIMMIX)

https://www.statsmodels.org/dev/mixed_glm.html

import statsmodels.api as sm

sm.BinomialBayesMixedGLM
sm.PoissonBayesMixedGLM

Generalized Method of Moments (GMM)

from statsmodels.sandbox.regression import gmm

gmm.GMM#(endog, exog, instrument, k_moms=None, k_params=None, missing='none')
gmm.IVGMM#(endog, exog, instrument, k_moms=None, k_params=None, missing='none')
gmm.IV2SLS#(endog, exog, instrument=None)
gmm.DistQuantilesGMM#(endog, exog, instrument)
gmm.LikelihoodModel#(endog, exog=None)
gmm.LinearIVGMM#(endog, exog, instrument, k_moms=None, k_params=None, missing='none')
gmm.NonlinearIVGMM#(endog, exog, instrument, func)

Two-Stage Least Squares

$${\displaystyle \begin{aligned} \mathbf{y} &= \underbrace{\mathbf{X}}_{\text{Uncentainty}}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{N}(0, \sigma_{\varepsilon}^{2}\mathbf{I}_{n}) \\ &\downarrow \quad \mathbf{X} = [X_{np}]_{n \ge 1, p \ge 0}, \quad \mathbf{Z} = [Z_{nm}]_{m \le p-1} \\ \text{IV2SLS Equation}& \quad \begin{cases} \quad \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \operatorname{Cov}[\mathbf{X}, \boldsymbol{\varepsilon}] \neq \mathbf{0} \\ \quad \mathbf{X} &= \mathbf{Z}\boldsymbol{\delta} + \mathbf{v}, \quad \operatorname{Cov}[\mathbf{Z}, \boldsymbol{\varepsilon}] = \mathbf{0} \\ \end{cases} \\ \end{aligned} }$$ $${\displaystyle \begin{aligned} \text{Stage 1)}\quad &{\hat {\boldsymbol{\delta} }}=(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{X} \\ & \rightarrow \quad {\mathbf{\widehat {X}}}=\mathbf{Z}{\hat {\boldsymbol{\delta} }}={\color {Red}\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }}\mathbf{X} \equiv {\color {Red}\mathbf{P}_{\mathbf{Z}}}\mathbf{X}.\, \\ & \quad \quad \quad {\color {Red}\mathbf{P}_{\mathbf{Z}}} = \mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} } : \; \text{symmetric and idempotent matrix} \\ & \quad \quad \quad \quad \rightarrow \; \mathbf{P}_{\mathbf{Z}}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}=\mathbf{P}_{\mathbf{Z}}\mathbf{P}_{\mathbf{Z}}=\mathbf{P}_{\mathbf{Z}} \\ \text{Stage 2)}\quad & \boldsymbol{\hat{\beta}} _{\text{2SLS}}=({\widehat {\mathbf{X}}}^{\mathrm {T} }{\widehat {\mathbf{X}}})^{-1}{\widehat {\mathbf{X}}}^{\mathrm {T} }\mathbf{y} \quad \rightarrow \quad \mathbf{y}={\widehat {\mathbf{X}}}\tilde{\boldsymbol{\beta}} + \boldsymbol{\varepsilon}^{\prime} \\ &=\left(\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}^{\mathrm {T} }\mathbf{y} \\ &=\left(\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{P}_{\mathbf{Z}}\mathbf{y} \\ &= (\mathbf{X}^{\mathrm {T} }\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y}\\ &=(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})(\mathbf{X}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{X}^{\mathrm {T} }\mathbf{Z}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y} \\ &=(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})(\mathbf{Z}^{\mathrm {T} }\mathbf{Z})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y}\\ &=(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{y} \\ &= (\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} } (\mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} )\\ &= (\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\mathbf{X}\boldsymbol{\beta} +(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\boldsymbol{\varepsilon} \\ &= \boldsymbol{\beta} + \underbrace{(\mathbf{Z}^{\mathrm {T} }\mathbf{X})^{-1}\mathbf{Z}^{\mathrm {T} }\boldsymbol{\varepsilon}}_{\text{orthogonality}(\mathbf{Z}^{\mathrm {T} }\boldsymbol{\varepsilon}=\mathbf{0}) \rightarrow \mathbf{0} } \\ & \therefore \quad \operatorname{E}[\boldsymbol{\hat{\beta}} _{\text{2SLS}}] = \boldsymbol{\tilde{\beta}} = \boldsymbol{\beta} \\ \end{aligned} }$$

import numpy as np
import pandas as pd
import statsmodels.api as sm

Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X = 3*Z + 1 + epsilon + np.random.normal(size=1000)
y = 5*X + 4 + epsilon
true_beta0 = 4.0000
true_beta1 = 5.0000
true_beta_corr_Xe = np.corrcoef(X, epsilon)[0,1].round(4)
print('%25s'%'True Orthogonality:', np.corrcoef(X, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
print('%25s'%'True Beta Coefficients:', true_beta1, true_beta0)


# OLS
ols_result = sm.OLS(endog=y, exog=np.c_[np.ones_like(X), X]).fit()
estimated_beta0 = ols_result.params[0].round(4)
estimated_beta1 = ols_result.params[1].round(4)
estimated_corr_Ze = np.corrcoef(Z, ols_result.resid)[0,1].round(4)
estimated_corr_Xe = np.corrcoef(ols_result.fittedvalues, ols_result.resid)[0,1].round(4)
print('%50s'%'OLS Correlation[instrumental vs residual]: ', estimated_corr_Ze)
print('%50s'%'OLS Correlation[esitmate vs residual]: ', estimated_corr_Xe)
print('%50s'%'OLS Beta: ', estimated_beta1, estimated_beta0)

# 2SLS
first_stage_ols_result = sm.OLS(endog=X, exog=np.c_[np.ones_like(y), Z]).fit()
X_hat = first_stage_ols_result.fittedvalues
last_stage_ols_result = sm.OLS(endog=y, exog=np.c_[np.ones_like(y), X_hat]).fit()
estimated_beta0 = last_stage_ols_result.params[0].round(4)
estimated_beta1 = last_stage_ols_result.params[1].round(4)
estimated_corr_Zv = np.corrcoef(Z, last_stage_ols_result.resid)[0,1].round(4)
estimated_corr_Xv = np.corrcoef(last_stage_ols_result.fittedvalues, last_stage_ols_result.resid)[0,1].round(4)
print('%50s'%'IV2SLS Correlation[instrumental vs residual]: ', estimated_corr_Zv)
print('%50s'%'IV2SLS Correlation[estimate vs residual]: ', estimated_corr_Xv)
print('%50s'%'IV2SLS Beta: ', estimated_beta1, estimated_beta0)

import numpy as np
import pandas as pd
from statsmodels.sandbox.regression import gmm

Z = np.random.normal(0, 1, size=1000)
epsilon = np.random.normal(0, 1, size=1000)
epsilon = epsilon - ( np.cov(epsilon, Z)[0,1]/ np.cov(Z, Z)[0,1] ) * Z
X = 3*Z + 1 + epsilon + np.random.normal(size=1000)
y = 5*X + 4 + epsilon

print('%25s'%'True Orthogonality:', np.corrcoef(X, epsilon)[0,1].round(4), np.corrcoef(Z, epsilon)[0,1].round(4))
result = gmm.IV2SLS(y, exog=np.c_[np.ones_like(y), X], instrument=np.c_[np.ones_like(y), Z]).fit()
result.summary()

# https://www.datascienceconcepts.com/tutorials/python-programming-language/exogeneity-wu-hausman-and-sargan-tests-in-python/
import statsmodels.api as sm
import statsmodels.formula.api as smf
import linearmodels.iv.model as lm

houseprices = sm.datasets.get_rdataset(dataname="HousePrices", package="AER", cache=True).data
mlr1 = smf.ols(formula="price ~ lotsize + bedrooms", data=houseprices).fit()

mdatac = sm.add_constant(data=houseprices, prepend=False)
mlr2 = lm.IV2SLS(dependent=mdatac["price"], exog=mdatac[["const", "bedrooms"]], endog=mdatac["lotsize"], instruments=mdatac[["driveway", "garage"]]).fit(cov_type="homoskedastic", debiased=True)
mlr2.wu_hausman() # Wu-Hausman test of exogeneity
mlr2.wooldridge_regression # Wooldridge's regression test of exogeneity(Wu-Hausman test results)
mlr2.sargan # Sargan's test

Generalized Estimating Equation (GEE)

Assumption: Covariance Structure

import statsmodels.api as sm

sm.GEE
sm.NominalGEE
sm.OrdinalGEE

sm.cov_struct.Exchangeable
sm.cov_struct.GlobalOddsRatio
sm.cov_struct.Nested
sm.cov_struct.Autoregressive
sm.cov_struct.Stationary
sm.cov_struct.Independence
sm.cov_struct.Equivalence
sm.cov_struct.Unstructured
sm.cov_struct.OrdinalIndependence
sm.cov_struct.NominalIndependence
sm.cov_struct.CovStruct
sm.cov_struct.CategoricalCovStruct

sm.cov_struct.Appender
sm.cov_struct.OutputWarning
sm.cov_struct.NotImplementedWarning
sm.cov_struct.ConvergenceWarning

General Linear Regression: OLS

	General Linear Model	Generalized Linear Model
Typical Estimation	Least Squares	Maximum Likelihood, Bayesian
Examples	ANOVA, ANCOVA	Logistic Regression, Possion Regression
Extensions	MANOVA, MANCOVA	GLMM, GEE

$${\displaystyle \begin{aligned} Y_{ij} &=\beta _{0j}+\beta _{1j}X_{i1}+\beta _{2j}X_{i2}+\ldots +\beta _{pj}X_{ip}+\epsilon _{ij} \\ \; &=\beta _{0j}+\sum \limits _{k=1}^{p}{\beta _{kj}X_{ik}}+\epsilon _{ij} \\ \end{aligned} }$$

statsmodels

# https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_custom_models.html
# https://www.statsmodels.org/stable/statespace.html#custom-state-space-models

sklearn

import numpy as np
from sklearn.linear_model import LinearRegression

X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)

X = np.c_[X1, X2]
Y = np.c_[
    1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
    1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
] 

model = LinearRegression(fit_intercept=True)
model.fit(X, Y)

print(model.coef_)
print(model.intercept_)

pytorch

import torch
import torch.nn as nn
from torch import optim

X1 = torch.normal(0,1,size=(1000,1))
X2 = torch.normal(0,1,size=(1000,1))

X = torch.cat([X1, X2], dim=1)
Y = torch.cat([
    1 + 2 * X1 + 3 * X2 + torch.normal(0,1,size=(1000,1)),
    1 + 4 * X1 - 2 * X2 + torch.normal(0,1,size=(1000,1))
], dim=1)

model = nn.Linear(2,2)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

epochs = 1000
for epoch in range(epochs):
    hypothesis = model(X)
    cost = criterion(hypothesis, Y)

    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

print(model.weight.detach().numpy())
print(model.bias.detach().numpy())

tensorflow

import tensorflow as tf
from tensorflow.keras import layers, losses, optimizers

X1 = tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32)
X2 = tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32)

X = tf.concat([X1, X2], axis=1)
Y = tf.concat([
    1 + 2 * X1 + 3 * X2 + tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32),
    1 + 4 * X1 - 2 * X2 + tf.random.normal(shape=(1000, 1), mean=0.0, stddev=1.0, dtype=tf.float32)
], axis=1)

model = layers.Dense(2, activation='linear')
criterion = losses.MeanSquaredError()
optimizer = optimizers.SGD(0.1)

epochs = 1000
for epoch in range(epochs):
    with tf.GradientTape() as tape:
        hypothesis = model(X, training=True)
        cost = criterion(Y, hypothesis)
    gradients = tape.gradient(cost, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

print(model.weights[0].numpy().T)
print(model.weights[1].numpy())

Seemingly Unrelated Regression(SUR/SURE)

from collections import OrderedDict
import numpy as np
import pandas as pd
from linearmodels.system import SUR

X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)

X = np.c_[X1, X2]
Y = np.c_[
    1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
    1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
] 
data = pd.DataFrame(data=np.c_[np.ones(X.shape[0]), X, Y], columns=['const', 'X0', 'X1', 'Y0', 'Y1'])

equations = OrderedDict()
equations['Y0'] = {'dependent': data['Y0'],
                         'exog': data[['const', 'X0', 'X1']]}
equations['Y1'] = {'dependent': data['Y1'],
                         'exog': data[['const', 'X0', 'X1']]}

# [cov_type]: [1]'unadjusted':'homoskedastic', [2]'rubost':'heteroskedastic', [3]'kernel':'heteroskedastic' + 'autocorrelation', [4]'clustered': 2-way clustering of errors
model = SUR(equations)
model.fit(iterate=True, method=['ols', 'gls', None][1], cov_type='unadjusted')

import numpy as np
import pandas as pd
from linearmodels.system import SUR

X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)

X = np.c_[X1, X2]
Y = np.c_[
    1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
    1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
] 
data = pd.DataFrame(data=np.c_[np.ones(X.shape[0]), X, Y], columns=['const', 'X0', 'X1', 'Y0', 'Y1'])

equations = dict()
equations['Y0'] = 'Y0 ~ 1 + X0 + X1'
equations['Y1'] = 'Y1 ~ 1 + X0 + X1'

# [cov_type]: [1]'unadjusted':'homoskedastic', [2]'rubost':'heteroskedastic', [3]'kernel':'heteroskedastic' + 'autocorrelation', [4]'clustered': 2-way clustering of errors
model = SUR.from_formula(equations, data)
model.fit(iterate=True, method=['ols', 'gls', None][1], cov_type='unadjusted')

import numpy as np
import pandas as pd
from linearmodels.system import SUR

X1 = np.random.normal(size=3000)
X2 = np.random.normal(size=3000)

X = np.c_[X1, X2]
Y = np.c_[
    1 + 2 * X1 + 3 * X2 + np.random.normal(size=3000),
    1 + 4 * X1 - 2 * X2 + np.random.normal(size=3000)
] 
data = pd.DataFrame(data=np.c_[np.ones(X.shape[0]), X, Y], columns=['const', 'X0', 'X1', 'Y0', 'Y1'])

# [cov_type]: [1]'unadjusted':'homoskedastic', [2]'rubost':'heteroskedastic', [3]'kernel':'heteroskedastic' + 'autocorrelation', [4]'clustered': 2-way clustering of errors
# [kernel]: [1]"bartlett":Newey-West, [2]"parzen":Gallant, [3]"qs":Quadratic Spectral, Andrews
model = SUR.multivariate_ls(data[['Y0', 'Y1']], data[['const', 'X0', 'X1']])
model.fit(iterate=True, method=['ols', 'gls', None][1], cov_type='kernel', kernel='parzen')

Reference(1)

statsmodels api for modeling and testing

import numpy as np
from scipy import stats
import statsmodels
import statsmodels.api as sm
import statsmodels.tsa.api as smt
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
import statsmodels.graphics.api as smg

# stats
X = np.random.normal(0, 3, size=3000)
#X = np.c_[np.ones_like(X), np.arange(X.shape[0]), X]
stats.skew(X, axis=0, keepdims=False)
stats.kurtosis(X, axis=0, keepdims=False, fisher=True) # fisher=True: gaussian-based kurtosis is 0, fisher=False: gaussian-based kurtosis is 3
stats.moment(X, moment=2, axis=0, keepdims=False)


# sm: statsmodels.api
# https://www.statsmodels.org/dev/examples/index.html#linear-regression-models
sm.PCA#(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd', missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100, svd_full_matrices=False)
sm.Factor#(endog=None, n_factor=1, corr=None, method='pa', smc=True, endog_names=None, nobs=None, missing='drop')
sm.Logit#(endog, exog, offset=None, check_rank=True)
sm.MNLogit#(endog, exog, check_rank=True)
sm.Probit#(endog, exog, offset=None, check_rank=True)
sm.ConditionalLogit#(endog, exog, groups, missing='none')
sm.ConditionalMNLogit#(endog, exog, groups, missing='none')
sm.OLS#(endog, exog=None, missing='none', hasconst=None)
sm.WLS#(endog, exog, weights=1.0, missing='none', hasconst=None) # core argument: weights
sm.GLS#(endog, exog, sigma=None, missing='none', hasconst=None) # core argument: sigma
sm.GLSAR#(endog, exog=None, rho=1, missing='none', hasconst=None) # core argument: rho
sm.GLM#(endog, exog, family=None, offset=None, exposure=None, freq_weights=None, var_weights=None, missing='none') # core argument: family, freq_weights, var_weights
sm.GLMGam#(endog, exog=None, smoother=None, alpha=0, family=None, offset=None, exposure=None, missing='none') # core argument: family
sm.RLM#(endog, exog, M=None, missing='none')
sm.GEE#(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, exposure=None, dep_data=None, constraint=None, update_dep=True, weights=None) # core argument: cov_struct
sm.NominalGEE#(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, dep_data=None, constraint=None) # core argument: cov_struct
sm.OrdinalGEE#(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, dep_data=None, constraint=None) # core argument: cov_struct
sm.MixedLM#(endog, exog, groups, exog_re=None, exog_vc=None, use_sqrt=True, missing='none') # core argument: groups, exog_re
sm.MANOVA#(endog, exog, missing='none', hasconst=None) # core argument: endog
#sms.anova_lm#(args, typ=2, scale=None, robust=None)
#sms.AnovaRM#(data, depvar, subject, within=None, between=None, aggregate_func=None)

sm.Factor.from_formula#(formula, data, subset=None, drop_cols=None)
sm.Logit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.MNLogit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.Probit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.ConditionalLogit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.ConditionalMNLogit.from_formula#(formula, data, subset=None, drop_cols=None)
sm.OLS.from_formula#(formula, data, subset=None, drop_cols=None)
sm.WLS.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLS.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLSAR.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLM.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GLMGam.from_formula#(formula, data, subset=None, drop_cols=None)
sm.RLM.from_formula#(formula, data, subset=None, drop_cols=None)
sm.GEE.from_formula#(formula, groups, data, subset=None, time=None, offset=None, exposure=None)
sm.NominalGEE.from_formula#(formula, groups, data, subset=None, time=None, offset=None, exposure=None)
sm.OrdinalGEE.from_formula#(formula, groups, data, subset=None, time=None, offset=None, exposure=None)
sm.MixedLM.from_formula#(formula, data, re_formula=None, vc_formula=None, subset=None, use_sparse=False, missing='none')
sm.MANOVA.from_formula#(formula, data, subset=None, drop_cols=None)

sm.BayesGaussMI
sm.MI
sm.MICE
sm.PHReg
sm.SurvfuncRight
sm.RecursiveLS

sm.ConditionalPoisson
sm.GeneralizedPoisson
sm.HurdleCountModel
sm.NegativeBinomial
sm.NegativeBinomialP
sm.TruncatedLFNegativeBinomialP
sm.TruncatedLFPoisson
sm.ZeroInflatedNegativeBinomialP
sm.ZeroInflatedPoisson
sm.ZeroInflatedGeneralizedPoisson
sm.Poisson
sm.PoissonBayesMixedGLM
sm.BinomialBayesMixedGLM



# smt: statsmodels.tsa.api
smt.UnobservedComponents#(endog=y, irregular=True, level=True, stochastic_level=True, trend=True, stochastic_trend=True, cycle=True, damped_cycle=True, stochastic_cycle=True, cycle_period_bounds=None, seasonal=None, stochastic_seasonal=True, stochastic_freq_seasonal=None)
smt.SARIMAX#(endog=y, order=(2,0,0), seasonal_order=(0,0,0,0))
smt.ETSModel#(endog=y, error=['add', 'mul'][0], trend=['add', 'mul', None][-1], damped_trend=False, seasonal=["add", "mul", None][-1], seasonal_periods=None)
smt.AutoReg#(endog=y, lags=2)
smt.ARIMA#(endog=y, order=(2,0,0), seasonal_order=(0,0,0,0))
smt.ExponentialSmoothing#(endog=y, trend=['add', 'mul', None][-1], damped_trend=False, seasonal=["add", "mul", None][-1], seasonal_periods=None) # [level, trend, seasonal]: Exponential smoothing with trend and seasonal components.
smt.SimpleExpSmoothing#(endog=y) # [level] Basic exponential smoothing with only a level component.
smt.Holt#(endog=y, exponential=False, damped_trend=False)
smt.SVAR#(endog=Y, svar_type=['A', 'B', 'AB'][0])
smt.VAR#(endog=Y)
smt.VARMAX#(endog=Y)
smt.VECM#(endog=Y)
smt.UECM#(endog=Y, lags=1)
smt.ARDL#(endog=Y, lags=1)
smt.DynamicFactorMQ#(endog=Y)
smt.DynamicFactor#(endog=Y, k_factors=1, factor_order=2)
smt.MarkovRegression#(y, k_regimes=2)
smt.MarkovAutoregression#(y, k_regimes=2, order=4, switching_ar=False).fit()



# smf: statsmodels.formula.api
smf.logit
smf.mnlogit
smf.probit
smf.conditionallogit
smf.conditionalmllogit
smf.ols
smf.wls
smf.gls
smf.glm
smf.mixedlm
smf.quantreg
smf.gee
smf.nominal_gee
smf.ordinal_gee


# sms: statsmodels.stats.api
sms.omni_normtest#(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
sms.jarque_bera#(result.resid) # Null (H0): Normality is present. / Alternative (HA): Normality is not present.
sms.normal_ad#(result.resid) # Null (H0): Normality is not present. / Alternative (HA): Normality is present.
sms.lilliefors#(result.resid, dist='norm', pvalmethod='table') # Null (H0): Normality is not present. / Alternative (HA): Normality is present. # alias of kstest 
sms.het_breuschpagan#(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
sms.het_white#(result.resid, result.model.exog) # Null (H0): Heteroscedasticity is present. / Alternative (HA): Homoscedasticity is present.
sms.het_goldfeldquandt#(result.resid, result.model.exog) # Null (H0): Homoscedasticity is present. / Alternative (HA): Heteroscedasticity is present.
sms.durbin_watson#(resids=result.resid) # if statistic is within the range of 1.5 and 2.5, No correlation. # Null (H0): There is no correlation among the residuals / Alternative (HA): The residuals are autocorrelated.
sms.acorr_breusch_godfrey#(res=result, nlags=None, store=False) # Null (H0): There is no autocorrelation at any order less than or equal to p. / Alternative (HA): There exists autocorrelation at some order less than or equal to p. 
sms.acorr_lm#(resid=result.resid, nlags=None, store=False, period=None, ddof=0, cov_type='nonrobust', cov_kwargs=None) 
sms.acorr_ljungbox#(x=result.resid, lags=None, boxpierce=False, model_df=0, period=None, return_df=True, auto_lag=False) # Null (H0): The residuals are independently distributed / Alternative (HA): The residuals are not independently distributed; they exhibit serial correlation.
sms.proportion_confint#(count, nobs, alpha, method='normal') # count: number of success, nobs: number of observation, alpha: significant level, default

# smg: statsmodels.graphics.api
smg.influence_plot#(results, external=True, alpha=0.05, criterion='cooks', size=48, plot_alpha=0.75, ax=None)
smg.plot_ccpr#(results, exog_idx, ax=None)
smg.plot_ccpr_grid#(results, exog_idx=None, grid=None, fig=None)
smg.plot_corr#(dcorr)
smg.plot_corr_grid#(dcorrs)
smg.plot_partregress#(endog, exog_i, exog_others)
smg.plot_partregress_grid#(results, exog_idx=None, grid=None, fig=None)
smg.plot_fit#(results, exog_idx, y_true=None)
smg.plot_leverage_resid2#(results, alpha=0.05)
smg.qqplot#(data)
smg.tsa.plot_acf#(x, lags=None, alpha=0.05, use_vlines=True, adjusted=False, fft=False, missing='none', title='Autocorrelation', zero=True, auto_ylims=False, bartlett_confint=True)
smg.tsa.plot_pacf#(x, lags=None, alpha=0.05, method='ywm', use_vlines=True, title='Partial Autocorrelation', zero=True)

pasty formula and design matrix

import numpy as np
import pandas as pd
from patsy import dmatrix, dmatrices

def function(x):
    return 2*x



# design matrix(1): numpy ndarray
np.random.seed(0)
x1 = np.random.rand(5) + 10
x2 = np.random.rand(5) * 10

dmatrix("x1")
dmatrix("x1 - 1")
dmatrix("x1 + 0")
dmatrix("x1 + x2")
dmatrix("x1 + x2 - 1")
dmatrix("function(x1)")
np.asarray(dmatrix("function(x1)")) # designmatrix to ndarray
pd.DataFrame(dmatrix("function(x1)")) # designmatrix to dataframe


# design matrix(2): pandas dataframe
df = pd.DataFrame(data=np.c_[np.random.normal(0, size=5), np.random.normal(1, size=5), np.random.poisson(10, size=5)] ,columns=['N1', 'N2', 'N3'])
df['C1'] = np.array(['P', 'Q', 'R', 'S', 'P'], dtype=str)

dmatrix("N1 + N2 - 1", data=df)
dmatrix("N1 + np.log(np.abs(N2))", data=df)
dmatrix("N1 + N2 + np.log(np.abs(N2)) + I(np.log(np.abs(N2)))", data=df)

dmatrix("N1 + function(N1)", data=df)
dmatrix("N1 + center(N1) + standardize(N1)", data=df)
dmatrix("N2", data=df)
dmatrix("N3", data=df)
dmatrix("C1", data=df)
dmatrix("C1 + 0", data=df)
dmatrix("C(N3)", data=df)
dmatrix("C(N3) + 0", data=df)
np.asarray(dmatrix("C1", data=df)) # designmatrix to ndarray
pd.DataFrame(dmatrix("C1", data=df)) # designmatrix to dataframe



# pasty formula(0): Intercepts(Constant)
dmatrix("1", data=df, return_type='dataframe') # intercept(constant) 
dmatrix("0", data=df, return_type='dataframe') # zero intercept(no intercept) 
dmatrix("-1", data=df, return_type='dataframe') # zero intercept(no intercept) 

# pasty formula(1-1): Intercepts(by Categorical Variable or by Group, 'C1')
dmatrix("C1", data=df, return_type='dataframe') # full-rank intercept
dmatrix("1 + C1", data=df, return_type='dataframe') # full-rank intercept
dmatrix("0 + C1", data=df, return_type='dataframe') # reduced-rank intercept
dmatrix("C(C1)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("1 + C(C1)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("0 + C(C1)", data=df, return_type='dataframe') # reduced-rank intercept
dmatrix('C(C1, Treatment(reference="P"))', data=df, return_type='dataframe') # full-rank intercept with reference
dmatrix('C(C1, Treatment(reference="Q"))', data=df, return_type='dataframe') # full-rank intercept with reference
dmatrix('C(C1, Treatment(reference="R"))', data=df, return_type='dataframe') # full-rank intercept with reference
dmatrix('C(C1, Treatment(reference="S"))', data=df, return_type='dataframe') # full-rank intercept with reference

# pasty formula(1-2): Intercepts(by Numerical Variable or by Group, 'N3')
dmatrix("C(N3)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("1 + C(N3)", data=df, return_type='dataframe') # full-rank intercept
dmatrix("0 + C(N3)", data=df, return_type='dataframe') # reduced-rank intercept

# pasty formula(1-3): Intercepts(Nested Intercepts)
dmatrix("0 + C1 + C(N3)", data=df, return_type='dataframe')
dmatrix("0 + C1 + C(N3, Sum)", data=df, return_type='dataframe') 
dmatrix("0 + C1 + C1:C(N3)", data=df, return_type='dataframe')

# pasty formula(1-4): Intercepts(Contrast)
dmatrix("C(C1, Treatment)", data=df, return_type='dataframe') 
dmatrix("C(C1, Sum)", data=df, return_type='dataframe') 
dmatrix("C(C1, Diff)", data=df, return_type='dataframe') 
dmatrix("C(C1, Poly)", data=df, return_type='dataframe') 
dmatrix("C(C1, Helmert)", data=df, return_type='dataframe') 

# pasty formula(2): Intercepts(by Categorical Variable or by Group, 'C1') + Slope(Constant)
dmatrix("0 + C1 + N1", data=df, return_type='dataframe') # full-rank intercept / slope(constant)
dmatrix("C1 + N1", data=df, return_type='dataframe') # reduced-rank intercept / slope(constant)
dmatrix("1 + C1 + N1", data=df, return_type='dataframe') # reduced-rank intercept / slope(constant)

# pasty formula(3): Intercepts(by Categorical Variable or by Group, 'C1') + Slopes(by Categorical Variable or by Group, 'C1')
dmatrix("0 + C1 + C1:N1", data=df, return_type='dataframe') # full-rank intercept / full-rank slopes
dmatrix("0 + C1 + N1 + C1:N1", data=df, return_type='dataframe') # full-rank intercept / reduced-rank slopes
dmatrix("1 + C1 + C1:N1", data=df, return_type='dataframe') # reduced-rank intercept / full-rank slopes
dmatrix("1 + C1 + N1 + C1:N1", data=df, return_type='dataframe') # reduced-rank intercept / reduced-rank slopes

# pasty formula(4): No Intercept + Slopes(by Categorical Variable or by Group, 'C1') 
dmatrix("0 + C1:N1", data=df, return_type='dataframe') # no intercept / full-rank slopes
dmatrix("0 + N1 + C1:N1", data=df, return_type='dataframe') # no intercept / reduced-rank slopes

# pasty formula(5): Intercept(Constant) + Slope(Constant)
dmatrix("-1 + N1", data=df, return_type='dataframe') # zero intercept(no intercept) / slope(constant)
dmatrix("0 + N1", data=df, return_type='dataframe') # zero intercept(no intercept) / slope(constant)
dmatrix("1 + N1", data=df, return_type='dataframe') # intercept(constant) / slope(constant)

Reference(2)

'quantitative analysis > statistics' 카테고리의 다른 글

Survival Analysis (0)	2023.07.06
Bayesian Estimation (0)	2023.05.31
Panel Analysis (4)	2023.05.20
Correlation and Covariance Analysis (0)	2023.05.08
Analysis of Variance (ANOVA) (1)	2023.05.06

All-Together

Regression Analysis

Classical Regression Analysis: Overview

Regression Analysis

Ordinary Least Squares (OLS)

Weighted Least Square(WLS) and Generalized Least Square(GLS)

Maximum Likelihood Estimation

Generalized Linear Regression: MLE

Regression Noise: Residual Analysis

Estimator not to be hold for the assumptions

Covariance Estimation

Confidence intervals

Decision boundary

Classification: MLE

Logistic Regression

Logistic Regression

Multinomial Logistic Regression

Conditional Logistic Regression

Probit Regression

Tobit Regression

Robust Linear Regression

Linear Mixed Effects Models

Varying-Coefficient Models

Variance Components Model

Linear Mixed Effect Model Structure

Multilevel Regression:Hierarchical and Nested Group Analysis

Generalized Linear Regression

Analysis on regression task

Generalized Additive Models (GAM)

Generalized Linear Mixed Effects (GLIMMIX)

Generalized Method of Moments (GMM)

Two-Stage Least Squares

Generalized Estimating Equation (GEE)

General Linear Regression: OLS

Seemingly Unrelated Regression(SUR/SURE)

Reference(1)

Reference(2)

'quantitative analysis > statistics' 카테고리의 다른 글

+ Recent posts

티스토리툴바