781 lines
32 KiB
Plaintext
781 lines
32 KiB
Plaintext
|
{
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 0,
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"name": "Chapter 7: Analysis of Variance (Anova).ipynb",
|
|||
|
"provenance": [],
|
|||
|
"collapsed_sections": [
|
|||
|
"1mrsPw8OSpPG",
|
|||
|
"dthsvFWS-UmB",
|
|||
|
"f2CRhyLNZZ5N",
|
|||
|
"EVRQF0HkZhxd",
|
|||
|
"6r7fJg2TiQlk"
|
|||
|
]
|
|||
|
},
|
|||
|
"kernelspec": {
|
|||
|
"name": "python3",
|
|||
|
"display_name": "Python 3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"name": "python"
|
|||
|
}
|
|||
|
},
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"# **`Chapter 7: Analysis of Variance (Anova)`**"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "Wb9LyXR91KSv"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Table of Content:**\n",
|
|||
|
"\n",
|
|||
|
"- [Import Libraries](#Import_Libraries)\n",
|
|||
|
"- [7.1. One-Way Analysis of Variance](#One-Way_Analysis_of_Variance)\n",
|
|||
|
" - [7.1.1. Equal Sample Sizes](#Equal_Sample_Sizes)\n",
|
|||
|
" - [7.1.2. Unequal Sample Sizes](#Unequal_Sample_Sizes)\n",
|
|||
|
"- [7.2. Two-Way Analysis of Variance](#Two-Way_Analysis_of_Variance) "
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "w5zb7kZV85_J"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"<a name='Import_Libraries'></a>\n",
|
|||
|
"\n",
|
|||
|
"## **Import Libraries**"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "1mrsPw8OSpPG"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"source": [
|
|||
|
"!pip install --upgrade scipy"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "1GroeqIoC6fr",
|
|||
|
"outputId": "139f4f13-d57b-472c-9ff7-e83fea5ac7d8"
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"output_type": "stream",
|
|||
|
"name": "stdout",
|
|||
|
"text": [
|
|||
|
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
|
|||
|
"Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (1.7.3)\n",
|
|||
|
"Requirement already satisfied: numpy<1.23.0,>=1.16.5 in /usr/local/lib/python3.7/dist-packages (from scipy) (1.21.6)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import matplotlib.patches as mpatches\n",
|
|||
|
"from matplotlib import collections as mc\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import math\n",
|
|||
|
"from scipy import stats\n",
|
|||
|
"from scipy.stats import norm\n",
|
|||
|
"from scipy.stats import chi2\n",
|
|||
|
"from scipy.stats import t\n",
|
|||
|
"from scipy.stats import f\n",
|
|||
|
"from scipy.stats import bernoulli\n",
|
|||
|
"from scipy.stats import binom\n",
|
|||
|
"from scipy.stats import nbinom\n",
|
|||
|
"from scipy.stats import geom\n",
|
|||
|
"from scipy.stats import poisson\n",
|
|||
|
"from scipy.stats import uniform\n",
|
|||
|
"from scipy.stats import randint\n",
|
|||
|
"from scipy.stats import expon\n",
|
|||
|
"from scipy.stats import gamma\n",
|
|||
|
"from scipy.stats import beta\n",
|
|||
|
"from scipy.stats import weibull_min\n",
|
|||
|
"from scipy.stats import hypergeom\n",
|
|||
|
"from scipy.stats import shapiro\n",
|
|||
|
"from scipy.stats import pearsonr\n",
|
|||
|
"from scipy.stats import normaltest\n",
|
|||
|
"from scipy.stats import anderson\n",
|
|||
|
"from scipy.stats import spearmanr\n",
|
|||
|
"from scipy.stats import kendalltau\n",
|
|||
|
"from scipy.stats import chi2_contingency\n",
|
|||
|
"from scipy.stats import ttest_ind\n",
|
|||
|
"from scipy.stats import ttest_rel\n",
|
|||
|
"from scipy.stats import mannwhitneyu\n",
|
|||
|
"from scipy.stats import wilcoxon\n",
|
|||
|
"from scipy.stats import kruskal\n",
|
|||
|
"from scipy.stats import friedmanchisquare\n",
|
|||
|
"from statsmodels.tsa.stattools import adfuller\n",
|
|||
|
"from statsmodels.tsa.stattools import kpss\n",
|
|||
|
"from statsmodels.stats.weightstats import ztest\n",
|
|||
|
"import statsmodels.api as sm\n",
|
|||
|
"from sklearn.linear_model import LinearRegression\n",
|
|||
|
"from scipy.stats import f_oneway\n",
|
|||
|
"from statsmodels.formula.api import ols\n",
|
|||
|
"from statsmodels.stats.anova import anova_lm\n",
|
|||
|
"from scipy.integrate import quad\n",
|
|||
|
"from statsmodels.stats.outliers_influence import summary_table\n",
|
|||
|
"from statsmodels.sandbox.regression.predstd import wls_prediction_std\n",
|
|||
|
"from statsmodels.stats.outliers_influence import variance_inflation_factor\n",
|
|||
|
"from IPython.display import display, Latex\n",
|
|||
|
"\n",
|
|||
|
"import warnings\n",
|
|||
|
"warnings.filterwarnings('ignore')\n",
|
|||
|
"warnings.simplefilter(action='ignore', category=FutureWarning)"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "o60rxBbwmJM5"
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"outputs": []
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"<a name='One-Way_Analysis_of_Variance'></a>\n",
|
|||
|
"\n",
|
|||
|
"## **7.1. One-Way Analysis of Variance:**"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "dthsvFWS-UmB"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"This technique, which is rather general and can be used to make inferences about a multitude of parameters relating to population means, is known as the analysis of variance.\n",
|
|||
|
"\n",
|
|||
|
"We suppose that we have been provided samples of size $n$ from $m$ distinct populations and that we want to use these data to test the hypothesis that the $m$ population means are equal."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "yRiF35t89W4t"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"<a name='Equal_Sample_Sizes'></a>\n",
|
|||
|
"\n",
|
|||
|
"### **7.1.1. Equal Sample Sizes:**"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "f2CRhyLNZZ5N"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"Since the mean of a random variable depends only on a single factor, namely, the sample the variable is from, this scenario is said to constitute a one-way analysis of variance.\n",
|
|||
|
"\n",
|
|||
|
"One way of thinking about this is to imagine that we have $m$ different treatments, where the result of applying treatment $i$ on an item is a normal random variable with mean $\\mu_i$ and variance $\\sigma^2$. We are then interested in testing the hypothesis that all treatments have the same effect, by applying each treatment to a (different) sample of $n$ items and then analyzing the result.\n",
|
|||
|
"\n",
|
|||
|
"Consider $m$ independent samples, each of size $n$, where the members of the ith sample $X_{i1}, X_{i2}, . . . , X_{in}$ are normal random variables with unknown mean $\\mu_i$ and unknown variance $\\sigma^2$.\n",
|
|||
|
"\n",
|
|||
|
"$X_{ij} \\sim N(\\mu_i, \\sigma^2) \\quad i=1,...,m,\\ j=1,...,n$\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"$H_0: \\mu_1=\\mu_2=...=\\mu_m$\n",
|
|||
|
"\n",
|
|||
|
"$H_1:$ not all the means are equal (at least two of them differ.)"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "MfiPoYrp-4U4"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Within samples sum of squares:**\n",
|
|||
|
"\n",
|
|||
|
"Since there are a total of $nm$ independent normal random variables $X_{ij}$, it follows that the sum of the squares of their standardized versions will be a chi-square random variable with $nm$ degrees of freedom.\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}-E[X_{ij}])^2}{\\sigma^2} = \\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}- \\mu_i)^2}{\\sigma^2} \\sim \\chi^2_{nm}$\n",
|
|||
|
"\n",
|
|||
|
"To obtain estimators for the $m$ unknown parameters $\\mu_1, . . . ,\\mu_m$, let $X_{i.}$ denote the average of all the elements in sample $i$:\n",
|
|||
|
"\n",
|
|||
|
"$X_{i.} = \\sum_{j=1}^n \\frac{X_{ij}}{n}$\n",
|
|||
|
"\n",
|
|||
|
"The variable $X_{i.}$ is the sample mean of the ith population, and as such is the estimator of the population mean $\\mu_i$ for $i=1,...,m$.\n",
|
|||
|
"\n",
|
|||
|
"Then if we substitute the $\\mu$ with $X_{i.}$ the following variable will have chi-square distribution with $nm − m$ degrees of freedom. (Recall that 1 degree of freedom is lost for each parameter that is estimated.)\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}- X_{i.})^2}{\\sigma^2} \\sim \\chi^2_{nm-m}$\n",
|
|||
|
"\n",
|
|||
|
"$SS_W = \\sum_{i=1}^m \\sum_{j=1}^n (X_{ij}- X_{i.})^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{E[SS_W]}{\\sigma^2} = nm-m \\quad \\rightarrow \\quad \\frac{E[SS_W]}{nm-m} = \\sigma^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{SS_W}{nm-m}$ is an estimator of $\\sigma^2$."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "IsmBOUo2EB6_"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**between samples sum of squares:**\n",
|
|||
|
"\n",
|
|||
|
"assume that $H_0$ is true and so all the population means $μ_i$ are equal, say, $μ_i = μ$ for all $i$. Under this condition it follows that the $m$ sample means $X_{1.}, X_{2.}, . . ., X_m$. will all be normally distributed with the same mean $\\mu$ and the same variance $\\frac{\\sigma^2}{n}$. Hence, the sum of squares of the $m$ standardized variables $\\frac{X_{i.}-\\mu}{\\sqrt{\\frac{\\sigma^2}{n}}} = \\frac{\\sqrt{n}(X_{i.}-\\mu)}{\\sigma}$ will be a chi-square random variable with $m$ degrees of freedoms.\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\frac{n(X_{i.}-\\mu)^2}{\\sigma^2} \\sim \\chi_m^2$\n",
|
|||
|
"\n",
|
|||
|
"Now, when all the population means are equal to $\\mu$, then the estimator of $\\mu$ is the average of all the nm data values. That is, the estimator of $\\mu$ is $X_{..}$.\n",
|
|||
|
"\n",
|
|||
|
"$X_{..} = \\frac{\\sum_{i=1}^m \\sum_{j=1}^n X_{ij}}{nm} = \\frac{\\sum_{i=1}^m X_{i.}}{m}$\n",
|
|||
|
"\n",
|
|||
|
"If we now substitute $X_{..}$ for the unknown parameter μ in expression $\\sum_{i=1}^m \\frac{n(X_{i.}-\\mu)^2}{\\sigma^2}$ it follows, when $H_0$ is true, that the resulting quantity will be a chi-square random variable with $m − 1$ degrees of freedom.\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\frac{n(X_{i.}-X_{..})^2}{\\sigma^2} \\sim \\chi_{m-1}^2$\n",
|
|||
|
"\n",
|
|||
|
"$SS_b = n \\sum_{i=1}^m (X_{i.}-X_{..})^2$\n",
|
|||
|
"\n",
|
|||
|
"When $H_0$ is true:\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{E[SS_b]}{\\sigma^2} = m-1 \\quad \\rightarrow \\quad \\frac{E[SS_b]}{m-1} = \\sigma^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{SS_b}{m-1}$ is an estimator of $\\sigma^2$.\n",
|
|||
|
"\n",
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th>Estimators of $\\sigma^2$</th><th>Conditions</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>$\\frac{SS_W}{nm-m}$</td><td>Always true</td></tr>\n",
|
|||
|
"<tr><td>$\\frac{SS_b}{m-1}$</td><td>Only when $H_0$ is true</td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "bS5UmLn3IhDF"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"Because it can be shown that $\\frac{SS_b}{m-1}$ will tend to exceed $\\sigma^2$ when $H_0$ is not true, the test statistic is:\n",
|
|||
|
"\n",
|
|||
|
"$F_0 = \\frac{\\frac{SS_b}{m-1}}{\\frac{SS_W}{nm-m}}$\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"Significance level = $\\alpha$\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"We accept $H_0$ if:\n",
|
|||
|
"\n",
|
|||
|
"1. $F_0 < F_{m-1,\\ nm-m,\\ \\alpha}$\n",
|
|||
|
"\n",
|
|||
|
"2. P_value = $P(F_{m-1,\\ nm-m} \\geq F_0) > \\alpha$"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "icLzqh0lRl3g"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"The sum of squares identity:\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^n X_{ij}^2 = nmX_{..}^2 + SS_b + SS_W$"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "xlWt1DIPYtaY"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Summary:**\n",
|
|||
|
"\n",
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th>--Source of Variation--</th><th>----------------------Sum of Squares----------------------</th><th>--Degrees of Freedom--</th><th>--Mean of Squares--</th><th>--Value of Test Statistic--</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>Between Samples</td><td>$SS_b=n\\sum_{i=1}^m(X_{i.}-X_{..})^2$</td><td>$m-1$</td><td>$MS_b = \\frac{SS_b}{m-1}$</td><td>$F_0 = \\frac{\\frac{SS_b}{m-1}}{\\frac{SS_W}{nm-m}}$\n",
|
|||
|
"</td></tr>\n",
|
|||
|
"<tr><td>Within Samples</td><td>$SS_W = \\sum_{i=1}^m \\sum_{j=1}^n (X_{ij}- X_{i.})^2$</td><td>$nm-m$</td><td>$MS_W = \\frac{SS_W}{nm-m}$</td><td></td></tr>\n",
|
|||
|
"<tr><td>Total</td><td>$SS_T = SS_W + SS_b = \\sum_{i=1}^m \\sum_{j=1}^n (X_{ij}- X_{..})^2$</td><td>$nm-1$</td><td></td><td></td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "Ti5EqBy9JrrG"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"You can do this test with f_oneway from scipy library."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "dq1LdNgzQcq-"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"source": [
|
|||
|
"Sample1 = [220, 251, 226, 246, 260]\n",
|
|||
|
"Sample2 = [244, 235, 232, 242, 225]\n",
|
|||
|
"Sample3 = [252, 272, 250, 238, 256]\n",
|
|||
|
"\n",
|
|||
|
"alpha = 0.05\n",
|
|||
|
"results = f_oneway(Sample1, Sample2, Sample3)\n",
|
|||
|
"\n",
|
|||
|
"print(results, '\\n')\n",
|
|||
|
"\n",
|
|||
|
"if results[1] < alpha:\n",
|
|||
|
" print(f'Since p_value < {alpha}, reject null hypothesis.')\n",
|
|||
|
"else:\n",
|
|||
|
" print(f'Since p_value > {alpha}, the null hypothesis cannot be rejected.')"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "RoAxmqus7_B2",
|
|||
|
"outputId": "e84caf74-683b-4402-cd9c-75fd43a202c1"
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"output_type": "stream",
|
|||
|
"name": "stdout",
|
|||
|
"text": [
|
|||
|
"F_onewayResult(statistic=2.6009238802972487, pvalue=0.11524892355706169) \n",
|
|||
|
"\n",
|
|||
|
"Since p_value > 0.05, the null hypothesis cannot be rejected.\n"
|
|||
|
]
|
|||
|
}
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"<a name='Unequal_Sample_Sizes'></a>\n",
|
|||
|
"\n",
|
|||
|
"### **7.1.2. Unequal Sample Sizes:**"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "EVRQF0HkZhxd"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"Suppose that we have $m$ normal samples of respective sizes $n_1, n_2, ... , n_m$. That is, the data consist of the $\\sum_{i=1}^m n_i$\n",
|
|||
|
"independent random variables $Xij,\\ j = 1, ... , n_i,\\ i = 1, ... , m$, where $X_{ij} ∼ N (\\mu_i, \\sigma^2)$\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"$H_0: \\mu_1=\\mu_2=...=\\mu_m$\n",
|
|||
|
"\n",
|
|||
|
"$H_1:$ not all the means are equal (at least two of them differ.)"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "qdRARCxRZttP"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Within samples sum of squares:**\n",
|
|||
|
"\n",
|
|||
|
"Since there are a total of $\\sum_{i=1}^m n_i$ independent normal random variables $X_{ij}$, it follows that the sum of the squares of their standardized versions will be a chi-square random variable with $\\sum_{i=1}^m n_i$ degrees of freedom.\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^{n_i} \\frac{(X_{ij}-E[X_{ij}])^2}{\\sigma^2} = \\sum_{i=1}^m \\sum_{j=1}^{n_i} \\frac{(X_{ij}- \\mu_i)^2}{\\sigma^2} \\sim \\chi^2_{\\sum_{i=1}^m n_i}$\n",
|
|||
|
"\n",
|
|||
|
"To obtain estimators for the $m$ unknown parameters $\\mu_1, . . . ,\\mu_m$, let $X_{i.}$ denote the average of all the elements in sample $i$:\n",
|
|||
|
"\n",
|
|||
|
"$X_{i.} = \\sum_{j=1}^{n_i} \\frac{X_{ij}}{n}$\n",
|
|||
|
"\n",
|
|||
|
"The variable $X_{i.}$ is the sample mean of the ith population, and as such is the estimator of the population mean $\\mu_i$ for $i=1,...,m$.\n",
|
|||
|
"\n",
|
|||
|
"Then if we substitute the $\\mu$ with $X_{i.}$ the following variable will have chi-square distribution with $\\sum_{i=1}^m n_i - m$ degrees of freedom. (Recall that 1 degree of freedom is lost for each parameter that is estimated.)\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^{n_i} \\frac{(X_{ij}- X_{i.})^2}{\\sigma^2} \\sim \\chi^2_{\\sum_{i=1}^m n_i - m}$\n",
|
|||
|
"\n",
|
|||
|
"$SS_W = \\sum_{i=1}^m \\sum_{j=1}^{n_i} (X_{ij}- X_{i.})^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{E[SS_W]}{\\sigma^2} = \\sum_{i=1}^m n_i - m \\quad \\rightarrow \\quad \\frac{E[SS_W]}{\\sum_{i=1}^m n_i - m} = \\sigma^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{SS_W}{\\sum_{i=1}^m n_i - m}$ is an estimator of $\\sigma^2$."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "PHvjr5IDarxc"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**between samples sum of squares:**\n",
|
|||
|
"\n",
|
|||
|
"assume that $H_0$ is true and so all the population means $μ_i$ are equal, say, $μ_i = μ$ for all $i$. Under this condition it follows that the $m$ sample means $X_{1.}, X_{2.}, . . ., X_m$. will all be normally distributed with the same mean $\\mu$ and the same variance $\\frac{\\sigma^2}{n_i}$. Hence, the sum of squares of the $m$ standardized variables $\\frac{X_{i.}-\\mu}{\\sqrt{\\frac{\\sigma^2}{n_i}}} = \\frac{\\sqrt{n_i}(X_{i.}-\\mu)}{\\sigma}$ will be a chi-square random variable with $m$ degrees of freedoms.\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\frac{n_i(X_{i.}-\\mu)^2}{\\sigma^2} \\sim \\chi_m^2$\n",
|
|||
|
"\n",
|
|||
|
"Now, when all the population means are equal to $\\mu$, then the estimator of $\\mu$ is the average of all the nm data values. That is, the estimator of $\\mu$ is $X_{..}$.\n",
|
|||
|
"\n",
|
|||
|
"$X_{..} = \\frac{\\sum_{i=1}^m \\sum_{j=1}^{n_i} X_{ij}}{\\sum_{i=1}^m n_i} = \\frac{\\sum_{i=1}^m X_{i.}}{m}$\n",
|
|||
|
"\n",
|
|||
|
"If we now substitute $X_{..}$ for the unknown parameter $\\mu$ in expression $\\sum_{i=1}^m \\frac{n_i(X_{i.}-\\mu)^2}{\\sigma^2}$ it follows, when $H_0$ is true, that the resulting quantity will be a chi-square random variable with $m − 1$ degrees of freedom.\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\frac{n_i(X_{i.}-X_{..})^2}{\\sigma^2} \\sim \\chi_{m-1}^2$\n",
|
|||
|
"\n",
|
|||
|
"$SS_b = n_i \\sum_{i=1}^m (X_{i.}-X_{..})^2$\n",
|
|||
|
"\n",
|
|||
|
"When $H_0$ is true:\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{E[SS_b]}{\\sigma^2} = m-1 \\quad \\rightarrow \\quad \\frac{E[SS_b]}{m-1} = \\sigma^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{SS_b}{m-1}$ is an estimator of $\\sigma^2$.\n",
|
|||
|
"\n",
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th>Estimators of $\\sigma^2$</th><th>Conditions</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>$\\frac{SS_W}{\\sum_{i=1}^m n_i - m}$</td><td>Always true</td></tr>\n",
|
|||
|
"<tr><td>$\\frac{SS_b}{m-1}$</td><td>Only when $H_0$ is true</td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "krC3IvY8dJZ2"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"Because it can be shown that $\\frac{SS_b}{m-1}$ will tend to exceed $\\sigma^2$ when $H_0$ is not true, the test statistic is:\n",
|
|||
|
"\n",
|
|||
|
"$F_0 = \\frac{\\frac{SS_b}{m-1}}{\\frac{SS_W}{\\sum_{i=1}^m n_i - m}}$\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"Significance level = $\\alpha$\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"We accept $H_0$ if:\n",
|
|||
|
"\n",
|
|||
|
"1. $F_0 < F_{m-1,\\ \\sum_{i=1}^m n_i - m,\\ \\alpha}$\n",
|
|||
|
"\n",
|
|||
|
"2. P_value = $P(F_{m-1,\\ \\sum_{i=1}^m n_i - m} \\geq F_0) > \\alpha$"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "7f34W5C3fr-I"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Summary:**\n",
|
|||
|
"\n",
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th>--Source of Variation--</th><th>----------------------Sum of Squares----------------------</th><th>--Degrees of Freedom--</th><th>---Mean of Squares---</th><th>--Value of Test Statistic--</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>Between Samples</td><td>$SS_b = n_i \\sum_{i=1}^m (X_{i.}-X_{..})^2$</td><td>$m-1$</td><td>$MS_b = \\frac{SS_b}{m-1}$</td><td>$F_0 = \\frac{\\frac{SS_b}{m-1}}{\\frac{SS_W}{\\sum_{i=1}^m n_i - m}}$\n",
|
|||
|
"</td></tr>\n",
|
|||
|
"<tr><td>Within Samples</td><td>$SS_W = \\sum_{i=1}^m \\sum_{j=1}^{n_i} (X_{ij}- X_{i.})^2$</td><td>$\\sum_{i=1}^m n_i - m$</td><td>$MS_W = \\frac{SS_W}{\\sum_{i=1}^m n_i - m}$</td><td></td></tr>\n",
|
|||
|
"<tr><td>Total</td><td>$SS_T = SS_W + SS_b = \\sum_{i=1}^m \\sum_{j=1}^{n_i} (X_{ij}- X_{..})^2$</td><td>$\\sum_{i=1}^m n_i-1$</td><td></td><td></td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "qpx-mpOhgAtR"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"source": [
|
|||
|
"Sample1 = [220, 251, 226, 246, 260]\n",
|
|||
|
"Sample2 = [244, 235, 232, 242]\n",
|
|||
|
"Sample3 = [252, 272, 250]\n",
|
|||
|
"\n",
|
|||
|
"alpha = 0.05\n",
|
|||
|
"results = f_oneway(Sample1, Sample2, Sample3)\n",
|
|||
|
"\n",
|
|||
|
"print(results, '\\n')\n",
|
|||
|
"\n",
|
|||
|
"if results[1] < alpha:\n",
|
|||
|
" print(f'Since p_value < {alpha}, reject null hypothesis.')\n",
|
|||
|
"else:\n",
|
|||
|
" print(f'Since p_value > {alpha}, the null hypothesis cannot be rejected.')"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "sRUSJO4YZkFc",
|
|||
|
"outputId": "44bd1ace-4eb4-46dd-bb68-6792dd9fc594"
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"output_type": "stream",
|
|||
|
"name": "stdout",
|
|||
|
"text": [
|
|||
|
"F_onewayResult(statistic=2.2667346740503254, pvalue=0.15949612861261475) \n",
|
|||
|
"\n",
|
|||
|
"Since p_value > 0.05, the null hypothesis cannot be rejected.\n"
|
|||
|
]
|
|||
|
}
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"<a name='Two-Way_Analysis_of_Variance'></a>\n",
|
|||
|
"\n",
|
|||
|
"## **7.2. Two-Way Analysis of Variance:**"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "6r7fJg2TiQlk"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"We suppose that each data value is affected by two factors. We will refer to the first factor as the \"row\" factor, and the second factor as\n",
|
|||
|
"the \"column\" factor. we will suppose that the data $X_{ij},\\ i = 1, ... , m,\\ j = 1, ... , n$ are independent normal random variables with a common variance $\\sigma^2$ and we suppose that the mean value of data depends in an additive manner on both its row and its column.\n",
|
|||
|
"\n",
|
|||
|
"We let $X_{ij}$ represent the value of the jth member of\n",
|
|||
|
"sample $i$, then that model could be symbolically represented as: $E[X_{ij}] = \\mu_i$\n",
|
|||
|
"\n",
|
|||
|
"However, if we let $\\mu$ denote the average value of the $\\mu_i$ $(\\mu = \\frac{\\sum_{i=1}^m \\mu_i}{m})$ then we can rewrite the model as $E[X_{ij}] = \\mu + \\alpha_i$ where $\\alpha_i = \\mu_i -\\mu$\n",
|
|||
|
"\n",
|
|||
|
"With this definition of $\\alpha_i$ as the deviation of $\\mu_i$ from the average mean value, it is easy to see that $\\sum_{i=1}^m \\alpha_i = 0$\n",
|
|||
|
"\n",
|
|||
|
"A two-factor additive model can also be expressed in terms of row and column deviations.\n",
|
|||
|
"\n",
|
|||
|
"If we let $\\mu_{ij} = E[X_{ij}]$, then the additive model supposes that for some constants $a_i,\\ i = 1, ... , m$ and $b_j,\\ j = 1, ... , n$\n",
|
|||
|
"\n",
|
|||
|
"$\\mu_{ij} = \\alpha_i + b_j$\n",
|
|||
|
"\n",
|
|||
|
"Continuing our use of the \"dot\" (or averaging) notation, we let\n",
|
|||
|
"\n",
|
|||
|
"$\\mu_{i.} = \\sum_{j=1}^n \\frac{\\mu_{ij}}{n} \\qquad \\mu_{.j} = \\sum_{i=1}^m \\frac{\\mu_{ij}}{m} \\qquad \\mu_{..} = \\sum_{i=1}^m \\sum_{j=1}^n\\frac{\\mu_{ij}}{nm}$\n",
|
|||
|
"\n",
|
|||
|
"$a_. = \\sum_{i=1}^m \\frac{a_i}{m} \\qquad b_. = \\sum_{j=1}^n \\frac{b_j}{n}$\n",
|
|||
|
"\n",
|
|||
|
"Note that:\n",
|
|||
|
"\n",
|
|||
|
"$\\mu_{i.} = \\sum_{j=1}^n \\frac{(a_i + b_j)}{n} = a_i + b_. \\qquad \\mu_{.j} = a_. + b_j \\qquad \\mu_{..} = a_. + b_.$\n",
|
|||
|
"\n",
|
|||
|
"If we now set\n",
|
|||
|
"\n",
|
|||
|
"$\\mu = \\mu_{..} = a_. + b_. \\qquad \\alpha_i = \\mu_{i.}-\\mu = \\alpha_i -\\alpha_. \\qquad \\beta_j = \\mu_{.j}-\\mu = b_j - b_.$\n",
|
|||
|
"\n",
|
|||
|
"Then the model can be written as \n",
|
|||
|
"\n",
|
|||
|
"$\\mu_{ij} = E[X_{ij}] = \\mu + \\alpha_i + \\beta_j$\n",
|
|||
|
"\n",
|
|||
|
"The value $\\mu$ is called the grand mean, $\\alpha_i$ is the deviation from the grand mean due to row $i$, and $\\beta_j$ is the deviation from the grand mean due to column $j$.\n",
|
|||
|
"\n",
|
|||
|
"$X_{i.}=\\frac{\\sum_{j=1}^n X_{ij}}{n} \\qquad X_{.j}=\\frac{\\sum_{i=1}^m X_{ij}}{m} \\qquad X_{..}=\\frac{\\sum_{i=1}^m \\sum_{j=1}^n X_{ij}}{nm}$\n",
|
|||
|
"\n",
|
|||
|
"Unbiased estimators of $\\mu, \\alpha_i, \\beta_j$ — call them $\\widehat{\\mu},\\ \\widehat{\\alpha_i},\\ \\widehat{\\beta_j}$ — are given by\n",
|
|||
|
"\n",
|
|||
|
"$\\widehat{\\mu} = X_{..} \\qquad \\widehat{\\alpha_i} = X_{i.} - X_{..} \\qquad \\widehat{\\beta_j} = X_{.j} - X_{..}$"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "XqazMIWzi6Wr"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Hypothesis Tests:**\n",
|
|||
|
"\n",
|
|||
|
"Test 1:\n",
|
|||
|
"\n",
|
|||
|
"$H_0:$ all $\\alpha_i = 0$ \n",
|
|||
|
"\n",
|
|||
|
"$H_1:$ not all the $\\alpha_i$ are equal to 0\n",
|
|||
|
"\n",
|
|||
|
"This null hypothesis states that there is no row effect, in that the value of a data is not affected by its row factor level.\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"Test 2:\n",
|
|||
|
"\n",
|
|||
|
"$H_0:$ all $\\beta_i = 0$ \n",
|
|||
|
"\n",
|
|||
|
"$H_1:$ not all the $\\beta_i$ are equal to 0\n",
|
|||
|
"\n",
|
|||
|
"This null hypothesis states that there is no column effect, in that the value of a data is not affected by its column factor level."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "0NWDjsqbwXtn"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Error Sum of Squares:**\n",
|
|||
|
"\n",
|
|||
|
"To obtain tests for the above null hypotheses, we will apply the analysis of variance approach in which two different estimators are derived for the variance $\\sigma^2$. The first will always be a valid estimator, whereas the second will only be a valid estimator when the null hypothesis is true. In addition, the second estimator will tend to overestimate $\\sigma^2$ when the null hypothesis is not true.\n",
|
|||
|
"\n",
|
|||
|
"To obtain our first estimator of σ2, we start with the fact that:\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}-E[X_{ij}])^2}{\\sigma^2} = \\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}-\\mu-\\alpha_i-\\beta_j)^2}{\\sigma^2} \\sim \\chi_{nm}^2$\n",
|
|||
|
"\n",
|
|||
|
"If in the above expression we now replace the unknown parameters $\\mu, \\alpha_1, \\alpha_2, ... , \\alpha_m, \\beta_1, \\beta_2, ... , \\beta_n$ by their estimators $\\widehat{\\mu}, \\widehat{\\alpha_1}, \\widehat{\\alpha_2}, ... , \\widehat{\\alpha_m}, \\widehat{\\beta_1}, \\widehat{\\beta_2}, ... , \\widehat{\\beta_n}$, then it turns out that the resulting expression will remain chi-square but will lose $1$ degree of freedom for each parameter that is estimated. Therefore,\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}-\\widehat{\\mu}-\\widehat{\\alpha_i}-\\widehat{\\beta_j})^2}{\\sigma^2} = \\sum_{i=1}^m \\sum_{j=1}^n \\frac{(X_{ij}-X_{i.}-X_{.j}+X_{..})^2}{\\sigma^2} \\sim \\chi_{nm-(n+m-1)=(n-1)(m-1)}^2$\n",
|
|||
|
"\n",
|
|||
|
"$SS_e = \\sum_{i=1}^m \\sum_{j=1}^n (X_{ij}-X_{i.}-X_{.j}+X_{..})^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{E[SS_e]}{\\sigma^2} = (n-1)(m-1) \\quad \\rightarrow \\quad \\frac{E[SS_e]}{(n-1)(m-1)} = \\sigma^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{SS_e}{(n-1)(m-1)}$ is an unbiased estimator of $\\sigma^2$."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "3PUDFUAEEbwi"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Row Sum of Squares:**\n",
|
|||
|
"\n",
|
|||
|
"Suppose now that we want to test the null hypothesis that there is no row effect.\n",
|
|||
|
"\n",
|
|||
|
"To obtain a second estimator of $\\sigma^2$, consider the row averages $X_{i.},\\ i = 1, ... , m$. Note that, when $H_0$ is true, each $\\alpha_i$ is equal to 0, and so $E[X_{i.}] = \\mu+\\alpha_i = \\mu$.\n",
|
|||
|
"Because each $X_{i.}$ is the average of $n$ random variables, each having variance $\\sigma^2$, it follows that $Var(X_{i.})=\\frac{\\sigma^2}{n}$.\n",
|
|||
|
"\n",
|
|||
|
"Thus, we see that when $H_0$ is true:\n",
|
|||
|
"\n",
|
|||
|
"$\\sum_{i=1}^m \\frac{(X_{i.}-E[X_{i.}])^2}{Var(X_{i.})} = n \\sum_{i=1}^m \\frac{(X_{i.}-\\mu)^2}{\\sigma^2} \\sim \\chi_m^2$\n",
|
|||
|
"\n",
|
|||
|
"$SS_r = n \\sum_{i=1}^m (X_{i.}-X_{..})^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{E[SS_r]}{\\sigma^2} = m-1 \\quad \\rightarrow \\quad \\frac{E[SS_r]}{m-1} = \\sigma^2$\n",
|
|||
|
"\n",
|
|||
|
"$\\frac{SS_r}{m-1}$ is an estimator of $\\sigma^2$."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "O-BKFeclL9bv"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th>Estimators of $\\sigma^2$</th><th>Conditions</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>$\\frac{SS_e}{(n−1)(m−1)}$</td><td>Always true</td></tr>\n",
|
|||
|
"<tr><td>$\\frac{SS_r}{m-1}$</td><td>Only when $H_0$ is true</td></tr>\n",
|
|||
|
"<tr><td>$\\frac{SS_c}{n-1}$</td><td>Only when $H_0$ is true</td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "dA34-5xZPGyq"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"**Summary:**\n",
|
|||
|
"\n",
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th></th><th>----------------------Sum of Squares----------------------</th><th>--Degrees of Freedom--</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>Row</td><td>$SS_r = n \\sum_{i=1}^m (X_{i.}-X_{..})^2$</td><td>$m-1$</td></tr>\n",
|
|||
|
"<tr><td>Column</td><td>$SS_c = \\sum_{j=1}^n (X_{.j}-X_{..})^2$</td><td>$m-1$</td></tr>\n",
|
|||
|
"<tr><td>Error</td><td>$\\sum_{i=1}^m \\sum_{j=1}^n (X_{ij}-X_{i.}-X_{.j}+X_{..})^2$</td><td>$(n-1)(m-1)$</td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"\n",
|
|||
|
"$\\\\ $\n",
|
|||
|
"\n",
|
|||
|
"<table>\n",
|
|||
|
"<thead>\n",
|
|||
|
"<tr><th>--Null Hypothesis--</th><th>--Test Statistics--</th><th>----Significance Level $\\alpha$ Test----</th><th>------------P_Value------------</th></tr>\n",
|
|||
|
"</thead>\n",
|
|||
|
"<tbody>\n",
|
|||
|
"<tr><td>All $\\alpha_i = 0$</td><td>$\\frac{\\frac{SS_r}{m-1}}{\\frac{SS_e}{(n-1)(m-1)}}$</td><td>Reject if $F_0 \\geq F_{m-1,(n-1)(m-1),\\alpha}$</td><td>$P(F_{m-1,(n-1)(m-1)} \\geq F_0)$</td></tr>\n",
|
|||
|
"<tr><td>All $\\beta_j = 0$</td><td>$\\frac{\\frac{SS_c}{n-1}}{\\frac{SS_e}{(n-1)(m-1)}}$</td><td>Reject if $F_0 \\geq F_{n-1,(n-1)(m-1),\\alpha}$</td><td>$P(F_{n-1,(n-1)(m-1)} \\geq F_0)$</td></tr>\n",
|
|||
|
"</tbody>\n",
|
|||
|
"</table>"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "jZfYFeZSQGIh"
|
|||
|
}
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"source": [
|
|||
|
"A = [9,6,8,4,6]\n",
|
|||
|
"B = [10,4,4,6,5]\n",
|
|||
|
"C = [1,2,2,3,1]\n",
|
|||
|
"\n",
|
|||
|
"data = pd.DataFrame()\n",
|
|||
|
"data['A'] = A\n",
|
|||
|
"data['B'] = B\n",
|
|||
|
"data['C'] = C\n",
|
|||
|
"\n",
|
|||
|
"model = ols('C ~ A + B + A:B', data=data).fit()\n",
|
|||
|
"aov_table = anova_lm(model, type=2)\n",
|
|||
|
"print(aov_table.round(4))"
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"colab": {
|
|||
|
"base_uri": "https://localhost:8080/"
|
|||
|
},
|
|||
|
"id": "17Z_Tu2jidrS",
|
|||
|
"outputId": "b10d2ecb-2c94-4b1a-8ee7-21f62bf1d4fb"
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"output_type": "stream",
|
|||
|
"name": "stdout",
|
|||
|
"text": [
|
|||
|
" df sum_sq mean_sq F PR(>F)\n",
|
|||
|
"A 1.0 1.2737 1.2737 1.1071 0.4838\n",
|
|||
|
"B 1.0 0.0253 0.0253 0.0220 0.9062\n",
|
|||
|
"A:B 1.0 0.3506 0.3506 0.3047 0.6789\n",
|
|||
|
"Residual 1.0 1.1504 1.1504 NaN NaN\n"
|
|||
|
]
|
|||
|
}
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"source": [
|
|||
|
"The p-values for A and B turn out to be greater than 0.05 which implies that the means of both the factors don't possess a statistically significant effect on C."
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"id": "lq5KxFRFARch"
|
|||
|
}
|
|||
|
}
|
|||
|
]
|
|||
|
}
|