2437 lines
361 KiB
Plaintext
2437 lines
361 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 1 动手学数理统计"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.1 总体与样本"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 总体:将试验的全部可能的观察值称为**总体**,这些观察值可能是有限的,也可能是无限的,分别对应有限总体和无限总体,每一个可能观察值称为**个体**。\n",
|
||
"> 由于总体的每一个个体都是随机试验的一个观察值,因此它是某一随机变量$X$的值,一个总体便对应一个随机变量$X$,对随机变量$X$的研究就是对总体的研究,随机变量$X$和总体具有相同的分布函数和数字特征。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 样本:设$X$是具有分布函数$F$的随机变量,若$X_{1}, X_{2}, \\cdots, X_{n}$是具有同一分布函数$F$的、相互独立的随机变量,则称$X_{1}, X_{2}, \\cdots, X_{n}$为从分布函数$F$(或总体$F$、或总体$X$)得到的容量为$n$的简单随机样本,简称**样本**,他们的观察值$x_{1}, x_{2}, \\cdots, x_{n}$称为**样本值**,又称为$X$的$n$个独立的观察值。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 由样本的定义(样本中$n$个随机变量相互独立)得:\n",
|
||
"> - 1. 样本($X_{1}, X_{2}, \\cdots, X_{n}$)的分布函数为$$F^{*}(x_{1}, x_{2}, \\cdots, x_{n})=\\prod_{i=1}^{n}F(x_{i})$$\n",
|
||
"> - 2. 样本($X_{1}, X_{2}, \\cdots, X_{n}$)的概率密度为$$f^{*}(x_{1}, x_{2}, \\cdots, x_{n})=\\prod_{i=1}^{n}f(x_{i})$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.2 经验分布函数、直方图与箱线图"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 经验分布函数:设 $x_{1}, x_{2}, \\cdots, x_{n}$ 是取自总体分布函数为 $F(x)$ 的样本,若将样本观测值由小到大进行排列,记为 $x_{(1)}, x_{(2)}, \\cdots, x_{(n)}$ , 则 $x_{(1)}, x_{(2)}, \\cdots, x_{(n)}$ 称为有序样本,用有序样本 定义如下函数\n",
|
||
"$$\n",
|
||
"F_{n}(x)=\\left\\{\\begin{array}{ll}\n",
|
||
"0, & \\text { 当 } x<x_{(1)}, \\\\\n",
|
||
"k / n, & \\text { 当 } x_{(k)} \\leqslant x<x_{(k+1)}, k=1,2, \\cdots, n-1, \\\\\n",
|
||
"1, & \\text { 当 } x \\geqslant x_{(n)},\n",
|
||
"\\end{array}\\right.\n",
|
||
"$$\n",
|
||
"则 $F_{n}(x)$ 是一非减右连续函数,且满足\n",
|
||
"$$\n",
|
||
"F_{n}(-\\infty)=0 \\text { 和 } F_{n}(\\infty)=1 .\n",
|
||
"$$\n",
|
||
"\n",
|
||
"称 $F_{n}(x)$ 为该样本的经验分布函数。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 经验分布函数$F_{n}(x)$是总体分布函数$F(x)$的良好的近似。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子: 随机观察总体 $X$ , 得到一个容量为 10 的样本:\n",
|
||
"$$\n",
|
||
"3.2, \\quad 2.5, \\quad-2, \\quad 2.5, \\quad 0, \\quad 3, \\quad 2, \\quad 2.5,2, \\quad 4\n",
|
||
"$$\n",
|
||
"求 $\\mathrm{X}$ 经验分布函数。\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解: \n",
|
||
"1. 排序 $$-2, \\quad 0, \\quad 2, \\quad 2, \\quad 2.5, \\quad 2.5, \\quad 2.5, \\quad 3, \\quad 3.2, \\quad 4 $$\n",
|
||
"2. 利用公式计算:\n",
|
||
"$$\n",
|
||
"F_{n}(x)=\\left\\{\\begin{array}{ll}\n",
|
||
"0, & \\text { 当 } x<x_{(1)}, \\\\\n",
|
||
"k / n, & \\text { 当 } x_{(k)} \\leqslant x<x_{(k+1)}, k=1,2, \\cdots, n-1, \\\\\n",
|
||
"1, & \\text { 当 } x \\geqslant x_{(n)},\n",
|
||
"\\end{array}\\right.\n",
|
||
"$$\n",
|
||
"3. 得:\n",
|
||
"$$\n",
|
||
"F_{10}(x)=\\left\\{\\begin{array}{cc}\n",
|
||
"0, & x<-2 \\\\\n",
|
||
"1 / 10, & -2 \\leq x<0 \\\\\n",
|
||
"2 / 10, & 0 \\leq x<2 \\\\\n",
|
||
"4 / 10, & 2 \\leq x<2.5 \\\\\n",
|
||
"7 / 10, & 2.5 \\leq x<3 \\\\\n",
|
||
"8 / 10, & 3 \\leq x<3.2 \\\\\n",
|
||
"9 / 10, & 3.2 \\leq x<4 \\\\\n",
|
||
"1, & x \\geq 4\n",
|
||
"\\end{array}\\right.\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 直方图:为研究总体分布的性质,通过独立重复试验得到其样本的观察值$x_{1}, x_{2}, \\cdots, x_{n}$,将这些数据进行整理,并以表格或图形的方式展现出来,从而推测出总体的分布。直方图可以反映样本的概率密度,由于样本和其总体服从同一分布,且具有相同的数字特征,则样本的概率密度可看作是总体的概率密度。直方图包括**频数直方图**和**频率直方图**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 直方图的绘制步骤:假设一样本包含$n$个样本值$(x_{1}, x{2}, \\cdots, x_{n})$\n",
|
||
"> - 1. 选取区间$[a, b]$, $a$要小于样本中最小的样本值,$b$要大于样本中最大的样本值;\n",
|
||
"> - 2. 将选取的区间分为$k$个小区间,小区间的长度记为$\\bigtriangleup , \\bigtriangleup = \\frac{b-a}{k}$;💡tips:当$n< 50$时,$k$取$5 \\sim 6$, 当$n$较大时,$k$取$10 \\sim 20$,若$k$取太大,则会出现小区间内频数为$0$的情况(应尽量避免);\n",
|
||
"> - 3. 统计小区间$([a+i\\bigtriangleup , a+(i+1)\\bigtriangleup ], i = 0, 1, \\cdots,k-1)$内样本中个体出现的次数$\\{f_{j}, j = 1, 2, \\cdots, k-1 \\}$,或频率$\\{ f_{j}/n, j = 1, 2, \\cdots, k-1 \\}$;\n",
|
||
"> - 4. 将选取的区间$[a, b]$作为横轴,样本中个体出现的次数$\\{ f_{j}, j = 1, 2, \\cdots, k-1 \\}$或频率$\\{ f_{j}/n, j = 1, 2, \\cdots, k-1 \\}$作为纵轴;\n",
|
||
"> - 5. 画出每个小区间及其对应的样本中个体次数(频数)的柱状图则得到直方图。\n",
|
||
"\n",
|
||
"> 将样本中个体出现的次数$\\{ f_{j}, j = 1, 2, \\cdots, k-1\\}$作为纵轴得到的直方图为频数直方图,将样本中个体出现的频率$\\{f_{j}/n, j = 1, 2, \\cdots, k-1\\}$作为纵轴得到的直方图为频率直方图。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:画出下列样本的直方图\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
" &138, \\quad 142, \\quad 148, \\quad 145, \\quad 140, \\quad 141 \\\\\n",
|
||
" &138, \\quad 139, \\quad 144, \\quad 138, \\quad 139, \\quad 136 \\\\\n",
|
||
" &138, \\quad 137, \\quad 137, \\quad 133, \\quad 140, \\quad 130\\\\\n",
|
||
" &145, \\quad 141, \\quad 135, \\quad 131, \\quad 136, \\quad 131\\\\\n",
|
||
" &134, \\quad 132, \\quad 135, \\quad 134, \\quad 132, \\quad 134\\\\\n",
|
||
" &130, \\quad 135, \\quad 135, \\quad 134, \\quad 136, \\quad 131\\\\\n",
|
||
" &139, \\quad 140, \\quad 141, \\quad 138, \\quad 137, \\quad 137\\\\\n",
|
||
" &131, \\quad 127, \\quad 136, \\quad 128, \\quad 138, \\quad 132\\\\\n",
|
||
" &134, \\quad 136, \\quad 137, \\quad 133, \\quad 121, \\quad 129\\\\\n",
|
||
" &137, \\quad 132, \\quad 131, \\quad 139, \\quad 136, \\quad 135\\\\\n",
|
||
" \\end{aligned}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(求解题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAESCAYAAAD+GW7gAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQqUlEQVR4nO3df7BcZX3H8fcnCSAkqBFuo1hibIU6KuLUqJBGjRZKJbQV6ogzWjtiG0dHR0utglItqB3wBzrVihPHsdRREadK1aioEyLRIDXRau0ILXWCiujEagi0KDb59o89ketlk3u53N29e5/3a+bOPXv23H2+D0/ms88+55wlVYUkaeFbNOoCJEnDYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwNfYS3JzksPSc9iU5w5JsijJS5PcL8nmJCck+ask90/yniRPmUEbRyRZPMN6FiVZN7veSIOzZNQFSLOV5FXAp4GfA3uBlcCHk+wFHgd8HVgMvILev/ULgP8DlgLPqaq3JPld4M19XvvzwP2BX3S7jgPuTPL97vEhwK1V9cw+pR0BXJHkD6vqXw5Q+9e6mu46SBePraoVB3leulcMfI2zE4AvAfuALcC2qlqT5DeBt1TVWQBJ7gc8HChgObAOuLab2S+rqu90xx1WVT8HqKpTJzeU5LPA66vq+qlFJFkELKmqu7q/vSPJucCRU45bAuyrqn303kjOqqqdSdYAfw2cXt2dkN2xO+/jfx/pVxj4Gmf7uh+qai1AkvOB9cDhSa4CHgycBjwFeCKwGvg28BPgxcCiJNvpfTq4Pcnjqur27rWeD5zbtXUc8A9Jft49/sequrTbfgLwgST76M3ulwO3dK8xud7FwFnAv9F9ckjyAOC9wJ3AV5M8Engt8Pf0Po1Ic8bA11hK8nvAycCjgMVJttFbHvkm8EZga1X9T5Ibgf8FvgM8FPgZsBv4KbAGeFdVvSHJFcDb9od954HAldxzyef5wPH7H3Sz/uO7uk4BNlTVs2fQjcXAJ+m9+WwFvgL8GfBOPL+mATDwNZaq6nNJTqc3w38vcDq9dfy3dodcl+S59JZxfh34MfAq4CLg9cAz6a3DP747fiXwX1Oa2Qds6F57sl+j90bQz1LgtCQ3TNl/U1WdMWXfXuBlXW1foneuYX1V7euWiaQ5ZeBrnF0AXEUv0B9Jb2b8te65AvafYH0Q8BfAhfRmzp+kt6SyBtiS5IHAkVX1kz5tfB24esq+kw9S00p6nxQu2r8jydqu7amOBf6m2/4Q8GXgRUk2Ai89SBvSrDiL0Fjq1r5PBD5BL9xfA7yL3qwdejPt3cCiqtoBrO0eH09viebGqroT+Bjwz8DmAzT138BNU35+dJDSTqF3AnmyBwO39jn2e/SWh04BbgT+FriW3qeOqa8h3WfO8DWWquq2JI/vlj+WAHur6sokdwLLgO/SWyO/olseOQbYAXyW3qeB13Uv9THgYuANB2jq9+ld4jnZUcAHpx6Y5InAw+itx0+2grs/bfzy8K4ftyR5NfDHwD8BX6yqPV2fgjSHnOFrnD0xyReA64EXdvu+AZxPb8b/ceBH3WWQNwEfAQ6jdxXMo5Oc1e17JbAxyZOgd0lkkv2fFN5VVasn/9Atz3THLem2H0HvTeDPJ11auSzJSuDMrv3J9r8+VXUJcAZwB/Bbk45xQqY5Ff8HKBpHSR4DvJteIF/Z7XsevTXxDVW1OcmR9AJ9E/As4F+BS6rqh90J3RcAL6mq/+jujH0NvZO5z6B3YvdgN0VBL7TfSu/mry8CL62qLZNqfAe9sN8MvLyq9kx67mvA/aZp4yHeeKW5ZOBrweiWbpZNCdZD6d3sNNBr2pMcuv/GqxkefxSwu6r2DrAs6VcY+JLUCNfwJakRBr4kNWLeXQVw9NFH16pVq0ZdhiSNlR07dvy4qiYOdsy8C/xVq1axffv2UZchSWMlyc3THeOSjiQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNWLe3WkrzWerzts06hL62nnx+lGXoDHgDF+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRsx54Cd5QJLPJPl8ko8nOTTJ+5JsS3LBXLcnSZqZQczwnwtcWlWnAj8EngMsrqo1wDFJjhtAm5KkaSyZ6xesqndPejgBPA94R/d4M7AW+M+5bleSdHADW8NPcjKwHPgecEu3ew+wos+xG5JsT7J9165dgypJkpo2kMBP8iDgncA5wB3A4d1Ty/q1WVUbq2p1Va2emJgYREmS1LxBnLQ9FLgSOL+qbgZ20FvGATgR2DnXbUqSpjeIGf4LgccDr02yBQjwJ0kuBZ4NbBpAm5KkaQzipO1lwGWT9yX5BHAq8Oaqum2u25QkTW/OA7+fqvopvWUeSdKIeKetJDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJasSSURcg9bPqvE2jLkFacJzhS1IjDHxJaoSBL0mNMPAlqREDCfwkK5Js7bYfmuT7SbZ0PxODaFOSdHBzfpVOkuXA5cDSbteTgDdV1WVz3ZYkaeYGMcPfC5wN7OkenwS8JMl1Sd4+gPYkSTMw54FfVXuq6rZJuz4DrKmqk4Hjkzx2rtuUJE1vGCdtt1XV7d32DcBxUw9IsiHJ9iTbd+3aNYSSJKk9wwj8q5M8JMkRwGnAt6YeUFUbq2p1Va2emPCcriQNwjC+WuFC4BrgLuA9VXXjENqUJE0xsMCvqnXd72uARw6qHUnSzHjjlSQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjlhzsySSXVtW5Sa4Bav9uoKrq6QOvTpI0Zw4a+FV1bvf7acMpR5I0KAdd0knyR93vo4ZTjiRpUKZbw3959/ujgy5EkjRYB13SASrJRcDDk7zuV56oumhwZUmS5tp0gX8mcCLwB8AWeidsJUljaLqTtnuArUneX1XXDqkmSdIAzOg6/Kr6u0EXIkkaLG+8kqRGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRgwk8JOsSLK12z4kyaeSbEtyziDakyRNb84DP8ly4HJgabfrZcD2qloDnJHkyLluU5I0vUHM8PcCZwN7usfrgCu77W3A6gG0KUmaxnRfj3yvdd+wSfLLb1JeCtzSbe8BVkz9myQbgA0AK1eunOuSpAVv1XmbRl1CXzsvXj/qEjTJME7a3gEc3m0v69dmVW2sqtVVtXpiYmIIJUlSe4YR+DuAtd32icDOIbQpSZpizpd0+rgc+HSSJwOPAq4fQpuSpCkGNsOvqnXd75uBU4EvA6dU1d5BtSlJOrBhzPCpqh9w95U6kqQR8E5bSWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQMP/CRLknw3yZbu54RBtylJuqclQ2jjscCHq+rVQ2hLknQAw1jSOQk4M8mXknwwyTDeZCRJUwwj8L8KPLWq1gK7gdOH0KYkaYphBP43q+rWbvsG4LipByTZkGR7ku27du0aQkmS1J5hBP4HkpyYZDFwJvCNqQdU1caqWl1VqycmJoZQkiS1Zxjr6RcBHwICfKKqvjCENiVJUww88KvqW/Su1JEkjZA3XklSIwx8SWqEgS9JjTDwJakRBr4kNcKvOWjcqvM2jboELWDz9d/XzovXj7qEkXCGL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqRFLRl3AXFp13qZRlyBpDMzXrNh58fqBvr4zfElqhIEvSY0w8CWpEQa+JDXCwJekRgwt8JO8L8m2JBcMq01J0t2GEvhJzgIWV9Ua4Jgkxw2jXUnS3YY1w18HXNltbwbWDqldSVJnWDdeLQVu6bb3AI+Y/GSSDcCG7uEdSW6cZTtHAz+e5d/OVwutTwutP7Dw+rTQ+gNj0qdcMuND+/XnYdP90bAC/w7g8G57GVM+WVTVRmDjfW0kyfaqWn1fX2c+WWh9Wmj9gYXXp4XWH1h4fZptf4a1pLODu5dxTgR2DqldSVJnWDP8q4CtSY4BngGcNKR2JUmdoczwq2oPvRO3XwGeVlW3Daip+7wsNA8ttD4ttP7AwuvTQusPLLw+zao/qaq5LkSSNA95p62GJslDkpyS5MhR1zIXFlp/tPCNbeAnWZFka7e9MsmWJJuTbEzPIUk+1d3de86o653ODPrz0CTf7/ZvSTIx6pqnM6VPjwU+AvwO8MUkh475GPXrz1iP0aR9j0nyuW57bMdo0r7J/RnrMTpQ/TP9JoOxDPwky4HL6V3fD/Ai4MVV9XTgWOAE4GXA9u7u3jPm8yxshv15EvCmqlrX/ewaTbUz06dPjwJeUFUXAt8BHs54j1G//oz7GJEkwKXAod2ucR6jfv0Z9zG6R/335psMxjLwgb3A2fRu4qKqXltV3+6eO4reDQnruPvu3m3AfL4Gdyb9OQl4SZLrkrx9NGXeK1P7dAVwc5L1wHLgJsZ7jPr1Z6zHqPMC4JpJj9cxpmPUmdqfcR+jfvWvY4bfZDCWgV9Ve/pd6ZPkbODfq+oH3PPu3hVDLPFemWF/PgOsqaqTgeO7JYV56wB9WgY8G/gJUIz/GE3tz1iPUZKjgOcBb5102NiO0QH6M9ZjRP/6ZzxGYxn4/ST5DeCVwCu6XQe9u3e+69OfbVV1e7d9AzB2X0BXVbur6k+BnwFPYMzHqE9/xn2MLgbOr6pfTNo3zmPUrz/jPkb96p/xGI3T4B1Qt871YeCcSe+GY3t37wH6c3V3VcgRwGnAt0ZW4CwkuSzJU7qHDwR2M95j1K8/Yz1GwFOBS5JsAR6X5I2M8RjRvz/jPkb96p/xGA3rTttBOw9YCbyzd46G19M70fHpJE+md4Lt+tGVd6/168+F9NYi7wLeU1Wz/YK5UXkz8IEkBXyuqm5MMs5j1K8/Yz1GVXX8/u0kW6rqgiQPY0zH6AD9eRpjPEb0yYEktzLDbzJY0Ddedf8B1gJXD/DuXt0HjtH85xjNf92qwKnAtVX1wwMet5ADX5J0twWxhi9Jmp6BL0mNMPAlqREGviQ1wsCXpEYY+NI0kqxN8tEki7rvMDl21DVJs+FlmdIMJHk/cCdwc1VdMup6pNkw8KUZSLIauA6YqKrdIy5HmhUDX5qBJFcB3wSWVtVfjrgcaVZcw5emkeRZwA+q6nXAo5P89qhrkmbDGb4kNcIZviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9Jjfh/5GHjZx/G4qIAAAAASUVORK5CYII=",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 1. 按照直方图的步骤一步一步画图\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"# 图像嵌入\n",
|
||
"%matplotlib inline \n",
|
||
"plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']\n",
|
||
"plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号\n",
|
||
"import numpy as np\n",
|
||
"import warnings\n",
|
||
"warnings.filterwarnings(\"ignore\")\n",
|
||
"\n",
|
||
"# 样本值\n",
|
||
"x = [138, 142, 148, 145, 140, 141,\n",
|
||
" 138, 139, 144, 138, 139, 136,\n",
|
||
" 138, 137, 137, 133, 140, 130,\n",
|
||
" 145, 141, 135, 131, 136, 131,\n",
|
||
" 134, 132, 135, 134, 132, 134,\n",
|
||
" 130, 135, 135, 134, 136, 131,\n",
|
||
" 139, 140, 141, 138, 137, 137,\n",
|
||
" 131, 127, 136, 128, 138, 132,\n",
|
||
" 134, 136, 137, 133, 121, 129,\n",
|
||
" 137, 132, 131, 139, 136, 135]\n",
|
||
"\n",
|
||
"# 1. 选取区间 [a, b]\n",
|
||
"a = np.min(x) - 1\n",
|
||
"b = np.max(x) + 1\n",
|
||
"\n",
|
||
"# 2. 分区间\n",
|
||
"n = len(x)\n",
|
||
"if n < 50:\n",
|
||
" k = 6\n",
|
||
"elif n < 100:\n",
|
||
" k = 8\n",
|
||
"else:\n",
|
||
" k =15\n",
|
||
"\n",
|
||
"delta = (b - a) / k\n",
|
||
"\n",
|
||
"# 3. 统计\n",
|
||
"region_ab = np.zeros(k) # 存储区间[a, b]的每个小区间\n",
|
||
"fi = np.zeros(k) # 存储每个小区间样本值的频数\n",
|
||
"for i in range(k):\n",
|
||
" region_ab[i] = a+i*delta + (delta / 2)\n",
|
||
"\n",
|
||
"for idx, cen in enumerate(region_ab):\n",
|
||
" for data in x:\n",
|
||
" if data >= (cen - delta/2) and data <= (cen + delta/2):\n",
|
||
" fi[idx] += 1\n",
|
||
" else:\n",
|
||
" continue\n",
|
||
"\n",
|
||
"fi_n = fi / n # 计算频率\n",
|
||
"# 4. 画图\n",
|
||
"\n",
|
||
"# plt.figure(figsize=(10, 8))\n",
|
||
"plt.bar(region_ab, fi, width=delta) # 频数直方图\n",
|
||
"plt.title('频数直方图')\n",
|
||
"plt.xlabel('x')\n",
|
||
"plt.ylabel('fi')\n",
|
||
"plt.show()\n",
|
||
"# plt.figure(figsize=(10, 8))\n",
|
||
"plt.bar(region_ab, fi_n, width=delta) # 频率直方图\n",
|
||
"plt.title('频率直方图')\n",
|
||
"plt.xlabel('x')\n",
|
||
"plt.ylabel('fi/n')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAESCAYAAAD+GW7gAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQqklEQVR4nO3df7BcZX3H8feHBBAS1Ai3USwRW6GOijg1KqRRg4VSEVqhjjijtSO2cXRMtdRqUaoFtSNq0WmsOHEcSx0VcapU/IU6IZIYpCZarR2hpQ6oiE6shkCLYpNv/9gTc71s7r2Eu7t37/N+zdy5Z8+eu8/34cl89tnnnLOkqpAkLXwHjboASdJwGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8DX2ktya5ND0HDrluYOTHJTk5UkekGRjkhOS/EWSByZ5T5KnzaKNw5MsmmU9ByVZc2C9kQZn8agLkA5UklcDnwZ+BuwGVgAfTrIbeALwNWAR8Ep6/9YvBP4PWAI8r6reluS3gbf2ee3PAw8Eft7tOg64O8n3uscHA7dX1bP7lHY4cEWS36uqf9lP7V/tarpnmi4eU1XLp3leuk8MfI2zE4AtwB5gE7C1qlYl+XXgbVV1DkCSBwCPBApYBqwBrutm9kur6tvdcYdW1c8Aquq0yQ0l+Szwhqq6YWoRSQ4CFlfVPd3f3pXkfOCIKcctBvZU1R56byTnVNUtSVYBfwWcUd2dkN2xt9zP/z7SLzHwNc72dD9U1WqAJBcAzwIOS3IV8FDgdOBpwJOBlcC3gB8DLwUOSrKN3qeDO5M8oaru7F7rhcD5XVvHAf+Q5Gfd43+sqku77ScBH0iyh97sfhlwW/cak+tdBJwD/BvdJ4ckDwLeC9wNfCXJo4HXAX9P79OINGcMfI2lJL8DnAw8BliUZCu95ZFvAG8CNlfV/yS5Cfhf4NvAw4GfAjuBnwCrgHdV1RuTXAH87d6w7zwYuJJ7L/m8EDh+74Nu1n98V9epwNqqeu4surEIuJrem89m4MvAHwPr8fyaBsDA11iqqs8lOYPeDP+9wBn01vHf3h1yfZLn01vG+VXgR8CrgYuBNwDPprcO/8Tu+BXAf01pZg+wtnvtyX6F3htBP0uA05PcOGX/zVV15pR9u4F1XW1b6J1reFZV7emWiaQ5ZeBrnF0IXEUv0B9Nb2b81e65AvaeYH0I8GfARfRmzlfTW1JZBWxK8mDgiKr6cZ82vgZcM2XfydPUtILeJ4WL9+5Isrpre6pjgL/utj8EfAl4SZINwMunaUM6IM4iNJa6te8TgU/QC/fXAu+iN2uH3kx7J3BQVW0HVnePj6e3RHNTVd0NfAz4Z2Djfpr6b+DmKT8/nKa0U+mdQJ7socDtfY79Lr3loVOBm4C/Aa6j96lj6mtI95szfI2lqrojyRO75Y/FwO6qujLJ3cBS4Dv01siv6JZHjga2A5+l92ng9d1LfQx4C/DG/TT1u/Qu8ZzsSOCDUw9M8mTgEfTW4ydbzr5PG784vOvHbUleA/wB8E/AF6tqV9enIM0hZ/gaZ09O8gXgBuDF3b6vAxfQm/F/HPhhdxnkzcBHgEPpXQXz2CTndPteBWxI8hToXRKZZO8nhXdV1crJP3TLM91xi7vtR9F7E/iTSZdWLk2yAji7a3+yva9PVV0CnAncBfzGpGOckGlOxf8BisZRkscB76YXyFd2+15Ab018bVVtTHIEvUD/FPAc4F+BS6rqB90J3RcBL6uq/+jujH0tvZO5z6R3Yne6m6KgF9pvp3fz1xeBl1fVpkk1vpNe2G8EXlFVuyY991XgATO08TBvvNJcMvC1YHRLN0unBOsh9G52Gug17UkO2Xvj1SyPPxLYWVW7B1iW9EsMfElqhGv4ktQIA1+SGjHvrgI46qij6thjjx11GZI0VrZv3/6jqpqY7ph5F/jHHnss27ZtG3UZkjRWktw60zEu6UhSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiPm3Z220nx21votoy6hr6vXrR51CRoDzvAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJasScB36SByX5TJLPJ/l4kkOSvC/J1iQXznV7kqTZGcQM//nApVV1GvAD4HnAoqpaBRyd5LgBtClJmsHiuX7Bqnr3pIcTwAuAd3aPNwKrgf+c63YlSdMb2Bp+kpOBZcB3gdu63buA5X2OXZtkW5JtO3bsGFRJktS0gQR+kocA64HzgLuAw7qnlvZrs6o2VNXKqlo5MTExiJIkqXmDOGl7CHAlcEFV3Qpsp7eMA3AicMtctylJmtkgZvgvBp4IvC7JJiDAHya5FHgu8KkBtClJmsEgTtpeBlw2eV+STwCnAW+tqjvmuk1J0szmPPD7qaqf0FvmkSSNiHfaSlIjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYsHnUBUj9nrd8y6hKkBccZviQ1wsCXpEYY+JLUCANfkhoxkMBPsjzJ5m774Um+l2RT9zMxiDYlSdOb86t0kiwDLgeWdLueAry5qi6b67YkSbM3iBn+buBcYFf3+CTgZUmuT/KOAbQnSZqFOQ/8qtpVVXdM2vUZYFVVnQwcn+Txc92mJGlmwzhpu7Wq7uy2bwSOm3pAkrVJtiXZtmPHjiGUJEntGUbgX5PkYUkOB04Hvjn1gKraUFUrq2rlxITndCVpEIbx1QoXAdcC9wDvqaqbhtCmJGmKgQV+Va3pfl8LPHpQ7UiSZscbrySpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IasXi6J5NcWlXnJ7kWqL27gaqqZwy8OknSnJk28Kvq/O73KcMpR5I0KNMu6ST5/e73kcMpR5I0KDOt4b+i+/3RQRciSRqsaZd0gEpyMfDIJK//pSeqLh5cWZKkuTZT4J8NnAicBWyid8JWkjSGZjppuwvYnOT9VXXdkGqSJA3ArK7Dr6q/G3QhkqTB8sYrSWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqxEACP8nyJJu77YOTfDLJ1iTnDaI9SdLM5jzwkywDLgeWdLvWAduqahVwZpIj5rpNSdLMBjHD3w2cC+zqHq8Bruy2twIrB9CmJGkGM3098n3WfcMmyS++SXkJcFu3vQtYPvVvkqwF1gKsWLFirkuSFryz1m8ZdQl9Xb1u9ahL0CTDOGl7F3BYt720X5tVtaGqVlbVyomJiSGUJEntGUbgbwf2vs2fCNwyhDYlSVPM+ZJOH5cDn07yVOAxwA1DaFOSNMXAZvhVtab7fStwGvAl4NSq2j2oNiVJ+zeMGT5V9X32XakjSRoB77SVpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaMfDAT7I4yXeSbOp+Thh0m5Kke1s8hDYeD3y4ql4zhLYkSfsxjCWdk4Czk2xJ8sEkw3iTkSRNMYzA/wrw9KpaDewEzhhCm5KkKYYR+N+oqtu77RuB46YekGRtkm1Jtu3YsWMIJUlSe4YR+B9IcmKSRcDZwNenHlBVG6pqZVWtnJiYGEJJktSeYaynXwx8CAjwiar6whDalCRNMfDAr6pv0rtSR5I0Qt54JUmNMPAlqREGviQ1wsCXpEYY+JLUCL/moHFnrd8y6hK0gM3Xf19Xr1s96hJGwhm+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRiwedQFz6az1W0ZdgqQxMF+z4up1qwf6+s7wJakRBr4kNcLAl6RGGPiS1AgDX5IaMbTAT/K+JFuTXDisNiVJ+wwl8JOcAyyqqlXA0UmOG0a7kqR9hjXDXwNc2W1vBAZ7sakk6V6GdePVEuC2bnsX8KjJTyZZC6ztHt6V5KYDbOco4EcH+Lfz1ULr00LrDyy8Pi20/sCY9Cl/OutD+/XnETP90bAC/y7gsG57KVM+WVTVBmDD/W0kybaqWnl/X2c+WWh9Wmj9gYXXp4XWH1h4fTrQ/gxrSWc7+5ZxTgRuGVK7kqTOsGb4VwGbkxwNPBM4aUjtSpI6Q5nhV9UueiduvwycUlV3DKip+70sNA8ttD4ttP7AwuvTQusPLLw+HVB/UlVzXYgkaR7yTlsNTZKHJTk1yRGjrmUuLLT+aOEb28BPsjzJ5m57RZJNSTYm2ZCeg5N8sru797xR1zuTWfTn4Um+1+3flGRi1DXPZEqfHg98BPgt4ItJDhnzMerXn7Eeo0n7Hpfkc9322I7RpH2T+zPWY7S/+mf7TQZjGfhJlgGX07u+H+AlwEur6hnAMcAJwDpgW3d375nzeRY2y/48BXhzVa3pfnaMptrZ6dOnxwAvqqqLgG8Dj2S8x6hff8Z9jEgS4FLgkG7XOI9Rv/6M+xjdq/778k0GYxn4wG7gXHo3cVFVr6uqb3XPHUnvhoQ17Lu7dyswn6/BnU1/TgJeluT6JO8YTZn3ydQ+XQHcmuRZwDLgZsZ7jPr1Z6zHqPMi4NpJj9cwpmPUmdqfcR+jfvWvYZbfZDCWgV9Vu/pd6ZPkXODfq+r73Pvu3uVDLPE+mWV/PgOsqqqTgeO7JYV5az99Wgo8F/gxUIz/GE3tz1iPUZIjgRcAb5902NiO0X76M9ZjRP/6Zz1GYxn4/ST5NeBVwCu7XdPe3Tvf9enP1qq6s9u+ERi7L6Crqp1V9UfAT4EnMeZj1Kc/4z5GbwEuqKqfT9o3zmPUrz/jPkb96p/1GI3T4O1Xt871YeC8Se+GY3t37376c013VcjhwOnAN0dW4AFIclmSp3UPHwzsZLzHqF9/xnqMgKcDlyTZBDwhyZsY4zGif3/GfYz61T/rMRrWnbaD9pfACmB97xwNb6B3ouPTSZ5K7wTbDaMr7z7r15+L6K1F3gO8p6oO9AvmRuWtwAeSFPC5qropyTiPUb/+jPUYVdXxe7eTbKqqC5M8gjEdo/305xTGeIzokwNJbmeW32SwoG+86v4DrAauGeDdvbofHKP5zzGa/7pVgdOA66rqB/s9biEHviRpnwWxhi9JmpmBL0mNMPAlqREGviQ1wsCXpEYY+NIMkqxO8tEkB3XfYXLMqGuSDoSXZUqzkOT9wN3ArVV1yajrkQ6EgS/NQpKVwPXARFXtHHE50gEx8KVZSHIV8A1gSVX9+YjLkQ6Ia/jSDJI8B/h+Vb0eeGyS3xx1TdKBcIYvSY1whi9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiP+H9hE5FGkcqAJAAAAAElFTkSuQmCC",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAESCAYAAAAWtRmOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAATaUlEQVR4nO3df7Dld13f8eeL3cRJdiPE5Lp0NRAoGxkQg+0Vw7rRDU3UAKllGyUDSEtKV6sTpNDaUFcQZmVCGtKOi2IXYuRHDazTGidORJwu0Y2boHdlorWGmtGNEsj0gkmuASqYvPvH+S73eD/n3nvu5pw999w8HzN39nvOeX+/5/3Z7+593c/3x7mpKiRJ6veUSTcgSVp/DAdJUsNwkCQ1DAdJUsNwkCQ1DAdJUsNw0JNGkvuTfF16vm7Ja6claf4/JHlbktes4T3OTLJpyNqnJNk97LalUyne56CNLslPArcD/wN4HvBNwC3AY8ALgU8Bm4A3Aj8K7AAe71Y/DzgT+HT3eBPw+ap6Rbft3wa+Hvhq9/oO4MvAZ7rHpwGfq6p/NqCvrcB9wD+tqt9fpvc/BDYDX1lhiOdV1bYVXpfWbPOkG5BOgRcAd9L7hn8HcLSqdib5h8B/qqo9JwqTfAH4UlU9mOQC4P3AZcBTqurLSZ5PL1QAqKrL+t8oyceAt1XVJ5c20c1MNlfVV7p1H03yJuCsJXWbgcer6nF6obOnqo4n2Qn8NPDS6n6q62qPP4G/G2kgw0FPBo93X1TVLoAkbwFeBpyR5Fbg6cB3dV9vTrIH+C/AfwT2AruS7AMOAfuBe09sPMlrgTd1D3cAv5zkb7vHH6yqG7vl7wA+lORxerORs4EHum3097sJ2AP8Md2MJMlTgffRm5X8QZLnAj8F/Dzwd0/kL0caxHDQhpbke4EX0zuctCnJUXqHaP6I3jf5I1X1xSSfrqrH6H3z/j/AxcB24N3APPBK4LnA3qr6vSVv8zR6oXH9kudfC1xw4kE3m7ig6+vSbls/NMQwNgG3AX8NHAHuBl4PHMDzhhoTw0EbWlV9PMlL6c0c3ge8lN5hoRu6kruSvBo4cZjmLHpBcIjezOHFwM8AnwD+LXBdkh+rqj/ue5vH6c0uXrrk7b+x284gW4DvS3Lvkufvq6qXL3nuMeAa4PP0Do+9EXhZVT0+6CS6NAr+w9KTwT7g24BvpvfT/+G+14ru5HF3ldER4E/pHR56M70T1h8A/gJ4BfDv6f3UvtSngA8v+bprhZ6eAby7qp574qvb7hkDas8DbgQ+CPwKcCXwI0n+DPgnKw9dOjnOHLShdcfqLwSuBq6jdw7hPcD3dCVbgIfp/aAU4JKqeqh77UNJ9gMPAv+V3knix5IcS7K5qvqP9X+B3pVH/Z69QmuX0jtk1e/pwOcG1P4VvUNUnwV+GHgncJDebOVvB9RLT5jhoA2tqh5J8o+7QzCbgceq6lCSLwNbgb+kd0z/I/ROUF/bdzIZ4Fn0zlFcCV87cbyJ3ongj/TVfT+9WUa/c4D/trSnJC8CnklvltJvG4uXwH6tvBvHA0n+A/DPgf8O/E5VLXRjCtKIGQ56MnhRNwP4JIuHhO4Bfg348e7Pj1XVrwO/3r/iiZlDVb1n6UaXfGN+T1Vdt+T1fwk8p6ujqv4uyXPoBcar+i5H3Qp8A73DVv2BA737JOjWf1eSm4FXA98C/En3kv+PNXL+o9KGluRb6V1F9J6qOtQ99xp6J5n3VtXdSV4FfDTJl6rq5iWbOI2+b9BL/ADwNrob1JJcuUzdFcANSW4HbgX+dVX9Qd/r++kFw2HaE9ibgduTNDfBJXlrX400Ut4hrSed7gqfrVW10Pfc6fTOKYz1noEkp5+4CW7I+nOAh7vLbKVTxnCQJDW8lFWS1DAcJEmNqT2Rde6559b5558/6TYkaaocO3bs81U1s1rd1IbD+eefz9zc3KTbkKSpkuT+Yeo8rCRJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJakztHdLSenbFgTsn3cJAt12za9ItaEo4c5AkNQwHSVLDcJAkNQwHSVLDcJAkNQwHSVLDcJAkNcYSDkluSnI0yb4VarYlOdL3+LQkv9Gtd/U4+pIkDWfk4ZBkD7CpqnYC25PsGFBzNvABYEvf09cAc916L09y1qh7kyQNZxwzh93AoW75MDDolszHgFcCC8usdxSYHUNvkqQhjCMctgAPdMsLwLalBVW1UFWPrHW9JHuTzCWZm5+fH2HLkqR+4wiHR4EzuuWta3iPVderqoNVNVtVszMzM0+4UUnSYOMIh2MsHkq6EDg+5vUkSSM2jk9lvRU4kmQ7cDlwVZL9VbXslUudDwC3J7kYeB7wyTH0JkkawshnDlW1QO/k8t3AJVV1z3LBUFW7+5bvBy4Dfg+4tKoeG3VvkqThjOX3OVTVQyxeebSW9T57MutJkkbLO6QlSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUMBwkSQ3DQZLUGEs4JLkpydEk+4atSXJ2ktuTHEnyi+PoS5I0nJGHQ5I9wKaq2glsT7JjyJofBj5cVRcDZyWZHXVvkqThjGPmsBs41C0fBnYNWfMF4FuSPA04D/jLMfQmSRrCOMJhC/BAt7wAbBuy5k5gB/AG4F7goaUrJdmbZC7J3Pz8/Kj7liR1xhEOjwJndMtbl3mPQTXvBH60qt5BLxxet3SlqjpYVbNVNTszMzPyxiVJPeMIh2MsHkq6EDg+ZM2ZwAuSbAK+E6gx9CZJGsLmMWzzVuBIku3A5cBVSfZX1b4Vai4C7gNuBp4J3AXcMobeJElDGHk4VNVCkt3AZcD1VfUgcM8qNY8Avw88f9T9SJLWbhwzB6rqIRavRjrpGknSZHiHtCSpYThIkhqGgySpYThIkhqGgySpYThIkhqGgySpMZb7HKRT5YoDd066BWlDcuYgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkhuEgSWoYDpKkxljCIclNSY4m2bfWmiS/kOSKcfQlSRrOyMMhyR5gU1XtBLYn2TFsTZKLgadX1W2j7kuSNLxxzBx2A4e65cPArmFqkpwGvA84nuQHxtCXJGlI4wiHLcAD3fICsG3ImtcC/xu4HnhRkmuWrpRkb5K5JHPz8/Mjb1yS1LNqOCT5xiRXJXntia9VVnkUOKNb3rrMewyq+XbgYFU9CHwYuGTpSlV1sKpmq2p2ZmZmtdYlSSdpmJnDx4BvBtL3tZJjLB5KuhA4PmTNfcCzu+dmgfuH6E2SNAabh6hZqKob1rDNW4EjSbYDlwNXJdlfVftWqLkIeBz4pSRXAacBV67hPSVJIzRMONyZ5Bbgg8AXAarqd5crrqqFJLuBy4Dru8NE96xS80j30g+usX9J0hgMEw5fBe4FXtQ9LmDZcACoqodYvBrppGskSZMxMBy6k86Hq+ozVfX2U9yTJGnCljshfQtweZJrk3hZkCQ9yQycOVTVV4H3JTkT+FdJTgfe33duQJK0ga14zqGqvgQcSPJU4PVJvgLc1D0vSdqghrpDuqoeqap3Ax8B3pDk3PG2JUmapGGuVvqaqpoHrhtTL5KkdcLf5yBJaiw7c0hyY1W9Kckn6N3bAL2Pzqiqeskp6U6SNBHLhkNVvan7s/kAPEnSxrbsYaUTv1MhyTmnrh1J0nqw0jmHn+j+/NVT0Ygkaf1Y6WqlSvIO4FlJ3vr3Xqh6x3jbkiRN0krh8Ap6v2vhCuAOVv89DpKkDWKlE9IL9H7nws0rfUS3JGnjWfU+h6r6uVPRiCRp/fAmOElSw3CQJDUMB0lSw3CQJDUMB0lSw3CQJDUMB0lSY02/7EfSdLviwJ2TbmGg267ZNekWtIQzB0lSw3CQJDUMB0lSw3CQJDUMB0lSw3CQJDUMB0lSw3CQJDXGEg5JbkpyNMm+tdYk2ZbkU+PoS5I0nJGHQ5I9wKaq2glsT7JjjTU3AGeMui9J0vDGMXPYDRzqlg8Dg+6LH1iT5CXAF4EHB204yd4kc0nm5ufnR9iyJKnfOMJhC/BAt7wAbBumJsnpwFuBa5fbcFUdrKrZqpqdmZkZYcuSpH7jCIdHWTwstHWZ9xhUcy3w81X18Bh6kiStwTjC4RiLh5IuBI4PWXMp8ONJ7gBemOT9Y+hNkjSEcXxk963AkSTbgcuBq5Lsr6p9K9RcVFW/cuLFJHdU1evH0JskaQgjnzlU1QK9E853A5dU1T1LgmFQzSNLXt896r4kScMbyy/7qaqHWLwa6aRrJEmT4R3SkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJahgOkqSG4SBJaowlHJLclORokn3D1iR5apLfTPLbSX4tyenj6E2StLqRh0OSPcCmqtoJbE+yY8iaVwM3VtVlwIPA94+6N0nScDaPYZu7gUPd8mFgF/Bnq9VU1S/0vT4D/N8x9CZJGsI4DittAR7olheAbWupSfJi4OyqunvpSkn2JplLMjc/Pz/ariVJXzOOcHgUOKNb3rrMewysSfINwAHg6kEbrqqDVTVbVbMzMzMjbVqStGgc4XCM3qEkgAuB48PUdCegDwFvqar7x9CXJGlIqarRbjD5euAI8D+By4GrgB+sqn0r1FwEvAp4J3BPV/beqvrocu8zOztbc3NzI+1dy7viwJ2TbkE65W67ZtfqRVMmybGqml2tbuQnpKtqIclu4DLg+qp6kMVv+MvVPAK8t/uSJE3YOK5WoqoeYvFqpJOukSRNhndIS5IahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqWE4SJIahoMkqbF50g1MyhUH7px0C5LWufX6feK2a3aN/T2cOUiSGoaDJKlhOEiSGoaDJKlhOEiSGmMJhyQ3JTmaZN9aaoZZT5I0fiMPhyR7gE1VtRPYnmTHMDXDrCdJOjXGMXPYDRzqlg8Dgy7IHVQzzHqSpFNgHDfBbQEe6JYXgOcMWbPqekn2Anu7h48m+fQT6PNc4PNPYP31ZqONBzbemDbaeGDjjWkqxpM3DF06aDzPHGbFcYTDo8AZ3fJWBs9OBtWsul5VHQQOjqLJJHNVNTuKba0HG208sPHGtNHGAxtvTI5n0TgOKx1j8ZDQhcDxIWuGWU+SdAqMY+ZwK3AkyXbgcuCqJPurat8KNRcBNeA5SdIEjHzmUFUL9E4u3w1cUlX3LAmGQTWPDHpu1L0tMZLDU+vIRhsPbLwxbbTxwMYbk+PppKpG2YgkaQPwDmmtO0n+QZJLk5w16V5GYaONR08OGz4ckmxLcqRbfkaSO5IcTnIwPacl+Y3uzuyrJ93vaoYYzzcl+Uz3/B1JZibd82qWjOnbgI8C3wX8TpLTp3wfDRrPVO+jvue+NcnHu+Wp3Ud9z/WPZ6r30XL9r+VTKDZ0OCQ5G/gAvXsoAH4E+DdV9RLgPOAFwDXAXHdn9svX8093Q47nO4Gfrard3df8ZLodzoAxPQ94XVW9Hfhz4FlM9z4aNJ5p30ckCXAjcHr31DTvo0HjmfZ91PS/1k+h2NDhADwGvJLeTXVU1U9V1Z92r51D7+aQ3SzemX0UWM/XOA8znouAH0tyV5L/PJk212TpmD4C3J/kZcDZwH1M9z4aNJ6p3ked1wGf6Hu8myndR52l45n2fTSo/92s4VMoNnQ4VNXCoKuekrwS+JOq+iztndnbTmGLazLkeH4T2FlVLwYu6A5rrFvLjGkr8EPAX9O7xHna99HS8Uz1PkpyDvAa4Ia+sqndR8uMZ6r3EYP7X9M+2tDhMEiSZwP/Dnhj99Qwd3SvWwPGc7Sq/qZbvheYug8wrKqHq+pfAP8P+A6mfB8NGM+076PrgLdU1Vf7npvmfTRoPNO+jwb1v6Z9NE078AnrjsvdAlzdl7JTe2f2MuP5re7qmDOB7wP+18QaPAlJ3pvku7uHTwMeZrr30aDxTPU+Ar4HeFeSO4AXJtnPFO8jBo9n2vfRoP7XtI/GcYf0enYt8AzgQO/8E2+jdxLn9iQX0zt5+MnJtbdmg8bzdnrHTr8C/GJVPZEPJ5yE64EPJSng41X16STTvI8GjWeq91FVXXBiOckdVbUvyTOZ0n20zHguYYr3EQO+DyT5HGv4FApvggO6v6xdwG+dgjuzdRLcR+uf+2j96442XAb8blU9uGKt4SBJWupJdc5BkjQcw0GS1DAcJEkNw0GS1DAcJEkNw0EaoSS7kvxqkqd0n2tz3qR7kk6Gl7JKI5bkZuDLwP1V9a5J9yOdDMNBGrEks8BdwExVPTzhdqSTYjhII5bkVuCPgC1V9eYJtyOdFM85SCOU5Ergs1X1VuD5Sf7RpHuSToYzB0lSw5mDJKlhOEiSGoaDJKlhOEiSGoaDJKlhOEiSGoaDJKnx/wGD/GZscaUXWAAAAABJRU5ErkJggg==",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 2. 利用matplotlib.pyplot 中的hist方法直接画图\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"# 图像嵌入\n",
|
||
"%matplotlib inline \n",
|
||
"plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']\n",
|
||
"plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号\n",
|
||
"import numpy as np\n",
|
||
"import warnings\n",
|
||
"warnings.filterwarnings(\"ignore\")\n",
|
||
"# 样本值\n",
|
||
"x = [138, 142, 148, 145, 140, 141,\n",
|
||
" 138, 139, 144, 138, 139, 136,\n",
|
||
" 138, 137, 137, 133, 140, 130,\n",
|
||
" 145, 141, 135, 131, 136, 131,\n",
|
||
" 134, 132, 135, 134, 132, 134,\n",
|
||
" 130, 135, 135, 134, 136, 131,\n",
|
||
" 139, 140, 141, 138, 137, 137,\n",
|
||
" 131, 127, 136, 128, 138, 132,\n",
|
||
" 134, 136, 137, 133, 121, 129,\n",
|
||
" 137, 132, 131, 139, 136, 135]\n",
|
||
" \n",
|
||
"a = np.min(x) - 1\n",
|
||
"b = np.max(x) + 1\n",
|
||
"k = 8\n",
|
||
"# plt.figure(figsize=(10, 8))\n",
|
||
"plt.hist(x, bins=k, alpha=0.8, range=(a, b), density=None) # density = None, 频数直方图\n",
|
||
"plt.title('频数直方图')\n",
|
||
"plt.xlabel('x')\n",
|
||
"plt.ylabel('fi')\n",
|
||
"plt.show()\n",
|
||
"# plt.figure(figsize=(10, 8))\n",
|
||
"plt.hist(x, bins=k, alpha=0.8, range=(a, b), density=True) # density = True, 频率直方图\n",
|
||
"plt.title('频率直方图')\n",
|
||
"plt.xlabel('x')\n",
|
||
"plt.ylabel('fi/n')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 箱线图"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  首先介绍**样本分位数**:设有容量为$n$的样本观察值$x_{1}, x_{2}, \\cdots, x_{n}$,样本$p$分位数$(0<p<1)$记为$x_{p}$,它具有以下性质:(1)至少有$np$个观察值小于或等于$x_{p}$;(2)至少有$n(1-p)$个观察值大于或等于$x_{p}$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 样本分位数的求解步骤:\n",
|
||
"> - 1. 将$x_{1}, x_{2}, \\cdots, x_{n}$按自小到大的次序排列成$x_{(1)}\\le x_{(2)}\\le \\cdots\\le x_{(n)}$\n",
|
||
"> - 2. 使用下述公式计算$x_{p}$分位数$$x_{p}=\\left \\{ \\begin{aligned} &x_{([np]+1)}, &当np不是整数\\\\&\\frac{1}{2}[x_{(np)}+x_{(np+1)}], &当np是整数 \\end{aligned}\\right.$$其中,$[\\cdot]$表示取整。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 特别地,当$p=0.25$时,$0.25$分位数$x_{0.25}$也记为$Q_{1}$, 称为第一四分位数;当$p=0.5$时,$0.5$分位数$x_{0.5}$也记为$Q_{2}或M$,称为样本中位数;当$p=0.75$时,$0.75$分位数$x_{0.75}$也记为$Q_{3}$,称为第三四分位数。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> **箱线图的画法**:箱线图基于以下$5$个数字特征概括,即 最小值$Min$、第一四分位数$Q_{1}$、中位数$M$、第三四分位数$Q_{3}$和最大值$Max$。箱线图的形式如下\n",
|
||
"<div align=center>\n",
|
||
"<img src=\"figures/box.jpg\"/>\n",
|
||
"</div>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:以下是$8$个病人的血压(收缩压,$mmHg$)数据,请作出箱线图\n",
|
||
"$$\n",
|
||
"110 \\quad 102 \\quad 117 \\quad 122 \\quad 118 \\quad 150 \\quad 132 \\quad 123\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"1. 排序\n",
|
||
"$$\n",
|
||
"102 \\quad 110 \\quad 117 \\quad 118 \\quad 122 \\quad 123 \\quad 132 \\quad 150\n",
|
||
"$$\n",
|
||
"\n",
|
||
"2. 计算各分位点及最小最大值\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\because np=8\\times 0.25 = 2, \\quad &\\therefore Q_{1}=\\frac{1}{2}(110+117)=113.5 \\\\\n",
|
||
"&\\because np=8\\times 0.2=5 = 4, \\quad &\\therefore Q_{2}=\\frac{1}{2}(118+122)=120 \\\\\n",
|
||
"&\\because np=8\\times 0.75 = 6, \\quad &\\therefore Q_{3}=\\frac{1}{2}(123+132)=127.5 \\\\\n",
|
||
"& Min = 110, Max = 123.\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"3. 画图"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(画箱线图)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXMAAAD2CAYAAAAksGdNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAALs0lEQVR4nO3dUYidd1rH8e/PNgsl6e5Ot8NgxW5UumxKbPdiRI3TOpEWWeyVC4aAINtARCFeCbsyy3YrBmQVvcjFSiBC8CJaerNYKrsXm2qyscXJ5RILXd2AqeCUZjObi2oNjxfz1k6mMzsnJ2dmOk+/Hwjzvv/35D3PRfjm5T1zzklVIUna3X5ipweQJN09Yy5JDRhzSWrAmEtSA8Zckhq4dyee9MEHH6z9+/fvxFNL0q51+fLlt6pqer1jOxLz/fv3s7i4uBNPLUm7VpKrGx3zNoskNWDMJakBYy5JDRhzSWrAmEtSA8ZcGpw7d46DBw9yzz33cPDgQc6dO7fTI0kjGynmSWaSXBi2fyrJfyR5ZfgzPayfSXIpyVe2cmBpK5w7d46FhQVOnTrFO++8w6lTp1hYWDDo2jU2jXmSKeAssHdY+kXgZFXND3+WkvwmcE9VHQIeSvLI1o0sTd7Jkyc5c+YMhw8fZs+ePRw+fJgzZ85w8uTJnR5NGskoV+a3gCPA8rD/S8DvJ/nnJH85rM0DLwzb3wHm1p4kyfEki0kWl5aW7m5qacKuXLnC3Nzt/2zn5ua4cuXKDk0k3ZlNY15Vy1V1Y9XSPwCHquqXgc8keYyVq/Zrw/FlYGad85yuqtmqmp2eXvfdqNKOOXDgABcvXrxt7eLFixw4cGCHJpLuzDgvgF6qqh8N2/8KPALcBO4b1vaNeV5pxywsLHDs2DHOnz/Pu+++y/nz5zl27BgLCws7PZo0knE+m+VbSY4CN4BfB06zcmU+B7wKPA68PrEJpW1w9OhRAE6cOMGVK1c4cOAAJ0+e/P916cNunJg/D5wH/gf4q6p6Pcl/AheSPAR8npX76tKucvToUeOtXWvkmFfV/PDzPPDZNceWk8wDTwNfX3OPXZK0xSb2EbhVdZ33f6NFkrSNfKFSkhow5pLUgDGXpAaMuSQ1YMwlqQFjLkkNGHNJasCYS1IDxlySGjDmktSAMZekBoy5JDVgzCWpAWMuSQ0Yc0lqwJhLUgPGXJIaMOaS1IAxl6QGjLkkNWDMJakBYy5JDRhzSWrAmEtSA8Zckhow5pLUgDGXpAaMuSQ1YMwlqQFjLkkNGHNJasCYS1IDI8U8yUySC2vWDib59rC9J8lLSS4leXYrBpUkbWzTmCeZAs4Ce1etBfgL4GPD0glgsaoOAc8kuX8LZpUkbWCUK/NbwBFgedXaF4Hzq/bngReG7UvA7NqTJDmeZDHJ4tLS0njTSpLWtWnMq2q5qm68t5/kU8BvA3++6mF7gWvD9jIws855TlfVbFXNTk9P393UkqTbjPMC6J8Cf1RV765auwncN2zvG/O8kqQx3TvG3/lV4JGV2+Z8LsmfAJeBOeBF4HHg1YlNKEna1B3HvKo+8952kleq6itJPg28nOQJ4FHgtQnOKEnaxMi3Q6pqfqO1qroKPA18F3iqqm5NaD5J0gjGuc2yrqp6k/d/o0WStI18oVKSGjDmktSAMZekBoy5JDVgzCWpAWMuSQ0Yc0lqwJhLUgPGXJIaMOaS1IAxl6QGjLkkNWDMJakBYy5JDRhzSWrAmEtSA8Zckhow5pLUgDGXpAaMuSQ1YMwlqQFjLkkNGHNJasCYS1IDxlySGjDmktTAvTs9gLSVkmzL81TVtjyPtBFjrtbGiWwS46xdx9ssktSAMZekBoy5JDVgzCWpgbFinuQnkzyV5P5JDyRJunMjxTzJTJILw/ZjwN8BvwL8Y5KPJdmT5KUkl5I8u4XzSpLWsWnMk0wBZ4G9w9KjwBer6nng34CfAU4Ai1V1CHjGK3ZJ2l6jXJnfAo4AywBV9bfA1SS/AUwBbwDzwAvD4y8Bs2tPkuR4ksUki0tLSxMYXZL0nk1jXlXLVXVjzfI+4LeAt4Fi5ar92nBsGZhZ5zynq2q2qmanp6fvbmpJ0m3GegG0qn5YVb8DvAP8AnATuG84vG/c80qSxnPH0U3yjSRPDrufBH4IXAbmhrXHgR9MYDZJ0ojG+WyWrwN/k6SAb1fV60nOAi8neYKVF0hfm+SQkqQfb+SYV9X88PPfef8q/L1jV5M8Pax/tapuTXJISdKPN7FPTayqN3n/N1okSdvIFyolqQFjLkkNGHNJasCYS1IDxlySGjDmktSAMZekBoy5JDVgzCWpAWMuSQ0Yc0lqwJhLUgMT+6Ataas98MADXL9+fVueK8mWnn9qaoq33357S59DHy3GXLvG9evXqaqdHmMitvo/C330eJtFkhow5pLUgDGXpAaMuSQ1YMwlqQFjLkkNGHNJasCYS1IDxlySGjDmktSAMZekBoy5JDVgzCWpAWMuSQ0Yc0lqwJhLUgPGXJIaMOaS1MBIMU8yk+TCsP1wkleSfCfJ6azYk+SlJJeSPLu1I0uS1to05kmmgLPA3mHpd4Hfq6pfA34a+HngBLBYVYeAZ5Lcv0XzSpLWMcoXOt8CjgDfBKiqhVXHPgW8BcwDXx7WLgGzwPnVJ0lyHDgO8PDDD9/NzPqIquc+Dl/7xE6PMRH13Md3egQ1s2nMq2oZPvht4kmOAN+rqjeT7AWuDYeWgZl1znMaOA0wOzvb4yvWta3y/DJVPf7pJKG+ttNTqJNRrsw/IMnPAn8IPDUs3QTuA24A+4Z9SdI2uePfZhnuoZ8Dnq2qG8PyZWBu2H4c+MFEppMkjWScK/MvAw8Dp4ZbL8+x8gLpy0meAB4FXpvYhJKkTY0c86qaH35+CfjS2uNJnmbl6vyrVXVrUgNKkjY31j3z9VTVm8ALkzqfJGl0vgNUkhow5pLUgDGXpAaMuSQ1YMwlqQFjLkkNGHNJasCYS1IDxlySGjDmktSAMZekBoy5JDUwsQ/akrbD2m+82q2mpqZ2egQ1Y8y1a2zXV8YlafP1dPro8DaLJDVgzCWpAWMuSQ0Yc0lqwJhLUgPGXJIaMOaS1IAxl6QGjLkkNWDMJakBYy5JDRhzSWrAmEtSA8Zckhow5pLUgDGXpAaMuSQ1YMwlqYGRYp5kJsmFVfsHknxz1f4nk/xTku8m+fxWDCpJ2timMU8yBZwF9g77Pwf8GfCJVQ/7Y+CvgSeBP0iXb92VpF1ilCvzW8ARYHnY/xHwhTWPeRJ4sapuAd8H9q89SZLjSRaTLC4tLY0/sSTpAzaNeVUtV9WNVfv/VVX/veZh/1tVN4ftZWBmnfOcrqrZqpqdnp6+q6ElSbeb1Augt1Zt75vgeSVJI5hUdL+XZHbYfgy4OqHzSpJGcO+EzvMN4EyS14CbVXVtQueVJI1g5JhX1fxG+1X1L0m+AHwO+PsJzSZJGtGkrsypqjeANyZ1PknS6HyhUpIaMOaS1IAxl6QGjLkkNWDMJakBYy5JDRhzSWrAmEtSA8Zckhow5pLUgDGXpAaMuSQ1YMwlqQFjLkkNGHNJasCYS1IDxlySGjDmktSAMZekBoy5JDVgzCWpAWMuSQ0Yc0lqwJhLUgPGXJIaMOaS1MC9Oz2AtJWSbMvfq6qxnkeaFGOu1oysPiq8zSJJDRhzSWrAmEtSA8ZckhoYKeZJZpJcGLb3JHkpyaUkz260JknaPpvGPMkUcBbYOyydABar6hDwTJL7N1iTJG2TUa7MbwFHgOVhfx54Ydi+BMxusHabJMeTLCZZXFpauouRJUlrbRrzqlquqhurlvYC14btZWBmg7W15zldVbNVNTs9PX13U0uSbjPOm4ZuAvcBN4B9w/56axu6fPnyW0mujvHc0nZ4EHhrp4eQ1vHpjQ6ME/PLwBzwIvA48OoGaxuqKi/N9aGVZLGqPnCrUPowGyfmZ4GXkzwBPAq8xsotlrVrkqRtknE+uyLJQ6xciX/rvfvp661Ju5FX5tqNxoq51FmS41V1eqfnkO6EMZekBnw7vyQ1YMwlqQFjLq2y+nOIpN3EmEuDdT6HSNo1jLn0vrWfQyTtGn4HqDSoqmUY/0ugpZ3klbkkNWDMJakBYy5JDfgOUElqwCtzSWrAmEtSA8Zckhow5pLUgDGXpAaMuSQ18H+3+JbWWKjgDwAAAABJRU5ErkJggg==",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 0 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import matplotlib.pyplot as plt \n",
|
||
"%matplotlib inline \n",
|
||
"plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']\n",
|
||
"plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号\n",
|
||
"\n",
|
||
"x = [102, 110, 117, 118, 122, 123, 132, 150]\n",
|
||
"\n",
|
||
"# 程序会自动找出异常点,即相差太大的点,该点< Q1-1.5(Q3-Q1)=Q1-1.5IQR 或> Q3+1.5(Q3-Q1)=Q3+1.5IQR\n",
|
||
"fig, ax = plt.subplots()\n",
|
||
"plt.figure(figsize=(6,4))\n",
|
||
"ax.boxplot(x)\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.3 统计量与三大抽样分布"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 统计量:设$X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$X$的一个样本,$g(X_{1}, X_{2}, \\cdots, X_{n})$是$X_{1}, X_{2}, \\cdots, X_{n}$的函数,若$g$中不含任何未知参数,则称$g(X_{1}, X_{2}, \\cdots, X_{n})$是一个**统计量**。\n",
|
||
"> 常用统计量,设$X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$X$的一个样本,$x_{1}, x_{2}, \\cdots, x_{n}$是这一样本的观察值。\n",
|
||
"> - 1. 样本均值$$\\overline{X} = \\frac{1} {n} \\sum_{i=1}^{n}X_{i}$$对应的观察值为$\\overline{x} = \\frac{1} {n} \\sum_{i=1}^{n}x_{i}$\n",
|
||
"> - 2. 样本方差$$\\begin{aligned} &1) S_{n}^{2} = \\frac{1} {n} \\sum_{i=1}^{n}(X_{i} - \\overline{X})^{2} \\\\ &2) S^{2} = \\frac{1} {n-1} \\sum_{i=1}^{n}(X_{i} - \\overline{X})^{2}, 无偏方差,应用较多\\end{aligned}$$对应的观察值分别为$s_{n}^{2} = \\frac{1} {n} \\sum_{i=1}^{n}(x_{i} - \\overline{x})^{2}和s^{2} = \\frac{1} {n-1} \\sum_{i=1}^{n}(x_{i} - \\overline{x})^{2}$\n",
|
||
"> - 3. 样本标准差$$S = \\sqrt{S^{2}} = \\sqrt{\\frac{1} {n-1} \\sum_{i=1}^{n}(X_{i} - \\overline{X})^{2}}$$对应的观察值为$s = \\sqrt{\\frac{1} {n-1} \\sum_{i=1}^{n}(x_{i} - \\overline{x})^{2}}$\n",
|
||
"> - 4. 样本$k$阶(原点)矩$$A_{k} = \\frac{1}{n}\\sum_{i=1}^{n}X_{i}^{k}, k =1, 2, \\cdots$$对应的观察值为$a_{k} = \\frac{1}{n}\\sum_{i=1}^{n}x_{i}^{k}, k =1, 2, \\cdots$\n",
|
||
"> - 5. 样本$k$阶中心矩$$B_{k} = \\frac{1}{n}\\sum_{i=1}^{n}(X_{i} - \\overline{X})^{k}, k =1, 2, \\cdots$$对应的观察值为$b_{k} = \\frac{1}{n}\\sum_{i=1}^{n}(x_{i} - \\overline{x})^{k}, k =1, 2, \\cdots$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 三大抽样分布\n",
|
||
"\n",
|
||
"   (1) $\\chi ^{2}$分布:设$X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$N(0, 1)$的样本,则称统计量\n",
|
||
"$$\\chi ^{2} = X_{1}^{2} + X_{2}^{2} + \\cdots + X_{n}^{2}$$\n",
|
||
"服从自由度为$n$的$\\chi ^{2}$分布,记为$\\chi ^{2} \\sim \\chi ^{2}(n)$。 自由度表示上式中右端包含的独立变量的个数。\n",
|
||
"\n",
|
||
"  $\\chi ^{2}$分布的概率密度函数(不需要记)为\n",
|
||
"$$\n",
|
||
"f(y) = \n",
|
||
"\\left \\{ \n",
|
||
" \\begin{aligned}\n",
|
||
" & \\frac{1}{2^{n/2}\\Gamma {(n/2})}y^{n/2-1}e^{-y/2}, &y>0 \\\\\n",
|
||
" & 0, & 其他\n",
|
||
" \\end{aligned}\n",
|
||
"\\right.\n",
|
||
"$$\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码($\\chi ^{2}分布的图形$)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import chi2\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"fig, ax = plt.subplots(1, 1)\n",
|
||
"x = np.linspace(0.01, 30, 10000)\n",
|
||
"ax.plot(x, chi2.pdf(x, df=2), '-', label='n = 2')\n",
|
||
"ax.plot(x, chi2.pdf(x, 4), '--', label='n = 4')\n",
|
||
"ax.plot(x, chi2.pdf(x, df=10), '-.', label='n = 10')\n",
|
||
"ax.set_ylim([0, 0.5])\n",
|
||
"ax.set_xlabel(\"y\")\n",
|
||
"ax.set_ylabel(\"f(y)\")\n",
|
||
"ax.legend()\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 720x360 with 3 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 利用定理画卡方分布的图形\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import norm, chi2\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"def demonstate_chi(n):\n",
|
||
" x = 0\n",
|
||
" for i in range(n):\n",
|
||
" x += np.square(norm(loc=0, scale=1).rvs(size=10000))\n",
|
||
" \n",
|
||
" return x\n",
|
||
"\n",
|
||
"x = np.linspace(0.01, 30, 10000)\n",
|
||
"\n",
|
||
"n_2 = demonstate_chi(2)\n",
|
||
"n_4 = demonstate_chi(4)\n",
|
||
"n_10 = demonstate_chi(10)\n",
|
||
"\n",
|
||
"plt.figure(figsize=(10, 5))\n",
|
||
"plt.subplot(1,3, 1)\n",
|
||
"plt.plot(x, chi2.pdf(x, 2), '-', label='n = 2', c='blue')\n",
|
||
"plt.hist(n_2, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(1,3, 2)\n",
|
||
"plt.plot(x, chi2.pdf(x, df = 4), '--', label='n = 4', c='gray')\n",
|
||
"plt.hist(n_4, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(1,3, 3)\n",
|
||
"plt.plot(x, chi2.pdf(x, 10), '-.', label='n = 10', c='red')\n",
|
||
"plt.hist(n_10, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.tight_layout(w_pad=3)\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> $\\chi ^{2}$分布的性质\n",
|
||
"> - 1. $\\chi ^{2}$分布的**可加性**:设$\\chi_{1}^{2} \\sim \\chi ^{2}(n1), \\chi_{2}^{2} \\sim \\chi ^{2}{n2}$,且$\\chi_{1}^{2}, \\chi_{2}^{2}$相互独立,则$$\\chi_{1}^{2} + \\chi_{2}^{2} \\sim \\chi ^{2} (n1 + n2)$$\n",
|
||
"> - 2. $\\chi ^{2}$分布的**期望**和**方差**:若$\\chi ^{2} \\sim \\chi ^{2}(n)$,则$$E(\\chi ^{2}) = n, D(\\chi ^{2}) = 2n$$ 证:$$\\begin{aligned} &\\chi ^{2} = X_{1}^{2} + X_{2}^{2} + \\cdots + X_{n}^{2}, X_{i} \\sim N(0, 1) \\\\ & 故 E(X_{i})=0, E(X_{i}^{2}) = D(X_{i}) = 1 \\\\ & E(\\chi ^{2}) = \\sum_{i=1}^{n}E(X_{i}^{2}) = n \\\\ &D(X_{i}^{2}) = E(X_{i}^{4}) - E^{2}(X_{i}^{2}) = 3 - 1 = 2 \\\\ & D(\\chi ^{2}) = \\sum_{i=1}^{n}D(X_{i}^{2}) = 2n\\end{aligned}$$\n",
|
||
"> - 3. $\\chi ^{2}$分布的**分位点**:对于给定的正数$\\alpha, 0 <\\alpha <1$,称满足条件$$P\\{\\chi^{2} > \\chi_{\\alpha} ^{2}(n)\\} = \\int_{\\chi_{\\alpha} ^{2}(n)}^{\\infty}f(y)dy = \\alpha$$ 的点$\\chi_{\\alpha} ^{2}(n)$为$\\chi ^{2}(n)$分布上的$\\alpha$分位点。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"   (2) $t$ 分布:设$X \\sim N(0, 1), Y \\sim \\chi^{2}(n)$,且$X, Y$相互独立,则称随机变量\n",
|
||
"$$\n",
|
||
"t = \\frac{X}{\\sqrt{Y/n}}\n",
|
||
"$$\n",
|
||
"服从自由度为$n$的$t$分布,记为$t \\sim t(n)$。\n",
|
||
"\n",
|
||
"  $t$分布的概率密度函数为:\n",
|
||
"$$\n",
|
||
"h(t) = \\frac{\\Gamma [(n+1)/2]}{\\sqrt{\\pi n} \\Gamma (n/2)}(1 + \\frac{t^{2}}{n})^{-(n+1)/2}, -\\infty < t < \\infty\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(画$t$分布的图像)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import t\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"fig, ax = plt.subplots(1, 1)\n",
|
||
"x = np.linspace(-10, 10, 10000)\n",
|
||
"ax.plot(x, t.pdf(x, df=2), '-', label='n = 2', c='blue')\n",
|
||
"ax.plot(x, t.pdf(x, 9), '--', label='n = 9', c='gray')\n",
|
||
"ax.plot(x, t.pdf(x, df=10000), '-.', label='n = 10000', c='red')\n",
|
||
"ax.set_xlabel(\"t\")\n",
|
||
"ax.set_ylabel(\"h(t)\")\n",
|
||
"ax.legend()\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 720x360 with 3 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 利用定理画 t 分布的分布函数\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import norm, chi2\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"def demonstate_t(n):\n",
|
||
" x = 0\n",
|
||
" y = 0\n",
|
||
" x = norm(loc=0, scale=1).rvs(size=10000)\n",
|
||
" y = chi2.rvs(df=n)\n",
|
||
" t = x / np.sqrt(y/ n)\n",
|
||
" \n",
|
||
" return t\n",
|
||
"\n",
|
||
"x = np.linspace(-10, 10, 10000)\n",
|
||
"\n",
|
||
"n_2 = demonstate_t(2)\n",
|
||
"n_9 = demonstate_t(9)\n",
|
||
"n_10000 = demonstate_t(10000)\n",
|
||
"\n",
|
||
"plt.figure(figsize=(10, 5))\n",
|
||
"plt.subplot(1,3, 1)\n",
|
||
"plt.plot(x, t.pdf(x, 2), '-', label='n = 2', c='blue')\n",
|
||
"plt.hist(n_2,bins=15, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(1,3, 2)\n",
|
||
"plt.plot(x, t.pdf(x, df = 9), '--', label='n = 9', c='gray')\n",
|
||
"plt.hist(n_9, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(1,3, 3)\n",
|
||
"plt.plot(x, t.pdf(x, 10000), '-.', label='n = 10000', c='red')\n",
|
||
"plt.hist(n_10000, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend(loc=\"upper right\")\n",
|
||
"plt.tight_layout(w_pad=3)\n",
|
||
"plt.show()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 当$n \\rightarrow \\infty$时,$t$分布近似为$N(0 ,1)$分布。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> - $t$分布的分位点:对于给定的正数$\\alpha, 0 <\\alpha <1$,称满足条件$$P\\{t > t_{\\alpha}(n)\\} = \\int_{t_{\\alpha}(n)}^{\\infty}h(t)dt = \\alpha$$ 的点$t_{\\alpha}(n)$为$t(n)$分布上的$\\alpha$分位点。\n",
|
||
"> - $h(t)$图形具有对称性,即$t_{1 - \\alpha}(n) = -t_{\\alpha}(n)$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"   (3) $F$ 分布:设$U \\sim \\chi ^{2}(n1), V \\sim \\chi ^{2}(n2)$,且$U, V$相互独立,则称随机变量\n",
|
||
"$$\n",
|
||
"F = \\frac{U/n1}{V/n2}\n",
|
||
"$$\n",
|
||
"服从自由度为$(n1, n2)$的$F$分布,记为$F \\sim F(n1, n2)$。\n",
|
||
"\n",
|
||
"  $F$分布的概率密度函数为:\n",
|
||
"$$\n",
|
||
"\\psi (y) = \n",
|
||
"\\left \\{\n",
|
||
" \\begin{aligned}\n",
|
||
" & \\frac{\\Gamma [(n1 +n2)/2](n1/n2)^{n1/2}y^{(n1/2)-1}}{\\Gamma (n1/2)\\Gamma (n2/2)[1+(n1y/n2)]^{(n1+n2)/2}}, &y>0 \\\\\n",
|
||
" & 0, &其它\n",
|
||
" \\end{aligned}\n",
|
||
"\\right.\n",
|
||
"$$\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(画$F$分布函数)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 432x288 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import f\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"fig, ax = plt.subplots(1, 1)\n",
|
||
"x = np.linspace(0.01, 10, 10000)\n",
|
||
"ax.plot(x, f.pdf(x, dfn=10, dfd=40), '-', label='F~(10, 40)', c='blue')\n",
|
||
"ax.plot(x, f.pdf(x, dfn=40, dfd=10), '--', label='F~(40, 10)', c='orange')\n",
|
||
"ax.plot(x, f.pdf(x, dfn=11, dfd=3), '-.', label='F~(11, 3)', c='red')\n",
|
||
"ax.set_xlabel(\"y\")\n",
|
||
"ax.set_ylabel(\"f(y)\")\n",
|
||
"ax.legend()\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 720x360 with 3 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 利用定理\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import chi2\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"def demonstate_f(n1, n2):\n",
|
||
" u = 0\n",
|
||
" v = 0\n",
|
||
" u = chi2.rvs(df=n1, size=10000)\n",
|
||
" v = chi2.rvs(df=n2, size=10000)\n",
|
||
" f = (u/n1) / (v/n2)\n",
|
||
" \n",
|
||
" return f\n",
|
||
"\n",
|
||
"x = np.linspace(0.01, 10, 10000)\n",
|
||
"\n",
|
||
"n_10_40 = demonstate_f(10, 40)\n",
|
||
"n_40_10 = demonstate_f(40 ,10)\n",
|
||
"n_11_3 = demonstate_f(11, 3)\n",
|
||
"\n",
|
||
"plt.figure(figsize=(10, 5))\n",
|
||
"plt.subplot(1,3, 1)\n",
|
||
"plt.plot(x, f.pdf(x, dfn=10, dfd=40), '-', label='F~(10, 40)', c='blue')\n",
|
||
"plt.hist(n_10_40, bins=300, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(1,3, 2)\n",
|
||
"plt.plot(x, f.pdf(x, dfn=40, dfd=10), '--', label='F~(40, 10)', c='orange')\n",
|
||
"plt.hist(n_40_10, bins=300, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(1,3, 3)\n",
|
||
"plt.plot(x, f.pdf(x, dfn=11, dfd=3), '-.', label='F~(11, 3)', c='red')\n",
|
||
"plt.hist(n_11_3,bins=550, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.xlim([0, 10])\n",
|
||
"plt.legend(loc=\"upper right\")\n",
|
||
"plt.tight_layout(w_pad=3)\n",
|
||
"plt.show()\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> - $F$分布的分位点:对于给定的正数$\\alpha, 0 <\\alpha <1$,称满足条件$$P\\{F > F_{\\alpha}(n1, n2)\\} = \\int_{F_{\\alpha}(n1, n2)}^{\\infty}\\psi (y)dy = \\alpha$$ 的点$F_{\\alpha}(n1, n2)$为$F(n1, n2)$分布上的$\\alpha$分位点。\n",
|
||
"> - 若$F \\sim F(n1, n2)$,则$$\\frac{1}{F} \\sim F(n2, n1)$$\n",
|
||
"> - $F_{1-\\alpha}(n1, n2) =\\frac{1} {F_{\\alpha}(n2, n1)}$。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 📕重要定理:关于正态总体的样本均值与样本方差的分布"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  **定理一**:设$X_{1}, X_{2}, \\cdots, X_{n}$是来自正态总体$N(\\mu, \\sigma^{2})$的样本,$\\overline{X}$是样本均值,则\n",
|
||
"$$\n",
|
||
"\\overline{X} \\sim N(\\mu, \\sigma^{2}/n).\n",
|
||
"$$\n",
|
||
"\n",
|
||
"  **定理二**:设$X_{1}, X_{2}, \\cdots, X_{n}$是来自正态总体$N(\\mu, \\sigma^{2})$的样本,$\\overline{X} {和} S^{2}$分别是样本均值和样本方差,则有\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"& 1. \\frac{(n-1)S^{2}}{\\sigma^{2}} \\sim \\chi^{2}(n-1) \\\\\n",
|
||
"& 2. \\overline{X}与S^{2}相互独立\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"  **定理三**:设$X_{1}, X_{2}, \\cdots, X_{n}$是来自正态总体$N(\\mu, \\sigma^{2})$的样本,$\\overline{X} {和} S^{2}$分别是样本均值和样本方差,则有\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{X} - \\mu}{S/\\sqrt{n}} \\sim t(n-1).\n",
|
||
"$$\n",
|
||
"\n",
|
||
"  **定理四**:设$X_{1}, X_{2}, \\cdots, X_{n1}{和}Y_{1}, Y_{2}, \\cdots, Y_{n2}$分别是来自正态总体$N(\\mu_1, \\sigma_{1}^{2})和N(\\mu_2, \\sigma_{2}^{2})$的样本,且这两个样本相互独立,则有\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"& 1. \\frac{S_{1}^{2}/S_{2}^{2}}{\\sigma_{1}^{2}/\\sigma_{2}^{2}} \\sim F(n1-1, n2-1) \\\\\n",
|
||
"& 2. 当\\sigma_{1}^{2} = \\sigma_{2}^{2} = \\sigma^{2}时,\\frac{(\\overline{X} - \\overline{Y}) - (\\mu_{1} - \\mu_{2})}{S_{w}\\sqrt{\\frac{1}{n1}+\\frac{1}{n2}}} \\sim t(n1+n2-2) \n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"其中,$S_{w}^{2} = \\frac{(n1-1)S_{1}^{2}+(n2-1)S_{2}^{2}}{n1+n2-2}$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(验证定理)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 720x576 with 4 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline\n",
|
||
"from scipy.stats import chi2, t, norm, f\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"def theory_1(mu, sigma, n):\n",
|
||
" x_mean = []\n",
|
||
" for i in range(10000):\n",
|
||
" x_mean.append(np.sum(norm.rvs(loc=mu, scale=sigma, size=n))/n)\n",
|
||
" return x_mean\n",
|
||
"\n",
|
||
"def theory_2(mu, sigma, n):\n",
|
||
" res = []\n",
|
||
" for i in range(10000):\n",
|
||
" x = norm.rvs(loc=mu, scale=sigma, size=n)\n",
|
||
" x_mean = np.mean(x)\n",
|
||
" s2 = np.sum(np.square(x - x_mean))/(n-1)\n",
|
||
" res.append((n-1)*s2/(sigma**2))\n",
|
||
" return res\n",
|
||
"\n",
|
||
"def theory_3(mu, sigma, n):\n",
|
||
" res = []\n",
|
||
" for i in range(10000):\n",
|
||
" x = norm.rvs(loc=mu, scale=sigma, size=n)\n",
|
||
" x_mean = np.mean(x)\n",
|
||
" s = np.sqrt(np.sum(np.square(x - x_mean))/(n-1))\n",
|
||
" res.append((x_mean-mu)/(s/np.sqrt(n)))\n",
|
||
" return res\n",
|
||
"\n",
|
||
"def theory_4(mu1, mu2, sigma1, sigma2, n1, n2):\n",
|
||
" res = []\n",
|
||
" for i in range(10000):\n",
|
||
" x1 = norm.rvs(loc=mu1, scale=sigma1, size=n1)\n",
|
||
" x1_mean = np.mean(x1)\n",
|
||
" x2 = norm.rvs(loc=mu2, scale=sigma2, size=n2)\n",
|
||
" x2_mean = np.mean(x2)\n",
|
||
" s1_2 = np.sum(np.square(x1-x1_mean)) / (n1-1)\n",
|
||
" s2_2 = np.sum(np.square(x2-x2_mean)) / (n2-1)\n",
|
||
" temp1 = (s1_2/s2_2)\n",
|
||
" temp2 = (sigma1**2/sigma2**2)\n",
|
||
" res.append(temp1/temp2)\n",
|
||
" return res \n",
|
||
"\n",
|
||
"mu = 5\n",
|
||
"sigma = 10\n",
|
||
"n = 5\n",
|
||
"mu1, mu2 = 1, 2\n",
|
||
"sigma1, sigma2 = 3, 4\n",
|
||
"n1, n2 = 10, 40\n",
|
||
"x_mean = theory_1(mu, sigma, n)\n",
|
||
"t2 = theory_2(mu, sigma, n)\n",
|
||
"t_ = theory_3(mu, sigma, n)\n",
|
||
"f_ = theory_4(mu1, mu2, sigma1, sigma2, n1, n2)\n",
|
||
"\n",
|
||
"x1 =np.linspace(-10, 20, 10000)\n",
|
||
"x2 = np.linspace(0.01, 30, 10000)\n",
|
||
"x3 = np.linspace(-5, 5, 10000)\n",
|
||
"x4 = np.linspace(0.01, 10, 10000)\n",
|
||
"\n",
|
||
"plt.figure(figsize=(10, 8))\n",
|
||
"plt.subplot(2,2, 1)\n",
|
||
"plt.plot(x1, norm.pdf(x1,loc=mu, scale=sigma/np.sqrt(n)), '-', label='N({}, {})'.format(mu, sigma**2/n), c='blue')\n",
|
||
"plt.hist(x_mean,bins=50, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.title(\"Theory_1\")\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"p(x)\")\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(2,2, 2)\n",
|
||
"plt.plot(x2, chi2.pdf(x2, df=n-1), '--', label='X({})'.format(n-1), c='orange')\n",
|
||
"plt.hist(t2, bins=50, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.title(\"Theory_2\")\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"p(x)\")\n",
|
||
"# plt.xlim([0, 30])\n",
|
||
"plt.legend()\n",
|
||
"plt.subplot(2,2, 3)\n",
|
||
"plt.plot(x3, t.pdf(x3, df=n-1), '-.', label='t({})'.format(n-1), c='red')\n",
|
||
"plt.hist(t_,bins=50, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.title(\"Theory_3\")\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"p(x)\")\n",
|
||
"plt.legend(loc=\"upper right\")\n",
|
||
"plt.subplot(2,2, 4)\n",
|
||
"plt.plot(x4, f.pdf(x4, dfn=n1-1, dfd=n2-1), '--', label='F({}, {})'.format(n1-1, n2-1), c='orange')\n",
|
||
"plt.hist(f_, bins=50, density=True, histtype='stepfilled', alpha=0.5)\n",
|
||
"plt.title(\"Theory_4\")\n",
|
||
"plt.xlabel(\"x\")\n",
|
||
"plt.ylabel(\"p(x)\")\n",
|
||
"plt.xlim([0, 10])\n",
|
||
"plt.legend()\n",
|
||
"\n",
|
||
"plt.tight_layout(w_pad=3)\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.4 参数估计之点估计的概念"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 点估计:设总体$X$的分布函数$F(x;\\theta)$的形式为已知,$\\theta$是待估计参数,$X_{1}, X_{2}, \\cdots, X_{n}$是$X$的一个样本,$x_{1}, x_{2}, \\cdots, x_{n}$是相应的一个样本值,点估计问题就是要构造一个适当的统计量$\\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$,用它的观察值$\\hat{\\theta}(x_{1}, x_{2}, \\cdots, x_{n})$作为未知参数$\\theta$的近似值。称$\\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$为$\\theta$的**估计量**,$\\hat{\\theta}(x_{1}, x_{2}, \\cdots, x_{n})$为$\\theta$的**估计值**。统称它们为**估计**,简记为$\\hat{\\theta}$。\n",
|
||
"> 点估计就是用样本统计量去估计总体分布的未知参数。由于估计量是样本的函数,因此,对于不同的样本值,$\\theta$的估计值一般是不相同的。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.5 参数估计之点估计的方法:矩估计"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 矩估计法:设$X$为连续型随机变量,其概率密度为$f(x;\\theta_{1},\\theta_{2}, \\cdots, \\theta_{k})$,或$X$为离散随机变量,其分布律为$P\\{X=x\\}=p(x;\\theta_{1},\\theta_{2}, \\cdots, \\theta_{k})$,其中$\\theta_{1},\\theta_{2}, \\cdots, \\theta_{k}$为待估计参数,$X_{1}, X_{2}, \\cdots, X_{n}$是来自$X$的样本。假设总体$X$的前$k$阶矩\n",
|
||
"$$\n",
|
||
"\\mu_{l} = E(X^{l}) = \\int_{-\\infty}^{\\infty}x^{l}f(x;\\theta_{1},\\theta_{2}, \\cdots, \\theta_{k})dx\n",
|
||
"$$\n",
|
||
"或\n",
|
||
"$$\n",
|
||
"\\mu_{l} = E(X^{l}) = =\\sum x^{l}p(x;\\theta_{1},\\theta_{2}, \\cdots, \\theta_{k})\n",
|
||
"$$\n",
|
||
"**然后假设样本$k$阶矩$A_{k}$等于总体$k$阶矩$\\mu_{k}$,即$A_{k} = \\mu_{k}$**,这种利用样本矩估计总体矩,从而估计未知参数的方法称为**矩估计法**。\n",
|
||
"> 样本矩公式\n",
|
||
"> - 1. 样本原点矩 $$A_{k} = \\frac{1}{n}\\sum_{i=1}^{n}X_{i}^{k}, k =1, 2, \\cdots$$\n",
|
||
"> - 2. 样本中心矩 $$B_{k} = \\frac{1}{n}\\sum_{i=1}^{n}(X_{i} - \\overline{X})^{k}, k =1, 2, \\cdots$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 矩估计法的解题步骤:\n",
|
||
"> - 1. 确定总体分布待估计参数$\\theta_{i}$的个数$n$\n",
|
||
"> - 2. 列出总体分布的前$n$阶矩$\\mu_{1}到\\mu_{n}$,$\\mu_{n}$是关于待估计参数$\\theta_{i}$的函数\n",
|
||
"> - 3. 将$\\mu_{1}到\\mu_{n}$联立方程组,求解待估计参数$\\theta_{i}$\n",
|
||
"> - 4. 将求得的$\\theta_{i}$中的$\\mu_{k}$换成相应的$A_{k}$,即得到待估计参数的估计值"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:设总体$X$在$[a, b]$上服从均匀分布,$a, b$未知,$X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$X$的样本,求$a, b$的矩估计量。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解: \n",
|
||
"1. 确定估计参数个数,$a, b$, $n=2$\n",
|
||
"2. 求总体的前$2$阶矩\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\mu_{1} = E(X) = \\frac{b-a}{2} \\\\\n",
|
||
"&\\mu_{2} = E(X^{2}) = D(X) + E^{2}(X) = \\frac{(b-a)^{2}}{12} + \\frac{(b-a)^{2}}{4} \\\\\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"3. 联立方程组并求解\n",
|
||
"$$\n",
|
||
"\\left \\{\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\mu_{1} = \\frac{b-a}{2} \\\\\n",
|
||
"&\\mu_{2} = \\frac{(b-a)^{2}}{12} + \\frac{(b-a)^{2}}{4} \\\\\n",
|
||
"\\end{aligned}\n",
|
||
"\\right.\n",
|
||
"$$\n",
|
||
"解得\n",
|
||
"$$\n",
|
||
"a = \\mu_{1} - \\sqrt{3(\\mu_{2}-\\mu_{1}^{2})}, b = \\mu_{1} + \\sqrt{3(\\mu_{2}-\\mu_{1}^{2})}\n",
|
||
"$$\n",
|
||
"4. 将相应的$\\mu_{k}$换成$A_{k}$\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&a = A_{1} - \\sqrt{3(A_{2}-A_{1}^{2})} = \\frac{1}{n}\\sum_{i=1}^{n}X_{i} - \\sqrt{3(\\frac{1}{n}\\sum_{i=1}^{n}X_{i}^{2}-(\\frac{1}{n}\\sum_{i=1}^{n}X_{i})^{2})} \\\\\n",
|
||
"& b = A_{1} + \\sqrt{3(A_{2}-A_{1}^{2})} = \\frac{1}{n}\\sum_{i=1}^{n}X_{i} + \\sqrt{3(\\frac{1}{n}\\sum_{i=1}^{n}X_{i}^{2}-(\\frac{1}{n}\\sum_{i=1}^{n}X_{i})^{2})}\n",
|
||
"\\end{aligned}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(求解上题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"a的真实值:1, b的真实值:6\n",
|
||
"a的矩估计值:1.08, b的矩估计值:6.07\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"from scipy.stats import uniform\n",
|
||
"\n",
|
||
"a_real = 1\n",
|
||
"b_real = 6\n",
|
||
"n = 1000\n",
|
||
"x = uniform.rvs(loc=1, scale=5, size=n)\n",
|
||
"\n",
|
||
"A1 = np.sum(x) / n\n",
|
||
"A2 = np.sum(np.square(x)) / n\n",
|
||
"\n",
|
||
"a_estimate = A1 - np.sqrt(3 *(A2-A1**2))\n",
|
||
"b_estimate = A1 + np.sqrt(3 *(A2-A1**2))\n",
|
||
"print(\"a的真实值:{}, b的真实值:{}\".format(a_real, b_real))\n",
|
||
"print(\"a的矩估计值:{:.2f}, b的矩估计值:{:.2f}\".format(a_estimate, b_estimate))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.6 参数估计之点估计的方法:极大似然估计"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 最大似然估计法:\n",
|
||
"\n",
|
||
"  对于离散型随机变量:设其分布律为$P\\{X=s\\}=p(x;\\theta), \\theta \\in \\Theta$的形式已知,$\\theta$为待估计参数,$\\Theta 是 \\theta$可能取值的范围。设$X_{1}, X_{2}, \\cdots, X_{n}$是来自$X$的样本,则$X_{1}, X_{2}, \\cdots, X_{n}$的联合分布律为\n",
|
||
"$$\n",
|
||
"\\prod{i=1}^{n}p(x_{i};\\theta).\n",
|
||
"$$\n",
|
||
"又设$x_{1}, x_{2}, \\cdots, x_{n}$是相应样本$X_{1}, X_{2}, \\cdots, X_{n}$的样本值。易知样本$X_{1}, X_{2}, \\cdots, X_{n}$取到观察值$x_{1}, x_{2}, \\cdots, x_{n}$的概率,即事件$\\{X_{1}=x_{1}, X_{2}=x_{2}, \\cdots, X_{n}=x_{n}\\}$发生的概率为\n",
|
||
"$$\n",
|
||
"L(\\theta) = L(x_{1}, x_{2}, \\cdots, x_{n}; \\theta) = \\prod{i=1}^{n}p(x_{i};\\theta), \\theta \\in \\Theta.\n",
|
||
"$$\n",
|
||
"这一概率随$\\theta$的取值而变化,它是$\\theta$的函数,$L(\\theta)$称为样本的**似然函数**(注意,这里$x_{1}, x_{2}, \\cdots, x_{n}$是已知的样本值,即常数)。\n",
|
||
"\n",
|
||
"最大似然估计思想:固定样本观察值$x_{1}, x_{2}, \\cdots, x_{n}$,在$\\theta$取值的可能范围$\\Theta$内挑选使似然函数$L(x_{1}, x_{2}, \\cdots, x_{n}; \\theta)$达到最大的参数$\\hat{\\theta}$,作为参数$\\theta$的估计值。即取$\\hat{\\theta}$使\n",
|
||
"$$\n",
|
||
"L(x_{1}, x_{2}, \\cdots, x_{n}; \\hat{\\theta}) = \\max_{\\theta \\in \\Theta} L(x_{1}, x_{2}, \\cdots, x_{n}; \\theta)\n",
|
||
"$$\n",
|
||
"这样得到的$\\hat{\\theta}$与样本值$x_{1}, x_{2}, \\cdots, x_{n}$有关,记为$\\hat{\\theta}(x_{1}, x_{2}, \\cdots, x_{n})$,称为参数$\\theta$的**最大似然估计值**,相应的统计量$\\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$称为参数$\\theta$的**最大似然估计量**。\n",
|
||
"\n",
|
||
"  对于连续型随机变量:似然函数\n",
|
||
"$$\n",
|
||
"L(\\theta) = L(x_{1}, x_{2}, \\cdots, x_{n}; \\theta) = \\prod{i=1}^{n}f(x_{i};\\theta)\n",
|
||
"$$\n",
|
||
"$f(x_{i};\\theta)$为连续随机变量的概率密度函数。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 最大似然函数方法的求解步骤\n",
|
||
"> - 1. 确定随机变量的分布律(概率密度)\n",
|
||
"> - 2. 确定似然函数\n",
|
||
"> - 3. 令 $\\frac{d}{d\\theta}L(\\theta) = 0$,求驻点,便可以找到使$L(\\theta)$取极值的估计参数$\\hat{\\theta}$\n",
|
||
"> - 4. 对于$L(\\theta)$函数中存在大量连乘项或指数项时,可令$\\frac{d}{d\\theta}ln L(\\theta) = 0$,也可以求得$\\hat{\\theta}$,因为$ln\\cdot$函数是单调递增函数,对似然函数做$ln\\cdot$变换不会改变原函数的特征。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子一:设$X\\sim b(1, p)(二项分布). X_{1}, X_{2}, \\cdots, X_{n}$是来自$X$的一个样本,求参数$p$的最大似然估计。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"1. 确定随机变量的分布律(概率密度)\n",
|
||
"$$\n",
|
||
"P\\{X=x\\} = p^{x}(1-p)^{1 - x}, x=0, 1\n",
|
||
"$$\n",
|
||
"2. 确定似然函数\n",
|
||
"$$\n",
|
||
"L(p) = \\prod_{i=1}^{n}P\\{X=x_{i}\\} = \\prod_{i=1}^{n}p^{x_{i}}(1-p)^{1 - x_{i}} = p^{\\sum_{i=1}^{n}x_{i}}(1-p)^{\\sum_{i=1}^{n}(1-x_{i})}\n",
|
||
"$$\n",
|
||
"出现连乘,取对数\n",
|
||
"$$\n",
|
||
"ln L(p) = ln[p^{\\sum_{i=1}^{n}x_{i}}(1-p)^{\\sum_{i=1}^{n}(1-x_{i})}] = ln[p^{\\sum_{i=1}^{n}x_{i}}] + ln [(1-p)^{\\sum_{i=1}^{n}(1-x_{i})}] = \\sum_{i=1}^{n}x_{i}ln p + \\sum_{i=1}^{n}(1-x_{i})ln(1-p)\n",
|
||
"$$\n",
|
||
"3. 令$\\frac{d}{dp}ln L(p) = 0$, 解得\n",
|
||
"$$\n",
|
||
"\\hat{p} = \\frac{1}{n}\\sum_{i=1}{n}x_{i} = \\overline{x}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(求解上题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"p的真实值:0.4\n",
|
||
"p的最大似然估计值:389/1000\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sympy import *\n",
|
||
"from sympy.abc import p\n",
|
||
"from scipy.stats import bernoulli\n",
|
||
"\n",
|
||
"p_real = 0.4\n",
|
||
"\n",
|
||
"x = bernoulli.rvs(p_real, size=1000)\n",
|
||
"\n",
|
||
"Lp = p ** sum(x) * (1-p)**sum(1-x)\n",
|
||
"\n",
|
||
"dLp = diff(Lp, p, 1)\n",
|
||
"p_estimate = solve(dLp)\n",
|
||
"# 寻找符合要求的p值\n",
|
||
"for i in p_estimate:\n",
|
||
" if i > 0 and i < 1:\n",
|
||
" p_e = i\n",
|
||
"print(\"p的真实值:{}\".format(p_real))\n",
|
||
"print(\"p的最大似然估计值:{}\".format(p_e))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子二:设$X\\sim N(\\mu, \\sigma^{2}), \\mu, \\sigma^{2}$为未知参数,$x_{1}, x_{2}, \\cdots, x_{n}$是来自$X$的一个样本值,求$\\mu, \\sigma^{2}$的最大似然估计量。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"1. 确定随机变量的概率密度函数\n",
|
||
"$$\n",
|
||
"f(x;\\mu, \\sigma^{2}) = \\frac{1}{\\sqrt{2\\pi }\\sigma}e^{-\\frac{(x-\\mu)^{2}}{2\\sigma^{2}}}\n",
|
||
"$$\n",
|
||
"2. 确定似然函数\n",
|
||
"$$\n",
|
||
"L(\\mu, \\sigma) = \\prod_{i=1}^{n} f(x_{i};\\mu, \\sigma^{2}) = \\prod_{i=1}^{n}(\\frac{1}{\\sqrt{2\\pi }\\sigma}e^{-\\frac{(x_{i}-\\mu)^{2}}{2\\sigma^{2}}}) = (\\frac{1}{\\sqrt{2\\pi }\\sigma})^{n}e^{-\\sum_{i=1}^{n}\\frac{(x_{i}-\\mu)^{2}}{2\\sigma^{2}}}\n",
|
||
"$$\n",
|
||
"出现指数求和项,整理成对数似然函数\n",
|
||
"$$\n",
|
||
"ln L(\\mu, \\sigma) = ln [(\\frac{1}{\\sqrt{2\\pi }\\sigma})^{n}e^{-\\sum_{i=1}^{n}\\frac{(x_{i}-\\mu)^{2}}{2\\sigma^{2}}}] = -n ln\\sqrt{2\\pi }\\sigma - \\sum_{i=1}^{n}\\frac{(x_{i}-\\mu)^{2}}{2\\sigma^{2}}\n",
|
||
"$$\n",
|
||
"3. 令$\\frac{\\partial }{\\partial \\mu}ln L(\\mu, \\sigma) = 0, \\frac{\\partial }{\\partial \\sigma}ln L(\\mu, \\sigma) = 0$,建立似然方程组\n",
|
||
"$$\n",
|
||
"\\left \\{\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\frac{\\partial }{\\partial \\mu}ln L(\\mu, \\sigma) = -2 \\sum_{i=1}^{n}\\frac{(x_{i}-\\mu)}{2\\sigma^{2}} = 0 \\\\\n",
|
||
"&\\frac{\\partial }{\\partial \\sigma}ln L(\\mu, \\sigma) = - \\frac{n}{\\sigma} + \\sum_{i=1}^{n}\\frac{(x_{i}-\\mu)^{2}}{\\sigma^{3}} = 0\n",
|
||
"\\end{aligned}\n",
|
||
"\\right.\n",
|
||
"$$\n",
|
||
"4. 解得\n",
|
||
"$$\n",
|
||
"\\left \\{\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\mu = \\frac{1}{n}\\sum_{i=1}^{n}x_{i} \\\\\n",
|
||
"&\\sigma^{2} = \\frac{1}{n}\\sum_{i=1}^{n}(x_{i} - \\mu)^{2}\n",
|
||
"\\end{aligned}\n",
|
||
"\\right.\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(验证上题结果)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"随机变量的原均值为:2, 方差为:9\n",
|
||
"最大似然估计的均值为:2.01, 方差为:9.39\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sympy import *\n",
|
||
"from sympy.abc import mu, sigma \n",
|
||
"from scipy.stats import norm\n",
|
||
"\n",
|
||
"mu_real = 2\n",
|
||
"sigma_real = 3\n",
|
||
"n = 1000\n",
|
||
"\n",
|
||
"x = norm.rvs(loc=mu_real, scale=sigma_real, size=n)\n",
|
||
"\n",
|
||
"mu_estimate = sum(x) / n \n",
|
||
"sigma2_estimate = sum((x - mu_estimate)**2) / n\n",
|
||
"\n",
|
||
"print(\"随机变量的原均值为:{}, 方差为:{}\".format(mu_real, sigma_real**2))\n",
|
||
"print(\"最大似然估计的均值为:{:.2f}, 方差为:{:.2f}\".format(mu_estimate,sigma2_estimate))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 💡最大似然估计性质:设$\\theta$的函数$u=u(\\theta), \\theta \\in \\Theta$具有*单值反函数*$\\theta=\\theta(u), u\\in \\vartheta$.又假设$\\hat{\\theta}$是$X$的概率分布中参数$\\theta$的最大似然估计,则$\\hat{u}=\\hat{u}(\\theta)$是$u(\\theta)$的最大似然估计,这一性质称为最大似然估计的**不变性**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.7 估计量的评选标准:无偏性、有效性、相合性(一致性)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"设$X_{1}, X_{2}, \\cdots, X_{n}$是总体$X$的一个样本,$\\theta \\in \\Theta$是包含在总体$X$的分布中的待估计参数,$\\Theta 是\\theta$的取值范围。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 无偏性:若估计量$\\hat{\\theta} = \\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$的数学期望$E(\\hat{\\theta})$存在,且对于任意的$\\theta \\in \\Theta$有\n",
|
||
"$$\n",
|
||
"E(\\hat{\\theta}) = \\theta\n",
|
||
"$$\n",
|
||
"则称$\\hat{\\theta}$为$\\theta$的**无偏估计量**。即估计值的期望等于待估计参数的真实值。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 估计量的无偏性是说,对于总体的样本值,由某个估计量得到的估计值与真值存在偏差,但反复将这估计量使用多次得到多个估计值,这多个估计值的期望(平均)与真值之间的偏差$E(\\hat{\\theta})-\\theta$为零。在科学技术中$E(\\hat{\\theta})-\\theta$称为以$E(\\hat{\\theta})$作为$\\theta$的估计的系统误差。无偏估计的实际意义就是无系统误差。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 有效性:假设要比较参数$\\theta$的两个无偏估计量$\\hat{\\theta_{1}}{和}\\hat{\\theta_{2}}$,如果在样本容量$n$相同的情况下,$\\hat{\\theta_{1}}$的观察值较$\\hat{\\theta_{2}}$更密集在真值$\\theta$的附近,则认为$\\hat{\\theta_{1}}{比}\\hat{\\theta_{2}}$更为理想。又由于方差是随机变量取值与其数学期望(此时数学期望$E(\\hat{\\theta_{1}})=E(\\hat{\\theta_{2}}) = \\theta$)的偏离程度的度量。所以无偏估计以方差小者为好。设$\\hat{\\theta_{1}} = \\hat{\\theta_{1}}(X_{1}, X_{2}, \\cdots, X_{n})$与$\\hat{\\theta_{2}} = \\hat{\\theta_{2}}(X_{1}, X_{2}, \\cdots, X_{n})$都是$\\theta$的无偏估计量,若对于任意$\\theta \\in \\Theta$,有\n",
|
||
"$$\n",
|
||
"D(\\hat{\\theta_{1}}) \\le D(\\hat{\\theta_{2}})\n",
|
||
"$$\n",
|
||
"且至少对于某一个$\\theta \\in \\Theta$上式中的不等号成立,则称$\\hat{\\theta_{1}}$较$\\hat{\\theta_{2}}$**有效**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 相合性(一致性):设$\\hat{\\theta} = \\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$为参数$\\theta$的估计量,若对于任意$\\theta \\in \\Theta$,当$n \\rightarrow \\infty$时$\\hat{\\theta} = \\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$依概率收敛于$\\theta$,则称$\\hat{\\theta}$为$\\theta$的**相合估计量**。即对于任意$\\varepsilon > 0$,有\n",
|
||
"$$\n",
|
||
"\\lim_{n\\rightarrow \\infty}P\\{| \\hat{\\theta} - \\theta| < \\varepsilon\\} = 1\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.8 参数估计之区间估计"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 背景:对于一个未知量或未知参数的估计,常不以得到近似值为满足,还需估计误差,即要求知道近似值的精确程度(也就是所求真值的所在范围)。类似地,对于未知参数$\\theta$,除了求出它的点估计$\\hat{\\theta}$外,还希望估计出一个范围,并希望知道这个范围包含参数$\\theta$真值的可信程度。这样的范围通常以区间的形式给出,同时还给出此区间包含参数$\\theta$真值的可信程度,这种形式的估计称为**区间估计**。这样的区间即所谓**置信区间**。\n",
|
||
"\n",
|
||
"* 置信区间:设总体$X$的分布函数$F(x;\\theta)$含有一个未知参数$\\theta, \\theta \\in \\Theta(\\Theta{是}\\theta 可能取值的范围)$,对于给定值$\\alpha(0 < \\alpha < 1)$,若来自$X$的样本$X_{1}, X_{2}, \\cdots, X_{n}$确定的两个统计量$\\underline{\\theta} = \\underline{\\theta}(X_{1}, X_{2}, \\cdots, X_{n}) {和}\\overline{\\theta} = \\overline{\\theta}(X_{1}, X_{2}, \\cdots, X_{n}), (\\underline(\\theta)<\\overline{\\theta})$,对于任意$\\theta \\in \\Theta$满足\n",
|
||
"$$\n",
|
||
"P\\{\\underline{\\theta}(X_{1}, X_{2}, \\cdots, X_{n}) <\\theta <\\overline{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})\\} \\ge 1 -\\alpha\n",
|
||
"$$\n",
|
||
"则称随机区间$(\\underline{\\theta}, \\overline{\\theta})$是$\\theta$置信水平为$1 -\\alpha$的**置信区间**,$\\underline{\\theta} {和} \\overline{\\theta}$分别称为置信水平为$1 -\\alpha$的双侧置信区间的**置信下限和置信上限**,$1 -\\alpha$称为置信水平。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  上式的含义为:对总体进行多次重复抽样(每次抽样的样本容量相同,均为$n$),每一次抽样都确定待估计参数的一个区间$(\\underline{\\theta}, \\overline{\\theta})$,每个这样的区间有两种可能,即要么包含待估计的真值$\\theta$,要么不包含待估计的真值$\\theta$。根据大数定理,包含真值$\\theta$的区间数量约占$100(1-\\alpha)\\%$,不包含真值$\\theta$的区间数量约占$100\\alpha \\%$。例如,若$\\alpha=0.01$,反复抽样$1000$次,则得到的$1000$个区间中不包含$\\theta$真值的约为$10$个。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:设总体$X \\sim N(\\mu, \\sigma^{2})$,$\\sigma^{2}$为已知,$\\mu$为未知,设$X_{1}, X_{2}, \\cdots, X_{n}$是来自$X$的样本,求$\\mu$的置信水平为$1-\\alpha$的置信区间。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"\n",
|
||
"由于$\\overline{X}{是}X$的无偏估计,且\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\sim N(0, 1)\n",
|
||
"$$\n",
|
||
"$\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}}$所服从的分布$N(0, 1)$不依赖于任何未知参数,按标准正态分布的上$\\alpha$分位点定义,有\n",
|
||
"$$\n",
|
||
"P\\left\\{\\left|\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}}\\right| < z_{\\alpha/2}\\right\\} = 1-\\alpha\n",
|
||
"$$\n",
|
||
"即\n",
|
||
"$$\n",
|
||
"P\\left\\{ \\overline{X} - \\frac{\\sigma}{\\sqrt{n}}z_{\\alpha/2}< \\mu < \\overline{X} + \\frac{\\sigma}{\\sqrt{n}}z_{\\alpha/2}\\right\\} = 1-\\alpha\n",
|
||
"$$\n",
|
||
"由定义知,$\\mu$的一个置信水平为$1-\\alpha$的置信区间为:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"(\\overline{X} - \\frac{\\sigma}{\\sqrt{n}}z_{\\alpha/2}, \\overline{X} + \\frac{\\sigma}{\\sqrt{n}}z_{\\alpha/2})\n",
|
||
"$$\n",
|
||
"或写为\n",
|
||
"$$\n",
|
||
"(\\overline{X} \\pm \\frac{\\sigma}{\\sqrt{n}}z_{\\alpha/2})\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(通过例题理解置信区间)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"在100个置信区间里,有89个置信区间包含未知参数mu\n",
|
||
"包含未知参数mu的置信区间数89>=95.0[区间数x(1-alpha)]\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 720x576 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"%matplotlib inline \n",
|
||
"plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']\n",
|
||
"plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号\n",
|
||
"from scipy.stats import norm \n",
|
||
"\n",
|
||
"# 令上题中 mu = 0.1 sigma = 1, alpha = 0.05,然后估计 参数 mu 的置信区间\n",
|
||
"\n",
|
||
"def get_confident_interval(mu, sigma, n, interval_num, alpha):\n",
|
||
" confident_intervals = []\n",
|
||
" for i in range(interval_num):\n",
|
||
" x = norm.rvs(loc=mu, scale=sigma, size=n)\n",
|
||
" # 置信区间上限计算 因为alpha/2 的分位点z_alpha/2 是从负无穷到-z_alpha/2 进行积分,因此得到的分位点需要加一个负号\n",
|
||
" right = np.sum(x)/n - (sigma/np.sqrt(n))*norm.ppf(loc=mu, scale=sigma, q=alpha/2)\n",
|
||
" # 置信区间下限计算\n",
|
||
" left = np.sum(x)/n + (sigma/np.sqrt(n))*norm.ppf(loc=mu, scale=sigma, q=alpha/2)\n",
|
||
" confident_intervals.append((left, right))\n",
|
||
" return confident_intervals\n",
|
||
"\n",
|
||
"mu, sigma = 0.1, 1\n",
|
||
"n, alpha = 1000, 0.05\n",
|
||
"interval_num = 100\n",
|
||
"confident_intervals = get_confident_interval(mu, sigma, n, interval_num, alpha)\n",
|
||
"count = 0\n",
|
||
"plt.figure(figsize=(10, 8))\n",
|
||
"for idx, temp in enumerate(confident_intervals):\n",
|
||
" plt.vlines(x=idx+1, ymin=temp[0], ymax=temp[1])\n",
|
||
" plt.scatter(x=np.array([idx+1]*2),y=np.array([temp[0], temp[1]]), c='r')\n",
|
||
" if mu >= temp[0] and mu <= temp[1]:\n",
|
||
" count += 1\n",
|
||
"\n",
|
||
"\n",
|
||
"print(\"在{}个置信区间里,有{}个置信区间包含未知参数mu\".format(interval_num, count))\n",
|
||
"print(\"包含未知参数mu的置信区间数{}>={}[区间数x(1-alpha)]\".format(count, interval_num*(1-alpha)))\n",
|
||
"plt.axhline(y=0.1, ls='--', c='r')\n",
|
||
"plt.show()\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 寻求未知参数$\\theta$的置信区间的具体步骤:\n",
|
||
"> - 1. 寻求一个**枢轴量$W$**,枢轴量的分布不依赖于参数$\\theta$以及其它未知参数。枢轴量是关于样本$X_{1}, X_{2}, \\cdots, X_{n}$和未知参数$\\theta$的函数,即$W =W(X_{1}, X_{2}, \\cdots, X_{n};\\theta)$,其中$X_{1}, X_{2}, \\cdots, X_{n}$为已知。\n",
|
||
"> - 2. 对于给定的置信水平$1-\\alpha$,定出两个常数$a, b$,使$$P\\left\\{ a <W(X_{1}, X_{2}, \\cdots, X_{n};\\theta) < b\\right \\} = 1-\\alpha $$则$(a, b)$既是未知参数$\\theta$的一个置信水平为$1-\\alpha$置信区间。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🧭表1.8:正态总体均值、方差的置信区间与单侧置信限(置信水平为$1-\\alpha$)\n",
|
||
"\n",
|
||
"| |待估计参数|其他参数|枢轴量的分布|置信区间|单侧置信限|\n",
|
||
"|:---:|:---:|:---:|:---:|:---:|:---:|\n",
|
||
"|一个正态总体|$\\mu$|$\\sigma^{2}已知$|$Z=\\frac{\\bar{X}-\\mu}{\\sigma / \\sqrt{n}} \\sim N(0,1)$|$\\left(\\bar{X} \\pm \\frac{\\sigma}{\\sqrt{n}} z_{a / 2}\\right)$|$\\bar{\\mu}=\\bar{X}+\\frac{\\sigma}{\\sqrt{n}} z_{\\alpha} \\quad \\underline{\\mu}=\\bar{X}-\\frac{\\sigma}{\\sqrt{n}} x_{\\alpha}$|\n",
|
||
"|一个正态总体|$\\mu$|$\\sigma^{2}未知$|$t=\\frac{\\bar{X}-\\mu}{S / \\sqrt{n}} \\sim t(n-1)$|$\\left(\\bar{X} \\pm \\frac{S}{\\sqrt{n}} t_{\\alpha / 2}(n-1)\\right)$|$\\bar{\\mu}=\\bar{X}+\\frac{S}{\\sqrt{n}} t_{\\alpha}(n-1) \\quad \\underline{\\mu}=\\bar{X}-\\frac{S}{\\sqrt{n}} t_{\\alpha}(n-1)$|\n",
|
||
"|一个正态总体|$\\sigma^{2}$|$\\mu未知$|$\\chi^{2} =\\frac{(n-1) S^{2}}{\\sigma^{2}} \\sim \\chi^{2}(n-1)$|$\\left(\\bar{X} \\pm \\frac{S}{\\sqrt{n}} t_{\\alpha / 2}(n-1)\\right)$|$\\overline{\\sigma^{2}}=\\frac{(n-1) S^{2}}{\\chi_{1-\\alpha}^{2}(n-1)} \\quad \\underline{ \\sigma^{2}}=\\frac{(n-1) S^{2}}{\\chi_{\\alpha}^{2}(n-1)}$|\n",
|
||
"|两个正态总体|$\\mu_{1}-\\mu_{2}$|$\\sigma{1}^{2}, \\sigma{2}^{2}已知$|$\\begin{aligned}Z &=\\frac{\\bar{X}-\\bar{Y}-\\left(\\mu_{1}-\\mu_{2}\\right)}{\\sqrt{\\frac{\\sigma_{1}^{2}}{n_{1}}+\\frac{\\sigma_{2}^{2}}{n_{2}}}} \\\\& \\sim N(0,1)\\end{aligned}$|$\\left(\\bar{X}-\\bar{Y} \\pm z_{\\alpha / 2} \\sqrt{\\frac{\\sigma_{1}^{2}}{n_{1}}+\\frac{\\sigma_{2}^{2}}{n_{2}}}\\right)$|$\\begin{aligned}{l}\\overline{\\mu_{1}-\\mu_{2}}=\\bar{X}-\\bar{Y}+z_{\\alpha} \\sqrt{\\frac{\\sigma_{1}^{2}}{n_{1}}+\\frac{\\sigma_{2}^{2}}{n_{2}}}\\\\\\underline{\\mu_{1}-\\mu_{2}}=\\bar{X}=\\bar{Y}-z_{\\alpha} \\sqrt{\\frac{\\sigma_{1}^{2}}{n_{1}}+\\frac{\\sigma_{2}^{2}}{n_{2}}}\\end{aligned}$|\n",
|
||
"|两个正态总体|$\\mu_{1}-\\mu_{2}$|$\\sigma{1}^{2}=\\sigma{2}^{2}=\\sigma^{2}未知$|$\\begin{array}{c}t=\\frac{(\\bar{X}-\\bar{Y})-\\left(\\mu_{1}-\\mu_{2}\\right)}{S_{w} \\sqrt{\\frac{1}{n_{1}}+\\frac{1}{n_{2}}}} \\\\\\sim t\\left(n_{1}+n_{2}-2\\right) \\\\S_{w}^{2}=\\frac{\\left(n_{1}-1\\right) S_{1}^{2}+\\left(n_{2}-1\\right) S_{2}^{2}}{n_{1}+n_{2}-2}\\end{array}$|$\\left(\\bar{X}-\\bar{Y} \\pm t_{\\alpha / 2}\\left(n_{1}+n_{2}-\\right.\\right.2) \\left.S_{w} \\sqrt{\\frac{1}{n_{1}}+\\frac{1}{n_{2}}}\\right) $|$\\begin{array}{l}\\overline{\\mu_{1}-\\mu_{2}}=\\bar{X}-\\bar{Y} \\\\\\quad+t_{\\alpha}\\left(n_{1}+n_{2}-2\\right) S_{w} \\sqrt{\\frac{1}{n_{1}}+\\frac{1}{n_{2}}} \\\\\\underline{\\mu_{1}-\\mu_{2}}=\\bar{X}-\\bar{Y} \\\\\\quad-t_{\\alpha}\\left(n_{1}+n_{2}-2\\right) S_{w} \\sqrt{\\frac{1}{n_{1}}+\\frac{1}{n_{2}}}\\end{array}$|\n",
|
||
"|两个正态总体|$\\frac{\\sigma{1}^{2}}{\\sigma{2}^{2}}$|$\\mu_{1}, \\mu_{2}未知$|$\\begin{aligned}F &=\\frac{S_{1}^{2} / S_{2}^{2}}{\\sigma_{1}^{2} / \\sigma_{2}^{2}} \\\\& \\sim F\\left(n_{1}-1, n_{2}-1\\right)\\end{aligned}$|$\\begin{array}{c}\\left(\\frac{S_{1}^{2}}{S_{2}^{2}} \\frac{1}{F_{a / 2}\\left(n_{1}-1, n_{2}-1\\right)}\\right. ,\\\\\\left.\\frac{S_{1}^{2}}{S_{2}^{2}} \\frac{1}{F_{1-\\alpha / 2}\\left(n_{1}-1, n_{2}-1\\right)}\\right)\\end{array}$|$\\begin{aligned}&\\frac{\\overline{\\sigma_{1}^{2}}}{\\sigma_{2}^{2}}= \\frac{S_{1}^{2}}{S_{2}^{2}}\\frac{1}{F_{1-\\alpha / 2}\\left(n_{1}-1, n_{2}-1\\right)} \\\\ &\\frac{\\sigma_{1}^{2}}{\\underline{\\sigma_{2}^{2}}} = \\frac{S_{1}^{2}}{S_{2}^{2}} \\frac{1}{F_{a / 2}\\left(n_{1}-1, n_{2}-1\\right)} \\end{aligned}$|"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.9 假设检验之基本思想"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 假设检验问题:在总体的分布函数完全未知或只知其形式、但不知其参数的情况下,为了推断总体的某些未知特性,便提出关于总体的假设,例如,假设某糖果车间生产的糖果的均值为$0.5kg$;需要根据已知的样本(例如某一天生产的糖果)对所提出的假设(判断糖果均值是否为$0.5kg$)作出是接受(糖果均值是$0.5kg$)还是拒绝(糖果均值不是$0.5kg$)的决策。假设检验就是做出这一决策的过程。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 🔥引入:某生产葡萄糖车间,袋装糖的质量是一个随机变量,服从正态分布,在车间正常工作时,袋装糖质量的均值为$0.5kg$,标准差为$0.015kg$,某日车间生产了$9$袋葡糖糖,其袋装质量为:\n",
|
||
"$$\n",
|
||
"0.479 \\quad 0.506 \\quad 0.518 \\quad 0.524 \\quad 0.498 \\quad 0.511 \\quad 0.520 \\quad 0.515 \\quad 0.512\n",
|
||
"$$\n",
|
||
"问当日机器是否正常工作?\n",
|
||
"\n",
|
||
"假设标准差是一意已知量,现在要判断当日机器是否正常工作即判断当日所生产的样本的均值是否等于正常工作时的均值即可,于是,提出如下假设\n",
|
||
"$$\n",
|
||
"H_{0}: \\mu = 0.5 ; \\quad H_{1}:\\mu \\ne 0.5\n",
|
||
"$$\n",
|
||
"假设$H_{0}$表示当日生产的样本的均值等于正常工作时的均值,即机器正常工作,假设$H_{1}$表示当日生产的样本的均值不等于正常工作时的均值,即机器不正常工作。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 🦊分析:**如何对上述假设做出决策呢?根据引入知,假设是对未知参数做出假设,要通过样本对未知参数做出的假设进行决策,则需要构造一个关于样本和未知参数的函数,上一节区间估计的计算也有和这里相似的一步,没错,就是枢轴量,可以通过枢轴量将样本和未知参数联系起来,不过在假设检验里将枢轴量称为检验统计量**,在本例中构造检验统计量(枢轴量)\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}}\n",
|
||
"$$\n",
|
||
"其实,只判断$|\\overline{x} - \\mu|$的大小也可以做出做出决策,但取上述的检验统计量会使决策的过程更加简便化。\n",
|
||
"\n",
|
||
"因为,当$H_{0}$为真时,$\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\sim N(0, 1)$。而判断$|\\overline{x} - \\mu|$的大小也可以转化为判断$\\frac{\\overline{x} - \\mu}{\\sigma/\\sqrt{n}}$的大小。适当选定一正数$k$,使当观察值$\\overline{x}$满足$\\frac{\\overline{x} - \\mu}{\\sigma/\\sqrt{n}}\\ge k$时就拒绝$H_{0}$,反之,$\\frac{\\overline{x} - \\mu}{\\sigma/\\sqrt{n}}< k$,就接受$H_{0}$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  然而,由于做出决策的依据是总体中的样本,所以$H_{0}$为真时也可能做出拒绝$H_{0}$的决策(*即做出的决策是错误的决策,例如,判断假设为假的,但实际假设是真的*),将犯这种错误的概率记为\n",
|
||
"$$\n",
|
||
"P\\{当H_{0}为真时拒绝H_{0}\\} \\quad 或 P_{\\theta}\\{拒绝H_{0}\\}, \\theta 为假设的未知参数\n",
|
||
"$$\n",
|
||
"希望将犯这类错误控制在一定范围内,即给出一个较小的数$\\alpha(0<\\alpha<1)$,使犯这类错误的概率不超过$\\alpha$,即\n",
|
||
"$$\n",
|
||
"P\\{当H_{0}为真时拒绝H_{0}\\} \\le \\alpha\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  为确定刚才需要选定的正数$k$,允许犯这类错误的最大概率为$\\alpha$,将上式取等号\n",
|
||
"$$\n",
|
||
"P\\{当H_{0}为真时拒绝H_{0}\\} = \\alpha\n",
|
||
"$$\n",
|
||
"则对于例题,即使\n",
|
||
"$$\n",
|
||
"P\\{\\left | \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\right| \\ge k\\} = \\alpha\n",
|
||
"$$\n",
|
||
"由于$\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\sim N(0, 1)$,所以$k$取$z_{\\alpha/2}$便于查表。即\n",
|
||
"$$\n",
|
||
"\\left | \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\right| \\ge k 时\n",
|
||
"$$\n",
|
||
"则拒绝假设$H_{0}$,若\n",
|
||
"$$\n",
|
||
"\\left | \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\right| < k 时\n",
|
||
"$$\n",
|
||
"则接受假设$H_{0}$,$\\alpha$称为**显著性水平**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(检验`引入`例题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"检验统计量1.84 < 1.96, 接受假设H0, 则mu = 0.5, 机器工作正常\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from scipy.stats import norm \n",
|
||
"import numpy as np \n",
|
||
"\n",
|
||
"# 显著性水平 alpha = 0.05\n",
|
||
"# H0: mu = 0.5; H1: mu != 0.5; \n",
|
||
"# sigma = 0.015\n",
|
||
"\n",
|
||
"x = [0.479, 0.506, 0.518, 0.524, 0.498, 0.511, 0.520, 0.515, 0.512]\n",
|
||
"\n",
|
||
"mu = 0.5\n",
|
||
"sigma = 0.015 \n",
|
||
"alpha = 0.05\n",
|
||
"\n",
|
||
"k = - norm.ppf(loc=0, scale=1, q=alpha/2)\n",
|
||
"check = np.abs((np.mean(x) - mu)/(sigma/np.sqrt(9)))\n",
|
||
"\n",
|
||
"if check >= k:\n",
|
||
" print(\"检验统计量{:.2f} >= {:.2f}, 拒绝假设H0, 则mu != 0.5, 机器工作不正常\".format(check, k))\n",
|
||
"else:\n",
|
||
" print(\"检验统计量{:.2f} < {:.2f}, 接受假设H0, 则mu = 0.5, 机器工作正常\".format(check, k))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  对于上述例题,假设的未知参数$\\theta$为等于或不等于某一假设常量,形如这类的假设检验称为**双边假设检验**。还有一些其它的问题要求假设的未知参数$\\theta$大于等于或小于等于某一假设常量,分别将其称为**左边检验**和**右边检验**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 左边检验:即检验假设\n",
|
||
"$$\n",
|
||
"H_{0}:\\theta \\le \\theta_{0}; \\quad H_{1}:\\theta > \\theta_{0}\n",
|
||
"$$\n",
|
||
"* 右边检验:即检验假设\n",
|
||
"$$\n",
|
||
"H_{0}:\\theta \\ge \\theta_{0}; \\quad H_{1}:\\theta < \\theta_{0}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* **讨论单边检验的拒绝域**:\n",
|
||
"\n",
|
||
"🔥(右边检验)设总体$X \\sim N(\\mu, \\sigma^{2})$,$\\mu {未知}, \\sigma {已知}$,$X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$X$的样本,给定显著性书平$\\alpha$,求检验问题\n",
|
||
"$$\n",
|
||
"H_{0}:\\mu \\le \\mu_{0}; \\quad H_{1}:\\mu > \\mu_{0}\n",
|
||
"$$\n",
|
||
"的拒绝域(默认$H_{0}$)。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"求拒绝域即寻找满足$H_{1}$假设的区间,因$H_{0}$中的$\\mu$都比$H_{1}$中的$\\mu$要小,当拒绝$H_{0}$时,观察值$\\overline{x}$会向假设$H_{1}$靠拢,即观察值$\\overline{x}$会偏大,因此,拒绝域的形式为\n",
|
||
"$$\n",
|
||
"\\overline{x} \\ge k (k为适当正数)\n",
|
||
"$$\n",
|
||
"接下来即确定正数$k$\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"P\\left\\{ H_{0}为真时拒绝H_{0}\\right\\} &= P\\left\\{\\overline{X} \\ge k \\right \\} \\\\\n",
|
||
"& = P\\left\\{ \\frac{\\overline{X} - \\mu_{0}}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\} \\\\ \n",
|
||
"&\\le P\\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\}\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"上式不等号成立是由于$\\mu \\le \\mu_{0}, \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}$,事件$\\left\\{ \\frac{\\overline{X} - \\mu_{0}}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right \\} \\subset \\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right \\}$,要控制$P\\{H_{0}为真时拒绝H_{0}\\} \\le \\alpha$,只需令\n",
|
||
"$$\n",
|
||
"P\\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\} = \\alpha\n",
|
||
"$$\n",
|
||
"由于 $\\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\sim N(0, 1)$,则令$\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}} = z_{\\alpha}$分位点。得检验问题得拒绝域为\n",
|
||
"$$\n",
|
||
"\\overline{x} \\ge k =\\sigma/\\sqrt{n}z_{\\alpha} + \\mu_{0} \\\\\n",
|
||
"$$\n",
|
||
"即\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{x} - \\mu_{0}}{\\sigma/\\sqrt{n}} \\ge z_{\\alpha}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥(左边检验)设总体$X \\sim N(\\mu, \\sigma^{2})$,$\\mu {未知}, \\sigma {已知}$,$X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$X$的样本,给定显著性书平$\\alpha$,求检验问题\n",
|
||
"$$\n",
|
||
"H_{0}:\\mu \\ge \\mu_{0}; \\quad H_{1}:\\mu < \\mu_{0}\n",
|
||
"$$\n",
|
||
"的拒绝域(默认$H_{0}$)。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"\n",
|
||
"拒绝$H_{0}$时,均值$\\mu$向$H_{1}$靠拢,即$\\mu$的无偏估计$\\overline{x}$较小,取一适当正数$k$,$H_{0}$的拒绝域形式为:\n",
|
||
"$$\n",
|
||
"\\overline{x} \\le k\n",
|
||
"$$\n",
|
||
"\n",
|
||
"当$H_{0}$为真时\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{X}- \\mu_{0}}{\\sigma/\\sqrt{n}} \\sim N(0, 1)\n",
|
||
"$$\n",
|
||
"确定$k$的取值\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"P\\left\\{ H_{0}为真时拒绝H_{0}\\right\\} &= P\\left\\{\\overline{x} \\le k\\right\\} \\\\\n",
|
||
"& = P\\left\\{\\frac{\\overline{X}- \\mu_{0}}{\\sigma/\\sqrt{n}} \\le \\frac{\\overline{k}- \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\} \\\\\n",
|
||
"& \\le P\\left\\{\\frac{\\overline{X}- \\mu}{\\sigma/\\sqrt{n}} \\le \\frac{\\overline{k}- \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\}, (H_{0}为真时,\\mu \\ge \\mu_{0}) \n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"取\n",
|
||
"$$\n",
|
||
"P\\left\\{\\frac{\\overline{X}- \\mu}{\\sigma/\\sqrt{n}} \\le \\frac{\\overline{k}- \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\} = \\alpha\n",
|
||
"$$\n",
|
||
"则 \n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\frac{\\overline{k}- \\mu_{0}}{\\sigma/\\sqrt{n}} = -z_{\\alpha} \\\\\n",
|
||
"& k = -z_{\\alpha}\\sigma/\\sqrt{n} + - \\mu_{0}\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"得拒绝域\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\overline{x} \\le k= -z_{\\alpha}\\sigma/\\sqrt{n} + \\mu_{0} \\\\\n",
|
||
"&\\frac{\\overline{x}- \\mu_{0}}{\\sigma/\\sqrt{n}} \\le -z_{\\alpha}\n",
|
||
"\\end{aligned}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 💡讨论:在右边检验中,为什么令$\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}} = z_{\\alpha}$,而在左边检验中令$\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}} = -z_{\\alpha}$? \n",
|
||
"> 因为(左边检验)$$P\\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\le \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\} = \\int_{-\\infty}^{\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}}f(x)dx, \\quad (f(x)为x的概率密度函数)$$要使,$$P\\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\le \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\}=\\alpha$$则取$$\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}} = -z_{\\alpha}$$ \n",
|
||
">同理(右边检验)$$P\\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\} = \\int_{\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}}^{\\infty}f(x)dx, \\quad (f(x)为x的概率密度函数)$$要使,$$P\\left\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\right\\}=\\alpha$$则取$$\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}} = z_{\\alpha}$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 这里只讨论了犯第一类错误得情况(即H_{0}为真时拒绝H_{0}),犯第二类错误(即H_{0}不真时接受H_{0})情况的分析过程与第一类相似,掌握一种即可,这种只限制第一类错误概率,而不考虑第二类错误概率的假设检验称为**显著性检验(显著性水平为$\\alpha$)**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:公司购买了$5$批牛奶,现公司怀疑牛奶掺了水,已知天然牛奶的冰点温度服从正太分布,均值为$-0.545$,标准差为$0.008$,牛奶掺水可视牛奶的冰点温度升高至近似水的冰点温度,测得公司购买的$5$批牛奶的冰点温度均值$\\overline{x}=-0.535$,问牛奶是否掺了水?取$\\alpha=0.05$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"1. 作出假设\n",
|
||
"$$\n",
|
||
"H_{0}: \\mu \\le \\mu_{0}=-0.545; \\quad H_{1}: \\mu > \\mu_{0}=-0.545\n",
|
||
"$$\n",
|
||
"问题即是判断假设$H_{0}$是否为真,通过假设可知该问题是**右边检验**问题\n",
|
||
"\n",
|
||
"2. 找拒绝域\n",
|
||
"当拒绝$H_{0}$时,$\\mu$大于$\\mu_{0}$,因此,拒绝域的形式为\n",
|
||
"$$ \\overline{x} \\ge k$$\n",
|
||
"\n",
|
||
"3. 确定常数$k$\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"P\\left\\{H_{0}为真时拒绝H_{0}\\right\\} &= P\\left\\{ \\overline{x} \\ge k \\right\\} \\\\\n",
|
||
"& = P\\{ \\frac{\\overline{X} - \\mu_{0}}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\} \\\\\n",
|
||
"&\\le P\\{ \\frac{\\overline{X} - \\mu}{\\sigma/\\sqrt{n}} \\ge \\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}\\} \\quad (\\mu \\le \\mu_{0})\\\\\n",
|
||
"&= \\alpha\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"4. 取$\\frac{k - \\mu_{0}}{\\sigma/\\sqrt{n}}=z_{\\alpha}$,得\n",
|
||
"$$\n",
|
||
"k = \\sigma/\\sqrt{n}z_{\\alpha} + \\mu_{0}\n",
|
||
"$$\n",
|
||
"即\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&\\overline{x} \\ge \\sigma/\\sqrt{n}z_{\\alpha} + \\mu_{0} \\\\\n",
|
||
"&\\frac{\\overline{x} - \\mu_{0}}{\\sigma/\\sqrt{n}} \\ge z_{\\alpha}\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"时拒绝$H_{0}$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(求解例题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"检验统计量-0.5350 >= -0.5391, 拒绝假设H0, 即牛奶掺了水\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from scipy.stats import norm \n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"n = 5\n",
|
||
"alpha = 0.05 \n",
|
||
"mu, sigma = -0.545, 0.008\n",
|
||
"x_mean = -0.535\n",
|
||
"\n",
|
||
"z_alpha = - norm.ppf(loc=0, scale=1, q=0.05)\n",
|
||
"k = (sigma/np.sqrt(n))*z_alpha + mu \n",
|
||
"\n",
|
||
"if x_mean >= k:\n",
|
||
" print(\"检验统计量{:.4f} >= {:.4f}, 拒绝假设H0, 即牛奶掺了水\".format(x_mean, k))\n",
|
||
"else:\n",
|
||
" print(\"检验统计量{:.4f} < {:.4f}, 接受假设H0, 即牛奶未掺水\".format(x_mean, k))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.10 假设检验之正态总体参数的假设检验"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 单个总体$N(0, 1)$均值$\\mu$的检验"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  (1) **$\\sigma^{2}$已知,关于$\\mu$的检验($Z$检验)**:上面已经讨论过这种情况,这种情况下,检验统计量为$Z = \\frac{\\overline{X}-\\mu_{0}}{\\sigma/\\sqrt{n}}$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  (2) **$\\sigma^{2}$未知,关于$\\mu$的检验($t$检验)**:设总体$X\\sim N(\\mu, \\sigma^{2})$,$\\mu, \\sigma^{2}$未知,求检验问题\n",
|
||
"$$\n",
|
||
"H_{0}: \\mu =\\mu_{0}; \\quad H_{1}:\\mu \\ne \\mu_{0}\n",
|
||
"$$\n",
|
||
"的拒绝域(显著性水平为$\\alpha$)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  🦊:这时,令检验统计量为\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{X}-\\mu_{0}}{S/\\sqrt{n}} , \\quad (S为样本标准差)\n",
|
||
"$$\n",
|
||
"根据之前学的定理知,$H_{0}$为真时:\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{X}-\\mu_{0}}{S/\\sqrt{n}} \\sim t(n-1)\n",
|
||
"$$\n",
|
||
"选定适当正数$k$\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"P\\left\\{ H_{0}为真时拒绝H_{0}\\right\\} &= P\\left\\{ \\left| \\frac{\\overline{X}-\\mu_{0}}{S/\\sqrt{n}} \\right| \\ge k \\right\\}\n",
|
||
"&= \\alpha\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"取$k = t_{\\alpha/2}(n-1)$\n",
|
||
"则拒绝域为\n",
|
||
"$$\n",
|
||
"\\left| \\frac{\\overline{X}-\\mu_{0}}{S/\\sqrt{n}} \\right | \\ge k = t_{\\alpha/2}(n-1)\n",
|
||
"$$\n",
|
||
"这类检验方法称为 **$t$检验法**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:某元件寿命$X(单位:h)$服从正态分布$N(\\mu, \\sigma^{2}),\\mu, \\sigma^{2}$未知,现测得16只元件寿命如下:\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"& 159 \\quad 280 \\quad 101 \\quad 212 \\quad 224 \\quad 379 \\quad 179 \\quad 264 \\\\\n",
|
||
"& 222 \\quad 362 \\quad 168 \\quad 250 \\quad 149 \\quad 260 \\quad 485 \\quad 170\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"问元件的平均寿命是否大于$225h$(取$\\alpha = 0.05$)?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"\n",
|
||
"1. 作出假设\n",
|
||
"$$\n",
|
||
"H_{0}: \\mu \\le \\mu_{0} = 225; \\quad H_{1}: \\mu > \\mu_{0} = 225\n",
|
||
"$$\n",
|
||
"拒绝$H_{0}$时,观察值$\\overline{x}$较大,取一适当正数$k$,拒绝域的形式为\n",
|
||
"$$\n",
|
||
"\\overline{x} \\ge k\n",
|
||
"$$\n",
|
||
"2. 确定正数$k$\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"P\\left\\{H_{0}为真时拒绝H_{0}\\right\\} &= P\\left\\{\\overline{X} \\ge k\\right\\} \\\\\n",
|
||
"&= P\\left\\{\\frac{\\overline{X} - \\mu_{0}}{S/\\sqrt{n}} \\ge \\frac{k- \\mu_{0}}{S/\\sqrt{n}}\\right\\} \\\\\n",
|
||
"&\\le P\\left\\{\\frac{\\overline{X} - \\mu}{S/\\sqrt{n}} \\ge \\frac{k- \\mu_{0}}{S/\\sqrt{n}}\\right\\}, \\quad (\\mu \\le \\mu_{0}) \\\\\n",
|
||
"&= \\alpha\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"令$\\frac{k- \\mu_{0}}{S/\\sqrt{n}} = t_{\\alpha}(n-1)$\n",
|
||
"\n",
|
||
"4. 则拒绝域为\n",
|
||
"$$\n",
|
||
"\\overline{x} \\ge k = t_{\\alpha}(n-1)S/\\sqrt{n} + \\mu_{0}\n",
|
||
"$$\n",
|
||
"即\n",
|
||
"$$\n",
|
||
"\\frac{\\overline{x} - \\mu_{0}}{S/\\sqrt{n}} \\ge t_{\\alpha}(n-1)\n",
|
||
"$$\n",
|
||
"若满足上列不等式时,拒绝$H_{0}$,接受$H_{1}$,即元件的平均寿命大于$225h$,反之,则接受$H_{0}$,拒绝$H_{1}$,即元件的平均寿命小于$225h$。\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(求解上题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"检验统计量 : 0.6685 < 1.7531 (t_alpha(n-1)), 接受H0, 拒绝H1, 即元件平均寿命小于 225 h\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from scipy.stats import t\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"# H0: mu <= 225; H1: mu > 225, mu0 = 225, alpha=0.05\n",
|
||
"alpha = 0.05\n",
|
||
"n = 16\n",
|
||
"mu0 = 225\n",
|
||
"x = [159, 280, 101, 212, 224, 379, 179, 264, 222, 362, 168, 250, 149, 260, 485, 170]\n",
|
||
"x_mean = np.mean(x)\n",
|
||
"S = np.sqrt(np.sum(np.square(x - x_mean))/(n-1))\n",
|
||
"\n",
|
||
"check = (x_mean - mu0)/(S/np.sqrt(n))\n",
|
||
"t_alpha = - t.ppf(df=n-1, q=alpha)\n",
|
||
"\n",
|
||
"if check >= t_alpha:\n",
|
||
" print(\"检验统计量 : {:.4f} >= {:.4f} (t_alpha(n-1)), 拒绝H0, 接受H1, 即元件平均寿命大于 225 h\".format(check, t_alpha))\n",
|
||
"else:\n",
|
||
" print(\"检验统计量 : {:.4f} < {:.4f} (t_alpha(n-1)), 接受H0, 拒绝H1, 即元件平均寿命小于 225 h\".format(check, t_alpha))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 两个正态总体均值差的检验($t$检验)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"设$X_{1}, X_{2}, \\cdots, X_{n1}$是来自正态总体$N(\\mu_{1}, \\sigma_{1}^{2})$的样本,$Y_{1}, Y_{2}, \\cdots, Y_{n2}$是来自正态总体$N(\\mu_{2}, \\sigma_{2}^{2})$的样本,设两样本独立,样本均值和方差分别为$\\overline{X}, \\overline{Y}, S_{1}^{2}, S_{2}^{2}$,求检验问题\n",
|
||
"$$\n",
|
||
"H_{0}:\\mu_{1} - \\mu_{2} = \\delta; \\quad H_{1}:\\mu_{1} - \\mu_{2} \\ne \\delta\n",
|
||
"$$\n",
|
||
"$\\delta$为常数,显著性水平取$\\alpha$\n",
|
||
"\n",
|
||
"  **(1)$\\mu_{1}, \\mu_{2}$未知, $\\sigma_{1}^{2} = \\sigma_{2}^{2} = \\sigma^{2}$未知**,取检验统计量\n",
|
||
"$$\n",
|
||
"t = \\frac{\\overline{X} - \\overline{Y} - \\delta}{S_{w}\\sqrt{\\frac{1}{n1}+\\frac{1}{n2}}}, \\quad S_{w}^{2} = \\frac{(n1-1)S_{1}^{2}+(n2-1)S_{2}^{2}}{n1+n2-2} , \\quad S_{w} = \\sqrt{S_{w}^{2}}\n",
|
||
"$$\n",
|
||
"拒绝域为\n",
|
||
"$$\n",
|
||
"\\frac{\\left|\\overline{x} - \\overline{y} - \\delta \\right |}{s_{w}\\sqrt{\\frac{1}{n1}+\\frac{1}{n2}}} \\ge t_{\\alpha/2}(n1+n2-2)\n",
|
||
"$$\n",
|
||
"  **(2)$\\mu_{1}, \\mu_{2}$未知, $\\sigma_{1}^{2}, \\sigma_{2}^{2}$已知**,取检验统计量\n",
|
||
"$$\n",
|
||
"Z = \\frac{\\overline{X} - \\overline{Y} - \\delta}{\\sqrt{\\frac{\\sigma_{1}^{2}}{n1} + \\frac{\\sigma_{2}^{2}}{n1}}}\n",
|
||
"$$\n",
|
||
"拒绝域为\n",
|
||
"$$\n",
|
||
"\\frac{\\left|\\overline{x} - \\overline{y} - \\delta\\right|}{\\sqrt{\\frac{\\sigma_{1}^{2}}{n1} + \\frac{\\sigma_{2}^{2}}{n1}}} \\ge z_{\\alpha/2}\n",
|
||
"$$\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 正态总体方差的假设检验(单个总体的情况)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  设总体$X \\sim N(\\mu, \\sigma^{2}),\\mu, \\sigma^{2}$均未知,$X_{1}, X_{2}, \\cdots, X_{n}$,是来自$X$的样本,要求检验假设\n",
|
||
"$$\n",
|
||
"H_{0}:\\sigma^{2}=\\sigma_{0}^{2}; \\quad H_{1}=\\sigma^{2} \\ne \\sigma_{0}^{2}\n",
|
||
"$$\n",
|
||
"$\\sigma_{0}^{2}$为已知常数\n",
|
||
"\n",
|
||
"由于$S^{2}$是$\\sigma^{2}$的无偏估计,当$H_{0}$为真时,观察值$s^{2}$与$\\sigma^{2}$应该在$1$附近摆动,而不应过分大于或小于$1$,由前面学习的定理知,当$H_{0}$为真时\n",
|
||
"$$\n",
|
||
"\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\sim \\chi^{2}(n-1)\n",
|
||
"$$\n",
|
||
"取检验统计量\n",
|
||
"$$\n",
|
||
"\\chi^{2} = \\frac{(n-1)S^{2}}{\\sigma_{0}^{2}}\n",
|
||
"$$\n",
|
||
"如上所说,上述问题的拒绝域具有以下的形式:\n",
|
||
"$$\n",
|
||
"\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\le k_{1} \\quad 或 \\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\ge k_{2}\n",
|
||
"$$\n",
|
||
"其中$k_{1}, k_{2}$的值由下式确定\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&P\\left\\{H_{0}为真时拒绝H_{0}\\right\\} \\\\\n",
|
||
"&P\\left\\{(\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\le k_{1}) \\cup (\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\ge k_{2}) \\right\\}\n",
|
||
"& = \\alpha\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"为方便计算,取\n",
|
||
"$$\n",
|
||
"P\\left\\{\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\le k_{1}\\right\\} =\\frac{\\alpha}{2} P\\left\\{ \\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\ge k_{2}\\right\\}=\\frac{\\alpha}{2}\n",
|
||
"$$\n",
|
||
"得$k_{1}=\\chi_{1-\\alpha/2}^{2}(n-1), k_{2}=\\chi_{\\alpha/2}^{2}(n-1)\\quad (这里因为\\chi^{2}分布的图像不是对称的)$ ,于是得拒绝域为\n",
|
||
"$$\n",
|
||
"\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\le \\chi_{1-\\alpha/2}^{2}(n-1) {或} \\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\ge \\chi_{\\alpha/2}^{2}(n-1)\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"📕**表1.10 正态总体均值、方差的检验法(显著性水平为$\\alpha$)**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<style>\n",
|
||
"table\n",
|
||
"{\n",
|
||
" margin: auto;\n",
|
||
"}\n",
|
||
"</style>\n",
|
||
"\n",
|
||
"| |原假设$H_{0}$|检验统计量|备择假设$H_{1}$|拒绝域|\n",
|
||
"|:--:|:--:|:--:|:--:|:--:|\n",
|
||
"|$1$|$\\begin{aligned} &\\mu \\leqslant \\mu_{0} \\\\ &\\mu \\geqslant \\mu_{0} \\\\ &\\mu=\\mu_{0}\\\\ &\\left(\\sigma^{2}\\right. 已知)\\end{aligned}$|$ Z=\\frac{\\bar{X}-\\mu_{0}}{\\sigma / \\sqrt{n}}$|$\\begin{aligned} &\\mu>\\mu_{0} \\\\ & \\mu<\\mu_{0} \\\\ & \\mu\\ne\\mu_{0}\\end{aligned}$|$\\begin{aligned} &z\\ge z_{\\alpha} \\\\ &z \\le -z_{\\alpha} \\\\ & \\|z\\| \\ge z_{\\alpha / 2}\\end{aligned}$|\n",
|
||
"|$2$|$\\begin{aligned}&\\mu \\leqslant \\mu_{0} \\\\ &\\mu \\geqslant \\mu_{0} \\\\ &\\mu=\\mu_{0}\\\\ &\\left(\\sigma^{2}\\right. 未知)\\end{aligned}$|$\\begin{aligned} t =\\frac{\\overline{X}-\\mu_{0}}{S/\\sqrt{n}} \\end{aligned}$|$\\begin{aligned} &\\mu>\\mu_{0} \\\\ & \\mu<\\mu_{0} \\\\ & \\mu\\ne\\mu_{0}\\end{aligned}$|$\\begin{aligned}& t \\geqslant t_{\\alpha}(n-1) \\\\&t \\leqslant-t_{\\alpha}(n-1) \\\\& \\|t\\| \\geqslant t_{\\alpha / 2}(n-1) \\\\ \\end{aligned}$|\n",
|
||
"|$3$|$\\begin{aligned} &\\mu_{1}-\\mu_{2}\\le \\delta \\\\&\\mu_{1}-\\mu_{2}\\ge \\delta \\\\&\\mu_{1}-\\mu_{2}= \\delta\\\\&(\\sigma_{1}^{2},\\sigma_{2}^{2}已知) \\end{aligned}$|$\\begin{aligned}Z=\\frac{\\overline{X}-\\overline{Y}-\\delta}{\\sqrt{\\frac{\\sigma_{1}^{2}}{n1}+\\frac{\\sigma_{1}^{2}}{n2}}} \\end{aligned}$|$\\begin{aligned}&\\mu_{1}-\\mu_{2}> \\delta \\\\&\\mu_{1}-\\mu_{2}< \\delta \\\\&\\mu_{1}-\\mu_{2}\\ne \\delta \\end{aligned}$|$\\begin{aligned} &z\\ge z_{\\alpha} \\\\ & z\\le -z_{\\alpha}\\\\ & \\| z\\| \\ge z_{\\alpha/2}\\end{aligned}$|\n",
|
||
"|$4$|$\\begin{aligned}&\\mu_{1}-\\mu_{2}\\le \\delta \\\\&\\mu_{1}-\\mu_{2}\\ge \\delta \\\\&\\mu_{1}-\\mu_{2}= \\delta\\\\&(\\sigma_{1}^{2}=\\sigma_{2}^{2}=\\sigma^{2}未知) \\end{aligned}$|$\\begin{aligned}&t=\\frac{\\overline{X}-\\overline{Y}-\\delta}{S_{w}\\sqrt{\\frac{1}{n1}+\\frac{1}{n2}}} \\\\ & S_{w}^{2}=\\frac{(n1-1)S_{1}^{2}+(n2-1)S_{2}^{2}}{n1+n2-2} \\end{aligned}$|$\\begin{aligned}&\\mu_{1}-\\mu_{2}> \\delta \\\\&\\mu_{1}-\\mu_{2}< \\delta \\\\&\\mu_{1}-\\mu_{2}\\ne \\delta \\end{aligned}$|$\\begin{aligned}&t\\ge t_{\\alpha}(n1+n2-2) \\\\&t \\le -t_{\\alpha}(n1+n2-2)\\\\&\\|t\\| \\ge t_{\\alpha/2}(n1+n2-2) \\end{aligned}$|\n",
|
||
"|$5$|$\\begin{aligned}&\\sigma_{1}^{2}\\le \\sigma_{2}^{2} \\\\&\\sigma_{1}^{2}\\ge \\sigma_{2}^{2}\\\\ &\\sigma_{1}^{2}= \\sigma_{2}^{2}\\\\ &(\\mu 未知) \\end{aligned}$|$\\begin{aligned}\\chi^{2}=\\frac{(n-1)S^{2}}{\\sigma_{0}^{2}} \\end{aligned}$|$\\begin{aligned}&\\sigma_{1}^{2}>\\sigma_{2}^{2} \\\\&\\sigma_{1}^{2}< \\sigma_{2}^{2}\\\\ &\\sigma_{1}^{2}\\ne \\sigma_{2}^{2} \\end{aligned}$|$\\begin{aligned}&\\chi^{2}\\ge \\chi_{\\alpha}^{2}(n-1) \\\\&\\chi^{2}\\le \\chi_{1-\\alpha}^{2}(n-1) \\\\&\\chi^{2}\\ge \\chi_{\\alpha/2}^{2}(n-1)或\\\\&\\chi^{2}\\le \\chi_{1-\\alpha/2}^{2}(n-1) \\end{aligned}$|\n",
|
||
"|$6$|$\\begin{aligned}&\\sigma_{1}^{2}\\le \\sigma_{2}^{2} \\\\&\\sigma_{1}^{2}\\ge \\sigma_{2}^{2}\\\\ &\\sigma_{1}^{2}= \\sigma_{2}^{2}\\\\ &(\\mu_{1},\\mu_{2}未知)\\end{aligned}$|$\\begin{aligned}F=\\frac{S_{1}^{2}}{S_{2}^{2}} \\end{aligned}$|$\\begin{aligned}&\\sigma_{1}^{2}>\\sigma_{2}^{2} \\\\&\\sigma_{1}^{2}< \\sigma_{2}^{2}\\\\ &\\sigma_{1}^{2}\\ne \\sigma_{2}^{2} \\end{aligned}$|$\\begin{aligned}&F\\ge F_{\\alpha}(n1-1, n2-1) \\\\&F\\le F_{1-\\alpha}(n1-1, n2-1) \\\\ & F\\ge F_{\\alpha/2}(n1-1, n2-1)或\\\\&F\\le F_{1-\\alpha/2}(n1-1, n2-1)\\end{aligned}$|\n",
|
||
"|$7$|$\\begin{aligned}&\\mu_{D}\\le 0\\\\&\\mu_{D}\\ge 0 \\\\&\\mu_{D}= 0\\\\&(成对数据)\\end{aligned}$|$\\begin{aligned}t=\\frac{\\overline{D}-0}{S_{D}/\\sqrt{n}} \\end{aligned}$|$\\begin{aligned}&\\mu_{D}> 0\\\\&\\mu_{D}< 0 \\\\&\\mu_{D}\\ne 0 \\end{aligned}$|$\\begin{aligned}&t\\ge t_{\\alpha}(n-1) \\\\&t\\le -t_{\\alpha}(n-1)\\\\&\\|t\\|\\ge t_{\\alpha/2}(n-1) \\end{aligned}$|"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> 置信区间与假设检验之间的关系, 假设置信区间$(\\underline{\\theta}, \\overline{\\theta})$,显著性水平为$\\alpha$\n",
|
||
"> - 1. 检验问题$$H_{0}: \\theta=\\theta_{0};\\quad H_{0}: \\theta\\ne \\theta_{0}$$ 若检验统计量在区间$(\\underline{\\theta}, \\overline{\\theta})$内,则接受$H_{0}$,否则,拒绝$H_{0}$。\n",
|
||
"> - 2. 检验问题$$H_{0}: \\theta\\ge \\theta_{0};\\quad H_{0}: \\theta < \\theta_{0}$$ 若检验统计量在区间$(-\\infty, \\overline{\\theta})$内,则接受$H_{0}$,否则,拒绝$H_{0}$。\n",
|
||
"> - 3. 检验问题$$H_{0}: \\theta\\le \\theta_{0};\\quad H_{0}: \\theta > \\theta_{0}$$ 若检验统计量在区间$(\\underline{\\theta}, \\infty)$内,则接受$H_{0}$,否则,拒绝$H_{0}$。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1.11 假设检验之似然比检验与Bootstrap方法"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 似然比检验:\n",
|
||
"\n",
|
||
"  定义(广义似然比):设 $x_{1}, \\cdots, x_{n}$ 为来自密度函数为 $p(x ; \\theta), \\theta \\in \\Theta$ 的样本,考虑检验问题\n",
|
||
" $$\n",
|
||
" H_{0}: \\theta \\in \\Theta_{0} \\quad v s \\quad H_{1}: \\theta \\in \\Theta_{1}=\\Theta-\\Theta_{0}\n",
|
||
" $$ \n",
|
||
" 令\n",
|
||
" $$\\Lambda\\left(x_{1}, \\cdots, x_{n}\\right)=\\frac{\\sup _{\\theta \\in \\Theta} p\\left(x_{1}, \\cdots, x_{n} ; \\theta\\right)}{\\sup _{\\theta \\in \\Theta_{0}} p\\left(x_{1}, \\cdots, x_{n} ; \\theta\\right)} \\quad \\sup 表示最小上界$$\n",
|
||
"那么称它为假设检验问题的广义似然比。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"  由上式可以看出,广义似然比定义下的分子分母都有一个上确界($\\sup$)的符号,仔细观察,分子就相当于在**全参数空间**下取联合概率密度的最大值,分母相当于在**原假设参数空间**下取联合概率密度的 最大值,所以这个比值就是两个极大似然估计的比值。直观来看,如果原假设是正确的,那么参数 应该会落在原假设的参数空间内,换句话说,分子的最大值对应的参数应该落在 $\\Theta_{0}$ ,所以这个 比值就不会太大。但是,反过来说,如果原假设应该被拒绝,那么参数就有很大可能落在拒绝域, 那么全参数空间的最大值就会在 $\\theta \\in \\Theta_{1}$ 中取到,那么这个时候比值就会变大。所以可以看出 来,拒绝域顺理成章的应该设置为\n",
|
||
"$$\n",
|
||
"W=\\left\\{\\Lambda\\left(x_{1}, \\cdots, x_{n}\\right) \\geq c\\right\\}\n",
|
||
"$$\n",
|
||
"其中临界值 $c$ 要满足 $P_{\\theta}\\left(\\Lambda\\left(x_{1}, \\cdots, x_{n}\\right) \\geq c\\right) \\leq \\alpha, \\forall \\theta \\in \\Theta_{0}$. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
">💡 $2ln \\Lambda (\\left(x_{1}, \\cdots, x_{n}\\right))$服从$\\chi^{2}$分布,自由度为独立参数的个数(需要检验参数的维度)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:假设观察某种疾病的发生情况: $n=100$ 人中发生了 $k=10$ 个事件。假定数据服从二项分布, 理论已知人群中每个人发生该事件的概率为 $\\pi_{0}=0.2$ 。试对该假设做似然比假设检验(显著性水平$\\alpha=0.05$)?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:\n",
|
||
"1. 作出假设\n",
|
||
"$$\n",
|
||
"H_{0}: \\pi = \\pi_{0}=0.2; \\quad H_{1}: \\pi\\ne \\pi_{0}=0.2\n",
|
||
"$$\n",
|
||
"2. 写出参数空间\n",
|
||
"$$\n",
|
||
"\\Theta_{0}=\\{\\pi_{0}\\}, \\quad \\Theta = \\{\\pi, \\pi \\in \\textbf{R}\\}\n",
|
||
"$$\n",
|
||
"3. 计算全参数空间($\\Theta$)下的极大似然估计\n",
|
||
"$$\n",
|
||
"\\begin{aligned}\n",
|
||
"&f(x) = C_{n}^{k}p^{k}(1-p)^{n-k}\\\\\n",
|
||
"&ln f(x) = lnC_{n}^{k} + lnp^{k}+ ln(1-p)^{n-k}\\\\\n",
|
||
"&令 \\frac{d}{dp}ln f(x) = 0 \\\\\n",
|
||
"&解得 \\\\\n",
|
||
"&p = \\frac{k}{n}\n",
|
||
"\\end{aligned}\n",
|
||
"$$\n",
|
||
"4. 在$H_{0}$假设下,$p = 0.2$,计算$2ln\\frac{\\sup_{\\theta\\in\\Theta }p(x_{1}, x_{2}, \\cdots, x_{n})}{\\sup_{\\theta\\in\\Theta_{0}} p(x_{1}, x_{2}, \\cdots, x_{n})}$,为\n",
|
||
"$$\n",
|
||
"check = 2ln\\frac{k/n}{0.2}\n",
|
||
"$$\n",
|
||
"5. 因为\n",
|
||
"$$2ln\\frac{\\sup_{\\theta\\in\\Theta }p(x_{1}, x_{2}, \\cdots, x_{n})}{\\sup_{\\theta\\in\\Theta_{0}} p(x_{1}, x_{2}, \\cdots, x_{n})}\\sim \\chi^{2}(1)$$\n",
|
||
"6. 因此,拒绝域为\n",
|
||
"$$\n",
|
||
"check \\ge \\chi_{\\alpha/2}(1) 或 check \\le \\chi_{1 - \\alpha/2}(1)\n",
|
||
"$$\n",
|
||
"7. 判断$check$值是否在拒绝域,若在,则拒绝$H_{0}$,否则接受$H_{0}$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(求解上题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"似然比统计量为:-1.3862943611198906\n",
|
||
"拒绝域为(0, 0.0009820691171752555)或(5.023886187314888, oo)\n",
|
||
"拒绝假设H0, 即每个人发生该事件得概率不为0.2\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import numpy as np \n",
|
||
"from scipy.stats import chi2 \n",
|
||
"\n",
|
||
"p0 = 0.2\n",
|
||
"n = 100\n",
|
||
"k = 10\n",
|
||
"alpha = 0.05\n",
|
||
"check = 2 * np.log((k/n)/p0)\n",
|
||
"left = chi2.ppf(df=1, q=alpha/2)\n",
|
||
"right = chi2.ppf(df=1, q=(1-alpha/2))\n",
|
||
"print(\"似然比统计量为:{}\".format(check))\n",
|
||
"print(\"拒绝域为(0, {})或({}, oo)\".format(left, right))\n",
|
||
"if check <= left or right >= right:\n",
|
||
" print(\"拒绝假设H0, 即每个人发生该事件得概率不为0.2\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* Bootstrap方法:在上述方法中,如果分布的自由度难以确定,这个方法将难以进行下去。换句话说,当碰到某个统计量的分布难以确定或者未知的时候如何做假设检验呢?Bootstrap方法就是在这一背景下产生的。设总体得分布$F$未知,但已经有一个容量为$n$得来自分布$F$的数据样本,自这一样本按放回抽样的方法抽取一个容量为$n$的样本,这种样本称为**bootstrap**样本或自助样本,相继地、独立地自原始样本中抽取很多个bootstrap样本,利用这些样本对总体$F$进行统计推断,这种方法称为**非参数bootstrap方法**。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"> bootstrap置信区间:设$X=X_{1}, X_{2}, \\cdots, X_{n}$是来自总体$F$容量为$n$的样本,$x=x_{1}, x_{2}, \\cdots, x_{n}$是一个已知的样本值,$F$中含有未知参数$\\theta$,$\\hat{\\theta}=\\hat{\\theta}(X_{1}, X_{2}, \\cdots, X_{n})$是$\\theta$的估计量,$\\theta$的置信水平为$1-\\alpha$(显著性水平为$\\alpha$)的置信区间为:相继地、独立地从样本$x=x_{1}, x_{2}, \\cdots, x_{n}$中抽出B个容量为n的bootstrap样本,对于每个样本求出的$\\theta$的bootstrap估计:$\\hat{\\theta_{1}}, \\hat{\\theta_{2}}, \\cdots, \\hat{\\theta_{n}}$,将他们从小到大排序:$$\\hat{\\theta_{(1)}}<\\hat{\\theta_{(2)}}< \\cdots< \\hat{\\theta_{(n)}}$$ 置信区间取$$(\\hat{\\theta}(k_{1}), \\hat{\\theta}(k_{2}))$$其中,$k_{1}=[B \\frac{\\alpha}{2}], k_{2}=[B (1-\\frac{\\alpha}{2})], \\quad ([ \\cdot]表示取整)$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🔥例子:某工厂生产以发光产品,发光产品的发光时长服从正态分布 $N\\left(\\mu, \\sigma^{2}\\right)$ ,产品的发光时长设定均值为 $250 \\mathrm{~h}$ 。现在从一批产品中抽取 $10$ 个产品,测得发光时长为(单位 为: $ h $) :\n",
|
||
"\n",
|
||
"$$248.8, \\quad 249.2, \\quad 250.7, \\quad 251.2, \\quad 248.0, \\quad 253.0, \\quad 248.9, \\quad 250.2, \\quad 251.2, \\quad 249.2$$\n",
|
||
"\n",
|
||
"问该厂的发光产品是否符合要求(显著性水平$\\alpha = 0.05$)?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"🦊解:该问题为左边检验,单侧置信区间的形式应为$(a, +oo)$\n",
|
||
"\n",
|
||
"1. 作出假设\n",
|
||
"$$\n",
|
||
"H_{0}:\\mu \\ge \\mu_{0}=250; \\quad H_{1}:\\mu < \\mu_{0}=250\n",
|
||
"$$\n",
|
||
"2. 确定检验统计量$\\overline{x}$\n",
|
||
"3. 进行$B$次bootstrap采样并计算参数$\\mu_{i}, (i=1, 2, \\cdots, B)$后从小到大排序$\\mu_{(1)}<\\mu_{(2)}<\\cdots<\\mu_{(B)}$\n",
|
||
"4. 取$\\alpha/2, 1-\\alpha/2$分位点,得到置信区间$(\\hat{\\theta}(k_{1}), \\hat{\\theta}(k_{2}))(这是双边检验的置信区间,这里须修改为单边)$, $k_{1}=[B \\frac{\\alpha}{2}], k_{2}=[B (1-\\frac{\\alpha}{2})], \\quad ([ \\cdot]表示取整)$\n",
|
||
"5. 判断检验统计量是否在置信区间内,若在接受$h_{0}$,否则,拒绝$H_{0}$.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"python代码(计算上题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"检验统计量为:250.03999999999996\n",
|
||
"拒绝域为:(-oo, 249.32000000000002]\n",
|
||
"接受H0, 该厂的发光产品符合要求\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"# 例题为左边检验,单侧置信区间为(k1, oo),拒绝域为(-oo, k1]\n",
|
||
"\n",
|
||
"mu0 = 250\n",
|
||
"alpha = 0.05\n",
|
||
"B = 1000\n",
|
||
"x = [248.8, 249.2, 250.7, 251.2, 248.0, 253.0, 248.9, 250.2, 251.2, 249.2]\n",
|
||
"x_mean= np.mean(x)\n",
|
||
"params = []\n",
|
||
"for i in range(B):\n",
|
||
" x_resample = np.random.choice(x, len(x), replace=True)\n",
|
||
" params.append(np.mean(x_resample))\n",
|
||
"\n",
|
||
"params = np.sort(params)\n",
|
||
"\n",
|
||
"k1 = int(B*alpha)\n",
|
||
"left = params[k1-1]\n",
|
||
"# right = np.percentile(params, (1-alpha)*100-1)\n",
|
||
"print(\"检验统计量为:{}\".format(x_mean))\n",
|
||
"print(\"拒绝域为:(-oo, {}]\".format(left))\n",
|
||
"if x_mean > left:\n",
|
||
" print(\"接受H0, 该厂的发光产品符合要求\")\n",
|
||
"else:\n",
|
||
" print(\"拒绝H0, 该厂的发光产品不符合要求\")"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3.10.5 ('.venv': venv)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.5"
|
||
},
|
||
"orig_nbformat": 4,
|
||
"vscode": {
|
||
"interpreter": {
|
||
"hash": "a0798e59729acf8f16dd74a5eaff0380fbaae7e20dddaf23e3973bf13b41dad4"
|
||
}
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|