芝士 · 从-1开始学统计推断

2023-01-31 22:20:49 # 芝士（Knowledge 🧀️） # Statistics

Statistic

The random variable or random vector $Y = T (X_1,\cdots,X_n)$ is called a statistic. The probability distribution of a statistic Y is called the sampling distribution of Y.

In general, we denote statistics with uppercase letters, and the observed values of statistics with lowercase letters.

Expectation, Variance, and Standard Deviation

For a random variable $X$,

The expectation of $X$: $E(X) = \mu$
The variance of $X$: $Var(X) = E(X-E(X))^2 = \sigma^2$
The standard deviation of $X$: $\sqrt{Var(X)} = \sigma$

Properties

$E(X+Y) = E(X) + E(Y)$
$E(aX+b) = aE(X) + b$
$E(XY) = E(X)E(Y) + Cov(X, Y)$
$Var(X) = E(X^2) - (E(X))^2$

http://theanalysisofdata.com/probability/2_3.html

Population vs. Sample

A population of size $N$ is denoted as $X_1, \cdots, X_N$.
A random sample of size $n$ is denoted as $X_1, \cdots, X_n$.

Population Statistic

Population mean: $\mu = \frac{1}{N} \sum_{i=1}^N{X_i}$
Population variance: = $\sigma^2 = \frac{1}{N}\sum_{i=1}^{n}{(X_i - \bar{X})^2}$
Population standard deviation: $\sigma = \sqrt{\sigma^2}$

Sample Statistic

Sample mean: $\bar{X} = \frac{1}{n} \sum_{i=1}^n{X_i}$
Sample variance: $S^2 = \frac{1}{n - 1}\sum_{i=1}^n{(X_i - \bar{X})^2}$
Sample standard deviation: $S = \sqrt{S^2}$

Further, since the sample statistics themselves are random variables, we can compute their expectation and variance. For example,

The expectation of sample mean: $E(\bar{X})$
The variance of sample mean (or the expected error of sample mean): $Var(\bar{X})$

Random Sampling vs. Simple Random Sampling

According to Casella and Berger (2002, p.207 & 210),

Random sampling: Draw n samples from a finite population with replacement. That is, each sample is independently and identically distributed (i.i.d.).
Simple random sampling: Draw n samples from a finite population without replacement. That is, each sample is identically but not independently distributed.

Similar definition appears in Wikipedia:

In small populations and often in large ones, [simple random] sampling is typically done “without replacement“, i.e., one deliberately avoids choosing any member of the population more than once.

In Battacharya, Lin & Patrangenaru (2016), the two cases are called simple random sample with replacement (SRSWR), and simple random sample without replacement (SRSWOR), respectively. The authors use the word “simple” to distinguish them from stratified random samples.

For SRSWR, we can derive that

$E(\bar{X}) = \mu$
$Var(\bar{X}) = \frac{1}{n}\sigma^2$

Similarly, for SRSWOR,

$E(\bar{X}) = \mu$
$Var(\bar{X}) = (\frac{1}{n} - \frac{n-1}{n(N-1)})\sigma^2$

The proofs can be found here.

2023-01-31 22:20:49 # 芝士（Knowledge 🧀️） # Statistics