手機(jī)站首頁散文詩歌雜文隨筆日記小小說

散文網(wǎng) » 生活 »日常 » R語言參數(shù)自抽樣法Bootstrap：估計(jì)MSE、經(jīng)驗(yàn)功效、杰克刀Jackknife、非參數(shù)自抽樣法可

R語言參數(shù)自抽樣法Bootstrap：估計(jì)MSE、經(jīng)驗(yàn)功效、杰克刀Jackknife、非參數(shù)自抽樣法可

2022-07-30 21:06 作者:拓端tecdat 0人讀過 | 我要投稿

全文鏈接：http://tecdat.cn/?p=27695?

原文出處：拓端數(shù)據(jù)部落公眾號(hào)

?相關(guān)視頻：什么是Bootstrap自抽樣及應(yīng)用R語言線性回歸預(yù)測(cè)置信區(qū)間實(shí)例

什么是Bootstrap自抽樣及R語言Bootstrap線性回歸預(yù)測(cè)置信區(qū)間

，時(shí)長(zhǎng)05:38

參數(shù)引導(dǎo)：估計(jì) MSE

統(tǒng)計(jì)學(xué)問題：級(jí)別(k\)修剪后的平均值的MSE是多少？

我們?nèi)绾位卮鹚汗烙?jì)從標(biāo)準(zhǔn)柯西分布（t 分布 w/df = 1）生成的大小為 20 的隨機(jī)樣本的水平 $k$ 修剪均值的 MSE。目標(biāo)參數(shù) $\theta$ 是中心或中位數(shù)?？挛鞣植疾淮嬖诰?。在表中總結(jié) MSE 的估計(jì)值 $k = 1, 2, ... 9$。

result=rep(0,9)
for(j in 1:9){
n<-20
for(i in 1:m){
x<-sort(rcauchy(n))

參數(shù)自抽樣法：經(jīng)驗(yàn)功效計(jì)算

統(tǒng)計(jì)問題：隨著零假設(shè)與現(xiàn)實(shí)之間的差異發(fā)生變化，功效如何變化？

我們?nèi)绾位卮穑豪L制 t 檢驗(yàn)的經(jīng)驗(yàn)功效曲線。

t 檢驗(yàn)的原假設(shè)是?

。另一種選擇是

。

您將從具有

?的正態(tài)分布總體中抽取大小為 20 的樣本。您將使用 0.05 的顯著性水平。

顯示當(dāng)總體的實(shí)際平均值從 350 變?yōu)?650（增量為 10）時(shí)，功效如何變化。

y 軸是經(jīng)驗(yàn)功效（通過 bootstrap 估計(jì)），x 軸是 $\mu$ 的不同值（350、360、370 … 650）。

x <- rnorm(n, mean = muA, sd = sigma) #抽取平均值=450的樣本
ts <- t.test(x, mu = mu0) #對(duì)無效的mu=500進(jìn)行t檢驗(yàn)
ts$p.value

?

參數(shù)自抽樣法：經(jīng)驗(yàn)功效計(jì)算

統(tǒng)計(jì)問題：樣本量如何影響功效？

我們?nèi)绾位卮穑簞?chuàng)建更多的功效曲線，因?yàn)閷?shí)際均值在 350 到 650 之間變化，但使用大小為 n = 10、n = 20、n = 30、n = 40 和 n = 50 的樣本生成它們。同一圖上的所有 5 條功效曲線。

pvals <- replicate(m, pvalue())
power <- mean(pvals <= 0.05)
points(sequence,final2[2,],col="red",pch=1)
points(sequence,final2[3,],col="blue",pch=2)

?

參數(shù)自抽樣法：經(jīng)驗(yàn)置信水平

統(tǒng)計(jì)問題：在制作 95% CI 時(shí)，如果我們的樣本很小并且不是來自正態(tài)分布，我們是否仍有 95% 的置信度？

我們?nèi)绾位卮鹚焊鶕?jù)樣本為總體的平均值創(chuàng)建一堆置信區(qū)間 (95%)。

您的樣本大小應(yīng)為 16，取自具有 2 個(gè)自由度的卡方分布。

找出未能捕捉總體真實(shí)均值的置信區(qū)間的比例。（提醒：自由度為 $k$ 的卡方分布的平均值為 $k$。）

for(i in 1:m){
samp=rchisq(n,df=2)
mean=mean(samp)
sd=sd(samp)
upper=mean+qt(0.975,df=15)*sd/4

?

非參數(shù)自抽樣法置信區(qū)間

統(tǒng)計(jì)問題：基于一個(gè)樣本，我們可以為總體相關(guān)性創(chuàng)建一個(gè)置信區(qū)間嗎？

我們?nèi)绾位卮穑簽橄嚓P(guān)統(tǒng)計(jì)量創(chuàng)建一個(gè) bootstrap t 置信區(qū)間估計(jì)。

boot.ti <-
function(x, B = 500, R = 100, level = .95, stattic){
x <- as.matrix(x)
library(boot) ? ? ? #for boot and boot.ci
data(law, package = "bootstrap")
dat <- law
ci <- boot.t.ci(dat, statistic = stat, B=2000, R=200)
ci

?

自抽樣法后的Jackknife

統(tǒng)計(jì)問題：R 的標(biāo)準(zhǔn)誤差的 bootstrap 估計(jì)的標(biāo)準(zhǔn)誤差是多少？

我們?nèi)绾位卮鹚?data(law)?像上一個(gè)問題一樣使用。在 bootstrap 后執(zhí)行 Jackknife 以獲得標(biāo)準(zhǔn)誤差估計(jì)的標(biāo)準(zhǔn)誤差估計(jì)。（bootstrap 用于獲得總體中 R 的 SE 的估計(jì)值。然后使用折刀法獲得該 SE 估計(jì)值的 SE。）

indices <- matrix(0, nrow = B, ncol = n)
# 進(jìn)行自舉
for(b in 1:B){
i <- sample(1:n, size = n, replace = TRUE)
LSAT <- law$LSAT[i]
# ?jackknife
for(i in 1:n){
keepers <- function(k){
!any(k == i)
}

?

自測(cè)題

Submit the rendered HTML file. Make sure all requested output (tables, graphs, etc.) appear in your document when you submit.

Parametric Bootstrap: Estimate MSE

Statistical question: What is the MSE of a level?$k$?trimmed mean?

How we can answer it: Estimate the MSE of the level?$k$?trimmed mean for random samples of size 20 generated from a standard Cauchy distribution (t-distribution w/df = 1). The target parameter?$\theta$?is the center or median. The mean does not exist for a Cauchy distribution. Summarize the estimates of MSE in a table for?$k = 1, 2, ... 9$.

Parametric Bootstrap: Empirical Power Calculations

Statistical question: How does power change as the difference between the null hypothes and the reality changes?

How we can answer it: Plot an empirical power curve for a t-test.

The null hypothesis of the t-test is?$\mu = 500$. The alternative is?$\mu \ne 500$.

You will draw samples of size 20, from a normally distributed population with?$\sigma = 100$. You will use a significance level of 0.05.

Show how the power changes as the actual mean of the population changes from 350 to 650 (increments of 10).

On the y-axis will be the empirical power (estimated via bootstrap) and the x-axis will be the different values of?$\mu$?(350, 360, 370 … 650).

Parametric Bootstrap: Empirical Power Calculations

Statistical question: How does sample size affect power?

How we can answer it: Create more power curves as the actual mean varies from 350 to 650, but produce them for using samples of size n = 10, n = 20, n = 30, n = 40, and n = 50. Put all 5 power curves on the same plot.

Parametric Bootstrap: Empirical Confidence Level

Statistical question: When making a 95% CI, are we still 95% confident if our samples are small and do not come from a normal distribution?

How we can answer it: Create a bunch of Confidence Intervals (95%) for the mean of a population based on a sample.

\[\bar{x} \pm t^{*} \times \frac{s}{\sqrt{n}}\]

Your samples should be of size 16, drawn from a chi-squared distribution with 2 degrees of freedom.

Find the proportion of Confidence Intervals that fail to capture the true mean of the population. (Reminder: a chi-squared distribution with?$k$?degrees of freedom has a mean of?$k$.)

Non Parametric Bootstrap Confidence Interval

Statistical question: Based on one sample, can we create a confidence interval for the correlation of the population?

How we can answer it: Create a bootstrap t confidence interval estimate for the correlation statistic.

Jackknife after bootstrap

Statistical question: What is the standard error of the bootstrap estimate of the standard error of R?

How we can answer it: Use?data(law)?like the previous problem. Perform Jackknife after bootstrap to get a standard error estimate of the standard error estimate. (The bootstrap is used to get an estimate of the SE of R in the population. The jackknife is then used to get an SE of that SE estimate.)

最受歡迎的見解

1.使用R語言進(jìn)行METROPLIS-IN-GIBBS采樣和MCMC運(yùn)行

2.R語言中的Stan概率編程MCMC采樣的貝葉斯模型

3.R語言實(shí)現(xiàn)MCMC中的Metropolis–Hastings算法與吉布斯采樣

4.R語言BUGS JAGS貝葉斯分析馬爾科夫鏈蒙特卡洛方法（MCMC）采樣

5.R語言中的block Gibbs吉布斯采樣貝葉斯多元線性回歸

6.R語言Gibbs抽樣的貝葉斯簡(jiǎn)單線性回歸仿真分析

7.R語言用Rcpp加速M(fèi)etropolis-Hastings抽樣估計(jì)貝葉斯邏輯回歸模型的參數(shù)

8.R語言使用Metropolis- Hasting抽樣算法進(jìn)行邏輯回歸

9.R語言中基于混合數(shù)據(jù)抽樣(MIDAS)回歸的HAR-RV模型預(yù)測(cè)GDP增長(zhǎng)

標(biāo)簽：