R語言代做編程輔導(dǎo)STA305/1004 Homework 2(附答案)
全文鏈接:https://tecdat.cn/?p=33341
The NHEFS survey was designed to investigate the relationships between clinical, nutritional, and behavioural factors assessed in the first National Health and Nutrition
Examination Survey NHANES I and subsequent morbidity, mortality, and hospital
utilization, as well as changes in risk factors, functional limitation, and institutionalization. For more information see?http://www.cdc.gov/nchs/nhanes/nhefs/nhefs.
htm. This question will involve using this data to estimate the average causal effect
of smoking cessation on weight gain.
(a) Individuals were classified as treated if they reported, being smokers at baseline
in 1971-75, and having quit smoking in the 1982 survey. The latter implies that
the individuals included in our study did not die and were not otherwise lost to
follow-up between baseline and 1982 (otherwise they would not have been able
to respond to the survey). That is, we selected individuals into our study conditional on an event (responding to the 1982 survey) that occurred after the start of smoking cessation. If smoking cessation affects the probability of selection
into the study, we might have selection bias (Hernan, Robins, 2014 Chapter 12,
page 11).
Would a randomized experiment of smoking cessation have this problem? How
could a randomized experiment of smoking cessation be designed? What is
the major difference between the latter randomized experiment and this study
(NHEFS survey)?
(b) Should a statistician be concerned that using the NHEFS data to compare weight
loss in the group of subjects that quit smoking versus those that did not quit
smoking is biased? If yes then state why you think the comparison might be
biased, otherwise state why the comparison is unbiased.
(c) Use R to estimate the propensity score for each subject in the study. Use
the variables: sex, race, age, education.code, smokeintensity, smokers, exercise,
active, wt71 as covariates. After calculating the propensity score use the Match
function in R to match subjects on the propensity score. Does the balance
between the two groups improve after matching? Hand in your R code and
output.
(d) Estimate the effect of smoking cessation on weight loss using propensity score
matching? Did the propensity reduce the bias in estimating the treatment effect?
What assumption can make to conclude that smoking cessation causes weight
loss? Do you think this assumption is valid? Briefly explain. Hand in your R
code and output
prop.model<-glm(qsmk~sex+race+age+education.code+smokeintensity+smokeyrs+exercise+active+wt71, family = binomial(), data = nhefshwdat)
對我們要對總體樣本執(zhí)行廣義回歸模型(logit回歸),以是否戒煙為因變量,性別種族年齡教育程度等8個(gè)變量作為協(xié)變量,然后估計(jì)出每一個(gè)觀測對象戒煙的概率是多少。

可以得到是否戒煙這個(gè)二元邏輯變量與其他協(xié)變量的線性回歸關(guān)系。
?
nhefshwdat$p.qsmk.obs <- ifelse(qsmk == 0, 1 - predict(prop.model, type = "response"),+????????????????????????????????? predict(prop.model, type = "response"))#用上一步得到的模型預(yù)測每一個(gè)觀測對象的戒煙概率為多少,并賦值給p.qsmk.obs這個(gè)變量。X <- prop.model$fitted#對nhefshwdat數(shù)據(jù)集中原始數(shù)據(jù)進(jìn)行擬合Y <- nhefshwdat$wt82_71#Y為觀測對象從71年到82年的體重變化Tr <-nhefshwdat$qsmk#Tr為觀測對象是否戒煙library(Matching)#讀取Matching包rr <-Match(Y=Y,Tr=Tr,X=X,M=1)#使用Match命令,對于每個(gè)戒煙的觀測對象,找出一個(gè)與之具有最接近的概率值的,可是抽煙的觀測對象,使得任何戒煙的觀察對象的對照對象都具有唯一性,換言之,只能1對1匹配。觀測他們的體重變化差異。summary(rr)#
