最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

R語言廣義線性模型索賠頻率預測:過度分散、風險暴露數(shù)和樹狀圖可視化

2021-04-13 10:39 作者:拓端tecdat  | 我要投稿

原文鏈接:http://tecdat.cn/?p=13963

在精算科學和保險費率制定中,考慮到風險敞口可能是一場噩夢。不知何故,簡單的結(jié)果是因為計算起來更加復雜,只是因為我們必須考慮到暴露是一個異構(gòu)變量這一事實。

?

保險費率制定中的風險敞口可以看作是審查數(shù)據(jù)的問題(在我的數(shù)據(jù)集中,風險敞口始終小于1,因為觀察結(jié)果是合同,而不是保單持有人),利息變量是未觀察到的變量,因為我們必須為保險合同定價一年(整年)的保險期。因此,我們必須對保險索賠的年度頻率進行建模。

?

在我們的數(shù)據(jù)集中,我們考慮索賠總數(shù)與總風險承擔比率。例如,如果我們考慮泊松過程,可能性是

?

?

因此,我們有一個預期值的估算,一個自然估算?。

現(xiàn)在,我們需要估算方差,更準確地說是條件變量。

這可以用來檢驗泊松假設是否對頻率建模有效??紤]以下數(shù)據(jù)集,

  1. > ?nombre=rbind(nombre1,nombre2)

  2. > ?baseFREQ = merge(contrat,nombre)

在這里,我們確實有兩個感興趣的變量,即每張合約的敞口,

> ?E <- baseFREQ$exposition

和(觀察到的)索賠數(shù)量(在該時間段內(nèi))

> ?Y <- baseFREQ$nbre

無需協(xié)變量,可以計算每個合同的平均(每年)索賠數(shù)量以及相關(guān)的方差

  1. > (mean=weighted.mean(Y/E,E))

  2. [1] 0.07279295

  3. > (variance=sum((Y-mean*E)^2)/sum(E))

  4. [1] 0.08778567

看起來方差(略)大于平均值(我們將在幾周后看到如何更正式地對其進行測試)??梢栽诒纬钟腥司幼〉牡貐^(qū)添加協(xié)變量,例如人口密度,


  1. Density, zone 11 average = 0.07962411 ?variance = 0.08711477

  2. Density, zone 21 average = 0.05294927 ?variance = 0.07378567

  3. Density, zone 22 average = 0.09330982 ?variance = 0.09582698

  4. Density, zone 23 average = 0.06918033 ?variance = 0.07641805

  5. Density, zone 24 average = 0.06004009 ?variance = 0.06293811

  6. Density, zone 25 average = 0.06577788 ?variance = 0.06726093

  7. Density, zone 26 average = 0.0688496 ? variance = 0.07126078

  8. Density, zone 31 average = 0.07725273 ?variance = 0.09067

  9. Density, zone 41 average = 0.03649222 ?variance = 0.03914317

  10. Density, zone 42 average = 0.08333333 ?variance = 0.1004027

  11. Density, zone 43 average = 0.07304602 ?variance = 0.07209618

  12. Density, zone 52 average = 0.06893741 ?variance = 0.07178091

  13. Density, zone 53 average = 0.07725661 ?variance = 0.07811935

  14. Density, zone 54 average = 0.07816105 ?variance = 0.08947993

  15. Density, zone 72 average = 0.08579731 ?variance = 0.09693305

  16. Density, zone 73 average = 0.04943033 ?variance = 0.04835521

  17. Density, zone 74 average = 0.1188611 ? variance = 0.1221675

  18. Density, zone 82 average = 0.09345635 ?variance = 0.09917425

  19. Density, zone 83 average = 0.04299708 ?variance = 0.05259835

  20. Density, zone 91 average = 0.07468126 ?variance = 0.3045718

  21. Density, zone 93 average = 0.08197912 ?variance = 0.09350102

  22. Density, zone 94 average = 0.03140971 ?variance = 0.04672329

可以可視化該信息

  1. > plot(meani,variancei,cex=sqrt(Ei),col="grey",pch=19,

  2. + xlab="Empirical average",ylab="Empirical variance")

  3. > points(meani,variancei,cex=sqrt(Ei))

?

圓圈的大小與組的大小有關(guān)(面積與組內(nèi)的總暴露量成正比)。第一個對角線對應于泊松模型,即方差應等于均值。也可以考慮其他協(xié)變量

?

或汽車品牌,

?

也可以將駕駛員的年齡視為分類變量

http://freakonometrics.hypotheses.org/files/2013/02/Capture-d%E2%80%99e%CC%81cran-2013-02-01-a%CC%80-10.51.40.png

讓我們更仔細地看一下不同年齡段的人,

?

在右邊,我們可以觀察到年輕的(沒有經(jīng)驗的)駕駛員。那是預料之中的。但是有些類別??低于??第一個對角線:期望的頻率很大,但方差不大。也就是說,我們??可以肯定的??是,年輕的駕駛員會發(fā)生更多的車禍。相反,它不是一個異類:年輕的駕駛員可以看作是一個相對同質(zhì)的類,發(fā)生車禍的頻率很高。

使用原始數(shù)據(jù)集(在這里,我僅使用具有50,000個客戶的子集),我們確實獲得了以下圖形:

?

由于圈正在從18歲下降到25歲,因此具有明顯的經(jīng)驗影響。

同時我們可以發(fā)現(xiàn)有可能將曝光量視為標準變量,并查看系數(shù)實際上是否等于1。如果沒有任何協(xié)變量,



  1. Call:

  2. glm(formula = Y ~ log(E), family = poisson("log"))


  3. Deviance Residuals:

  4. Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max

  5. -0.3988 ?-0.3388 ?-0.2786 ?-0.1981 ?12.9036


  6. Coefficients:

  7. Estimate Std. Error z value Pr(>|z|)

  8. (Intercept) -2.83045 ? ?0.02822 -100.31 ? <2e-16 ***

  9. log(E) ? ? ? 0.53950 ? ?0.02905 ? 18.57 ? <2e-16 ***

  10. ---

  11. Signif. codes: ?0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


  12. (Dispersion parameter for poisson family taken to be 1)


  13. Null deviance: 12931 ?on 49999 ?degrees of freedom

  14. Residual deviance: 12475 ?on 49998 ?degrees of freedom

  15. AIC: 16150


  16. Number of Fisher Scoring iterations: 6

也就是說,該參數(shù)顯然嚴格小于1。它與重要性均不相關(guān),

  1. Linear hypothesis test


  2. Hypothesis:

  3. log(E) = 1


  4. Model 1: restricted model

  5. Model 2: Y ~ log(E)


  6. Res.Df Df ?Chisq Pr(>Chisq)

  7. 1 ?49999

  8. 2 ?49998 ?1 251.19 ?< 2.2e-16 ***

  9. ---

  10. Signif. codes: ?0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

我也沒有考慮協(xié)變量,



  1. Deviance Residuals:

  2. Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max

  3. -0.7114 ?-0.3200 ?-0.2637 ?-0.1896 ?12.7104


  4. Coefficients:

  5. Estimate Std. Error z value Pr(>|z|)

  6. (Intercept) ? ? ? ? ? ? ? ? ?-14.07321 ?181.04892 ?-0.078 0.938042

  7. log(exposition) ? ? ? ? ? ? ? ?0.56781 ? ?0.03029 ?18.744 ?< 2e-16 ***

  8. carburantE ? ? ? ? ? ? ? ? ? ?-0.17979 ? ?0.04630 ?-3.883 0.000103 ***

  9. as.factor(ageconducteur)19 ? ?12.18354 ?181.04915 ? 0.067 0.946348

  10. as.factor(ageconducteur)20 ? ?12.48752 ?181.04902 ? 0.069 0.945011

因此,假設暴露是此處的外生變量可能是一個過強的假設。

接下來我們開始討論建模索賠頻率時的過度分散。在前面,我討論了具有不同暴露程度的經(jīng)驗方差的計算。但是我只使用一個因素來計算類。當然,可以使用更多的因素。例如,使用因子的笛卡爾積,


  1. Class D A (17,24] ?average = 0.06274415 ?variance = 0.06174966

  2. Class D A (24,40] ?average = 0.07271905 ?variance = 0.07675049

  3. Class D A (40,65] ?average = 0.05432262 ?variance = 0.06556844

  4. Class D A (65,101] average = 0.03026999 ?variance = 0.02960885

  5. Class D B (17,24] ?average = 0.2383109 ? variance = 0.2442396

  6. Class D B (24,40] ?average = 0.06662015 ?variance = 0.07121064

  7. Class D B (40,65] ?average = 0.05551854 ?variance = 0.05543831

  8. Class D B (65,101] average = 0.0556386 ? variance = 0.0540786

  9. Class D C (17,24] ?average = 0.1524552 ? variance = 0.1592623

  10. Class D C (24,40] ?average = 0.0795852 ? variance = 0.09091435

  11. Class D C (40,65] ?average = 0.07554481 ?variance = 0.08263404

  12. Class D C (65,101] average = 0.06936605 ?variance = 0.06684982

  13. Class D D (17,24] ?average = 0.1584052 ? variance = 0.1552583

  14. Class D D (24,40] ?average = 0.1079038 ? variance = 0.121747

  15. Class D D (40,65] ?average = 0.06989518 ?variance = 0.07780811

  16. Class D D (65,101] average = 0.0470501 ? variance = 0.04575461

  17. Class D E (17,24] ?average = 0.2007164 ? variance = 0.2647663

  18. Class D E (24,40] ?average = 0.1121569 ? variance = 0.1172205

  19. Class D E (40,65] ?average = 0.106563 ? ?variance = 0.1068348

  20. Class D E (65,101] average = 0.1572701 ? variance = 0.2126338

  21. Class D F (17,24] ?average = 0.2314815 ? variance = 0.1616788

  22. Class D F (24,40] ?average = 0.1690485 ? variance = 0.1443094

  23. Class D F (40,65] ?average = 0.08496827 ?variance = 0.07914423

  24. Class D F (65,101] average = 0.1547769 ? variance = 0.1442915

  25. Class E A (17,24] ?average = 0.1275345 ? variance = 0.1171678

  26. Class E A (24,40] ?average = 0.04523504 ?variance = 0.04741449

  27. Class E A (40,65] ?average = 0.05402834 ?variance = 0.05427582

  28. Class E A (65,101] average = 0.04176129 ?variance = 0.04539265

  29. Class E B (17,24] ?average = 0.1114712 ? variance = 0.1059153

  30. Class E B (24,40] ?average = 0.04211314 ?variance = 0.04068724

  31. Class E B (40,65] ?average = 0.04987117 ?variance = 0.05096601

  32. Class E B (65,101] average = 0.03123003 ?variance = 0.03041192

  33. Class E C (17,24] ?average = 0.1256302 ? variance = 0.1310862

  34. Class E C (24,40] ?average = 0.05118006 ?variance = 0.05122782

  35. Class E C (40,65] ?average = 0.05394576 ?variance = 0.05594004

  36. Class E C (65,101] average = 0.04570239 ?variance = 0.04422991

  37. Class E D (17,24] ?average = 0.1777142 ? variance = 0.1917696

  38. Class E D (24,40] ?average = 0.06293331 ?variance = 0.06738658

  39. Class E D (40,65] ?average = 0.08532688 ?variance = 0.2378571

  40. Class E D (65,101] average = 0.05442916 ?variance = 0.05724951

  41. Class E E (17,24] ?average = 0.1826558 ? variance = 0.2085505

  42. Class E E (24,40] ?average = 0.07804062 ?variance = 0.09637156

  43. Class E E (40,65] ?average = 0.08191469 ?variance = 0.08791804

  44. Class E E (65,101] average = 0.1017367 ? variance = 0.1141004

  45. Class E F (17,24] ?average = 0 ? ? ? ? ? variance = 0

  46. Class E F (24,40] ?average = 0.07731177 ?variance = 0.07415932

  47. Class E F (40,65] ?average = 0.1081142 ? variance = 0.1074324

  48. Class E F (65,101] average = 0.09071118 ?variance = 0.1170159

同樣,可以將方差與平均值作圖,

  1. > plot(vm,vv,cex=sqrt(ve),col="grey",pch=19,

  2. + xlab="Empirical average",ylab="Empirical variance")

  3. > points(vm,vv,cex=sqrt(ve))

  4. > abline(a=0,b=1,lty=2)

?

一種替代方法是使用樹。樹可以從其他變量獲得,但它應該是相當接近我們理想的模型。在這里,我確實使用了整個數(shù)據(jù)庫(超過60萬行)

樹如下

  1. > plot(T)

  2. > text(T)

?

現(xiàn)在,每個分支都定義了一個類,可以使用它來定義一個類。應該被認為是同質(zhì)的。


  1. Class ?6 average = ? 0.04010406 ?variance = 0.04424163

  2. Class ?8 average = ? 0.05191127 ?variance = 0.05948133

  3. Class ?9 average = ? 0.07442635 ?variance = 0.08694552

  4. Class ?10 average = ?0.4143646 ? variance = 0.4494002

  5. Class ?11 average = ?0.1917445 ? variance = 0.1744355

  6. Class ?15 average = ?0.04754595 ?variance = 0.05389675

  7. Class ?20 average = ?0.08129577 ?variance = 0.0906322

  8. Class ?22 average = ?0.05813419 ?variance = 0.07089811

  9. Class ?23 average = ?0.06123807 ?variance = 0.07010473

  10. Class ?24 average = ?0.06707301 ?variance = 0.07270995

  11. Class ?25 average = ?0.3164557 ? variance = 0.2026906

  12. Class ?26 average = ?0.08705041 ?variance = 0.108456

  13. Class ?27 average = ?0.06705214 ?variance = 0.07174673

  14. Class ?30 average = ?0.05292652 ?variance = 0.06127301

  15. Class ?31 average = ?0.07195285 ?variance = 0.08620593

  16. Class ?32 average = ?0.08133722 ?variance = 0.08960552

  17. Class ?34 average = ?0.1831559 ? variance = 0.2010849

  18. Class ?39 average = ?0.06173885 ?variance = 0.06573939

  19. Class ?41 average = ?0.07089419 ?variance = 0.07102932

  20. Class ?44 average = ?0.09426152 ?variance = 0.1032255

  21. Class ?47 average = ?0.03641669 ?variance = 0.03869702

  22. Class ?49 average = ?0.0506601 ? variance = 0.05089276

  23. Class ?50 average = ?0.06373107 ?variance = 0.06536792

  24. Class ?51 average = ?0.06762947 ?variance = 0.06926191

  25. Class ?56 average = ?0.06771764 ?variance = 0.07122379

  26. Class ?57 average = ?0.04949142 ?variance = 0.05086885

  27. Class ?58 average = ?0.2459016 ? variance = 0.2451116

  28. Class ?59 average = ?0.05996851 ?variance = 0.0615773

  29. Class ?61 average = ?0.07458053 ?variance = 0.0818608

  30. Class ?63 average = ?0.06203737 ?variance = 0.06249892

  31. Class ?64 average = ?0.07321618 ?variance = 0.07603106

  32. Class ?66 average = ?0.07332127 ?variance = 0.07262425

  33. Class ?68 average = ?0.07478147 ?variance = 0.07884597

  34. Class ?70 average = ?0.06566728 ?variance = 0.06749411

  35. Class ?71 average = ?0.09159605 ?variance = 0.09434413

  36. Class ?75 average = ?0.03228927 ?variance = 0.03403198

  37. Class ?76 average = ?0.04630848 ?variance = 0.04861813

  38. Class ?78 average = ?0.05342351 ?variance = 0.05626653

  39. Class ?79 average = ?0.05778622 ?variance = 0.05987139

  40. Class ?80 average = ?0.0374993 ? variance = 0.0385351

  41. Class ?83 average = ?0.06721729 ?variance = 0.07295168

  42. Class ?86 average = ?0.09888492 ?variance = 0.1131409

  43. Class ?87 average = ?0.1019186 ? variance = 0.2051122

  44. Class ?88 average = ?0.05281703 ?variance = 0.0635244

  45. Class ?91 average = ?0.08332136 ?variance = 0.09067632

  46. Class ?96 average = ?0.07682093 ?variance = 0.08144446

  47. Class ?97 average = ?0.0792268 ? variance = 0.08092019

  48. Class ?99 average = ?0.1019089 ? variance = 0.1072126

  49. Class ?100 average = 0.1018262 ? variance = 0.1081117

  50. Class ?101 average = 0.1106647 ? variance = 0.1151819

  51. Class ?103 average = 0.08147644 ?variance = 0.08411685

  52. Class ?104 average = 0.06456508 ?variance = 0.06801061

  53. Class ?107 average = 0.1197225 ? variance = 0.1250056

  54. Class ?108 average = 0.0924619 ? variance = 0.09845582

  55. Class ?109 average = 0.1198932 ? variance = 0.1209162

在這里,當根據(jù)索賠的經(jīng)驗平均值繪制經(jīng)驗方差時,我們得到

?

在這里,我們可以識別剩余異質(zhì)性的類。

?


R語言廣義線性模型索賠頻率預測:過度分散、風險暴露數(shù)和樹狀圖可視化的評論 (共 條)

分享到微博請遵守國家法律
扎鲁特旗| 贡嘎县| 深圳市| 米林县| 五寨县| 龙游县| 东乌珠穆沁旗| 株洲市| 安国市| 体育| 禹州市| 黄大仙区| 台前县| 东阿县| 精河县| 望谟县| 宁波市| 涞源县| 隆林| 乌兰察布市| 淄博市| 班玛县| 乌苏市| 泊头市| 蒙山县| 平南县| 富民县| 新乡市| 安图县| 乌鲁木齐市| 澎湖县| 西盟| 利辛县| 府谷县| 昆明市| 孟州市| 夏津县| 汕头市| 彰化县| 南部县| 青铜峡市|