信用評分卡模型在Python中實踐(下)
python金融風(fēng)控評分卡模型和數(shù)據(jù)分析微專業(yè)課:
http://dwz.date/b9vv

1項目背景介紹
? ? 信用評分卡的模型一般采用邏輯回歸模型,屬于二分類模型,Python 中的sklearn.linear_model導(dǎo)入LogisticRegression即可。
model_data?
=
?data[np.append(quant_model_vars,qual_model_vars)]
#
model_data_WOE?
=
?pd.DataFrame()
model_data_WOE[
'duration'
]
=
duration_WoE
model_data_WOE[
'amount'
]
=
amount_WoE
model_data_WOE[
'age'
]
=
age_WoE
model_data_WOE[
'installment_rate'
]
=
installment_rate_WoE
model_data_WOE[
'status'
]
=
status_WoE
model_data_WOE[
'credit_history'
]
=
credit_history_WoE
model_data_WOE[
'savings'
]
=
savings_WoE
model_data_WOE[
'property'
]
=
property_WoE
model_data_WOE[
'employment_duration'
]
=
employment_duration_WoE
model_data_WOE[
'purpose'
]
=
purpose_WoE
#model_data_WOE['credit_risk']=credit_risk
#邏輯回歸
model?
=
?LogisticRegression()
model.fit(model_data_WOE,credit_risk)
coefficients?
=
?model.coef_.ravel()
intercept?
=
?model.intercept_[
0
]

注:Python中的模型不夠R中模型友好,想看模型的變量、系數(shù)、檢驗之類的都比較麻煩,要一個變量一個變量去找,然后輸出打印,反之R的模型結(jié)果就友好很多了,一個summary函數(shù)就把全部概況顯示出來了。
###########自定義ks函數(shù)#############
def
?predict_df(model,data,label,feature
=
None
):
????
if
?feature:
????????
df_feature
=
data.loc[:,feature]
????
else
:
????????
all_feature?
=
?list
(data.columns.values)
????????
all_feature.remove(label)
????????
df_feature
=
data.loc[:,all_feature]
????
df_prob
=
model.predict(df_feature)
????
df_pred
=
pd.Series(df_prob).
map
(
lambda
?x:
1
?if
?x>
0.5
?else
?0
)
????
df
=
pd.DataFrame()
????
df[
'predict'
]
=
df_pred
????
df[
'label'
]
=
data.loc[:,label].values
????
df[
'score'
]
=
df_prob
????
return
?df
?
?def
?ks(data,model,label):
????
data_df?
=
?predict_df(model,data,label)
????
KS_data?
=
?data_df.sort_values(by
=
'score'
,ascending
=
True
)
????
KS_data[
'Bad'
]?
=
?KS_data[
'label'
].cumsum()?
/
?KS_data[
'label'
].
sum
()
????
KS_data[
'Count'
]?
=
?np.arange(
1
?,?
len
(KS_data[
'label'
])?
+
?1
)
????
KS_data[
'Good'
]?
=
?(KS_data[
'Count'
]?
-
?KS_data[
'label'
].cumsum() )?
/
?(
len
(KS_data[
'label'
])?
-
?KS_data[
'label'
].
sum
())
????
KS_data.index
=
KS_data[
'Count'
]
????
?????
ks?
=
?KS_data.iloc[::
int
(
len
(KS_data)
/
100
),:]
????
ks.index?
=
?np.arange(
len
(ks))
????
return
?ks
?
?def
?ks_plot(ks_df):
????
plt.figure(figsize
=
(
6
,?
5
))
????
plt.subplot(
111
)
????
plt.plot(ks_df[
'Bad'
], lw
=
3.5
, color
=
'r'
, label
=
'Bad'
)??
# train_ks['Bad']
????
plt.plot(ks_df[
'Good'
], lw
=
3.5
, color
=
'g'
,
?????????????
label
=
'Good'
)??
# train_ks['Good']
????
plt.legend(loc
=
4
)
????
plt.grid(
True
)
????
plt.axis(
'tight'
)
????
plt.title(
'The KS Curve of data'
)
????
plt.show()
KS(Kolmogorov-Smirnov):KS用于模型風(fēng)險區(qū)分能力進(jìn)行評估,?
指標(biāo)衡量的是好壞樣本累計分部之間的差值。?好壞樣本累計差異越大,KS指標(biāo)越大,那么模型的風(fēng)險區(qū)分能力越強(qiáng),通常來講,KS>0.2即表示模型有較好的預(yù)測準(zhǔn)確性。經(jīng)過計算,模型的KS值為0.35,模型效果較好,如下:

?

?
?
六、評分卡
引用文獻(xiàn)的評分卡計算方法:
一般評分卡公式:Score=A - B * log(Odds)
通常情況下,需要設(shè)定兩個假設(shè):?
(1)給某個特定的比率設(shè)定特定的預(yù)期分值;?
(2)確定比率翻番的分?jǐn)?shù)(PDO)?
根據(jù)以上的分析,我們首先假設(shè)比率為x的特定點的分值為P。則比率為2x的點的分值應(yīng)該為P+PDO。代入式中,可以得到如下兩個等式:?
P = A - B * log(x)?
P - PDO = A - B * log(2x)
本文中通過指定特定比率(好壞比)(1/20)的特定分值(50)和比率翻番的分?jǐn)?shù)(10),來計算評分卡的系數(shù)alpha和beta
def
?alpha_beta(basepoints,baseodds,pdo):
????
beta?
=
?pdo
/
math.log(
2
)
????
alpha?
=
?basepoints?
+
?beta?
*
?math.log(baseodds)
????
return
?alpha,beta

評分卡公式:Score=6.78?- 14.43 * log(Odds)
而 log(Odds) = \beta _{0} + \beta _{1} x _{1}+ \beta _{2} x _{2} + ... +\beta _{n } x _{n}
,代入WOE轉(zhuǎn)換后的變量并進(jìn)行變化,可得到最終的評分卡公式:

式中ωijωij?為第i行第j個變量的WOE,為已知變量;βiβi為邏輯回歸方程中的系數(shù),為已知變量;δijδij為二元變量,表示變量i是否取第j個值。

根據(jù)以上表格可計算出指標(biāo)各分段的分值
#計算基礎(chǔ)分值
basepoint?
=
?round
(alpha?
-
?beta?
*
?intercept)
#變量_score
duration_score?
=
?np.
round
(model_data_WOE[
'duration'
]
*
coefficients[
0
]
*
beta)
amount_score?
=
?np.
round
(model_data_WOE[
'amount'
]
*
coefficients[
1
]
*
beta)
age_score?
=
?np.
round
(model_data_WOE[
'age'
]
*
coefficients[
2
]
*
beta)
installment_rate_score?
=
?np.
round
(model_data_WOE[
'installment_rate'
]
*
coefficients[
2
]
*
beta)
status_score?
=
?np.
round
(model_data_WOE[
'status'
]
*
coefficients[
4
]
*
beta)
credit_history_score?
=
?np.
round
(model_data_WOE[
'credit_history'
]
*
coefficients[
5
]
*
beta)
savings_score?
=
?np.
round
(model_data_WOE[
'savings'
]
*
coefficients[
6
]
*
beta)
property_score?
=
?np.
round
(model_data_WOE[
'property'
]
*
coefficients[
7
]
*
beta)
employment_duration_score?
=
?np.
round
(model_data_WOE[
'employment_duration'
]
*
coefficients[
8
]
*
beta)
purpose_score?
=
?np.
round
(model_data_WOE[
'purpose'
]
*
coefficients[
9
]
*
beta)
#變量的分值
duration_scoreCard?
=
?pd.DataFrame(duration_Cutpoint,duration_score).drop_duplicates()
amount_scoreCard?
=
?pd.DataFrame(amount_Cutpoint,amount_score).drop_duplicates()
age_scoreCard?
=
?pd.DataFrame(age_Cutpoint,age_score).drop_duplicates()
installment_rate_scoreCard?
=
?pd.DataFrame(installment_rate_Cutpoint,installment_rate_score).drop_duplicates()
status_scoreCard?
=
?pd.DataFrame(np.array(discrete_data[
'status'
]),status_score).drop_duplicates()
credit_history_scoreCard?
=
?pd.DataFrame(np.array(discrete_data[
'credit_history'
]),credit_history_score).drop_duplicates()
savings_scoreCard?
=
?pd.DataFrame(np.array(discrete_data[
'savings'
]),savings_score).drop_duplicates()
property_scoreCard?
=
?pd.DataFrame(np.array(discrete_data[
'property'
]),property_score).drop_duplicates()
employment_duration_scoreCard?
=
?pd.DataFrame(np.array(discrete_data[
'employment_duration'
]),employment_duration_score).drop_duplicates()
purpose_scoreCard?
=
?pd.DataFrame(np.array(discrete_data[
'purpose'
]),purpose_score).drop_duplicates()


至此,信用評分卡的
轉(zhuǎn)載https://blog.csdn.net/kxiaozhuk/article/details/84613794
python信用評分卡(附代碼,博主錄制)
http://dwz.date/b62p
