信用評分卡Credit Scorecards (4-5)
up主微信公眾號pythonEducation?

Credit Scorecards – Advanced Analytics (part 4 of 7)信用評分卡
http://ucanalytics.com/blogs/credit-scorecards-advanced-analytics-part-4/
Modeling in Advanced Analytics模型中高級分析

Advanced Analytics: Model Development – by Roopam
The room, full of Analysts, erupts with a loud round of laughter when a young business analyst narrates to us an incident from his recent trip back home. A distant aunt inquired about his new profession. His response – I am into modeling. She got all excited and asked – is it just on the ramp or will I see you on the television? Jokes apart, this left me wondering about the roots of the word modeling or model. What is a model?
A model is defined as a simplified representation of reality. A representation of reality, hmmm, a photograph is a representation of reality – a moment of reality capture on the reel – does that makes it into a model. I think yes. Similarly, a newspaper reporter covering an incident and makes it into breaking news is also a model – a descriptive model. Now, let us try to link models with Analytics.
當(dāng)一位年輕的商業(yè)分析師向我們講述他最近回家的事件時,充滿分析師的房間爆發(fā)出一陣響亮的笑聲。 一位遙遠(yuǎn)的阿姨詢問了他的新職業(yè)。 他的回答 - 我正在進(jìn)行建模。 她興高采烈地問道 - 它只是在坡道上還是我會在電視上看到你? 開玩笑,這讓我想知道建?;蚰P瓦@個詞的根源。 什么是模特?
模型被定義為現(xiàn)實(shí)的簡化表示。 現(xiàn)實(shí)的表現(xiàn),嗯,照片是現(xiàn)實(shí)的代表 - 在卷軸上捕捉現(xiàn)實(shí)的瞬間 - 這使它成為一個模型。 我想是的。 同樣,報道一個事件并將其作為突發(fā)新聞的報紙記者也是一個模型 - 描述性模型。 現(xiàn)在,讓我們嘗試將模型與Google Analytics相關(guān)聯(lián)。
Data warehouse, Business Intelligence and Advanced Analytics
Analytics has received a massive boost because of the emergence of information technology. We are living in the era of big data. A plethora of data collected at every stage of the business process had created a need to extract knowledge out of the information. This overall process has three aspects to it
1.?Data warehouse or data marts:?transactional data is extracted-transformed and loaded (ETL) into a data model / schema for the purpose of analysis
2.?Business Intelligence or dashboards:?“as is” business reports
3.?Predictive Analytics or Advanced Analytics:?high-end statistical and data mining exercise
As the quantum of data is exponentially increasing, Hadoop and big data technologies are replacing the data warehouses. However, the thought process for business intelligence and predictive analytics – the focus of this article – will not change much. Let me try to distinguish between business intelligence and predictive Analytics using something I learned at a professional theater.
1.數(shù)據(jù)倉庫或數(shù)據(jù)集市:事務(wù)數(shù)據(jù)被提取 - 轉(zhuǎn)換和加載(ETL)到數(shù)據(jù)模型/模式中以進(jìn)行分析
2.商業(yè)智能或儀表板:“按原樣”業(yè)務(wù)報告
3.預(yù)測分析或高級分析:高端統(tǒng)計和數(shù)據(jù)挖掘練習(xí)
隨著數(shù)據(jù)量的呈指數(shù)增長,Hadoop和大數(shù)據(jù)技術(shù)正在取代數(shù)據(jù)倉庫。但是,商業(yè)智能和預(yù)測分析的思維過程 - 本文的重點(diǎn) - 不會發(fā)生太大變化。讓我嘗試使用我在專業(yè)劇院學(xué)到的東西來區(qū)分商業(yè)智能和預(yù)測分析
5Ws for business intelligence & predictive Analytics – Lessons from?Theater

5 Ws for Data Warehouse, Business Intelligence, and Advanced Analytics – by Roopam
I joined a professional theater group a few years ago. To understand the nuances of acting we started with improv or improvisation theater. This form of theater does not have a predefined script but the actors built the story while performing. Most people thought I was a good improv actor. However, the style of remembering dialogue while performing did not work very well for me and hence it was the end of my theater gig. However, I learn some good lessons from the whole experience. One of them was the five-Ws of deciphering a character to build the drama.
1. What had happened?
2. When did it happen?
3. Where did it happen?
4. Who was part of this?
5. Why did it happen?
Clearly, the first four questions are trying to report an as-is version of the reality – a descriptive model. This is exactly what the business intelligence professionals try to achieve through the fancy reporting platforms & software. The fifth question is the trickiest of the lot. The question that keeps scientists and inquisitive minds awake late at night.
幾年前我加入了一個專業(yè)劇團(tuán)。為了理解表演的細(xì)微差別,我們從即興劇或即興劇開始。這種形式的劇院沒有預(yù)定義的劇本,但演員在表演時建立了故事。大多數(shù)人都認(rèn)為我是一個很好的即興演員。然而,在表演時記住對話的風(fēng)格對我來說并不是很好,因此它是我戲劇演出的結(jié)束。但是,我從整個經(jīng)歷中學(xué)到了一些好的教訓(xùn)。其中一個是解讀一個角色來制作戲劇的五個W.
1.發(fā)生了什么事?
2.什么時候發(fā)生的?
3它發(fā)生在哪里?
4誰是這個的一部分?
5.為什么會這樣?
顯然,前四個問題試圖報告現(xiàn)實(shí)的現(xiàn)實(shí)版本 - 描述性模型。這正是商業(yè)智能專業(yè)人員試圖通過花哨的報告平臺和軟件實(shí)現(xiàn)的目標(biāo)。第五個問題是最棘手的問題。讓科學(xué)家和好奇的頭腦在深夜醒來的問題。
Newton’s Legacy
An apple falls from a tree. How difficult is it to answer the first four questions? Most of us can answer them with a help of a clock and a map. However, Isaac Newton answered the fifth question and his answer – Gravity. If he had stopped there, nobody would have remembered him after close to four hundred years since his birth. He gave a mathematical model to explain this phenomenon.

Replace apple and earth with any other objects and you have the general equation for the model. Albert Einstein did shatter the Newtonian notion of Gravity. However, this model still holds good for all problems of practical purposes and used extensively in rocket science.
Advanced analytics tries to facilitate the answer to the fifth question of why did something happen using predictive modeling. ?The combination of high-end statistical and data mining techniques along with analysts’ business acumen produces models that help organizations make informed decisions.?Remember, this is just the beginning and causality is still a fair distance!
一棵蘋果從樹上掉下來。回答前四個問題有多難?我們大多數(shù)人都可以借助時鐘和地圖來回答這些問題。然而,Isaac Newton回答了第五個問題和他的回答 - Gravity。如果他已經(jīng)停在那里,那么在他出生后近四百年后,沒有人會想起他。他給出了一個數(shù)學(xué)模型來解釋這種現(xiàn)象。
4重力
用任何其他物體替換蘋果和地球,你就可以得到模型的一般公式。阿爾伯特愛因斯坦確實(shí)粉碎了牛頓的重力概念。然而,這種模型仍然適用于所有實(shí)際問題,并廣泛用于火箭科學(xué)。
高級分析試圖通過預(yù)測建模來回答第五個問題,即為什么會發(fā)生某些事情。高端統(tǒng)計和數(shù)據(jù)挖掘技術(shù)與分析師的商業(yè)敏銳度相結(jié)合,可以生成幫助組織做出明智決策的模型。請記住,這只是一個開始,因果關(guān)系仍然是一個公平的距離
Credit Scoring Models
Credit scorecards are models to predict the probability of a borrower default on his/her loan. The following is a simplified version of credit score with three variables
Credit Score = Age + Loan to Value Ratio (LTV) + Installment (EMI) to Income Ratio (IIR)
信用記分卡是預(yù)測借款人違約貸款概率的模型。 以下是具有三個變量的信用評分的簡化版本
信用評分=年齡+貸款與價值比率(LTV)+分期付款(EMI)與收入比率(IIR)
貸款價值比,英文loan to value,簡寫LTV,指貸款金額和抵押品價值的比例,多見于抵押貸款,如房產(chǎn)抵押貸款。
如某客戶A的房產(chǎn)抵押貸款,抵押房產(chǎn)估值為100萬人民幣,而銀行的信貸政策規(guī)定LTV<70%,銀行最多可以貸給A客戶70萬元的貸款。
不同的抵押品貸款的LTV根據(jù)銀行自身政策,各不相同。反映銀行對抵押物的風(fēng)險預(yù)期!

A 28-year-old man with the LTV of 75 and the IIR of 60 will have the score of 10+50+5 =65 and hence is a high credit risk.
一名28歲男子的LTV為75,IIR為60,他的得分為10 + 50 + 5 = 65,因此信用風(fēng)險很高。

Classification of good & bad loans using two variables – LTV & IIR – by Roopam
Now the question is, how did we arrive at the bucket-wise score points and associated risk tables? By now, after going through the previous three articles of the series, you must have some idea how we will go about it. We have a historical list of good / bad borrowers (article 2) that we want to distinguish using predictor variables (article 3). There are several statistical & data mining techniques that could help us achieve our object such as
1. Decision tree
2. Neural Networks
3. Support Vector Machines
4. Probit Regression
5. Linear discriminant analysis
6. Logistic Regression
Logistic regression is the most commonly used technique for the purpose. We will explore more about logistic regression in the next article.
Sign-off Note
I must conclude this article by saying that the good analysts find a good mathematical model as beautiful as the model walking on the catwalk ramp.
?
現(xiàn)在的問題是,我們是如何得出存儲分?jǐn)?shù)和相關(guān)風(fēng)險表的? 到目前為止,在完成系列的前三篇文章之后,你必須知道我們將如何去做。 我們有一個好/壞借款人的歷史清單(第2條),我們希望使用預(yù)測變量來區(qū)分(第3條)。 有幾種統(tǒng)計和數(shù)據(jù)挖掘技術(shù)可以幫助我們實(shí)現(xiàn)我們的目標(biāo),例如
1.決策樹
2.神經(jīng)網(wǎng)絡(luò)
3.支持向量機(jī)
4.概率回歸
5.線性判別分析
6. Logistic回歸
Logistic回歸是最常用的技術(shù)。 我們將在下一篇文章中探討有關(guān)邏輯回歸的更多信息。
簽字筆記
我必須在結(jié)束本文時說,優(yōu)秀的分析師找到了一個很好的數(shù)學(xué)模型,就像模特走在T臺上一樣漂亮。
Credit Scorecards – Logistic Regression (part 5 of 7)邏輯回歸
http://ucanalytics.com/blogs/credit-scorecards-logistic-regression-part-5/
A Primer on Logistic Regression – Are you Happy?

Logistic regression for happiness- by Roopam
A few years ago, my wife and I took a couple of weeks’ vacation to England and Scotland. Just before boarding the British Airway’s plane, an air-hostess informed us that we were upgraded to business class. Jolly good! What a wonderful start to the vacation. Once we got onto to the plane, we got another tempting offer for a further upgrade to the first class. However, this time, there was a catch – just one seat was available. Now that is a shame, of course, we could not take this offer. The business class seats were fabulous before the first class offer came – by the way, all free upgrades. This is the situation behavioral economist describe as relativity & anchoring – in plain English comparison. Anchoring or comparison is at the root of pricing strategies in business and also to all the human sorrow. However, eventually the vacation mood took over and we enjoyed the business class thoroughly. Humans are phenomenally good at adjusting to the situation in the end and enjoy it as well. You will find some of the happiest faces with people in the most difficult situations. Here is a quote by Henry Miller “I have no money, no resources, no hopes. I am the happiest man alive”. Human behavior is full of anomaly – full of puzzles. The following is an example to strengthen this thesis.
幾年前,我和妻子在英格蘭和蘇格蘭度過了幾個星期的假期。就在登上英國航空公司的飛機(jī)之前,一名空姐告訴我們,我們已升級為商務(wù)艙??鞓罚《燃僬媸且粋€美好的開始。一旦我們登上飛機(jī),我們又獲得了另一個誘人的提議,可以進(jìn)一步升級到頭等艙。然而,這一次,有一個問題 - 只有一個座位可用。當(dāng)然,這是一種恥辱,我們無法接受這個提議。在提供頭等艙優(yōu)惠之前,商務(wù)艙座位非常棒 - 順便說一下,所有免費(fèi)升級。這是行為經(jīng)濟(jì)學(xué)家描述為相對論和錨定的情況 - 用簡單的英語比較。錨定或比較是企業(yè)定價策略的根源,也是所有人類悲傷的根源。然而,最終度假心情接管了,我們徹底享受了商務(wù)艙。人類在適應(yīng)最終情況方面非常擅長并享受它。在最困難的情況下,你會發(fā)現(xiàn)一些最快樂的面孔。以下是亨利米勒的一句話:“我沒有錢,沒有資源,沒有希望。我是最幸福的人“。人類的行為充滿了異常 - 充滿了謎題。以下是加強(qiáng)本論文的一個例子
列儂,麥卡特尼,哈里森和貝斯特是這個星球上最著名的樂隊(duì) - 甲殼蟲樂隊(duì)的成員。 好的,我知道你發(fā)現(xiàn)了這個錯誤。 到現(xiàn)在為止,你必須說出正確的名字:John Lennon,Paul McCartney,George Harrison和Ringo Starr,而不是Pete Best。 實(shí)際上,Ringo Starr是Pete Best的替代品,Pete Best是甲殼蟲樂隊(duì)的原始常規(guī)鼓手。 皮特一定是被摧毀了,看到他的伙伴們在落后的時候冉冉升起。 錯了,在Google上搜索他 - 他是所有人中最快樂的披頭士樂隊(duì)。 現(xiàn)在這是違反直覺的,我想我們不知道是什么讓我們開心。
正如在前一篇文章中所承諾的那樣,在本文中,我將嘗試使用邏輯回歸來探索幸福 - 這種技術(shù)廣泛用于記分卡開發(fā)。

Source: flicker.com
Lennon, McCartney, Harrison, and Best are the members of the most famous band ever on the planet – the Beatles. Ok, I know you have spotted the error. By now your must have uttered out the right names: John Lennon, Paul McCartney, George Harrison and Ringo Starr not Pete Best. Actually, Ringo Starr was the replacement for Pete Best, the original regular drummer for the Beatles. Pete must have been devastated seeing his partners rising to glory while he was left behind. Wrong, search for him on Google – he is the happiest Beatle of all. Now that is counter intuitive, I guess we do not have a clue what makes us happy.
As promised in a previous article, in this article I will attempt to explore happiness using logistic regression – the technique extensively used in scorecard development.
我是一位徹底的經(jīng)驗(yàn)主義者 - 支持基于事實(shí)的管理。 因此,讓我設(shè)計一個快速而骯臟的實(shí)驗(yàn)*來生成數(shù)據(jù)來評估幸福感。 我們的想法是確定影響我們整體幸福感的因素/變量。 讓我列出一個生活在城市中的工作成年人的代表性因素列表:
Logistic Regression – An Experiment
I am a thorough empiricist – a proponent of fact-based management. Hence, let me design a quick and dirty experiment* to generate data to evaluate happiness. The idea is to identify the factors / variables that influence our overall happiness. Let me present a representative list of factors for a working adult living in a city:

Now, throw in some other factors to the above list such as – random act of kindness or an unplanned visit to a friend. As you could see, the above list can easily be expanded (recall the article on variable selection-?article 3). This is a representative list and you will have to create your own to figure out factors that influence your level of happiness.
The second part of the experiment is to collect data. This is like maintaining a diary only this one will be in Microsoft Excel. Every night before sleeping, you could assess your day and fill up numbers in the Spreadsheet along with your overall level of happiness for the day (as shown in the figure below).

*I am calling this a quick and dirty experiment for the following reasons (1) It’s not a well thought out experiment but is created more to illustrate how logistic regression works (2) the observer and the observed are same in this experiment which might create a challenge for objective measurement.
After a couple of years of data collection, you will have enough observations to create a model – a logistic regression model in this case. We are trying to model feeling of happiness (column B) with other columns (C to I) in the above data set. If we plot B on the Y-axis and the additive combination of C to I (we’ll call it Z) on the X-axis it will look something like the plot shown below.

The idea behind logistic regression is to optimize Z in such a way that we get the best possible distinction between happy and sad faces, as achieved in the plot above. This is a curve-fitting problem with sigmoid function (the curve in violet) as the choice of function.
I would recommend using dates of observations (column A) in our model; this might give an interesting influence of seasons on our mood.
邏輯回歸背后的想法是以這樣的方式優(yōu)化Z,使得我們在快樂和悲傷面孔之間得到最佳區(qū)分,如上圖所示。 這是一個曲線擬合問題,其中sigmoid函數(shù)(紫色曲線)作為函數(shù)的選擇。
我建議在我們的模型中使用觀察日期(A欄); 這可能會給季節(jié)帶來有趣的影響。
Applications in Banking and Finance
This is exactly what we do in case of analytical scorecards such as credit scorecards, behavioral scorecards, fraud scorecards or buying propensity models. Just replace happy and sad faces with …
? Good and Bad borrowers
? Fraud and genuine cases
? Buyers and non-buyers
…. for the respective cases and you have the model. If you remember in the previous article?(4), I have shown a simple credit scorecard model:?Credit Score = Age + Loan to Value Ratio (LTV) + Instalment (EMI) to Income Ratio (IIR)
A straightforward transformation of the sigmoid function will help us arrive at the above equation of the line. This is the final link to arrive at the desired scorecard.
Variable Transformation in Credit Scorecards

The Swordsmith – by Roopam
I loved the movie Kill-Bill, both parts. In the first part, I enjoyed when Uma Thurman’s character went to Japan to get a sword from Hattori Hanzō, the legendary swordsmith. After learning about her motive, he agrees to make his finest sword for her. Then Quentin Tarantino, director of the movie, briefly showed the process of making the sword. Hattori Hanzō transformed a regular piece of iron to the fabulous sword – what a craftsman. This is fairly similar to how analysts perform transformation of the sigmoid function to the linear equation. The difference is that analysts use mathematical tools rather than hammers and are not as legendary as Hattori Hanzō.
我喜歡電影Kill-Bill這兩部分。 在第一部分中,當(dāng)Uma Thurman的角色去日本從傳說中的劍士HattoriHanzō手中拿劍時,我很享受。 在了解了她的動機(jī)之后,他同意為她做出最好的劍。 然后電影導(dǎo)演昆汀·塔倫蒂諾(Quentin Tarantino)簡要介紹了制作劍的過程。 HattoriHanzō將一塊普通的鐵片變成了神話般的劍 - 這真是一個工匠。 這與分析師如何將S形函數(shù)轉(zhuǎn)換為線性方程非常相似。 不同之處在于,分析師使用數(shù)學(xué)工具而不是錘子,并不像HattoriHanzō那樣具有傳奇色彩。
Reject Inference
Reject inference is a distinguishing aspect about credit or application scorecards which is different from all other classification models. For the application scorecards, the development sample is biased because of the absence of performance for rejected loans. Reject inference is a way to rectify this shortcoming and removing the bias from the sample. We will discuss reject inference in detail in some later article on YOU CANalytics.
拒絕推斷是信用或應(yīng)用記分卡的一個顯著方面,它與所有其他分類模型不同。 對于應(yīng)用記分卡,由于拒絕貸款缺乏績效,開發(fā)樣本存在偏差。 拒絕推斷是一種糾正這一缺點(diǎn)并消除樣本偏差的方法。 我們將在后面有關(guān)您的CANalytics的文章中詳細(xì)討論拒絕推斷。
Sign-off Note
Now that we have our scorecard ready the next task is to validate the predictive power of the scorecard. This is precisely what we will do in the next article. See you soon.
博主網(wǎng)校主頁: http://dwz.date/bwes

信用評分卡Credit Scorecards (4-5)的評論 (共 條)
