当前位置:天才代写 > 商科代写,金融经济统计代写-100%原创拿高分 > 代写FIN之金融工程matlab定量代做题目:Quantitative Credit Scoring 85% Section I – Variable Mapping

代写FIN之金融工程matlab定量代做题目:Quantitative Credit Scoring 85% Section I – Variable Mapping

2018-04-01 08:00 星期日 所属: 商科代写,金融经济统计代写-100%原创拿高分 浏览:1002

PART 1 – Quantitative Credit Scoring 85% Section I – Variable Mapping


Go to the following link to download the dataset: https://www.dropbox.com/s/mkoxbevzyvj1j9q/Credit_data_RSM6305.txt?dl=0

PS: The first column is just index values, ignore it or don’t count it as one of the features!

 

You have been provided a credit dataset with 20 different borrower attributes (7 numerical and 13 categorical). Description of the attributes are as follows,

 

Attribute description:

Attribute 1: (qualitative)

Status of existing checking account Attribute 2: (numerical)

Duration in month

 

Attribute 3: (qualitative)

Credit history

 

Attribute 4: (qualitative)

Purpose

A40 : car (new)

A41 : car (used)

A42 : furniture/equipment

A43 : radio/television

A44 : domestic appliances

A45 : repairs

A46 : education

A47 : (vacation – does not exist?)

A48 : retraining

A49 : business

A410 : others

 

Attribute 5: (numerical)

Credit amount

 

Attibute 6: (qualitative)

Savings account/bonds

o A61 : … < 100 DM

o  A62 :  100 <= … <  500 DM o  A63 :  500 <= … < 1000 DM o A64 : .. >= 1000 DM

A65 : unknown/ no savings account

 

Attribute 7: (qualitative)

Present employment since

A71 : unemployed

o A72 : … < 1 year

o A73 : 1  <= … < 4 years o A74 : 4  <= … < 7 years o A75 : .. >= 7 years

 

Attribute 8: (numerical)

Installment rate in percentage of disposable income


 

Attribute 9: (qualitative)

Personal status and sex

A91 : male : divorced/separated

A92 : female : divorced/separated/married

A93 : male : single

A94 : male : married/widowed

A95 : female : single

 

Attribute 10: (qualitative)

Other debtors / guarantors

A101 : none

A102 : co-applicant

A103 : guarantor

 

Attribute 11: (numerical)

Present residence since

 

Attribute 12: (qualitative)

Property

A121 : real estate

A122 : if not A121 : building society savings agreement/life insurance

A123 : if not A121/A122 : car or other, not in attribute 6

A124 : unknown / no property

 

Attribute 13: (numerical)

Age in years

 

Attribute 14: (qualitative)

Other installment plans

A141 : bank

A142 : stores

A143 : none

 

Attribute 15: (qualitative)

Housing

A151 : rent

A152 : own

A153 : for free

 

Attribute 16: (numerical)

Number of existing credits at this bank Attribute 17: (qualitative)

Job

A171 : unemployed/ unskilled – non-resident

A172 : unskilled – resident

A173 : skilled employee / official

A174 : management/ self-employed/highly qualified employee/ officer Attribute 18: (numerical)

Number of people being liable to provide maintenance for Attribute 19: (qualitative)

Telephone

A191 : none

A192 : yes, registered under the customers name


Attribute 20: (qualitative)

foreign worker

A201 : yes

A202 : no

The 21st column denotes the credit status of the borrower, with 1 being good and 2 being bad.

Your first task will be to transform the provided the dataset to the respective categorical variables based on the attributes mentioned above.

Section II – Exploratory Data Analysis & Wrangling

Your next task will be to conduct exploratory data analysis.

 

1) Observe the variables by plotting histograms and box plots (for continuous variables) and frequency tables (bar plots) for categorical variables. Conduct proper outlier detection (if any) and use necessary tools taught in the class to treat outliers. Present your analysis.

2) You may notice that the categorical variables contain missing values in some instances. It will be advisable to conduct proper data imputation instead of dropping the rows entirely which will result in loss of information. Use your judgement to conduct proper imputation. Keep in mind that not all the variables are of the same type (some continuous and some categorical), so use proper judgement and methods to treat these two types differently to conduct imputation. Present your analysis.

3)  Conduct cross tabulation of all the categorical predictors with the credit status of the borrower. For this you need to create cross contingency table. For example, if we take a categorical variable with 3 categories

1, 2 and 3, the cross contingency table will take the following form,

Credit Status

1

2

3

Row Total

0

# of 1 in 0

(and % of 0)

# of in 0

(and % of 0)

# of 3in 0

(and % of 0)

Total 0

1

# of 1 in 1

(and % of 0)

# of in 1

(and % of 0)

# of 3 in 1

(and % of 0)

Total 1

Column Total

Total 1

Total 2

Total 3

Sub – total

 

For each table, present an outline of the analysis.

 

4) Using the cross contingency table, perform a chi-square test in order to test the dependence of the categorical variables with the credit status of the borrower. Take note of the variables which have a statistically significant dependence with the response variable.

5) For the continuous variables, present the necessary descriptive statistics. Make sure to standardize the continuous variables before moving to estimation. Also a correlation matrix of all the variables (categorical and continuous) will give us a good idea of the dependence structure of the dataset.

Section III – Estimation

Before moving to the estimation phase, it is important to not use the full data for estimation. Conduct a 70:30 cross validation, which means randomly sample 70% of the data as training set and keep the rest of 30% as test set.

1) Start by estimating a logistic regression using all the significant categorical predictors based on the chi square tests and all numeric variables. At every iteration, take out the insignificant variables and re–estimate until all the variables are significant at the 5% level. Briefly explain the final chosen variables, the signs of the coefficients.

a. For the logistic regression, you must build your own function, which includes constructing the functions for the logistic distribution, the log likelihood and the optimization process. Take help from Appendix A3 of the text book to build your own function for Newton’s method, since you need the Hessian matrix to calculate relevant regression statistics.


b. Plot the ROC curve for the in sample prediction. Use built in packages/function to extract the ROC curve. Perform a Kolmogorov Smirnov test (KS) on the possible true positive rates and false positive rates for each cut off value. Pick the appropriate cutoff value based on the KS test. Use this cut off value for out of sample prediction. (This will be important in Part IV for calculating Brier Score and conducting HL test)

2) Estimate a stepwise logistic regression model. Present your output results. Briefly explain the signs of the coefficients and their significance.

a. Take help from the slides to get an idea on how to approach the algorithm for the stepwise functions. Feel free to create only one method, either the backward or the forward.

b. Perform the KS test as usual to obtain the optimal cut off value. Use this cut off value for out of sample prediction.

3) Estimate a decision tree on the given dataset. Present your results.

a. For this question, feel free to leverage the libraries or built in function for estimation.

b. Apply cost complexity pruning to the large tree obtained in order to obtain a sequence of sub-trees. Conduct a K-fold cross validation.

c. Find the complexity parameter for which the cross validated error rate is minimum.

d. Prune the tree using this complexity parameter. This way, you will obtain the final tree.1

4) Estimate a random forest on the given dataset. Set the forest size to 1000 trees and evaluated variables per node to 5. For more information on the methodology and how to estimate it in R, go over this document: https://www.dropbox.com/s/higfw6xd1drchq1/TreeBasedMethods.pdf?dl=0

a. For this question, feel free to leverage the libraries or built in function for estimation.

b. Present the necessary results.

c. Plot the variables based on their importance, do the results make sense??

 

Section IV – Performance Validation

  • 1) For each model estimated in part 3, predict default probability for each method using the test set.

  • 2) Build a function for the Brier score2 and calculate the score for each prediction result.

  • 3) Build a function for Hosmer – Lemeshow Test3 and use it to test each prediction result.

  • 4) Plot the ROC curve for each method, also calculate the AUC for each out of sample prediction.

  • 5) Which model would you recommend as your final model?


1
For a brief on K fold cross validation, please see:
https://eight2late.wordpress.com/2016/02/16/a-gentle-introduction-to-decision- trees-using-r/

2 For a brief on brier score, https://en.wikipedia.org/wiki/Brier_score

3 A good primer, http://thestatsgeek.com/2014/02/16/the-hosmer-lemeshow-goodness-of-fit-test-for-logistic-regression/


PART 2 – Fundamental Credit Analysis 15%

You are an analyst for a bond hedge fund or a credit analyst or working in a Pension fund (private asset classes’ investment division or Investment Risk). The goal of this part of the project is to get you familiar with some concepts/approaches for credit analysis. Especially what is known under Fundamental Credit analysis or some places call it Internal rating approach.

— To begin, pick four ‘comparable’ companies within the same industry, and have the same rating and have bonds outstanding preferably with approximately same time to maturity.

— To make your life easier, all bonds should be denominated in USD/CAN, and issued by firms domiciled either in CA or the US.

— You can use the Bloomberg SRCH function to search for bonds and firms. See these links for details:

http://asklib.library.hbs.edu/faq/47426 http://www.investopedia.com/university/corporate-bonds-bloomberg-active-trader/ Further,

· Given the industry you choose, create internal ratings of the firms in your sample. I have shared several Moody’s reports by industry that talks about the important factors for your credit analysis. You can find them here https://www.dropbox.com/sh/i7ju97wsk72ctbu/AACH76AEOSkHWxD1JwyAtOxBa?dl=0. You can follow Moody’s methodology. You can also complement the analysis using the 5 steps of credit analysis that I mentioned in class to guide your thinking. Basically I am looking for Fundamental analysis.

The document of Moody’s attached, should help you understand what ratio is more important than other in each industry.

Augment your credit/internal rating of the firms by adding recovery analysis and have a more informed internal ratings that accounts for both likelihood of default and severity of default: ( if you think the security value at time of default is not represented in the rating methodology you are following )

You can also compute Z & Ohlson score. They are helpful to give some comparison across companies even though you will notice that your internal ratings may dominate those measures. Comment on that.

Overall consider factors that affect the potential Default probability (cash flow, capacity to repay and covenants strictness with respect to the other firms you chose or with respect to the industry)

Consider factors that affect the potential Recovery rate (collateral, tangible assets, priority, other sr. or jr. debt in the capital structure, etc.).

· What can you learn from the above information about the differences in bond yields of these firms?

· Based on your internal rating assigned, how do you assess these companies relative to the rating provided in Bloomberg. Explain your rationale of the deviation in you internal rating methodology and that of provided by the rating agencies.



代写CS&Finance|建模|代码|系统|报告|考试

编程类:C++,JAVA ,数据库,WEB,Linux,Nodejs,JSP,Html,Prolog,Python,Haskell,hadoop算法,系统 机器学习

金融类统计,计量,风险投资,金融工程,R语言,Python语言,Matlab,建立模型,数据分析,数据处理

服务类:Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导

E-mail:850190831@qq.com   微信:BadGeniuscs  工作时间:无休息工作日-早上8点到凌晨3点


如果您用的手机请先保存二维码到手机里面,识别图中二维码。如果用电脑,直接掏出手机果断扫描。

qr.png

 

    关键字:

天才代写-代写联系方式