当前位置:天才代写 > R语言代写,r语言代做-无限次修改 > 代写R之RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS

代写R之RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS

2018-05-29 08:00 星期二 所属: R语言代写,r语言代做-无限次修改 浏览:655

RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS

College of Business & Economics, The Australian National University REGRESSION MODELLING

(STAT2008/STAT4038/STAT6038)

Assignment 2 for 2018

 

 

 

• INSTRUCTIONS:

– This assignment is worth 20% of your overall marks for this course (for all students, enrolled in STAT2008, STAT4038 or STAT6038).

– If you wish, you may work together with another student (one other) in doing the analyses and present a single (joint) report. If you choose to do this then both of you will be awarded the same total mark. Students enrolled under different course codes may work together. You may NOT work in groups of more than two students and the usual ANU examination rules on plagiarism still apply with respect to people not in your group. This means you should not discussthe assignment (questions, solutions, code, etc.) with your classmates or any other individuals if they are not in your group. You can discuss the assignment with me (Anton Westveld) or your tutors.

– Please submit your assignment on Wattle. As a group you should only submitone assignment. Make sure to place to place the names and IDs of the individuals in your group on the front page of your assignment. When uploading to Wattle you will submit:

1. Your assignment/report.

2. An ‘.R’ ftle containing the R code you used for the assignment.

– Assignments should be typed. Your assignment may include some carefully edited computer output (e.g. graphs, tables) showing the results of your data analysis and a discussion of those results, as well as some carefully selected code. Please be selective about what you present and only include as many pages and as much computer output as necessary to justify your solution. It is important to be be concise in your discussion of the results. Clearly label each part of your report with the part of the question that it refers to.

– Unless otherwise advised, use a significance level of 5%.

–Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly be deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs and tables. You may include an appendix that is in addition to the above page limits; however the appendix will generally not be marked, only checked if there is some question about what you have actually done.

– Assignments will be marked by your tutor (or one of your two tutors, for joint assignments). You may ask any of the tutors or me (Anton Westveld) questions about this assignment up to 4 pm on Thursday 17 May 2018.

– Late assignments will NOT be accepted after the deadline without an extension. Extensions will usually be granted on medical or compassionate grounds on production of appropriate evidence, but must have my permission by no later than 12 noon on Thursday 17 May 2018. Even with an extension, all assignments must be submitted reasonably close to the original deadline to allow time for the marking to be completed.


1.(100 points) You will explore the techniques for the course by examining data on the number of visits to a health care professional in Australia from 1977-78. The data have been placed on Wattle. The variables are:

–sex : 1 if female, 0 if male

 

–age : Age in years divided by 100 (measured as mid-point of 10 age groups from 15-19 years to 65-69 with 70 or more coded treated as 72)

 

–income: Annual income in Australian dollars divided by 1000 (measured as mid-point of coded ranges Nil, less than 200, 200-1000, 1001-, 2001-, 3001-, 4001-, 5001-, 6001-, 7001-,

8001-10000, 10001-12000, 12001-14000, with 14001- treated as 15000

 

–insurance : insurance contract (medlevy : medibanl levy, levyplus : private health insurance, freepoor : government insurance due to low income, freerepa : government insurance due to old age disability or veteran status

 

–illness : number of illness in past 2 weeks

 

–actdays : number of days of reduced activity in past 2 weeks due to illness or injury

 

–hscore : general health score using Goldberg’s method (from 0 to 12). High score indicates bad health

 

–chcond : chronic condition (np : no problem, la : limiting activity, nla : not limiting activity)

 

–doctorco : number of consultations with a doctor or specialist in the past 2 weeks

 

–nondocco : number of consultations with non-doctor health professionals (chemist, optician, physiotherapist, social worker, district community nurse, chiropodist or chiropractor) in the past 2 weeks

 

–hospadmi : number of admissions to a hospital, psychiatric hospital, nursing or convalescent home in the past 12 months (up to 5 or more admissions which is coded as 5)

 

–hospdays : number of nights in a hospital, etc. during most recent admission: taken, where appropriate, as the mid-point of the intervals 1, 2, 3, 4, 5, 6, 7, 8-14, 15-30, 31-60, 61-79 with 80 or more admissions coded as 80. If no admission in past 12 months then equals zero.

 

–prescrib : total number of prescribed medications used in past 2 days

 

–nonpresc : total number of non-prescribed medications used in past 2 days


(a)(15 points) Conduct an exploratory data analysis, where the response y = doctorco +nondocco (i.e. the total number of visits to health care professional in the past two weeks) in relation to the other variables, which should be considered explanatory variables (covariates). In doing your analysis make sure to identify any unusual points and discuss why they are unusual. For this assignment do not remove any unusual points, only comment on them (if they exist).

 

(b)(8.5 points) Fit a multiple linear regression model with the response variable and with the other variables in the data as explanatory variables. Do not consider any transformations of the covariates or interactions. Present the main residual plot of the residuals against the fitted values for this model, along with a lowess smoother. Are there are any obvious problems with underlying assumptions?

(c)(8.5 points) Consider a few transformations of y, such as log(y + 1), y, y1/4. Fit a multiple linear regression model with the response variable and with the other variables in the data as explanatory variables. Do not consider any transformations of the covariates or interactions. Again present the main residual plot of the residuals against the fitted values for this new model, along with a lowess smoother. Do any of the transformation applied to the response variable appear to have corrected any problems you identified in part (b)?

 

(d)(8.5 points) Try using the Box-Cox approach to find a transformation. Again present the main residual plot of the residuals against the fitted values  for this new model, along with  a lowess smoother. Do any of the transformation applied to the response variable appear  to have corrected any problems you identified in part (b) and (c)? Based on your analysis, decide whether a transformation should be considered and if so clearly state which one. Use this transformation through the rest of the assignment.

 

(e)(8.5 points) Construct two added variable plots: one for income and one for age. Comment on the plots.

 

(f)(8.5 points) Construct confidence intervals for all pairwise differences for the factor insurance

with a family level α = 0.05. Which differences are statistically significant, if any?

 

(g)(8.5 points) Construct confidence intervals for all pairwise differences for the factor chcond

with a family level α = 0.05. Which differences are statistically significant, if any?

 

(h)(8.5 points) Examine (but do not present) the ANOVA (Analysis of Variance) table and sum- mary output for the model which you chose in (d). Now adjust the order of the explanatory variables so that you can test the following nested hypotheses.

 

H0 : βinsurance = βsex = βage = βincome = βnonpresc = 0

H0 : βinsurance = βsex = βage = 0

H0  : βinsurance  = 0

Present the ANOVA table for the re-ordered model and discuss the result of the partial (nested) F-tests for the above hypotheses. Fully write out the tests. Do your results suggest some possible modification(s) you could make to the model? If so then make those modifica- tions.

 

(i)(8.5 points) Investigate whether the variable sex has an interaction effect with any of the other variables.


(j)(8.5 points) For your model, construct a plot of the internally Studentized residuals against the fitted values, a normal Q-Q plot of the residuals, and a bar plot of Cook’s distances for each observation. Use these plots (and other means) to comment on the model assumptions and on any unusual data points.

 

(k)(8.5 points) Fully interpret the results of your final model. Provide plots of ( y against age), (y against income), and (y against hscore), with regression lines for the different levels of the factor insuranceAdditionally, add 95% point-wise confidence intervals for the regression lines (each confidence interval can have an α = 0.05). Finally, use different plotting symbols for male and female.

代写CS&Finance|建模|代码|系统|报告|考试

编程类:C++,JAVA ,数据库,WEB,Linux,Nodejs,JSP,Html,Prolog,Python,Haskell,hadoop算法,系统 机器学习

金融类统计,计量,风险投资,金融工程,R语言,Python语言,Matlab,建立模型,数据分析,数据处理

服务类:Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导

天才写手,代写CS,代写finance,代写statistics,考试助攻

E-mail:850190831@qq.com   微信:BadGeniuscs  工作时间:无休息工作日-早上8点到凌晨3点


如果您用的手机请先保存二维码到手机里面,识别图中二维码。如果用电脑,直接掏出手机果断扫描。

qr.png

 

    关键字:

天才代写-代写联系方式