




已阅读5页,还剩10页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Chapter 6: Linear Regression and CorrelationCHAPTER 6LINEAR REGRESSION AND CORRELATIONPageContents6.1Introduction1006.2 Curve Fitting1006.3 Fitting a Simple Linear Regression Line1016.4 Linear Correlation Analysis1056.5 Spearmans Rank Correlation109Exercise112Objectives:In business and economic applications, frequently interest is in relationships between two or more random variables, and the association between variables is often approximated by postulating a linear functional form for their relationship.After working through this chapter, you should be able to:(i)understand the basic concepts of regression and correlation analyses;(ii)determine both the nature and the strength of the linear relationship between two variables.6.1IntroductionThis chapter presents some statistical techniques to analyze the association between two variables and develop the relationship for prediction.6.2Curve FittingVery often in practice a relation is found to exist between two (or more) variables.It is frequently desirable to express this relationship in mathematical form by determining an equation connecting the variables.To aid in determining an equation connecting variables, a first step is the collection of data showing corresponding values of the variables under consideration.linearnon-linearFigure 1. Scatter diagram6.3Fitting a Simple Linear Regression LineTo determine from a set of data, a line of best fit to infer the relationship between two variables.6.3.1The Method of Least SquaresFigure 2. Sample observations and the sample regression lineDetermining the line of “best fit”:by minimizing .To minimize , we apply calculus and find the following “normal equations”:Solve (1) and (2) simultaneously, we have:Notes:1.The formula for calculating the slope b is commonly written aswhich the numerator and denominator then reduce to formulasandrespectively; and is the y-intercept of the regression line.2.When the equation is calculated from a sample of observations rather than from a population, it is referred as a sample regression line.Example 1Suppose an appliance store conducts a 5-month experiment to determine the effect of advertising on sales revenue and obtains the following resultsMonthAdvertising Expenditure(in $1,000)Sales Revenue(in $10,000)111221332442554Find the sample regression line and predict the sales revenue if the appliance store spends 4.5 thousand dollars for advertising in a month.From the data, we find thatn=5, =15, =10, =37, =55.Hence and .Then the slope of the sample regression line isand the y-intercept isThe sample regression line is thus=So if the appliance store spends 4.5 thousand dollars for advertising in a month, it can expect to obtain =ten-thousand dollars as sales revenue during that month.Example 2Obtain the least squares prediction line for the data below:1011.2920.81101.01201.3900.7820.8931.0750.6910.91051.1SumExample 3Find a regression curve in the form for the following data:123456789131417181919206.4Linear Correlation AnalysisCorrelation analysis is the statistical tool that we can use to determine the degree to which variables are related.6.4.1Coefficient of Determination, r2Problem: how well a least squares regression line fits a given set of paired data?Figure 3. Relationships between total, explained and unexplained variationsVariation of the y values around their own mean = Variation of the y values around the regression line = Regression sum of squares = We have:.Denoting by , then.r2, the coefficient of determination, is the proportion of variation in y explained by a sample regression line.For example, r2 =0.9797; that is, 97.97% of the variation in y is due to their linear relationship with x.6.4.2Correlation Coefficientand .Notes:The formulas for calculating (sample coefficient of determination) and r (sample coefficient of correlation) can be simplified in a more common version as follows:Since the numerator used in calculating r and b are the same and both denominators are always positive, r and b will always be of the same sign. Moreover, if r=0 then b=0; and vice versa.Example 4Calculate the sample coefficient of determination and the sample coefficient of correlation for example 1. Interpret the results.From the data we getn=5, =15, =10, =37, =55, =26.Then, the coefficient of determination is given byandr = = implies that of the sample variability in sales revenue is explained by its linear dependence on the advertising expenditure. r = indicates a very strong positive linear relationship between sales revenue and advertising expenditure.Example 5Interest rates (x) provide an excellent leading indicator for predicting housing starts (y). As interest rates decline, housing starts increase, and vice versa. Suppose the data given in the accompanying table represent the prevailing interest rates on first mortgages and the recorded building permits in a certain region over a 12-year span.Year198519861987198819891990Interest rates (%)6.56.06.57.58.59.5Building permits216529842780194017501535Year199119921993199419951996Interest rates (%)10.09.07.59.011.515.0Building permits962131020501695856510(a)Find the least squares line to allow for the estimation of building permits from interest rates.(b)Calculate the correlation coefficient r for these data.(c)By what percentage is the sum of squares of deviations of building permits reduced by using interest rates as a predictor rather than using the average annual building permits as a predictor of y for these data?6.5Spearmans Rank CorrelationOccasionally we may need to determine the correlation between two variables where suitable measures of one or both variables do not exist.However, variables can be ranked and the association between the two variables can be measured by :, where d is the difference of rank between x and y.if closes to 1: strong positive associationif closes to -1: strong negative associationif closes to 0: no associationNotes:1.The two variables must be ranked in the same order, giving rank 1 either to the largest (or smallest) value, rank 2 to the second largest (or smallest) value and so forth.2.If there are ties, we assign to each of the tied observations the mean of the ranks which they jointly occupy; thus, if the third and fourth ordered values are identical we assign each the rank of , and if the fifth, sixth and seventh ordered values are identical we assign each the rank of .3.The ordinary sample correlation coefficient r can also be used to calculate the rank correlation coefficient where x and y represent ranks of the observations instead of their actual numerical values.Example 6Calculate the rank correlation coefficient for example 1.Month(1)Value x(2)rank (x)(3)Value y(4)rank (y)(5)d(6)=(3)-(5)(7)11111.5-0.50.2522211.50.50.2533323.5-0.50.2544423.50.50.255554500By formula = indicates a correlation between the rankings of advertising expenditure and sales revenue. Note that if we apply the ordinary formula of correlation coefficient r to calculate the correlation coefficient of the rankings of the variables in example 6, the result would be slightly different. Sincen=5, =15, =15, =54,=55, =54,then r =which is very close to the result of .Example 7Calculate the Spearmans rank correlation, , between x and y for the following data:rankrank52105414476428496384508498Example 8The data in the table represent the monthly sales and the promotional expenses for a store that specializes in sportswear for young women.MonthSales (in $1,000)Promotional expenses (in $1,000)162.43.9268.54.8370.25.5479.66.0580.16.8688.77.7798.67.98104.39.09106.59.210107.39.711115.810.912120.111.0(a)Calculate the coefficient of correlation between monthly sales and promotional expenses.(b)Calculate the Spearmans rank correlation between monthly sales and promotional expenses.(c)Compare your results from part a and part b. What do these results suggest about the linearity and association between the two variables?EXERCISE:LINEAR REGRESSION AND CORRELATION1.The grades of a class of 9 students on a midterm report (x) and on the final examination (y) are as follows:x775071728194969967y826678344785999968(a)Find the equation of the regression line.(b)Estimate the final examination grade of a student who received a grade of 85 on the midterm report but was ill at the time of the final examination.2.(a)From the following information draw a scatter diagram and by the method of least squares draw the regression line of best fit.Volume of sales (thousand units)5678910Total expenses (thousand $)747782869295(b)What will be the total expenses when the volume of sales is 7,500 units?(c)If the selling price per unit is $11, at what volume of sales will the total income from sales equal the total expenses?3.The following data show the unit cost of producing certain electronic components and the number of units produced:Lot size, x501002505001000Unit cost, y$108$53$24$9$5It is believed that the regression equation is of the form.By simple linear regression technique or otherwise estimate the unit cost for a lot of 400 components.4.Two variables x and y are related by the law:.State how a and b can be estimated by
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 第14课 誕生日 教学设计-2024-2025学年初中日语人教版第一册
- 《月相变化的规律》(教学设计)2023-2024学年教科版三年级下册科学
- 第四課 デジタルカメラ说课稿-2025-2026学年新编日语第三册重排本-新编日语
- 蒸汽清洗油烟机培训课件
- 常州国企专招考试真题及答案
- 消防干部国考题目及答案
- 2025关于汽车销售代理的合同范本
- 闲鱼题目大全及答案
- 餐饮加盟商培训考试题及答案
- 2025综合商品销售合同模板大全
- 作业条件危险性评价法(LEC法)介绍
- 项目部刻章申请书
- 版挖掘机租赁合同
- 语言学概论全套教学课件
- JJF 1265-2022生物计量术语及定义
- GB/T 8118-2010电弧焊机通用技术条件
- GB/T 17421.7-2016机床检验通则第7部分:回转轴线的几何精度
- 电工技能测试
- 药事管理学全套课件
- 社区心理学课件
- 质量整改通知单(样板)
评论
0/150
提交评论