有关失业率的时间序列分析和回归分析.doc

上传人：清*** IP属地：河南上传时间：2020-03-08 格式：DOC 页数：20 大小：413.50KB 积分：12 举报 版权申诉

已阅读5页，还剩15页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

The Unemployment RateSummaryUnemployment rate reflects the employment situation of a country or a district. It is possible that some countries have similar time series in unemployment rate. We want to divide these countries into serval classes, and then we can predict the employment rate of special class and study the factors influencing unemployment rate. We compute mean and variance of unemployment rate in different regions, and sort them independently. From the above results, we know that Spain, Poland and Bulgaria have bigger values in above two statistics than the others, which reveals these governments performed badly in employment. We use the System Cluster method to divide 35 countries into four levels according to pseudo F statistic. The same class data illustrate that these countries are common in unemployment rate, especially, Spain is a class alone. Taking unemployment data of China as example, we solve this problem with time series after making original data stationary. We select the appropriate model through AIC and SBC statistics, and then we get the trend equation. When testing the quality of fitting, we obtain the MPEA statistic which is 3.51%, thus we think the equation performs well. So we predict UR in 2011 and 2012, comparing with the measuring values, it is surprised that predicting values is same as measuring values. At end, we repeat the above process with the data of Japan and Australia. For struggling with multicollinearity and nonlinearity always existing in economic data, we use RFR (random forests regression) method. By comparing R-Square, MSE and MPEA, we obtain that the RFR is more accuracy than OLSR. In order to illustrate the importance of independent variables, we define a statistic as a criterion.We use Cluster Analysis, Time Series Analysis and the Random Forests Regression to analyze the unemployment rate among different regions.Keywords: System Cluster; Time Series; RFR; OLSR;1. IntroductionTwo years after we will find a job, at that time, the unemployment rate will be associated with us. Now let us explore the unemployment rate. Unemployment rate is an importantindex of the capital market, isa lagging indicatorcategory.The increase in unemploymentisa weak economy signal,can stimulate economic growth. On the contrary,the unemployment rate droppedwill be the formation ofinflation.We analyze the situation of all countries, the further research will get the level of unemployment rate in all countries, and then we predict the tendency of unemployment rate. 2. Notations Table1 indicatorsIndicator meanings GDP Gross Domestic ProductFS Public Finance ExpenditureM2 Currency SupplyEP The Economic Activity PC People Final Consumption CPI Consumer Price Index EG Energy ConsumptionUR Rate of Unemployment3. Mean and Variance of Unemployment Rate The mean of unemployment rate reflects the level of economic development, when getting the mean of UR among 35 countries, we can discover the fact that the mean of UR change from less than 2% to more than 14%, which illustrates clearly that levels of unemployment rate are different among these countries. The figure 1 can help us see that well. We are surprised at Spain whose unemployment rate achieves 15.39%. Figure1 the meanings of indicatorsIn reality, different countries own different level of development so that the value of UR is not a constant. Therefore, we want further to know the UR variance which reveal the economic stability. The figure2 reflexes that the volatility of variance of different countries.Figure2 Variance of UR At the last, we gain the top three of UR mean and variance, as follow:Table2 Top three of two indicators.countrySpainPolandBulgariamean15.3914.2213.76countrySpainBulgariaPolandvariance29.6618.7613.04 From table 2, we see that Spain, Poland and Bulgaria are the top three both of the two indicators, thats to say, these governments did badly in the field of employment, and their economic environment is unstable.4. System Cluster Analysis Now we analyze the unemployment rate, whether they are at a similar level, we utilize Cluster Analysis to classify 1.4.1 Express the distance between countriesThe follow formula describes the distance between two points using Euclidean distance, (1)Advantage of Euclidean distance is when the axis orthogonal rotation, the Euclidean distance is maintained.4.2 The distance among classesHere we select the average linkage method to express the distance for both of classes. Using average linkage method is a good way of all the samples between information. (2)Where and are respectively stand for the number of samples in classes and. The indexis the distance between the samples in and the samplein.4.3 The Classing StatisticWe set upas the total number of samples, dividing original sample intoclasses, each class havesamples and we derive the pseudostatistic: (3)Where The bigger Pseudo statistic value and the smaller value is the better effect of classification.We obtain the examination appeal through clustering analysis method, as shown in the different number of clusters of Statistics.Table3 the information of cluster Numbers of clusters 1 2 3 4Pseudo statistic 0 19.3 12.6 23.8Numbers of clusters 5 6 7 8Pseudo statistic 19.3 17.3 15.1 19.5From table 3, we find that when we divide original sample into four classes, the pseudo statistic achieves the best value as well as the value is not big, thus we choose the number of classes as four. In the end, we give the system diagram.Figure3 the class resultIf classifying original sample into four, then we come to the conclusion as follows:l the first classification: China, Japan, Austria, South Korea, China Hong Kong, Macao, Iceland, Holland, Norway, Thailand, Czech.l The second classification: Australia, Britain, Canada, New Zealand, Denmark, Hungary, Portugal, Sweden, Romania, Finland, American, France, Italy, Greece, Germany, Israel, Philippines, Turkey, Russia, Irelandl The third classification: Bulgaria, Poland, Venezuelal The forth classification: SpainWe know that the unemployment rate index reflects the overall state of the economy, and it is the economic data for each month with first published, so the unemployment rate index called all economic indicators of the crown jewel. It is for the monthly economic indicators sensitive on the market.5. Time Series ModelA time series2 is a set of observations, each one being record at a specific time, and observed data is a specification of joint distributions (or possibly only means and covariance) of a sequence of random variablesof which is postulated to be realization.5.1 Stationary Test If we want to make a good time series model, we should recognize it firstly. A key role in time series analysis is played by process whose properties, or some of them, do not vary with time.We choose the data of China as example and curve the figure4Figure 4 the scatter of URForm figure4, we can find that the rate of unemployment of China has increased trend with the time going, thus we need to make it stationary before we construct model.5.2 Stationary processWe proceed to make the time sequence stationary with the first difference, and we give the scatter diagram；Figure5 the scatter after differenceFrom the figure5, we perceive the data tend to stationary, which suggest that we can construct time series model.5.3 The ARMA Model We make an ARMA to fit the data of China UR, the model is as follows: (4)The parameter of formula is an automatic regression parameter of ARMA, parameter is a moving average. Parameter is a Stochastic Process with a zero mean and a normal white noise, thats to say, Especially, when, the model is same as, the formula is as follows: (5) In order to make sure the parameters of ARMA, we give the autocorrelation figure 6. Figure 6 the PACF and ACFFrom the figure6, we can find the autocorrelation coefficient is first truncation and the first partial correlation coefficient is two times than standard error. So we preliminary make it as AR (1) or MA (1).To make sure which model is better, we compute some statistics of two models.The information of model AR(1) Table4 the AR information of AIC and BICConditional Least Squares EstimationParameterEstimateStandard ErrorT-ValueApproxPr |t|LagMU2.472360.2120911.66.00010AR（1,1）1.000000.0412524.25 |t|Lag MU3.345100.1821118.37.00010MA(1,1)-0.857770.12761-6.72 ChiSqAutocorrelations66.9650.22390.4690.072-0.055-0.0090.0400.2261213.90110.23860.2960.2720.074-0.045-0.122-0.0641816.85170.4643-0.0280.0000.0950.009-0.117-0.021 5.4 Prediction of China Unemployment Rate The lag equal to 6, 12 and 18, the p-values are more than 0.05, we regard the residuals as white noise, and the model extracts the information enough. Now, we predict the rate of unemployment of China, we give the information of prediction. Table 8 the information of predictiontimeForecastStd.Error95% Confidence interval20114.08250.17913.73154.433520124.12980.30153.53884.720820134.20000.40153.41314.9870 Based on the above, we analyze the UR of Japan and Australia in the similar method. 5.5 predictions of Japan and Australia Here, we give the UR scatter diagram of Australia Figure 8 the scatter of Australia UR Based on the above steps, we get the formula of Australia (7) Here, we fit the values above and measured them,Figure 9 the effect of fittingAnd then we give the table of predicted informationTable9 the information predicted of AustraliaYearsForecastStd Error95% Confidence Interval20114.95750.64783.6879 6.227220124.73040.96172.84546.615320134.50471.19962.15366.8559The scatter diagram of Japan UR is as follows Figure 10 the scatter of Japan URRepeat the above step, we derive the formula: (7)Also, here, we give the figure of fitted values and measured valuesFigure 11 the quality of Japan UR fittedThe predicted data contained in the table Table10 The prediction information of JapanTimeForecastStd Error95% Confidence Interval20114.86830.37214.1389 5.597720124.86830.68823.51956.217020134.86830.89923.10586.63086. Random Forests Regression ModelIn order to illustrate the relationship between UR and the factors influencing unemployment rate directly, we choose GDP, FS, M2, EP, PC, CPI, EG as the indicators ,according to the paper written by ZHAI Lun3. 6.1 Reasons of Choosing RFR l A regression model may be able to reveal the relationship between dependent variable and independent variable clearly, but in the economic system, multicollinearity always destroys ordinary least square regression. l We cannot sure the linear relationship between response variable and dependent variable.6.2 The Random Forests Theory The random forests 4 is putted forward by Leo Breiman, which consists of the combined model, here, vector (the regression tree) conducted by the bootstrap importance sampling. The predictive variables is the numeric variables, the RFR (random forests regression) model is a multivariate nonlinear models. The predicted values given by RFR is the mean of k trees , the training set is independent and it sampled from the set of , the mean square error of that is a numeric value, here we give the MSE : (8)The process of RFR algorithm(1) The number of original sample data is n, sampling b sets with replacement and random by the technology of bootstrap, and then we conduct b trees, we regard the out-of-bag consists of the data which is not sampled at a time as the set tested.(2) We assumeis the variables number of original data, choosing sample randomvariables of as branch of alternative variables from each point of each regression tree, then pick up optimum branch on the basis of branching criterion of goodness. Parameter is In the Random forest regression.(3) Every tree branches from the top to the bottom by recursion, and setting the least nodes of leaf:, we use the least nodes as the condition of end as the growth of regression tree.(4) The random forests consist of b regression trees, the evaluation criterion of regression is the mean square error of out-of-bag, her, we give the formula: (10)6.3 Multicollinearity and NonlinearityBefore we use random forests regression, we want to prove that the simpler method (ordinary least square regression) performs badly in it. We make an ordinary least square regression model, and then we prove that it is not appropriate. For removing the influence of unit, we use formula to transform original data.Table 11 the OLSR information Estimate Std. Error t value Pr(|t|)Intercept -0.9628 1.6338 -0.589 0.5666GDP -6.9553 4.1914 -1.659 0.1229FS 2.4410 1.8815 1.297 0.2189M2 -1.0729 1.4100 -0.761 0.4614EP 0.9647 0.6550 1.473 0.1665FC 3.9401 4.3912 0.897 0.3872CPI 0.1619 0.1205 1.343 0.2041EG 1.6347 0.5929 2.757 0.0174Multiple R-squared: 0.941,Adjusted R-squared: 0.9066F-statistic: 27.35 on 7 and 12 DF, p-value: 1.846e-06From the table11, we can see that the regression equation only passes F-test; all parameters estimated fail in t-test, so we suspect that there is multicollinearity among variables. And then, we give the VIF of variablesTable 12Indicator GDP FS M2 EP FC CPI EGVIF 2454 501 258 70 2626 2 59The table illustrates the fact that every VIF exceeds 10 except for CPI, so the multicollinearity may lead to incorrect conclusion in we insist using simple regression.Now we try to illustrate that there is nonlinear relationship between response variable and independent variable.Figure 12The figure 12 suggests that there is nonlinearity between response variable and in dependent variable.6.4 Results of RFRNow we try to solve this problem with RFR and compare it with ordinary regression.Table 13: the information of fitting Method R-Squared MSE MPEAOLSR 0.941 0.1165 2.78%RFR 0.954 0.022 1.47%From the table, we know that the OLSR exceeds RFR in three indicators that can measure the fitting effect of a regression method, since we think the RFR is better. The figure can help us see the effect of RFR directly, here, we give it.Figuer12When we get a regression equation with OLSR, we can weigh the importance of the coefficients of variables, the bigger the coefficients is, the more important the variable to dependent, but with the method of RFR, we cannot get the quantified formula, so we use the following method 6.5 Variable importance measure The RFR of VIM, based on residual mean square (RMS) of permutation random permutation measure. The specific process is as follow (1) Set each regression tree model for each Bootstrap sample, and take the same model to predict corresponding OOB, gain RMS of OOB of ,note (2) Variable random permutation on OOB samples of b, forming new OOB test samples, and then, predicting new OOB samples by built random forest, similar to the method of the first calculation step, we get the following matrix, (11)(3)Using subtract the third row vector matrix corresponding, and then divided by the standard error of the mean, the obtained values is importance of grading for (12)According to the above process, we give the bar diagram of coefficients importance.Figure 13Form the figure13, we know that M2 ranked most important, followed by FC, EG, FS, GDP, EP, CPI. The result is consitent with reality, we know the energy consumption can reflects the exent of economic prosperity, the more final consumption, the more posts will be created. With the market of finance development, the supply of currency is playing an important role in econo

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

有关失业率的时间序列分析和回归分析.doc

文档简介

温馨提示

最新文档

评论

有关失业率的时间序列分析和回归分析.doc

文档简介

温馨提示

最新文档

评论

相关文档