c52815virginia tech va usa

上传人：环*** IP属地：四川上传时间：2020-10-14 格式：DOCX 页数：18 大小：489.76KB 积分：6 举报 版权申诉

已阅读5页，还剩13页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、更多数学建模资料请关注微店店铺“数学建模学习交流”/RHO6PSpATeam #52815Page 17 of 17Mr. Alpha Chiang :We are writing to you regarding the investment strategy of the Goodgrant Foundation. We have created a sufficient model for dispersing the donations to colleges in a way that would enhance the Return on I

2、nvestment (RoI) for Goodgrant and thus the benefits for the impacted students. This model was developed due in large part to the analysis of data provided by the U.S. National Center on Education Statistics. Considering the lack of an official RoI measurement regarding the funding of schools, we opt

3、ed to create our own before model creation. This measurement was a function of what we called Goodness obtained by and resources available to a given school. Goodness was assessed as a schools contribution to the U.S. Gross Domestic Product (GDP). Resources were defined as a schools annual income fr

4、om tuition and external funds. This RoI was optimized by the models simulations.Jobs added by a given school were calculated by considering the unemployment rates for each major and the number of undergraduate degrees awarded by a school. This was compared to the unemployment rate of those with bach

5、elors degrees in the respective majors across the United States. A positive relationship was then found between the amount of jobs added and the change in GDP. Therefore, schools were assigned high goodness if they awarded a large amount of degrees in fields that were undersaturated in the U.S. econ

6、omy.The $500,000,000 that Goodgrant plans to donate is disbursed on a yearly basis in the analysis. The simulations determine which schools to invest in and how much to invest in each school. The time duration over which the money should be invested is also included. These durations were a result of

7、 forecasted GDP and population data over the next five years.We first computed an overall model by looking at the relationship between resources and goodness for all the schools on which we had data. Then, using this first model as a starting point, we developed individual models for each school. Th

8、at is to say, we have an individual relationship between how much funding each school receives and how much its able to contribute to the economy based on that income, as well as a custom estimation for the rate at which decreasing marginal returns occur for each school.Based on these data, we estim

9、ated RoI over time by comparing the change in resources to the change in Goodness as a result of each respective investment. Then, we ran a number of simulations to allocate the individual investments that maximized the RoI. These investments are allocated until the overall donation amount is exhaus

10、ted, leaving us with a plan to best disburse the grants every year.After optimizing RoI, 73 schools were given at least some sort of investment. Most were small, possibly underdeveloped schools that would benefit greatly from a private contribution. There are also a few highly developed schools that

11、 were recommended for investment such as Princeton. The yearly investments in these schools are provided by the model.The recommended investments output from the model optimize an acceptable metric for gauging the added value to the invested schools. Multiple models were compared to one another to a

12、rrive at a concluded model for the recommended investments. We are confident that this model will realize its full potential.Sincerely,MCM Team MembersA Multilevel Bayesian Model based on Economic Contribution of University AlumniFebruary 1, 2016Contents1 Introduction32 Data Cleaning and Imputation

13、Schema32.1 Data Gathering, Cleaning and Filtering32.2 Imputation Schema43 Model Components43.1 Model Overview53.2 Contribution to GDP53.3 Final Cross Sectional Model Specification83.4 Funding Over Time83.5 Analysis of Linear Model Assumptions84 Solving the Optimization Problem104.1 Minimal Equivalen

14、t Example104.2 Full Optimization Scheme104.3 Algorithm115 Problem Statement and Special Considerations116 Model Analysis126.1 Why Bayes?126.2 Model Limitations126.3 Model Assumptions126.4 Model Strengths136.5 Potential Improvements137 Results from Model138 Conclusions158.1 Unconstrained Model158.2 C

15、onstrained Model168.3 Evaluation of Assumptions161 IntroductionStarting in July 2016, the Goodgrant Foundation will provide an overall donation amount of $500,000,000 over the course of 5 years ($100,000,000 per year) to various schools. This donation amount will be allotted to certain schools based

16、 on the schools return on investment (RoI). A schools projected RoI will determine whether or not the school will receive any investment whatsoever, how much money the school will receive, and the time duration over which the school will receive the money. Solid investments will have a positive effe

17、ct on overall student performance.While it is acknowledged that, in general, investment in tertiary education has lower RoI than invest- ment in primary education, there are still important returns to expanding higher education, especially in developed countries such as the United States 24. In part

18、icular, we will show the impact that investment in universities can have on GDP. Investing money in the development of skilled workers is not only beneficial to the economy, it is necessary for the US to compete globally. We hope that this document will help to show how best that can be achieved.Ret

19、urn on Investment was calculated as a function of the schools ”goodness” and the schools resources. Goodness and resources were determined using the provided data sets and external data sets. Before review, the data required various preparation techniques.2 Data Cleaning and Imputation SchemaAny ser

20、ious statistical analysis begins with substantial work on the data. The data provided had a panoply of issues ranging from unnatural variables, structural and seemingly random missingness, and impossible observations. Their rectification was prerequisite to model fitting and analysis.2.1 Data Gather

21、ing, Cleaning and FilteringFor our approach to evaluating RoI, the gathering of unemployment measurements for each major that was referred to in the given data sets was required. Multiple sources were referred to in collecting these measurements 6, 12, 19, 25, 26. It was also necessary to obtain ave

22、rage starting salary for each major 8,9, 13, 14, 15, 16, 17, 20.Initial data cleaning was performed in Python, where we re-organized variables into a more natural structure (for example, there were two tuition variables, one for public and one for private schools, as opposed to one tuition variable

23、and one private/public flag). Certain schools were found to have negative average tuition. We considered this to be measurement error, and removed from consideration for funding any school reporting negative average tuition.Next, we analyzed the missingness structure in R. The MI package (short for

24、Multiple Imputation) 23 has many useful tools for dealing with missing data problems. Among these is a heatmap of missing cases and observations clustered by missingness structure. Figure 1 shows the initial missingness structure.In Figure 1, the x axis represents cases, the y variables (the names o

25、f which are not intended to be legible). There are a number of schools which are missing the majority of variables, shown at the top of the missingness image. These schools were not considered for funding due to the lack of information, and were removed from the analysis. There is a large block on t

26、he left, representing the lack of SAT or ACT scores for a majority of schools. These variables were not included in our analysis because of this lack of data for most schools. On the right, a strange missingness pattern is observed, but this is due to the factors applyingFigure 1: Initial Missingnes

27、s Structure of Dataeither to public or private schools and is rectified in the python script. Any further missingness was handled by our imputation schema.2.2 Imputation SchemaWe identified two distinct kinds of missingness in the data, and modeled it with two different mecha- nisms.Firstly, data si

28、mply missing from the model were assumed to be Missing Completely at Random (MCAR). BUGS treats missing values as parameters and conducts inference on them automatically. As we are acting as Bayesians, we must specify priors for them, and choose to use a very disperse normal distribution centered at

29、 zero.We were not comfortable making this same assumption with the ”Privacy Suppressed” Data. In reporting the data, aggregators are required to censor data if they believe it may be possible to make reasonable inferences on personal data of any individual. Oftentimes, this is the case if the data r

30、epresents an aggregation from very few people. We believe that schools with few people may have different characterisitcs than schools in general, and so these were not assumed to be MCAR, but Missing at Random (MAR), that is, whether or not they are missing depends on other variables in our model.

31、These data were estimated using a multiple regression model depending on the complete data 21.3 Model ComponentsThere were two means of assessing how contributions to a school would increase its goodness. The first was a measurement of how well the school is doing at satisfying the nations needs wit

32、h respect to the skills of its graduated students. The second was simply a function of the median salary for that school.The resources of a given school were calculated by estimating the annual tuition income and the annual funds added externally to the school.3.1 Model OverviewThe data did not come

33、 with any kind of response; there was no measurement of how deserving a school was of money. As such, we had to create our own, which we will refer to as Goodness. Return on Investment was defined as a change in Goodness. We hypothesized that there would be a positive, concave down relationship betw

34、een the amount of resources available to a school and its Goodness. The relationship was expected to be positive because, in general, a school with access to more money is expected to be able to produce better results, and concave down due to the law of diminishing marginal returns.Specifically, res

35、ources are defined as the tuition per student, as well as the total enrollment for the school, and Goodness is a linear combination of a schools contribution to GDP (defined below) and the median salary of its alumni. We first computed an overall model:Goodness = ln(Tuition) + Enrollment + EE N (0,

36、I)Next, we computed an individual model for each school. The posterior for the model described above was given extra uncertainty (variance), and was used as a prior for each school 1.P (Ms|X) L(Ms|Xs)P (Ms|X)This gave us an estimated Goodness or RoI Curve for each school. Given these curves, the ide

37、a was to maximize RoI through a change in per capita dollar amount, theoretically added to each schools resources to change its effective tuition. A more precise overview of how schools were chose based on this model may be found in the optimization section.3.2 Contribution to GDPThe model related t

38、he proportion of students in each course of study to GDP in a several step process. The first step was to identify how many jobs each school is expected to add to the economy. This was calculated based on the unemployment in the fields which a university was preparing students to work in. For exampl

39、e, if there is higher than usual unemployment in Psychology, we would rate colleges producing many Psychology majors as producing negative jobs (which would lead to an increase in unemployment). On the other hand, if there is lower than average unemployment in Computer Science, schools with many com

40、puter scientists would be rated as producing many jobs. The calculations were conducted as follows (DegreesAwarded is the proportion of degrees awarded, Unemployment is the mean unemployment of all majors, Unemployment is the standard deviation of employment for all majors):Students = DegreesAwarded

41、(Enrollment)Unemployment = Unemployment UnemploymentUnemploymentJobsAdded = Students(DegreesAwarded)(Unemployment)If a school was found to have a net decrease in jobs, it was removed from consideration for fund-ing.We next map jobs added to the economy to a change in the unemployment rate for each m

42、ajor. This is simply defined as follows:f inalCurrentJobs = LaborForce(1 Unemploymentinitial) U nemployment= 1CurrentJobs + JobsAddedLaborForceU nemployment = U nemploymentfinal U nemploymentinitialIt is well known that there is a relationship between real gross domestic product (GDP) and unemploy-

43、ment 4, and there exist complicated economic models relating the two 22. However, we chose to fit our own simple model to unemployment and GDP data spanning from 1948 to 2015 obtained from the Bureau of Economic Analysis 11 and the Bureau of Labor Statistics 2, respectively, for use in our greater m

44、odel. We are not directly concerned with the relationship between GDP and unemployment, but with the relationship between a change in GDP and a change in unemployment. As such our model is as such:ln(GDP ) = Unemployment + EE N (0, I)We use the natural log of a change in GDP because while reasonable

45、 values for GDP have greatly increased since 1948, reasonable values for unemployment have remained about the same. So our model predicts not on an absolute change in GDP, but a percent change in GDP (when back-transformed from log space).ln(GDP ) = ln(GDPfinal) ln(GDPinitial) ln(GDP) = ln(GDPfinal/

46、GDPinitial)eln(GDP ) = GDPfinal/GDPinitialFigure 2: Regression of logGDP vs UnemploymentBut in our model, we want an absolute change in GDP. We obtain this using the following for- mula:GDP = eMLEGDPinitial GDPinitialGDP = eln(GDPfinal /GDPinitial )GDPinitial GDPinitialGDP = GDPfinal GDPGDPinitialin

47、itial GDPinitialGDP = GDPfinal GDPinitialThe model fit looks surprisingly good. See in Figure 2 a scatterplot of the chart with regression line plotted.This negative relationship was manipulated to evaluate the goodness that would be added by a possible Goodgrant contribution. The jobs that are adde

48、d to a particular field decrease the overall unemployment rate and thus increase the GDP. While keeping resources constant, an increase in GDP leads to a positive ROI.3.3 Final Cross Sectional Model SpecificationOnce the data were manipulated as desired in Python and R, BUGS was called from within R

49、 to sample from the posterior distribution. The model as specified to BUGS was as such:Goodness = 0 + 1ln(Price) + 2Enrollment + 3ln(Price)Enrollment + EE N (0, I) N (0, 0.00001) (0.00001, 0.00001)For Missing Completely at Random Data:For Missing at Random Data:Salary N (0, 0.00001)Salary = XWhere X

50、 is a matrix of other covariates and is a vector of coefficients relating the two.3.4 Funding Over TimeIn order to answer the question of when these funds should be dispersed, we simply recomputed our model posteriors for different values of economic data. We used forecasted econometric 10 and popul

51、ation data7 in our calculations. We assumed that the percent of the US population making up the Labor Force would stay constant over the next five years for our Job Added calculations. We did not inject any other different information into the model when computing the coefficients at different times

52、.3.5 Analysis of Linear Model AssumptionsWe will in this section perform an analysis of the residuals of the regression model in order to determine whether any of the assumptions of linear models have been validated.We first consider a normal Quantile-Quantile Plot (Figure 3). It does not look perfe

53、ctly normal, contrary to the assumptions which lead to our posterior estimates, but it is close enough.This predicted versus residual plot is more worrisome (Figure 4). We actually see a decrease in variance as the predicted value increases. This plot shows that our model is not perfect.The error of

54、 the prediction seems to increase as the predicted value increases (Figure 5). This plot again serves as evidence against this model the most correct.Figure 3: Q-Q Plot for Main ModelFigure 4: Residual vs Predicted Plot for Main ModelFigure 5: Actual vs Predicted Plot for Main Model4 Solving the Opt

55、imization ProblemOur model as described above is only part of our solution. Given it, we are able to calculate the most beneficial distribution of available funds. How this is done will be described here.4.1 Minimal Equivalent ExampleConsider an example (see Figure 6) in which we have only two schoo

56、ls competing for Goodgrants funds. School A has a coefficient of 1 to relate log of funding with Goodness, while School B has a coefficient of0.2. Both graphs above show what the change in Goodness is for a disbursement of 1.3 thousand dollars to each school would be.Obviously, the disbursement has

57、a larger effect on the school on the left (School A), so we would choose to fund it over the school on the right (School B). This is essentially how our optimization problem works; we iteratively check which school would most benefit from a small disbursement until we have exhausted our funds.The above example is just a simple case of our investments. When considering our real model, the problembecomeshowmuchoveralldisbursementeachschoolshouldbegivenfromthe Goodgrant Fo

人人文库> 全部分类> 应用文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

c52815virginia tech va usa

文档简介

温馨提示

最新文档

评论

c52815virginia tech va usa

文档简介

温馨提示

最新文档

评论

相关文档