已阅读5页,还剩13页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Team#52815Page1of 17 Mr. Alpha Chiang : We are writing to you regarding the investment strategy of the Goodgrant Foundation. We have created a sufficient model for dispersing the donations to colleges in a waythat wouldenhance the Return on Investment (RoI) for Goodgrant and thus the benefits for the impacted students. This model was developed due in large part to the analysis of data provided by the U.S. National Center on Education Statistics. Considering the lack of an official RoI measurement regarding the funding of schools, we opted to create our own before model creation. This measurement was a function of what we called Goodness obtained by and resources available to a given school. Goodness was assessed as a schools contribution to the U.S. Gross Domestic Product (GDP). Resources were defined as a schools annual income from tuition and external funds. This RoI was optimized by the models simulations. Jobs added by a given school were calculated by considering the unemployment rates for each major and the number of undergraduate degrees awarded by a school. This was compared to the unemployment rate of those with bachelors degrees in the respective majors across the United States. A positive relationship was then found between the amount of jobs added and the change in GDP. Therefore, schools were assigned high goodness if they awarded a large amount of degrees in fields that were undersaturated in the U.S. economy. The $500,000,000 that Goodgrant plans to donate is disbursed on a yearly basis in the analysis. The simulations determine which schools to invest in and how much to invest in each school. The time duration over which the money should be invested is also included. These durations were a result of forecasted GDP and population data over the next five years. We first computed an overall model by looking at the relationship between resources and goodness for all the schools on which we had data. Then, using this first model as a starting point, we developed individual models for each school. That is to say, we have an individual relationship between how much funding each school receives and how much its able to contribute to the economy based on that income, as well as a custom estimation for the rate at which decreasing marginal returns occur for each school. Based on these data, we estimated RoI over time by comparing the change in resources to the change in Goodness as a result of each respective investment. Then, we ran a number of simulations to allocate the individual investments that maximized the RoI. These investments are allocated until the overall donation amount is exhausted, leaving us with a plan to best disburse the grants every year. After optimizing RoI, 73 schools were given at least some sort of investment. Most were small, possibly underdeveloped schools that would benefit greatly from a private contribution. There are also a few highly developed schools that were recommended for investment such as Princeton. The yearly investments in these schools are provided by the model. The recommended investments output from the model optimize an acceptable metric for gauging the added valueto the invested schools. Multiple models were compared to one another to arrive at a concluded modelfortherecommendedinvestments. Weareconfidentthatthismodel willrealizeitsfullpotential. Sincerely, MCM Team Members Team#52815Page2of 17 A Multilevel Bayesian Model based on Economic Contribution of UniversityAlumni February 1, 2016 Contents 1Introduction3 2Data Cleaning and Imputation Schema3 2.1Data Gathering, Cleaning and Filtering. 3 2.2Imputation Schema.4 3Model Components4 3.1Model Overview.5 3.2Contribution to GDP.5 3.3Final Cross Sectional Model Specification.8 3.4Funding Over Time.8 3.5Analysis of Linear Model Assumptions.8 4Solving the Optimization Problem10 4.1Minimal Equivalent Example . 10 4.2Full Optimization Scheme . 10 4.3Algorithm .11 5Problem Statement and Special Considerations11 6Model Analysis12 6.1Why Bayes?.12 6.2Model Limitations .12 6.3Model Assumptions.12 6.4Model Strengths.13 6.5Potential Improvements.13 7Results from Model13 8Conclusions15 8.1Unconstrained Model.15 8.2Constrained Model.16 8.3Evaluation of Assumptions.16 Team#52815Page3of 17 1Introduction Starting in July 2016, the Goodgrant Foundation will provide an overall donation amount of $500,000,000 over the course of 5 years ($100,000,000 per year) to various schools. This donation amount will be allotted to certain schools based on the schools return on investment (RoI). A schools projected RoI will determine whether or not the school will receive any investment whatsoever, how much money the school will receive, and the time duration over which the school will receive the money. Solid investments will have a positive effect on overall student performance. Whileitis acknowledged that, in general, investment in tertiary education has lower RoI than invest- ment in primary education, there are still important returns to expanding higher education, especially in developed countries such as the United States 24. In particular, we will show the impact that investment in universities can have on GDP. Investing money in the development of skilled workers is not only beneficial to the economy,itis necessary for the US to compete globally. We hope that this document will help to show howbest that can be achieved. Return on Investment was calculated as a function of the schools ”goodness” and the schools resources. Goodness and resources were determined using the provided data sets and external data sets. Before review, the data required various preparation techniques. 2Data Cleaning and Imputation Schema Any serious statistical analysis begins with substantial work on the data. The data provided had a panoply of issues ranging from unnatural variables, structural and seemingly random missingness, and impossible observations. Their rectification was prerequisite to model fitting and analysis. 2.1Data Gathering, Cleaning and Filtering For our approach to evaluating RoI, the gathering of unemployment measurements for each major that was referred to in the given data sets was required. Multiple sources were referred to in collecting these measurements 6, 12, 19, 25, 26.Itwas also necessary to obtain average starting salary for each major 8,9,13,14,15, 16,17, 20. Initial data cleaning was performed in Python, where we re-organized variables into a more natural structure (forexample, thereweretwotuition variables,one for publicand one forprivateschools,asopposed to one tuition variable and one private/public flag). Certain schools were found to have negative average tuition. Weconsidered this to be measurement error, and removed from consideration for funding any school reporting negative average tuition. Next, we analyzed the missingness structure in R. The MI package (short for Multiple Imputation) 23 has many useful tools for dealing with missing data problems. Among these is a heatmap of missing cases and observations clustered by missingness structure. Figure 1 shows the initial missingness structure. In Figure 1, the x axis represents cases, the y variables (the names of which are not intended to be legible). There are a number of schools which are missing the majority of variables, shown at the top of the missingness image. These schools were not considered for funding due to the lack of information, and were removed from the analysis. There is a large block on the left, representing the lack of SAT or ACT scores for a majority of schools. These variables were not included in our analysis because of this lack of data for most schools. On the right, a strange missingness pattern is observed, but this is due to the factors applying Team#52815Page4of 17 Figure 1: Initial Missingness Structure of Data either to public or private schools and is rectified in the python script. Any further missingness was handled by our imputation schema. 2.2ImputationSchema We identified two distinct kinds of missingness in the data, and modeleditwith two different mecha- nisms. Firstly, data simply missing from the model were assumed to be Missing Completely at Random (MCAR). BUGS treats missing values as parameters and conducts inference on them automatically. As we are acting as Bayesians, we must specify priors for them, and choose to use a very disperse normal distribution centered at zero. We were not comfortable making this same assumption with the ”PrivacySuppressed”Data.In reporting the data, aggregators are required to censor data if they believeitmay be possible to make reasonable inferences on personal data of any individual. Oftentimes, this is the case if the data represents an aggregation from very few people. We believe that schools with few people may have different characterisitcs than schools in general, and so these were not assumed to be MCAR, but Missing at Random (MAR), that is, whether or not they are missing depends on other variables in our model. These data were estimated using a multiple regression model depending on the complete data 21. 3Model Components There were twomeans of assessing howcontributionsto a school would increase its goodness. The first was a measurement of how well the school is doing at satisfying the nations needs with respect to the skills of its graduated students. The second was simply a function of the median salary for that school. Team#52815Page5of 17 The resources of a given school were calculated by estimating the annual tuition income and the annual funds added externally to the school. 3.1Model Overview The data did not come with any kind of response; there was no measurement of how deserving a school was of money. As such, we had to create our own, which we will refer to as Goodness. Return on Investment was defined as a change in Goodness. We hypothesized that there would be a positive, concave down relationship between the amount of resources available to a school and its Goodness. The relationship was expected to be positive because, in general, a school with access to more money is expected to be able to produce better results, and concave down due to the law of diminishing marginal returns. Specifically, resources are defined as the tuition per student, as well as the total enrollment for the school, and Goodness is a linear combination of a schools contribution to GDP (defined below) and the median salary of its alumni. We first computed an overall model: Goodness=ln(Tuition)+Enrollment+ N(0,I) Next, we computed an individual model for each school. The posterior for the model described above was given extra uncertainty (variance), and was used as a prior for each school 1. P (Ms|X)L(Ms|Xs)P (Ms|X) Thisgave us an estimatedGoodness or RoI Curvefor each school. Giventhese curves,the idea wasto maximize RoI through a change in per capita dollar amount, theoretically added to each schools resources to change its effective tuition. A more precise overview of how schools were chose based on this model may be found in the optimization section. 3.2Contribution to GDP The model related the proportion of students in each course of study to GDP in a several step process. The first step was to identify how many jobs each school is expected to add to the economy. This was calculated based on the unemployment in the fields which a university was preparing students to work in. For example, if there is higher than usual unemployment in Psychology, we would rate colleges producing many Psychology majors as producing negative jobs (which would lead to an increase in unemployment). On the other hand, if there is lower than average unemployment in Computer Science, schools with many computer scientists would be rated as producing many jobs. The calculations were conducted as follows (DegreesAwardedis the proportion of degrees awarded, Unemployment is the mean unemployment of all majors, Unemploymentis the standard deviation of employment for all majors): Students = DegreesAwarded(Enrollment) Team#52815Page6of 17 Unemployment= UnemploymentUnemployment U nemployment JobsAdded= Students(DegreesAwarded)(Unemployment) ing. Ifa school was found to have a net decrease in jobs,itwas removed from consideration for fund- We next map jobs added to the economy to a change in the unemployment rate for each major. This issimplydefinedasfollows: CurrentJobs = LaborForce(1Unemploymentinitial) Unemploymentfinal=1 CurrentJobs+JobsAdded LaborF orce Unemployment= UnemploymentfinalUnemploymentinitial Itis well known that there is a relationship between real gross domestic product (GDP) and unemploy- ment 4, and there exist complicated economic models relating the two 22. However, we chose tofitour own simple model to unemployment and GDP data spanning from 1948 to 2015 obtained from the Bureau of Economic Analysis 11 and the Bureau of Labor Statistics 2, respectively, for use in our greater model. We are not directly concerned with the relationship between GDP and unemployment, but with the relationship betweena change in GDP and a changein unemployment. As such our model is as such: ln(GDP) = Unemployment+ N(0,I) We use the natural log of a change in GDP because while reasonable values for GDP have greatly increased since 1948, reasonable values for unemployment have remained about the same. So our model predicts not on an absolute change in GDP, but a percent change in GDP (when back-transformed from log space). ln(GDP)=ln(GDPfinal)ln(GDPinitial) ln(GDP) =ln(GDPfinal/GDPinitial) eln(GDP)=GDPfinal/GDPinitial Team#52815Page7of 17 Figure2: RegressionoflogGDP vsUnemployment But in our model, we want an absolute change in GDP. We obtain this using the following for- mula: GDP =eMLEGDPinitialGDPinitial GDP =eln(GDPfinal/GDPinitial)GDPinitialGDPinitial GDP = GDPfinal GDPinitial GDPinitialGDPinitial GDP =GDPfinalGDPinitial The model fit looks surprisingly good. See in Figure 2 a scatterplot of the chart with regression line plotted. Thisnegativerelationship wasmanipulated toevaluatethegoodness thatwouldbe added bya possible Goodgrant contribution. The jobs that are added to a particular field decrease the overall unemployment rate and thus increase the GDP. While keeping resources constant, an increase in GDP leads to a positive ROI. Team#52815Page8of 17 3.3Final Cross Sectional Model Specification Once the data were manipulated as desired in Python and R, BUGS was called from within R to sample from the posterior distribution. The model as specified to BUGS was as such: Goodness=0+1ln(Price)+2Enrollment+3ln(Price)Enrollment+ N(0,I) N(0,0.00001) (0.00001,0.00001) For Missing Completely at Random Data: SalaryN(0,0.00001) For Missing at Random Data: Salary=- X Where X is a matrix of other covariates and is a vector of coefficients relating the two. 3.4Funding Over Time In order to answer the question of when these funds should be dispersed, we simply recomputed our model posteriors for different values of economic data. We used forecasted econometric 10 and population data 7 in our calculations. We assumed that the percent of the US population making up the Labor Force would stay constant over the next five years for our Job Added calculations. We did not inject any other different information into the model when computing the coefficients at different times. 3.5Analysis of Linear Model Assumptions We will in this section perform an analysis of the residuals of the regression model in order to determine whether any of the assumptions of linear models have been validated. We first consider a normal Quantile-Quantile Plot (Figure 3). It does not look perfectly normal, contrarytotheassumptionswhichleadtoourposteriorestimates,butitiscloseenough. This predicted versus residual plot is more worrisome (Figure 4). We actually see a decrease in variance as the predicted value increases. This plot shows that our model is not perfect. The error o
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 8 《手术室护理风险管理在预防手术患者术后并发症中的实践研究》教学研究课题报告
- 2026年塔城职业技术学院单招职业技能考试必刷测试卷及答案解析(夺冠系列)
- 2026年山东力明科技职业学院单招职业倾向性测试题库带答案解析
- 2026年广东省湛江市单招职业适应性考试必刷测试卷及答案解析(夺冠系列)
- 2026年山东工程职业技术大学单招职业技能考试题库及答案解析(夺冠系列)
- 2026年扬州市职业大学单招职业技能测试题库附答案解析
- 2026年云南商务职业学院单招职业倾向性考试题库及答案解析(夺冠系列)
- 2026年洛阳商业职业学院单招职业适应性测试题库及答案解析(名师系列)
- 2026年安徽省滁州市单招职业倾向性考试必刷测试卷附答案解析
- 2026年上海海事大学单招职业倾向性测试必刷测试卷及答案解析(名师系列)
- 八年级上名著《红岩》第6章(讲练测)
- 劳动力调查业务知识培训课件
- TDBC2025可信数据库发展大会:中国信通院2025上半年“可信数据库”系列标准发布及解读
- 汽车技术发展与创新应用
- 2025年党纪党规知识测试题(含答案)
- 体检中心护理改进
- 回转窑石灰煅烧工安全技术操作规程
- 消防车辆安全事故课件
- 太空课件教学课件
- 设计院风电项目-初步设计手册说明书
- 挂篮施工安全培训课件
评论
0/150
提交评论