大数据经济学_第1页
大数据经济学_第2页
大数据经济学_第3页
大数据经济学_第4页
大数据经济学_第5页
已阅读5页,还剩30页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、u 引引 言言u 什么是大数据什么是大数据 u 大数据的使用大数据的使用u 大数据下的经济和政策分析大数据下的经济和政策分析u大数据的挑战大数据的挑战u数据科学家应具备的条件数据科学家应具备的条件u自从美国奥巴马总统将大数据列为美国科技发展战略以自从美国奥巴马总统将大数据列为美国科技发展战略以来,大数据受到社会各界和媒体的高度关注来,大数据受到社会各界和媒体的高度关注u几年之前人们没有听说的几年之前人们没有听说的“数据科学家数据科学家”突然变得异常突然变得异常火爆,社会对火爆,社会对“数据科学家数据科学家”的需求异常高涨,他们的薪的需求异常高涨,他们的薪酬也随之水涨船高。酬也随之水涨船高。u但

2、但“数据科学家数据科学家”的供给却非常有限,为此,一些北美的供给却非常有限,为此,一些北美高校开始设立数据科学本科和硕士学位项目。高校开始设立数据科学本科和硕士学位项目。u大数据如此受欢迎,还应归功于奥巴马的总统竞选;他大数据如此受欢迎,还应归功于奥巴马的总统竞选;他们通过数据科学家对大量数据的分析,获取了募捐和广告们通过数据科学家对大量数据的分析,获取了募捐和广告方面的优势方面的优势 u数据科学家成功的预测了奥巴马竞选连任。数据科学家成功的预测了奥巴马竞选连任。u微软的数据科学家成功地预测世界杯比赛结果,击败了微软的数据科学家成功地预测世界杯比赛结果,击败了所有其他的预测,包括所有其他的预测

3、,包括IBM 的数据科学家。的数据科学家。 u数据科学家在中国也有巨大需求。数据科学家在中国也有巨大需求。u经济理论一般高度简化,假设经济理论一般高度简化,假设“其他因素不变其他因素不变”。在实。在实际中,际中,“其他因素其他因素”是变化的。是变化的。 如果如果“其他因素其他因素”变化变化聊,理论结果还有效吗?这就是所谓的比较分析,经济理聊,理论结果还有效吗?这就是所谓的比较分析,经济理论很少做,也很难做。论很少做,也很难做。u在实证分析中,我们深知在实证分析中,我们深知“其他因素其他因素”都在变,但因缺都在变,但因缺乏数据,不得不忽略他们。乏数据,不得不忽略他们。u以前,经济活动的数据记录下

4、来的很少,有的大都是总以前,经济活动的数据记录下来的很少,有的大都是总体数据。今天,计算机技术和英特网改变了一切。体数据。今天,计算机技术和英特网改变了一切。u当你在百度上搜索时,搜素的关键词及访问的网站都记当你在百度上搜索时,搜素的关键词及访问的网站都记录在案。当你在淘宝上逛街的时候,每一项游览活动和每录在案。当你在淘宝上逛街的时候,每一项游览活动和每一项购买都记录在案。当你在网上阅读、看录像、聊天或一项购买都记录在案。当你在网上阅读、看录像、聊天或者查看你的个人金融状况是,你的行为都记录在案。者查看你的个人金融状况是,你的行为都记录在案。u短信、微信、推特、手机、超市的摄像机和取付款机、短

5、信、微信、推特、手机、超市的摄像机和取付款机、银行的提款机,道路口和各种场合的摄像镜头等各种电子银行的提款机,道路口和各种场合的摄像镜头等各种电子通讯设备都留下了数据的脚印。通讯设备都留下了数据的脚印。 u大数据是通过各种手段记录下来的数据;它有可能是实大数据是通过各种手段记录下来的数据;它有可能是实时的、非结构化的、复杂的大量数据时的、非结构化的、复杂的大量数据u例一,例一,Consider the data collected by retail stores. Few decades ago, stores might have collected data on daily sales

6、, and it would have been considered high quality if the data was split by products or product categories. Nowadays, scanner data makes it possible to track individual purchases and item sales, capture the exact time at which they occur and the purchase histories of individuals, and use electronic in

7、ventory data to link purchases to specific shelf locations uExample 1. Internet retailers observe not just this information, but can trace individuals behavior around the sale, including his or her initial search queries, items viewed and discarded, recommendations and promotions that were shown and

8、 subsequent product or seller review.uIn principle, these data could be linked to demographics, advertising exposure, social media activity, offline spending or credit history uExample 2. There has been a parallel evolution in business activity. As firms have moved their day to day operations to com

9、puters then online, it has become possible to compile rich datasets of sales contacts, hiring practices, and physical shipments of goods. Increasingly, there are also electronic records of collaborative work efforts, personnel evaluations and productivity measures. uSame story can be told about the

10、public sector uThis is a lot of data. Whats exactly new about it?uData is now available faster, has greater coverage and scope and includes new types of observations and measurements that previously were not available. uA key aspect of such modern datasets is that they have much less or more structu

11、re than the traditional datasetsuData is available in real timeThe ability to capture and process large amount of data in real time is crucial for many business applications, but has not been used much in economic research and policy analysis. Perhaps this is because many economic questions are retr

12、ospective so that it is important for data to be detailed and accurate rather than available immediately. This may change in the future. uData is available in large scale A major change for economists is the scale of the modern datasets. Before, we worked with data with hundreds or thousands observa

13、tions and few variables. With small samples, statistical power was an important issue; the omitted variable bias was also a concern. Now datasets with tens of millions of observations and huge number of variables are common. Statistical power is no longer an issue. uData come with less structure. Th

14、e information available about a consumer may include her entire shopping history. With this information, it is possible to create an almost unlimited set of individual characteristics. While this is very powerful, it is also challenging. We are familiar with “rectangular” form of data with N observa

15、tions and K variables。uK is a lot smaller than N. uWhen data arrive in its raw form of digital recording of a sequence of events, with no further structure, there are a huge number of ways to move from that recording to the standard “rectangular” format. Figuring out how to organize unstructured dat

16、a and reduce its dimensionality and assessing whether the way we do this matters is not something we are capable of doing. uData is available on novel types of variables. Much of the data now being recorded is on activities that previously were very difficult to observe. Email or geo-location data r

17、ecords where people have been. Social network data captures personal connections. Most economists believe that social connections play an important role in job search, in shaping consumer preferences and in the transmission of information. The challenge is in figuring out how to make effective use o

18、f these data, which may have novel structures. Traditional econometrics assume cross sectional independence or grouped as in panel data or linked by time. But individuals in a social network may be connected in highly complex ways. Indeed the point of econometric modeling may be to uncover exactly w

19、hat are the key features of this dependence structure. Developing methods that are suited to these settings is an interesting challenge for an econometric research. uThe most common uses of big data are tracking business processes and outcomes, and for building a wide array of predictive models. Whi

20、le business analytics is a big deal and surely has improved the efficiency of many organizations, predictive modeling lies behind many of the information products and services introduced in recent years . uAmazon and Netflix recommendations rely on predictive model of what book or movie an individua

21、l might want to purchase uGoogle (Baidu) search results and news feed rely on algorithms that predict the relevance of particular web pages or articles uApples auto-complete tries to predict the rest of ones text or email. uOnline advertising and marketing rely on automated predictive models that at

22、tempt to target individuals who most likely to respond to offersuIn health care, it is common for insurers to adjust payments and quality measures based on “risk scores”, which are derived from predictive models of individual health costs and outcomes. An individuals risk score is a weighted sum of

23、health indicators that identify whether an individual has different chronic conditions. uCredit card companies use predictive models of default and repayment to guide their underwriting, pricing and marketing activities uBanks use predictive models of deposit and withdrawal to manage their cash hold

24、ingsuCompanies use predictive models of demand to schedule production and manage inventory and supply chain uPredictive models can also be used to detect fraudulent activities and to manage risk uAll these applications rely on converting large amount of unstructured data into “vertical” or predictiv

25、e scores, often in a fully automated and scalable way, and sometimes in real time. The scores can be used in various ways. First, they can speed up or automate the existing processes (Amazon recommendation recommends items that it predicts to be relevant for a given consumer, replacing a recommendat

26、ion one could have obtained from a libarian ). uSecond, they can be used to offer new services (Apple auto-complete takes the word or sentence with the top score and proposes it as the auto-completion). uFinally, the scores can be used to support decision making (credit card fraud; the transaction s

27、core is reported to the issuing bank , and most banks implement some policy that dictates which transaction scores are approved, which are rejected, which need further investigation ) uData on tracking business processes and outcomes can be used to improve efficiencyuTargeted pricing uUse stock tran

28、saction data to arbitrage uThere has been a remarkable amount of work on the statistical and machine learning techniques that underlie these applications: classification models, lasso and ridge regressions, data mining, text mining, etc. uA conceptual overview of building predictive models: N observ

29、ations and K variables; K is very large, often larger than N. With these types of data, we often get perfect fit within sample but poor prediction out of sample. Solution? Lasso uMachine learning models assume stable environment; but this assumption may not be satisfied if individuals respond to the

30、 change (Lucas critique) uGovernment collect or could collect a large amount of detailed micro-level data. These data can be used for tracking economic activities, evaluating policies, fighting fraud, risk control, support decision making and for developing new information services and products (ale

31、rting fraud, inform consumers about the consequence of their decisions such as taking out loans, purchasing houses and retirement decisions) uBig data provides a detailed snapshot of economic activity (almost) in real time. Therefore, big data allow for better measurements of economic effects and ou

32、tcomes, help to pose new sorts of research questions and enable novel research designs that can inform us about the consequences of different economic policies and eventsuBig data may change the way economists approach empirical questions and the tool they use to answer them uFor example, economists

33、 have not embraced some of the data mining tools. The reason is that economists do not want to shift away from the single covariate causal effects framework. In the mind of economists, there is a sharp distinction between predictive modeling and causal inference, and as a result statistical learning

34、 approaches have little to contribute uBig data may change that. uBig data enable novel research design uExample 1. Chetty et al (2012) studies the long term effects of better teaching. The study combines 2.5 million NY schoolchildren with their earnings as adults 20 years later. The main question i

35、s if the students of teachers who have higher “value-added” in the short run subsequently have higher earnings as adults, where teachers value added is measured by the amount that test scores are improved.The results are striking. The authors find that replacing a teacher in the bottom 5%with an ave

36、rage teacher raisesthe lifetime earnings of students bya quarter of a million dollars in present value terms.u Example 2. Internet commerce. Use detailed browsing and purchase data on the universe eBay customers (100 millions in the United States alone) to study the sales taxes on internet commerce.

37、 Aggregated data on state-to-state trade flows provide relatively standard estimates of tax elasticities, but we also use the detailed browsing data to obtain more micro-level evidence on tax responsiveness. Specifically, we find groups of individuals who clicked to view the same item, some ofwhom w

38、ere located in thesame state as the seller, and hence taxed, and some of whom were not, and hence went untaxed. We compare the purchasing propensities of the two groups, doing this for many thousands of items and millions of browsing sessions. We find significant tax responsiveness, and evidence of sub

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论