2019年数据科学从业者薪酬报告ppt课件_第1页
2019年数据科学从业者薪酬报告ppt课件_第2页
2019年数据科学从业者薪酬报告ppt课件_第3页
2019年数据科学从业者薪酬报告ppt课件_第4页
2019年数据科学从业者薪酬报告ppt课件_第5页
已阅读5页,还剩43页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、2019 Data Science Salary SurveyTools, Trends, What Pays (and What Doesnt) for Data ProfessionalsJohn King & Roger MagoulasMake Data WorkstrataconfPresented by OReilly and Cloudera, Strata + Hadoop World helps you put big data, cutting-edge data science, and new business fundamentals to work.Lear

2、n new business applications of data technologiesDevelop new skills through trainings and in-depth tutorialsConnect with an international community of thousands who work with dataJob # D2044BeijingSan JoseLondonNew YorkSingapore2019 Data Science Salary SurveyTools, Trends, What Pays (and What Doesnt)

3、for Data ProfessionalsJohn King & Roger Magoulas2019 Data Science Salary Survey . . . . . . . . . . . . . . . . . . . . . 1Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Factors

4、that Influence Salary: The Regression Model . . . . . . . . . . . 5How You Spend Your Time . . . . . . . . . . . . . . . . . . . . . . . . . 16The Impact of Tool Choice . . . . . . . . . . . . . . . . . . . . . . . . . . 22The Relationship Between Tools and Tasks: Clustering Respondents . 31 Wrappin

5、g Up: What to Consider Next . . . . . . . . . . . . . . . . . . . 37Appendix A: Full Cluster Profiles . . . . . . . . . . . . . . . . . . . . . . 38Appendix B: The Regression Model . . . . . . . . . . . . . . . . . . . . 422019 DATA SCIENCE SAL ARY SURVEY2019 DATA SCIENCE SAL ARY SURVEYTable of Cont

6、entsVOVER 900 OVER 900 RESPONDENTSRESPONDENTSFROM A FROM A VARIETY OF VARIETY OF INDUSTRIES INDUSTRIES COMPLETED COMPLETED THE SURVEYTHE SURVEY2019 DATA SCIENCE SAL ARY SURVEY2019 DATA SCIENCE SAL ARY SURVEYTHE RESEARCH IS BASED ON DATA collected THE RESEARCH IS BASED ON DATA collected through an on

7、line 64-question survey, including through an online 64-question survey, including demographic information, time spent on specific demographic information, time spent on specific data-related tasks, and the use/non-use of a data-related tasks, and the use/non-use of a broad range of software tools.b

8、road range of software tools.2019 DATA SCIENCE SAL ARY SURVEYIN THIS FOURTH EDITION of the OReilly Data Science IN THIS FOURTH EDITION of the OReilly Data Science Salary Survey, weve analyzed input from 983 Salary Survey, weve analyzed input from 983 respondents working in the data space, across a r

9、espondents working in the data space, across a variety of industriesvariety of industries representing 45 countries and representing 45 countries and 45 US states. Through the results of our 64-question 45 US states. Through the results of our 64-question survey, weve explored which tools data scien

10、tists, survey, weve explored which tools data scientists, analysts, and engineers use, which tasks they engage analysts, and engineers use, which tasks they engage in, and of coursein, and of coursehow much they make.how much they make.Key findings include:Key findings include:Python and Spark are a

11、mong the tools that contribute Python and Spark are among the tools that contribute most to salary.most to salary.Among those who code, the highest earners are the Among those who code, the highest earners are the ones who code the most.ones who code the most.SQL, Excel, R and Python are the most co

12、mmonly SQL, Excel, R and Python are the most commonly used tools.used tools.Those who attend more meetings, earn more.Those who attend more meetings, earn more.Women make less than men, for doing the same Women make less than men, for doing the same thing.thing. Country and US state GDP serves as a

13、decent proxy for geographic salary variation (not as a direct estimate, but as an additional input for a model). The most salient division between tool and tasks usage is between those who mostly use Excel, SQL, and a small number of closed source toolsand those who use more open source tools and sp

14、end more time coding. R is used across this division: even people who dont code much or use many open source tools, use R. A secondary division emerges among the coding half separating a younger, Python-heavy data scientist/analyst group, from a more experienced data scientist/engineer cohort that t

15、ends to use a high number of tools and earns the highest salaries. To see our complete model and input your own metrics to predict salary, see Appendix B (but bewaretheres a trans- formation involved: dont forget to square the result!). 1Executive Summary2019 DATA SCIENCE SAL ARY SURVEYnon-US respon

16、dents and respondents aged 30 or younger. Three-fifths of the sample came from the US, and these respondents had a median salary of $106K.Understanding Interquartile RangeFor a number of survey questions, we show graphs of answer shares and the median salaries of respondents who gave particular answ

17、ers. While median salary is probably the best number to compare how much two groups of people make, it doesnt say anything about the spread or variation of salaries. In addition to median, we also show the interquartile range (IQR) two numbers that delineate salaries of the middle 50%. This range is

18、 not a confidence interval, nor is it based on standard deviations.As an example, the IQR for US respondents was $80K to$138K, meaning one quarter of US respondents had salaries lower than $80K and one quarter had salaries higher than$138K. Perhaps more illustrative of the value of the IQR is compar

19、ing the US Northeast and Midwest: the Northeast has a higher median salary ($105K vs. $98K) but the third quartileFOR THE FOURTH YEAR RUNNING, we at OReilly FOR THE FOURTH YEAR RUNNING, we at OReilly Media have collected survey data from data scientists, Media have collected survey data from data sc

20、ientists, engineers, and others in the data space, about their engineers, and others in the data space, about their skills, tools, and salary.skills, tools, and salary.Across our four years of data, many key trends are Across our four years of data, many key trends are more or less constant: median

21、salaries, top tools, and more or less constant: median salaries, top tools, and correlations among tool usage. For this years correlations among tool usage. For this years analysis, we collected responses from September analysis, we collected responses from September 2019 to June 2019, from 983 data

22、 professionals.2019 to June 2019, from 983 data professionals.In this report, we provide some different approaches In this report, we provide some different approaches to the analysis, in particular conducting clustering on to the analysis, in particular conducting clustering on the respon- dents (n

23、ot just tools). We have also the respon- dents (not just tools). We have also adjusted the linear model for improved accuracy, adjusted the linear model for improved accuracy, using a square root transform and publicly available using a square root transform and publicly available data on geographic

24、al variation in economies. The data on geographical variation in economies. The survey itself also included new questions, most survey itself also included new questions, most notably about specific data-related tasks and any notably about specific data-related tasks and any change in salary.change

25、in salary.Salary: The Big PictureSalary: The Big PictureThe median base salary of the entire sample was The median base salary of the entire sample was $87K. This figure is slightly lower than in previous $87K. This figure is slightly lower than in previous years (last year it was $91K), but this di

26、screpancy is years (last year it was $91K), but this discrepancy is fully attributable to shifts in demographics: this years fully attributable to shifts in demographics: this years sample had a higher share ofsample had a higher share of2 2Introduction05%10%15%0K20K40K60K80K100K120K140K160K180K200K

27、200KBase Salary (US DOLLARS)BASE SALARYBASE SALARYShare of Share of RespondentsRespondentsShare of respondents2019 DATA SCIENCE SAL ARY SURVEYin places with stronger economies, wages are less likely to stagnate.Assessing Your SalaryTo use the model for you own salary, refer to the full model in Appe

28、ndix B, and add up the coefficients that apply to you. Once all of the constants are added, square the result for a final salary estimate (note: the coefficients are not in dollars). The contribution of a particular coefficient to the eventual salary estimate depends on the other coefficients: the h

29、igher the salary, the higher the contribution of each coefficient.For example, the salary difference between a junior data sci- entist and a senior architect will be greater in a country with high salaries than somewhere with lower salaries.cutoffs are $133K for the Northeast and $138K for the Mid-

30、west. This indicates that there is generally more variation in Midwest salaries, and that among top earnerssalaries might be even higher in the Midwest than in the Northeast.How Salaries ChangeWe also collected data on salary change over the last three years. About half of the sample reported a 20%

31、change, and the salary of 12% of the sample doubled. We attempted to model salary change with other variables from the survey, but the model performed much more poorly, with an R2of just 0.221. Many of the same significant features in the salary regression model also appeared as factors in predicted

32、 salary change: Spark/Unix, high meeting hours, high coding hours, and buildingprototype models, all predict higher salary growth, while using Excel, gender dispar- ity, and working at an older company predict lower salary growth. Geogra- phy also correlated positively with salarychange, meaning tha

33、tPERCENTAGE CHANGE IN SALARY OVER LAST THREE YEARSSHARE OF RESPONDENTS5%5%NA NA (SALARY (SALARY WAS WAS ZERO)ZERO)5%5%NEGATINEGATIVE VE CHANGCHANGE E14%14%NO NO CHACHANGENGE11%11%+0% +0% TO TO +10%+10%13%13%+10% +10% TO TO +20%+20%8%8%+20% +20% TO TO +30%+30%8%8%+30% +30% TO TO +40%+40%8%8%+40% +40%

34、 TO TO +50%+50%+50% TO +75%9%9%+75% TO +100% (DOUBLE)7%7%6%6%+100% +100% TO TO +200% +200% (TRIPL(TRIPLE)E)6%6%OVEOVER R TRIPTRIPLELE42019 DATA SCIENCE SAL ARY SURVEYsented by only one or two respondents, this isnt enough to jus- tify giving the country its own coefficient. For this reason, we use b

35、road regional coefficients (e.g., “Asia” or “Eastern Europe”), keeping in mind however that economic differences within a region are huge, and thus the accuracy of the model suffers.To get around this problem, weve used publicly available records of per capita GDP of countries and US states. While G

36、DP itself doesnt translate to salary, it can serve a proxy function for geographic salary variation. Note that we use per capita GDP on the state and country level; therefore the model is likely to produce an inaccurate estimate with GDP figures for smaller geographic units.Two exceptions were made

37、to the GDP data before incorporat- ing it into the model. The per capita GDP of Washington DCis $181Kmuch greater than in neighboring Virginia ($57K) and Maryland ($60K). Many (if not most) data science jobs in Maryland and Virginia are actually in the greater DC metropoli- tan area, and the survey

38、data suggest that average data science salaries in these three places are not radically different from each other. Using the true $181K figure would produce gross5WE HAVE INCLUDED OUR FULL regression model in WE HAVE INCLUDED OUR FULL regression model in Appendix B. For this years report, we have ma

39、de two Appendix B. For this years report, we have made two important changes to the basic, parsimonious linear important changes to the basic, parsimonious linear model we presented in the 2019 report. We have model we presented in the 2019 report. We have included: 1) external geographic data (GDP

40、by US included: 1) external geographic data (GDP by US state and country), and 2) a square root state and country), and 2) a square root transformation. The transformation adds one step to transformation. The transformation adds one step to the linear model: we add up model coefficients, and the lin

41、ear model: we add up model coefficients, and then square the result. Both of these changes then square the result. Both of these changes significantly improve the accuracy in salary estimates.significantly improve the accuracy in salary estimates.Our model explains about three-quarters of the Our mo

42、del explains about three-quarters of the variance in the sample salaries (with an R2 of 0.747). variance in the sample salaries (with an R2 of 0.747). Roughly half of the salary variance is due to Roughly half of the salary variance is due to geography and experience. Given the important geography a

43、nd experience. Given the important factors that can not be captured in the surveyfactors that can not be captured in the survey for for example, we dont measure competence or evaluate example, we dont measure competence or evaluate thethequality of respondents work outputquality of respondents work

44、outputits not its not surprising that a large amount of variance is left surprising that a large amount of variance is left unexplained.unexplained.Impact of GeographyImpact of GeographyGeography has a huge impact on salary, but is not Geography has a huge impact on salary, but is not adequately cap

45、tured due to sample size. For example, adequately captured due to sample size. For example, if a country is repre-if a country is repre-Factors that Influence Salary: The Regression Model*The interquartile range (IQR ) is the middle 50% of respondents salaries. One quarter of respondents have a sala

46、ry below this range, one quarter have a salary above this range.0K50K100KRange/Median150KUnited States Europe (except UK/I)AsiaUK/IrelandCanada Australia/NZLatin AmericaAfricaSALARY MEDIAN AND IQRC (US SALARY MEDIAN AND IQRC (US DOLLARS)DOLLARS)61%61%UNITUNITED ED STATSTATESES2%2%LATIN LATIN AMERAME

47、RICAICA8 8%UUK/IK/IRERELALANDND15%15%EUROPE EUROPE (EXCEPT (EXCEPT UK/I)UK/I)8 8%ASASIAIA2%2%AUSTAUSTRALIRALIA/NZA/NZ1 1%AFAFRIRICACA3%3%CACANANADADAWORLD WORLD REGIONREGIONRegionSHARE OF SHARE OF RESPONDENTSRESPONDENTSUS REGIONSHARE OF RESPONDENTS6 6%TETEXASXASSASALARY MEDIAN AND IQR LARY MEDIAN AN

48、D IQR (US DOLLARS)(US DOLLARS)CaCalifornia Northeast lifornia Northeast Midwest Mid-AtlanticMidwest Mid-AtlanticSoSouth Pacific NWuth Pacific NWTeTexasxasS SW/MountainW/Mountain050K100KRange/Median150K200K22%22%CALICALIFORFORNIANIA8%8%PACIPACIFIC FIC NWNW1010%SOSOUTUTHH5%5%SW/MSW/MOUNTOUNTAINAIN16%1

49、6%MIDMIDWESWEST T13%13%MID-MID-ATLANATLANTICTIC20%20%NORTNORTHEASHEAST TRegion2019 DATA SCIENCE SAL ARY SURVEYWe also asked respondents to rate their bargaining skills on a scale of 1 to 5, and those who gave higher self-evalua- tions tended to have higher salaries. The difference in salary between

50、two data scientists, one with a bargaining skill “1”and the other with “5”, with otherwise identical demograph- ics and skills, is expected to be $10K$15K.Finally, in terms of work-life balance, our results show that once you are working beyond 60 hours, salary estimates actually go down.overestimat

51、es for DC salaries, and so the per capita GDP figure for DC was replaced with that of Maryland, $60K.The other exception is California. In all of the salary surveys we have conducted, California has had the highest median salary of any state or country, even though its per capita GDP ($62K) is not r

52、anked so high (nine states have higher per capita GDPs, as do two countries that were represented in the sample, Switzerland and Norway). The anomaly is likely due to the San Francisco Bay Area, where, depending on how the region is defined, per capita GDP is $80K$90K. As a major tech center, the Ba

53、y Area is likely overrepresented in the sample, meaning that the geographic factor attributable to California should be pushed upward; an appropriate compromise was $70K.Considering GenderThere is a difference of $10K between the median salaries of men and women. Keeping all other variables constant

54、same roles, same skillswomen make less than men.Age, Experience, and IndustryExperience and age are two important variables that influence salary. The coefficient for experience (+3.8) translates to an increase of $2K$2.5K on average, per year of experience. As for age, the biggest jump is between p

55、eople in their early and late 20s, but the difference between those aged 3165 and those over 65 is also significant.30K60K90KRange/Median120K150KFemaleMaleGENDERGENDERSHARE OF SHARE OF RESPONDENTSRESPONDENTS0204060SALARY MEDIAN AND IQR (US DOLLARS)GenderGender80FemaleMale8AGEUNDER 3131 - 40OVER 6051

56、 - 6041 - 50AgeRange/MedianSALARY MEDIAN AND IQR (US SALARY MEDIAN AND IQR (US DOLLARS)DOLLARS)YEARS OF EXPERIENCE (in your YEARS OF EXPERIENCE (in your field)field)SHARE OF RESPONDENTSSHARE OF RESPONDENTSYears100KRange/Median42%42% 5 YEARS 20 YEARS 20 YEARS050K150K200K 20SALARY MEDIAN AND IQR (US S

57、ALARY MEDIAN AND IQR (US DOLLARS)DOLLARS)SELF-ASSESSED BARGAINING SKILLS (1 Being Poor, 5 Being SELF-ASSESSED BARGAINING SKILLS (1 Being Poor, 5 Being Excellent)Excellent)SHARE OF RESPONDENTSSHARE OF RESPONDENTSSkill Level100KRange/Median31%31%9%9%6%6%18%18%3535%Poor - Poor - 1 1Excellent - Excellen

58、t - 5 52 23 34 4050K150K200K(Poor) 1234(Excellent) 5SALARY MEDIAN AND IQR (US SALARY MEDIAN AND IQR (US DOLLARS)DOLLARS)EASE OF FINDING A NEW EASE OF FINDING A NEW ROLEROLESHARE OF RESPONDENTSSHARE OF RESPONDENTSEase of Finding Work90KRange/MedianVery difficult - Very difficult - 1 1Very easy - Very

59、 easy - 5 52828%2 2%9 9%3 323%23%36%36%2 24 430K60K120K150K234(Very easy) 5(Very di阳cult) 1SALARY MEDIAN AND IQR (US SALARY MEDIAN AND IQR (US DOLLARS)DOLLARS)COMPANY AGECOMPANY AGESHARE OF SHARE OF RESPONDENTSRESPONDENTSCampany Age60K90KRange/Median030K120K150K 20 years4%4% 2 YEARS 20 YEARS 20 YEAR

60、S2 - 25 EMPLOYEES1 EMPLOYEE26 - 100EMPLOYEES101 - 500EMPLOYEES501 - 1,000EMPLOYEES1,001 - 2,500EMPLOYEES15%15%2,501 2,501 - - 10,0010,000 0EMPLEMPLOYEEOYEES S28%28%10,010,000+00+EMPEMPLOYLOYEESEESRange/MedianCompany Size 30 HOURS/WEEKRange/Median30 - 35 HOURS/WEEK36 - 39 HOURS/WEEK30%30%40 40 HOURHOURS/WEE

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论