市场研究中的统计技术培训资料

上传人：扣*** IP属地：宁夏上传时间：2021-11-03 格式：PPT 页数：67 大小：197KB 积分：18 举报 版权申诉

已阅读5页，还剩62页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、copyright 2008 ciic &comr- 1 -statistics is a diverse body of theory and application which ranges from simple averages to complex statistical modeling andmultivariate analysis. it plays a key role in marketing research from project design through analysis and interpretation of data.backgroundcop

2、yright 2008 ciic &comr- 2 -backgroundthough frequently seen as an esoteric and intimidating discipline, the basic concepts in statistics require no more than high school algebra to master. and, like many other skills, hands-on experience is the best teacher (!).our focus in this seminar will be

3、on fundamental concepts and methods which have the widest application in marketing research. we will avoid complex theory and aim to provide the background necessary for you to begin applying these concepts on the job.copyright 2008 ciic &comr- 3 -to provide a practical working knowledge of fund

4、amental statistical concepts and methods in order to:1) help you to analyze and interpret quantitative marketing research data; and,2) to broaden client service skills.objectives:copyright 2008 ciic &comr- 4 -i. basic definitionsii. samplingiii.types of dataiv.summarizing datav. inferential stat

5、isticssyllabus:copyright 2008 ciic &comr- 5 - variablea quantity which is free to vary, e.g., purchase interestrating univariate analysisthe investigation of one variable at a time, e.g., mean purchase interest rating in a product test bivariate analysisthe investigation of the relationship betw

6、een twovariables, e.g., the correlation between an attribute rating and purchase interesti. basic definitionscopyright 2008 ciic &comr- 6 - multivariate analysisthe investigation of the interrelationships among several variables, e.g., the joint relationship between 15 attribute ratings and purc

7、hase interest population (universe)all objects (e.g., consumers) in the group of interestex:- all male beer drinkers in their 30s living in japan- all japanese housewives aged 25-49 who have purchased canned condensed soup in the past 3 monthscopyright 2008 ciic &comr- 7 -sampleselected subset o

8、f populationex:- 200 male beer drinkers in their 30s- 500 housewives aged 25-49 who have purchasedcanned condensed soup in the past 3 monthscensus and sample surveya census is the gathering of information about allmembers of a population (e.g., survey of all acnielsenemployees). a sample survey is t

9、he gathering ofinformation about a selected subset of the population (e.g., random sample of 400 acnielsen employees). copyright 2008 ciic &comr- 8 -sampling errorthe deviation of a figure obtained from a samplefrom the true (i.e., population) valueinferential statisticsused to generalize or mak

10、e inferences about a populationfrom a sample.ex:a market research survey of 500 housewives aged 25-49 finds that they are more frequent buyers of dry soup than of condensed soup; how likely is this to be true forall housewives aged 25-49 in japan?copyright 2008 ciic &comr- 9 -parameters and stat

11、isticsparameters are numbers used to describe a populationnumbers used to describe a sample are called statisticsobjects/casesin marketing research, these usually refer to respondents,sometimes brands.raw data and aggregate dataraw data are case or object level data, e.g., data for eachrespondent. a

12、ggregate data are data which have been grouped in some way, and often consist of percentagesand means.copyright 2008 ciic &comr- 10 -independent and dependent variablesan independent variable may sometimes be viewed as a cause and a dependent variable as an effect. moregenerally, independent var

13、iables (e.g., age, income) areused to better understand or predict dependent variables(e.g., purchase likelihood).ex:purchase interest in a new product concept differs byrespondent age; in this example, age is the independentvariable and purchase interest the dependent variablesince age could effect

14、 purchase interest but not the otherway around.copyright 2008 ciic &comr- 11 -statistical notationsome of the notational conventions used in statistics are:n:number of objects/cases (e.g., respondents) in thepopulation n:number of objects in the sample (i.e., sample size) pi:the greek letter pie

15、; percentage or proportion corresponding to group i :the greek letter mu; the population arithmetic meanx:the sample arithmetic mean (x bar):the greek letter sigma; population standard deviation s:sample standard deviation:the greek capital letter sigma; the sum of a seriesof numbers*computer symbol

16、 for multiplication: a*b = a x bcopyright 2008 ciic &comr- 12 -iii.types of datathere are two main types of data which can be further subdivided into two categories each. different types of statistical procedures are appropriate for each type of data.non-metric- nominal- ordinalmetric- interval-

17、 ratioa basic understanding of the concepts which follow is importantfor good questionnaire design.copyright 2008 ciic &comr- 13 -nominalthe lowest form of data in terms the information it provides.examples are male/female, tokyo/osaka, user/non-user.no ranking or order of data is presumed (e.g.

18、, frequency of use).usually expressed in percentages or frequencies.ordinalan ordered category. ordinal data indicates whether an object has more or less of a characteristic than another object, but not how muchmore. examples are age groups and heavy/medium/light usage of a product category.medians,

19、 ranks and percentiles can be computed on ordinal data.copyright 2008 ciic &comr- 14 -intervaldata are measured in constant units. an example is a numeric rating scale, where 5-4 = 4-3 = 3-2 = 2-1. the unit of measurementis 1.there is no true zero, however; fahrenheit and celsius temperaturescal

20、es are interval. one cannot say that 50 is twice as hot as 25because the zero on either scale is arbitrary.ex:30 celsius = 86 fahrenheit15 celsius = 59 fahrenheit, not 43copyright 2008 ciic &comr- 15 -in actuality, most rating scales used in marketing research lie somewhere between ordinal and i

21、nterval. for example, can we really say that the difference between very much want to buy and want to buy” is the same as the difference between want to buy and cant say either way?if the data are judged reasonably close to being interval, it is acceptable to compute means and to treat them as inter

22、val for analysis. statistical procedures designed for ordinal data (non-parametric) and those designed for interval data (parametric) frequently yield similar results.copyright 2008 ciic &comr- 16 -ratiothis is the highest form of data. possesses all properties of intervaldata and has a true zer

23、o.a kelvin scale is ratio; so are age and income when not categorized.copyright 2008 ciic &comr- 17 -some guidelines1)many significance tests, such as the t-test, assume that the dataare interval or ratio. while it is not uncommon in practice to employ these methods when the data are non-metric,

24、 strictly speaking, non-parametric tests such as the kruskal-wallis test are more appropriate.copyright 2008 ciic &comr- 18 -2)weights are often assigned to frequency of usage/purchase data in order to compute means, as in the example below:frequency consume coffee weighttwice a day or more (3.0

25、)once a day (1.0)3-5 times a week (0.3)1-2 times a week (0.2)less than once a week (0.1)weights such as these are often quite arbitrary and the resulting means only rough approximations. in such cases, it may be preferable to treat the data as ordinal rather than interval.copyright 2008 ciic &co

26、mr- 19 -iv. summarizing datafrequency distributionone of the most useful means of summarizing data.can be represented in tabular form, e.g.,respondent age n %teens 75 1520s100 2030s125 2540s100 2050s 75 1560s 25 5total500100or in graphic form.copyright 2008 ciic &comr- 20 -051015202530teens20s30

27、s40s50s60shistogramcopyright 2008 ciic &comr- 21 -shape of frequency distributionnormal distributionthis plays a key role in many statistics.many parametric inferential statistics assume the population isat least approximately normally distributed.severe departures from normality can invalidate

28、descriptive statisticssuch as means and standard deviations.copyright 2008 ciic &comr- 22 -example of a normal distribution 1copyright 2008 ciic &comr- 23 -example of a normal distribution 2copyright 2008 ciic &comr- 24 -example of a normal distribution 3copyright 2008 ciic &comr- 25

29、 -departures from normalityskewnesswhen a distribution is asymmetrical, it is skewed.if the distribution leans to the left and the longer tail points to the right, it is positively skewed. on the other hand, if it leans to theright and the longer tail points to the left, it is negatively skewed.copy

30、right 2008 ciic &comr- 26 -positively skewed distributioncopyright 2008 ciic &comr- 27 -negatively skewed distributioncopyright 2008 ciic &comr- 28 -kurtosiswhen the tails are unusually fat or unusually thin, the distributionis said to be kurtotic.copyright 2008 ciic &comr- 29 -examp

31、le of platykurtic distributioncopyright 2008 ciic &comr- 30 -example of leptokurtic distribution copyright 2008 ciic &comr- 31 -measures of central location (center of data)averages3 kinds of averages are typically used in marketing research:- arithmetic mean- median- modeif a distribution i

32、s symmetrical, the mean, median and mode areall the same.copyright 2008 ciic &comr- 32 -example of symetrical distributionmean, median, modecopyright 2008 ciic &comr- 33 - example of asymetricalpositively skewed distributionmode median meancopyright 2008 ciic &comr- 34 -arithmetic meanth

33、e most commonly-used average for metric data; it is calculated as follows:x = x nex:the mean of 5, 2, 1, 3 is 5 + 2 + 1 + 3/4 = 2.75copyright 2008 ciic &comr- 35 -means of grouped data may also be estimated by using the following formula:x*w/w,where x is the interval midpoint and w (weight) is t

34、he frequency or (percent).copyright 2008 ciic &comr- 36 -ex:respondent age x n %teens14.5 75 1520s24.5100 2030s34.5125 2540s44.5100 2050s 54.5 75 1560s64.5 25 5total500100copyright 2008 ciic &comr- 37 -x = (75*14.5) + (100*24.5) + (125*34.5) + (100*44.5) + (75*54.5) + (25*64.5)/500 = 36years

35、or, if percentages are used as the weights:x = (15*14.5) + (20*24.5) + (25*34.5) + (20*44.5) + (15*54.5) + (5*64.5)/100 = 36 yearscopyright 2008 ciic &comr- 38 -2 drawbacks of the arithmetic mean, however, are:it is sensitive to extreme values (outliers), especially when the number of data point

36、s is small. in the earlier hypothetical series of numbers (5, 2, 1, 3), 5 appears to be an outlier. if 3 is substituted for 5, the mean of these numbers decreases from 2.75 to 2.25:3 + 2 + 1 + 3/4 = 2.25a second disadvantage is more general; the mean may be misleading when the data distribution is n

37、on-normal.copyright 2008 ciic &comr- 39 -example of bi-modal distributionmode mean mode mediancopyright 2008 ciic &comr- 40 -example of uniform distributionmean, median, mode identicalcopyright 2008 ciic &comr- 41 -example of u-shaped distributionmode mean mode mediancopyright 2008 ciic

38、&comr- 42 -medianthe middle value of ordered data.appropriate for ordinal, interval, and ratio data.computational procedure:first, rank the data from smallest value to largest value.then, find the position of the middle value with the followingformula: x = n + 1 2where n is the number of data po

39、ints.copyright 2008 ciic &comr- 43 -ex:there are 11 data points (numbers) in the following ranked data set:6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19the median is the sixth largest (or smallest) number (11 + 1/2 = 6) :6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19note that the above data set has an odd nu

40、mber of values (n=11). when there is an even number of data points, there will be two middle values. in these instances, the median is the arithmetic meanof the two middle values.copyright 2008 ciic &comr- 44 -ex:consider the median of the following data set with 12 data points:6, 9, 9, 10, 11,

41、11, 12, 12, 13, 16, 19, 30.the median lies between the 6th and 7th largest (or smallest) value(12 + 1/2 = 6.5):6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19, 30 or 11.5notice that the addition of an outlier (30) had little impact on themedian. by contrast, the mean of the first data set is 11.64 and theme

42、an of the second data set is 13.2. this is a major reason for using the median rather than the arithmetic mean.copyright 2008 ciic &comr- 45 -however, the median uses less information about the data than does the mean and is often less in formative.statistical procedures (non-parametric methods)

43、 developed for the median are generally less flexible and informative than those which analyze means.medians can also be calculated from grouped data.copyright 2008 ciic &comr- 46 -ex:respondent age x n(w) %teens14.5 75 1520s24.5100 2030s34.5125 2540s44.5100 2050s54.5 75 1560s64.5 25 5total50010

44、0there is an even number of interval midpoints (6), thus the medianmust lie somewhere between the 3rd and 4th values:n + 1/2 = 3.5; 34.5 and 44.5 are the two middle categories.copyright 2008 ciic &comr- 47 -the (weighted) mean of 34.5 and 44.5 is(125*34.5) + (100*44.5)/225or 38.9.copyright 2008

45、ciic &comr- 48 -modethe mode is closest in meaning to the laymanss term, average - that is typical. it is very commonly used in marketing research but rarely referred to by name.it is simply the most frequent value.ex:brand x (33%) leads in terms of p3m purchase, followed by brand y (21%) and br

46、and z (17%). the mode of these data is 33%.ex:the mode of the following data set,6, 9, 10, 11, 15, 16, 16, 16, 20is 16.copyright 2008 ciic &comr- 49 -modes can also be obtained from grouped data.ex:respondent age x n(w) % teens14.5 75 1520s24.5 100 2030s34.5 125 2540s44.5 100 2050s54.5 75 1560s6

47、4.5 25 5total 500 100here, the modal age group is 30-39 and the modal age is the midpoint 34.5 of this age range.copyright 2008 ciic &comr- 50 -the major disadvantages of the mode are:-it does not lend itself well to inferential statistical methods.-sometimes, there is no distinct mode. consider

48、 the following data:brandp1m purchase a 29% b 28% c 27%there is no meaningful difference in p1m purchase among the three brands.copyright 2008 ciic &comr- 51 -measures of dispersion (spread of data)the most commonly-used measures of dispersion are:- variance- standard deviation- range- percentil

49、es, quartiles, quintiles, terciles (ntiles)- inter-quartile range copyright 2008 ciic &comr- 52 -variance and standard deviationtwo of the most widely-used statistics and play a role in most parametric statistical procedures.they are related to one another in the following way: the standard devi

50、ation is the square root of the variance and the variance the square of the standard deviation. copyright 2008 ciic &comr- 53 -formula:for sample for populationvariance:s = (x - x) = (x - m) n-1 nstandarddeviation:s = (x - x) = (x - m) n-1 n s = s, = note that the term n-1 for the sample statist

51、ics is known as the degrees of freedom.copyright 2008 ciic &comr- 54 -ex:the standard deviation of the hypothetical sample data below is computed as follows:9, 8, 5, 11, 7, 5compute mean:x = 9 + 8 + 5 + 11 + 7 + 5 = 7.5 6copyright 2008 ciic &comr- 55 -calculate squared deviations from the me

52、an:x xx-x (x-x)97.5 1.5 2.2587.5 0.5 0.2557.5-2.5 6.25 117.5 3.5 12.2577.5 0.5 0.2557.5-2.5 6.25 45 0 27.50copyright 2008 ciic &comr- 56 -substitute (x - x) and n into formula:s = 27.5 = 2.3 6-1 s = 2.3 = 5.29the term (x - x) is known as the sum of squares.copyright 2008 ciic &comr- 57 -some

53、 uses of standard deviation and varianceaverages tell us about the center of a distribution, but nothingabout its spread. in a peaked distribution, most observations willfall close to the average. in a flat distribution, on the other hand, the average may have little meaning.if the distribution of t

54、he data is approximately normal, about 68%of the observations lie within +1 standard deviation of the mean and about 95% within + 1.96 standard deviations of the mean.copyright 2008 ciic &comr- 58 -z scores (“standard scores”) can be computed so that different types of scales can be compared. fo

55、r example, ratings collected from a 5-point scale and those collected from a 7-point scale can be analyzed by expressing each respondents ratings in terms of standard deviation units from the mean. this is typically done in factor and cluster analysis, for example.formula for z scores:z score = x -

56、x scopyright 2008 ciic &comr- 59 -areas under normal curve mean-1.65 sd 90% +1.65 sd-1.96 sd 95% +1.96 sd-2.58 sd 99% +2.58 sdcopyright 2008 ciic &comr- 60 -other measures of dispersionrangethe largest value minus the smallest value, e.g., the range of a 5 pt. purchase interest scale is 4.th

57、e range only has meaning with metric data.copyright 2008 ciic &comr- 61 -ntiles: percentiles (100ths), quintiles (5ths), quartiles (4ths), terciles (3rds).data are ranked according to magnitude and partitioned into equal sized rages (e.g., 4ths). these are appropriate if the data are at least or

58、dinal.ex:a commercial ranking higher in terms of overall liking than 55% of the products in the bases ii norm bank and lower than 45% in the norm bank is in the 55th percentile, the third quintile, the second quartile and the second tercile.copyright 2008 ciic &comr- 62 -inter-quartile rangethe percentage of observations falling between the 25th percentile and the 75th percentile; i.e., the middle 50% of the distribution.copyright 2008 ciic &comr- 63 -

人人文库> 全部分类> 应用文书 > 事务文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

市场研究中的统计技术培训资料

文档简介

温馨提示

最新文档

评论

市场研究中的统计技术培训资料

文档简介

温馨提示

最新文档

评论

相关文档