版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、copyright 2008 ciic &comr- 1 -statistics is a diverse body of theory and application which ranges from simple averages to complex statistical modeling andmultivariate analysis. it plays a key role in marketing research from project design through analysis and interpretation of data.backgroundcop
2、yright 2008 ciic &comr- 2 -backgroundthough frequently seen as an esoteric and intimidating discipline, the basic concepts in statistics require no more than high school algebra to master. and, like many other skills, hands-on experience is the best teacher (!).our focus in this seminar will be
3、on fundamental concepts and methods which have the widest application in marketing research. we will avoid complex theory and aim to provide the background necessary for you to begin applying these concepts on the job.copyright 2008 ciic &comr- 3 -to provide a practical working knowledge of fund
4、amental statistical concepts and methods in order to:1) help you to analyze and interpret quantitative marketing research data; and,2) to broaden client service skills.objectives:copyright 2008 ciic &comr- 4 -i. basic definitionsii. samplingiii.types of dataiv.summarizing datav. inferential stat
5、isticssyllabus:copyright 2008 ciic &comr- 5 - variablea quantity which is free to vary, e.g., purchase interestrating univariate analysisthe investigation of one variable at a time, e.g., mean purchase interest rating in a product test bivariate analysisthe investigation of the relationship betw
6、een twovariables, e.g., the correlation between an attribute rating and purchase interesti. basic definitionscopyright 2008 ciic &comr- 6 - multivariate analysisthe investigation of the interrelationships among several variables, e.g., the joint relationship between 15 attribute ratings and purc
7、hase interest population (universe)all objects (e.g., consumers) in the group of interestex:- all male beer drinkers in their 30s living in japan- all japanese housewives aged 25-49 who have purchased canned condensed soup in the past 3 monthscopyright 2008 ciic &comr- 7 -sampleselected subset o
8、f populationex:- 200 male beer drinkers in their 30s- 500 housewives aged 25-49 who have purchasedcanned condensed soup in the past 3 monthscensus and sample surveya census is the gathering of information about allmembers of a population (e.g., survey of all acnielsenemployees). a sample survey is t
9、he gathering ofinformation about a selected subset of the population (e.g., random sample of 400 acnielsen employees). copyright 2008 ciic &comr- 8 -sampling errorthe deviation of a figure obtained from a samplefrom the true (i.e., population) valueinferential statisticsused to generalize or mak
10、e inferences about a populationfrom a sample.ex:a market research survey of 500 housewives aged 25-49 finds that they are more frequent buyers of dry soup than of condensed soup; how likely is this to be true forall housewives aged 25-49 in japan?copyright 2008 ciic &comr- 9 -parameters and stat
11、isticsparameters are numbers used to describe a populationnumbers used to describe a sample are called statisticsobjects/casesin marketing research, these usually refer to respondents,sometimes brands.raw data and aggregate dataraw data are case or object level data, e.g., data for eachrespondent. a
12、ggregate data are data which have been grouped in some way, and often consist of percentagesand means.copyright 2008 ciic &comr- 10 -independent and dependent variablesan independent variable may sometimes be viewed as a cause and a dependent variable as an effect. moregenerally, independent var
13、iables (e.g., age, income) areused to better understand or predict dependent variables(e.g., purchase likelihood).ex:purchase interest in a new product concept differs byrespondent age; in this example, age is the independentvariable and purchase interest the dependent variablesince age could effect
14、 purchase interest but not the otherway around.copyright 2008 ciic &comr- 11 -statistical notationsome of the notational conventions used in statistics are:n:number of objects/cases (e.g., respondents) in thepopulation n:number of objects in the sample (i.e., sample size) pi:the greek letter pie
15、; percentage or proportion corresponding to group i :the greek letter mu; the population arithmetic meanx:the sample arithmetic mean (x bar):the greek letter sigma; population standard deviation s:sample standard deviation:the greek capital letter sigma; the sum of a seriesof numbers*computer symbol
16、 for multiplication: a*b = a x bcopyright 2008 ciic &comr- 12 -iii.types of datathere are two main types of data which can be further subdivided into two categories each. different types of statistical procedures are appropriate for each type of data.non-metric- nominal- ordinalmetric- interval-
17、 ratioa basic understanding of the concepts which follow is importantfor good questionnaire design.copyright 2008 ciic &comr- 13 -nominalthe lowest form of data in terms the information it provides.examples are male/female, tokyo/osaka, user/non-user.no ranking or order of data is presumed (e.g.
18、, frequency of use).usually expressed in percentages or frequencies.ordinalan ordered category. ordinal data indicates whether an object has more or less of a characteristic than another object, but not how muchmore. examples are age groups and heavy/medium/light usage of a product category.medians,
19、 ranks and percentiles can be computed on ordinal data.copyright 2008 ciic &comr- 14 -intervaldata are measured in constant units. an example is a numeric rating scale, where 5-4 = 4-3 = 3-2 = 2-1. the unit of measurementis 1.there is no true zero, however; fahrenheit and celsius temperaturescal
20、es are interval. one cannot say that 50 is twice as hot as 25because the zero on either scale is arbitrary.ex:30 celsius = 86 fahrenheit15 celsius = 59 fahrenheit, not 43copyright 2008 ciic &comr- 15 -in actuality, most rating scales used in marketing research lie somewhere between ordinal and i
21、nterval. for example, can we really say that the difference between very much want to buy and want to buy” is the same as the difference between want to buy and cant say either way?if the data are judged reasonably close to being interval, it is acceptable to compute means and to treat them as inter
22、val for analysis. statistical procedures designed for ordinal data (non-parametric) and those designed for interval data (parametric) frequently yield similar results.copyright 2008 ciic &comr- 16 -ratiothis is the highest form of data. possesses all properties of intervaldata and has a true zer
23、o.a kelvin scale is ratio; so are age and income when not categorized.copyright 2008 ciic &comr- 17 -some guidelines1)many significance tests, such as the t-test, assume that the dataare interval or ratio. while it is not uncommon in practice to employ these methods when the data are non-metric,
24、 strictly speaking, non-parametric tests such as the kruskal-wallis test are more appropriate.copyright 2008 ciic &comr- 18 -2)weights are often assigned to frequency of usage/purchase data in order to compute means, as in the example below:frequency consume coffee weighttwice a day or more (3.0
25、)once a day (1.0)3-5 times a week (0.3)1-2 times a week (0.2)less than once a week (0.1)weights such as these are often quite arbitrary and the resulting means only rough approximations. in such cases, it may be preferable to treat the data as ordinal rather than interval.copyright 2008 ciic &co
26、mr- 19 -iv. summarizing datafrequency distributionone of the most useful means of summarizing data.can be represented in tabular form, e.g.,respondent age n %teens 75 1520s100 2030s125 2540s100 2050s 75 1560s 25 5total500100or in graphic form.copyright 2008 ciic &comr- 20 -051015202530teens20s30
27、s40s50s60shistogramcopyright 2008 ciic &comr- 21 -shape of frequency distributionnormal distributionthis plays a key role in many statistics.many parametric inferential statistics assume the population isat least approximately normally distributed.severe departures from normality can invalidate
28、descriptive statisticssuch as means and standard deviations.copyright 2008 ciic &comr- 22 -example of a normal distribution 1copyright 2008 ciic &comr- 23 -example of a normal distribution 2copyright 2008 ciic &comr- 24 -example of a normal distribution 3copyright 2008 ciic &comr- 25
29、 -departures from normalityskewnesswhen a distribution is asymmetrical, it is skewed.if the distribution leans to the left and the longer tail points to the right, it is positively skewed. on the other hand, if it leans to theright and the longer tail points to the left, it is negatively skewed.copy
30、right 2008 ciic &comr- 26 -positively skewed distributioncopyright 2008 ciic &comr- 27 -negatively skewed distributioncopyright 2008 ciic &comr- 28 -kurtosiswhen the tails are unusually fat or unusually thin, the distributionis said to be kurtotic.copyright 2008 ciic &comr- 29 -examp
31、le of platykurtic distributioncopyright 2008 ciic &comr- 30 -example of leptokurtic distribution copyright 2008 ciic &comr- 31 -measures of central location (center of data)averages3 kinds of averages are typically used in marketing research:- arithmetic mean- median- modeif a distribution i
32、s symmetrical, the mean, median and mode areall the same.copyright 2008 ciic &comr- 32 -example of symetrical distributionmean, median, modecopyright 2008 ciic &comr- 33 - example of asymetricalpositively skewed distributionmode median meancopyright 2008 ciic &comr- 34 -arithmetic meanth
33、e most commonly-used average for metric data; it is calculated as follows:x = x nex:the mean of 5, 2, 1, 3 is 5 + 2 + 1 + 3/4 = 2.75copyright 2008 ciic &comr- 35 -means of grouped data may also be estimated by using the following formula:x*w/w,where x is the interval midpoint and w (weight) is t
34、he frequency or (percent).copyright 2008 ciic &comr- 36 -ex:respondent age x n %teens14.5 75 1520s24.5100 2030s34.5125 2540s44.5100 2050s 54.5 75 1560s64.5 25 5total500100copyright 2008 ciic &comr- 37 -x = (75*14.5) + (100*24.5) + (125*34.5) + (100*44.5) + (75*54.5) + (25*64.5)/500 = 36years
35、or, if percentages are used as the weights:x = (15*14.5) + (20*24.5) + (25*34.5) + (20*44.5) + (15*54.5) + (5*64.5)/100 = 36 yearscopyright 2008 ciic &comr- 38 -2 drawbacks of the arithmetic mean, however, are:it is sensitive to extreme values (outliers), especially when the number of data point
36、s is small. in the earlier hypothetical series of numbers (5, 2, 1, 3), 5 appears to be an outlier. if 3 is substituted for 5, the mean of these numbers decreases from 2.75 to 2.25:3 + 2 + 1 + 3/4 = 2.25a second disadvantage is more general; the mean may be misleading when the data distribution is n
37、on-normal.copyright 2008 ciic &comr- 39 -example of bi-modal distributionmode mean mode mediancopyright 2008 ciic &comr- 40 -example of uniform distributionmean, median, mode identicalcopyright 2008 ciic &comr- 41 -example of u-shaped distributionmode mean mode mediancopyright 2008 ciic
38、&comr- 42 -medianthe middle value of ordered data.appropriate for ordinal, interval, and ratio data.computational procedure:first, rank the data from smallest value to largest value.then, find the position of the middle value with the followingformula: x = n + 1 2where n is the number of data po
39、ints.copyright 2008 ciic &comr- 43 -ex:there are 11 data points (numbers) in the following ranked data set:6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19the median is the sixth largest (or smallest) number (11 + 1/2 = 6) :6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19note that the above data set has an odd nu
40、mber of values (n=11). when there is an even number of data points, there will be two middle values. in these instances, the median is the arithmetic meanof the two middle values.copyright 2008 ciic &comr- 44 -ex:consider the median of the following data set with 12 data points:6, 9, 9, 10, 11,
41、11, 12, 12, 13, 16, 19, 30.the median lies between the 6th and 7th largest (or smallest) value(12 + 1/2 = 6.5):6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19, 30 or 11.5notice that the addition of an outlier (30) had little impact on themedian. by contrast, the mean of the first data set is 11.64 and theme
42、an of the second data set is 13.2. this is a major reason for using the median rather than the arithmetic mean.copyright 2008 ciic &comr- 45 -however, the median uses less information about the data than does the mean and is often less in formative.statistical procedures (non-parametric methods)
43、 developed for the median are generally less flexible and informative than those which analyze means.medians can also be calculated from grouped data.copyright 2008 ciic &comr- 46 -ex:respondent age x n(w) %teens14.5 75 1520s24.5100 2030s34.5125 2540s44.5100 2050s54.5 75 1560s64.5 25 5total50010
44、0there is an even number of interval midpoints (6), thus the medianmust lie somewhere between the 3rd and 4th values:n + 1/2 = 3.5; 34.5 and 44.5 are the two middle categories.copyright 2008 ciic &comr- 47 -the (weighted) mean of 34.5 and 44.5 is(125*34.5) + (100*44.5)/225or 38.9.copyright 2008
45、ciic &comr- 48 -modethe mode is closest in meaning to the laymanss term, average - that is typical. it is very commonly used in marketing research but rarely referred to by name.it is simply the most frequent value.ex:brand x (33%) leads in terms of p3m purchase, followed by brand y (21%) and br
46、and z (17%). the mode of these data is 33%.ex:the mode of the following data set,6, 9, 10, 11, 15, 16, 16, 16, 20is 16.copyright 2008 ciic &comr- 49 -modes can also be obtained from grouped data.ex:respondent age x n(w) % teens14.5 75 1520s24.5 100 2030s34.5 125 2540s44.5 100 2050s54.5 75 1560s6
47、4.5 25 5total 500 100here, the modal age group is 30-39 and the modal age is the midpoint 34.5 of this age range.copyright 2008 ciic &comr- 50 -the major disadvantages of the mode are:-it does not lend itself well to inferential statistical methods.-sometimes, there is no distinct mode. consider
48、 the following data:brandp1m purchase a 29% b 28% c 27%there is no meaningful difference in p1m purchase among the three brands.copyright 2008 ciic &comr- 51 -measures of dispersion (spread of data)the most commonly-used measures of dispersion are:- variance- standard deviation- range- percentil
49、es, quartiles, quintiles, terciles (ntiles)- inter-quartile range copyright 2008 ciic &comr- 52 -variance and standard deviationtwo of the most widely-used statistics and play a role in most parametric statistical procedures.they are related to one another in the following way: the standard devi
50、ation is the square root of the variance and the variance the square of the standard deviation. copyright 2008 ciic &comr- 53 -formula:for sample for populationvariance:s = (x - x) = (x - m) n-1 nstandarddeviation:s = (x - x) = (x - m) n-1 n s = s, = note that the term n-1 for the sample statist
51、ics is known as the degrees of freedom.copyright 2008 ciic &comr- 54 -ex:the standard deviation of the hypothetical sample data below is computed as follows:9, 8, 5, 11, 7, 5compute mean:x = 9 + 8 + 5 + 11 + 7 + 5 = 7.5 6copyright 2008 ciic &comr- 55 -calculate squared deviations from the me
52、an:x xx-x (x-x)97.5 1.5 2.2587.5 0.5 0.2557.5-2.5 6.25 117.5 3.5 12.2577.5 0.5 0.2557.5-2.5 6.25 45 0 27.50copyright 2008 ciic &comr- 56 -substitute (x - x) and n into formula:s = 27.5 = 2.3 6-1 s = 2.3 = 5.29the term (x - x) is known as the sum of squares.copyright 2008 ciic &comr- 57 -some
53、 uses of standard deviation and varianceaverages tell us about the center of a distribution, but nothingabout its spread. in a peaked distribution, most observations willfall close to the average. in a flat distribution, on the other hand, the average may have little meaning.if the distribution of t
54、he data is approximately normal, about 68%of the observations lie within +1 standard deviation of the mean and about 95% within + 1.96 standard deviations of the mean.copyright 2008 ciic &comr- 58 -z scores (“standard scores”) can be computed so that different types of scales can be compared. fo
55、r example, ratings collected from a 5-point scale and those collected from a 7-point scale can be analyzed by expressing each respondents ratings in terms of standard deviation units from the mean. this is typically done in factor and cluster analysis, for example.formula for z scores:z score = x -
56、x scopyright 2008 ciic &comr- 59 -areas under normal curve mean-1.65 sd 90% +1.65 sd-1.96 sd 95% +1.96 sd-2.58 sd 99% +2.58 sdcopyright 2008 ciic &comr- 60 -other measures of dispersionrangethe largest value minus the smallest value, e.g., the range of a 5 pt. purchase interest scale is 4.th
57、e range only has meaning with metric data.copyright 2008 ciic &comr- 61 -ntiles: percentiles (100ths), quintiles (5ths), quartiles (4ths), terciles (3rds).data are ranked according to magnitude and partitioned into equal sized rages (e.g., 4ths). these are appropriate if the data are at least or
58、dinal.ex:a commercial ranking higher in terms of overall liking than 55% of the products in the bases ii norm bank and lower than 45% in the norm bank is in the 55th percentile, the third quintile, the second quartile and the second tercile.copyright 2008 ciic &comr- 62 -inter-quartile rangethe percentage of observations falling between the 25th percentile and the 75th percentile; i.e., the middle 50% of the distribution.copyright 2008 ciic &comr- 63 -
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 电影娱乐行业在线购票与会员管理平台方案
- 电子元器件项目运营管理方案
- 2025 高中语文必修上册《哦香雪》香雪的文化追求与个人成长课件
- 机械运动学试题及答案
- 幼儿园各领域学科简案5篇
- 南坝小学考试题目及答案
- 血透室职业暴露应急预案
- 2025年临床执业医师《外科》模拟卷
- 医保基金使用规范考核试题及答案
- 简单技巧组合考试题及答案
- 2023年湖南省各市州湘能农电服务有限公司招聘笔试参考题库含答案解析
- 资源枯竭型城市冷水江经济转型发展研究的开题报告
- 大唐国际600MW仿真机题
- 会议记录表格式01
- 新视野大学英语(第四版)读写教程1(思政智慧版) 课件 Unit 4 Social media matters Section A
- 第二章-军事思想-题库
- 灰姑娘Cinderella英语故事(课堂PPT)
- 新型花篮式悬挑架专项施工方案
- 闽教版(2020版)六年级上册信息技术全册教案
- 恒大-金碧天下开盘方案
- GB/T 21655.1-2008纺织品吸湿速干性的评定第1部分:单项组合试验法
评论
0/150
提交评论