捕获-再捕获分析.ppt_第1页
捕获-再捕获分析.ppt_第2页
捕获-再捕获分析.ppt_第3页
捕获-再捕获分析.ppt_第4页
捕获-再捕获分析.ppt_第5页
免费预览已结束,剩余81页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Capture recapture analysis捕获-再捕获分析,Keith Sabin, PhD, MPH DHHS/CDC/GAP,What is it for?为什么?,Capture-recapture analysis is used for counting the total number of people in a population using two or more incomplete lists of those people 捕获-再捕获分析用于两组或多组非完整名单来对某个人群进行计数 Why should I be interested?为什么我对此感兴趣?

2、 Evaluating surveillance systems 评价监测系统 Magnitude of issues 问题的大小,Overview概要,Origin of method方法的来源 Application to epidemiology - why is it useful for us? 方法的应用-为什么对我们有用? Principles原则 Conditions for using capture-recapture methods应用捕获-再捕获的条件 Methods方法 Two sources两个来源 Multiple sources多个来源 Limitations不

3、足之处,Origins of capture-recapture analysis捕获-再捕获分析方法的来源,Origins in demography来源于人口学 1662 - used to estimate the population of London 1662年用来估计伦敦市的人口 1783 - Laplace used to estimate population of France 1783年laplace用于估计法国的人口 1949 - Sekar and Deming used to estimate birth rate and mortality in India 19

4、49年Sekar和Deming用于印度的出生率和死亡率 Subsequently most often for estimating wildlife populations 随后通常用于野生动物数量 More recently applied to epidemiology (Wittes 1968) 近期被用于流行病学,Application of capture-recapture analysis to human epidemiology捕获-再捕获分析在人类流行病学上的应用,Evaluating completeness of a surveillance source 评价监测资

5、料的完整性 Passive surveillance被动监测 Registers登记 Refining incidence and prevalence estimates from surveillance systems or population surveys改进监测系统或人群调查得到的发病率和患病率估计 Used for cancers, stroke, homelessness, mental illness, drug use, congenital disorders, infections运用于癌症、中风、无家可归、精神疾患、吸毒、生天性疾患和感染,Principles原则,

6、Two or more sources (lists) of cases a given disease 某一疾病的两组或多组来源病例 Sources considered random capture samples in population 认定来源病例为一总体的的随机捕获样本 Cases can be matched by unique identifiers 病例之间可以用唯一识别匹配 Estimate total number of cases that are not captured by any source from the matched and unmatched估计未

7、被任何病例来源(匹配和未匹配)捕获的病例总数,Critical assumptions/conditions重要假设条件,1. Population is closed 人群是封闭的 methods exist for open populations有用于开放人群的方法 2. Individuals captured on both occasions can be matched两次被捕获的个体间可以匹配 3. Capture in the second sample is independent of capture in the first两次捕获之间是相互独立的 4. Probabi

8、lity of capture is homogeneous across individuals每个个体被捕获的可能性是一致的 Homogeneity of individuals个水平上 Homogeneity of lists名单水平上,Application to humans在人群上的应用,“Capture” = appearing on a “list”“捕获”出现在“名单”上 “re-capture” = linking by identifying individuals appearing on both lists by criteria name, date of bir

9、th etc “再捕获”同时出现在两个名单位并符合标准(姓名、生日等)的连接 “Trap fascination” ”陷阱诱惑“ if you feed the animal they are more likely to be caught again 如果你喂食,动物被抓获的可能就增大 laboratory confirmed cases are more likely to be reported in other systems实验室确证的病例在其他系统被报告可能就更大 “Trap avoidance” ”陷阱逃逸” if you scare the animal they will

10、avoid the trap 如果吓唬动物,动物将不会被逮 a person cant appear on community injecting drug user registry if they are in prison 被关押在监狱的人不可能出现在社区IDU登记册上,仅在第一次 被捕获,仅在第二次 被捕获,未被捕获,第一次和第二次 均被捕获,未被捕获,Two sources两次捕获样本,Source A,Source B,x12,x11,x21,x22?,1 included in source 在捕获样本中 2 not included in source不在捕获样本中,Captu

11、re (Source A) and recapture (source B)捕获(样本A)和再捕获(样本B),Estimation估计,If sources independent: P(A+ if B+) = P(A+ if B-) 如果是独立样本:,Capture (Source A) 是针对难以接近人群的抽样方法 Description of RDS; RDS的描述 Lessons learned from Vietnam; 越南实践中得到的经验教训,Probability Sampling 概率抽样 (Simple单纯随机, Systematic系统, Cluster整群),Gold

12、Standard-Best methods for sampling But, do not reach hidden populations: 金标准-最好的抽样方法,但不能接近隐匿人群 No sampling frame没有抽样方案 Stigmatized被歧视 Would need huge sample sizes in order to capture a hidden population 需要很大的样本量才能找到隐匿人群 Expensive费钱,Sampling Methods to Reach Hidden Populations 针对隐匿人群的抽样方法,Time-Locati

13、on (TLS), Venue-Based 在场所的时间-空间抽样法 -Major Bias: Only captures those who are visible 主要偏倚: 仅能找到显性人群 Snowball滚雪球 -Major Bias: Not representative of the population (tendency for in-group affiliation, volunteerism and masking) 主要偏倚:代表性不好(组内从属倾向,自愿性),Background on RDSRDS背景,Developed by D. Heckathorn and

14、R. Broadhead with IDUs in Connecticut and in Yaroslavl, Russia; 由D. Heckathorn和R. Broadhead在 美国Connecticut州和俄罗斯Yaroslavl的IDU中研发而成 Sampling vs. Recruitment strategy;抽样 vs 招募策略 Different from other chain referral methods because it can give us point estimations with standard errors.不同于其他链式推举方法,因为可以给出一

15、个点估计和标准误,How RDS Works如何做RDS,Use of a dual system of recruitment through the use of incentives. 运用双重招募系统,并给予报酬(小的刺激) Use of recruitment quotas. 运用招募限额 Use of peers to recruit peers. 运用同伴招募同伴 Use of links between recruiters and recruits. 运用招募人与被招募人之间的联系,The Theory Behind RDSRDS的理论要点,Uses prinicples o

16、f First Order Markov Theory 运用Markov链式理论 Long referral chains 推举长链 Final sample will be independent of those selected as “seeds” 最后的样本独立于刚开始的”种子” Final sample will be similar to the population of the network from which you are recruiting 最后的样本将与你所招募捐的人群相近,Wave 1 Wave 2 Wave 3 Wave 4 Wave 5,Wave 1 Wa

17、ve 2 Wave 3 Wave 4 Wave 5,Wave 1 Wave 2 Wave 3 Wave 4 Wave 5,Wave 1 Wave 2 Wave 3 Wave 4 Wave 5,Wave 1 Wave 2 Wave 3 Wave 4 Wave 5,Wave 1 Wave 2 Wave 3 Wave 4 Wave 5,A Long Referral Chain: Jazz Musicians in New York City,Selection of Seeds,Example in Hai Phong Vietnam越南海防的例子,Final Sample size: 420 I

18、DUs in Hai Phong and Saigon; 418 CSWs in Saigon and 220 in Hai Phong 最后的样本量: 海防和西贡的420名IDU,西贡的418名CSW,和海防的220名CSW Recruitment process招募过程 20 seeds selected by peer educators 同伴教育员选择20个种子 Three coupons to each participant 每个参加对象发三张卡片 Participants asked to recruit their peers 要求参加对象招募其同伴 Time: March J

19、une, 2004时间: 2004年3-6月 Three sites (Hai Phong); Four sites (Saigon) 海防的三个地点;西贡的四个地点,Eligibility Criteria入选标准,CSWs: Women, 18 years or more, living or working in Hai Phong or Saigon;女性,18岁及以上,在海防或西贡生活或工作 Has sold sex for money in the last 30 days; 在过去的30天内以性换钱 Has a green coupon (except seeds); 有一张绿卡

20、片(除种子外) Has provided consent. 知情同意 IDUs Women (Saigon only) or Men,18 years or more, living in Hai Phong or Saigon; 男性或女性,18岁及以上,住在海防或西贡,西贡仅做女性 Has injected drugs during the last 30 days; 在过去30天内注射过毒品 Has a yellow coupon (except seeds);有一张黄卡片(除种子外) Has provided consent.知情同意,Coupon: Front Side卡片正面,LI

21、FE-GAP project: For Your Health and Safety Payment coupon Address_ Telephone:_ (You can call to make an appointment in advance) You will receive 15,000 VND for each person who you recruit and enrolls into the study (you may recruit up to 3 persons) ID number: Please call us in advance. You must pres

22、ent this coupon for payment,Coupon: Back Side卡片背面,Networks of CSWs in Hai Phong,A network in Hai Phong,Seed,Initial Lessons from Vietnam越南的初步经验,Seeds should have high degree-initial focus group may be important;种子应该具有高学历,最初的中心小组很重要 No slow down mechanism to end RDS; 没有减速机制来停止RDS Need for security-In

23、terviewers have no choice of whom they interview; 安全保障-调查员无法选择调查对象 Managing multiple sites can be difficult; 同时管理几点地点有困难 Managing coupon numbers;卡片号码管理 No way to control for those who recruit faster.没法控制那些快速招募者,Initial Lessons from Vietnam (Cont),Difficult to discourage recruiters from selling coupo

24、ns or giving them out in a non random way; 没法控制招募人出售卡片或以非随机的方式发出卡片 Non response information difficult to obtain (incentives picked up by friends, recruiters do not return for secondary incentive) 无应答信息很能获取(报酬被朋友拿走,招募人不回来再取报酬),Philosophical objection?哲学上的异议?,Capture-recapture is fun, so it must be ep

25、idemiology! 捕获-再捕获很有意思,所以客观上一定是流行病学! But, as epidemiologists we are interested in 但是作为流行病学家,我们对三间分布感兴趣 Time, place and person Capture-recapture does not capture time - it is a static tool which relies on lists which correspond to prevalence of a chronic disease (e.g. diabetes) or long time periods f

26、or acute diseases (legionella)捕获-再捕获不”捕获”时间, 这是一个静态工具,领带于与慢病性(如糖尿病)或长时间急性病(军团病)的名单 Can be used for measuring broad trends by repeat analysis (Nardone et al Epidemiol Infect 2003)可以用重复分析来测量总体趋势,Practical limitations操作上的不足之处,Unique identifier has to match in all data sources 必须用唯一识别信息对所有数据来源进行匹配 This

27、may contravene confidentiality laws这可能与保密法相抵触 Clever statistics cant correct bad data 聪明的统计不能纠正不好的数据 Rubbish in, rubbish out. 垃圾进,垃圾出 For chronic and expensive diseases (eg diabetes) it may be better to carry out an expensive detailed survey than to use quick and dirty methods对慢性和费钱的疾病(如糖尿病),开展一项费钱的

28、详细调查比使用快速但很差的方法更好 it may be even more expensive to get it wrong. 如果出错,费用更高,Extrapolation is based on assumptions,we are assuming that the model which describes the observed data also describes the count of the unobserved individuals. We have no way of checking this assumption. This is analogous to,

29、and has the same dangers as fitting an arbitrary curve to a series of points (x,y), where x0, with the intention of estimating y at x=0. .this is analogous to the position of those who automatically assume that the k samples in our problem are independent. 我们假设描述观察数据的模型也描述了未观察个体的计数,我们无法检验这个假设.这与随意用一

30、曲线对一组数据(X,Y)进行拟合一样并有同样的危险,当X0时,当X=)估计Y.这与自动假设我们的问题有K个独立样本的某些人的立场一样. Fienberg, Biometrika 1972;59:591-603,Conclusion小结,If conditions are met如果条件符合 Potential to use multiple incomplete registers and to estimate population size by capture-recapture有可能利用多组不完整的登记数据,用捕获-再捕获的方法估计人群基数 Cheaper than exhaustiv

31、e registers比彻底完全的登记少费钱 两个来源样本 不可能对相关度进行定量 Two sources两个来源样本 Impossible to quantify extent of dependence Requires third source Multiple sources Log-linear modelling method of choice Can adjust for dependence and variable catchability,Caveats警告,Use technique but be careful!使用技术但要小心 Dont treat this as

32、a black box method 不能当成是一个暗箱方法 All prior knowledge should be used to formulate the model就用所有前人的知识来制定模型 Know your data!了解你的数据 Not the solution to all problems Conditions often not met when applied to epidemiology There may still be heterogeneity you dont understand Complementary technique,References,

33、Wittes JT, Colton T and Sidel VW. Capture-recapture models for assessing the completeness of case ascertainment using multiple information sources. J Chronic diseases 1974;27:25-36. Hook EB, Regal RR. Capture-recapture methods in epidemiology. Methods and limitations. Epidemiologic Rev 1995; 17(2):

34、243-264 International Working Group for Disease Monitoring and Forecasting. Am J Epidemiol. Capture-recapture and multiple-record systems estimation I: History and theoretical development. 1995;142:1047-58 International Working Group for Disease Monitoring and Forecasting. Am J Epidemiol. Capture-re

35、capture and multiple-record systems estimation II: Applications in human diseases. 1995;142:1059-68 LaPorte RE, Dearwater SR, Yue-Fang C et al. Efficiency and accuracy of disease monitoring systems: Application of capture-recapture methods to injury monitoring. Am J Epidemiol 1995;142:1069-77,Recent

36、 examples of application to field epidemiology,Legionnaires disease. Infuso et al Eurosurveillance 1998;3:48-50; Nardone et al 2003;131:647-54 Malaria. Van Hest et al. Epidemiol Infect 2002; 129:371-7 Measles. Van den Hof et al Pediatr Inf Dis J 2002; 21:1146-50 Acute flaccid paralysis. Whitfield Bu

37、ll WHO 2002;80:846-851 Pertussis deaths. Crowcroft et al Arch Dis Child 2002;86:336-8 Intussception after rotavirus vaccination. Verstraeten et al Am J Epidemiol 2001;154:1006-1012 Tuberculosis. Tocque et al Commun Dis Public Health 2001;4:141-3 Salmonella outbreaks. Gallay et al Am J Epidemiol 2000

38、; 152:171-7 AIDS. Bernillon et al Int J Epidemiol 2000;29:168-174 Meningitis. Faustini et al. Eur J Epidemiol 2000;16:843-8,Special thanks to Nancy Crowcroft Health Protection Agency London Many of the capture-recapture analysis slides come directly from her class at Epi-Et.,THANK YOU!,RDS: Advantag

39、es,Ease of field operations Little for formative research/mapping Target members recruit for you Reach less visible segment of population Good external validity (found in other studies-still waiting to see in Vietnam) Minimal number of additional questions needed Computer software available Lower Co

40、st (Still waiting to see),RDS: Limitations,Population must be a network; Must be able to verify group membership; Must track links between recruiters and recruits-coupon management; Incentives; Very difficult to deal with selective non response bias.,Option 1: Use RDS with Institutional Data,Capture

41、-recapture requires two samples of the population, only one of which need be representative. If an institutional database is available, only a single number is required to “recapture” the population. Example: # of Registered NEP members,Example of Capture-Recapture,Capture: During the study period,

42、police recorded contacts with 86 injectors. The detective who provided this information said he was “confident that this is almost all the shooters in town.” Recapture: During the study period, 388 were interviewed using RDS. Overlap: 32 respondents were in both the police and the RDS samples. Estim

43、ated population size:,Estimating the Number of Jazz Musicians in NYC using the Logic of Capture/Recapture,Capture: Proportion of NYC musician union members who identified themselves as jazz musicians (in response to a union member survey) = 70% (415/592). Number of musician union members in the New

44、York metropolitan area, according to union records is 10,499. Therefore, the estimated number of union jazz musicians is 7,360 = (10,499 x .70). Recapture: Proportion of all NYC jazz musicians who are union members according to a RDS study is 22%. Using estimate of number of NYC union jazz musicians

45、 and estimated portion of all NYC jazz musicians who are union members, the size of the NYC jazz musician universe is: 7,360/.223 = 33,003,Multiple sources,Wittes Method,Evaluate dependence among sources Compare two-source estimates of N If estimates different Test of independence Calculate odds rat

46、ios between cell counts of two sources within a third source If OR 1 dependence Merge dependent sources Repeat calculation of estimates with merged source,a,b,c,d,e,f,g,A,B,C,Test of independence,Test of independence,a,b,c,d,e,f,g,A,B,C,OR = cg/de,OR = 1 independence OR 1 positive dependence underes

47、timation of N OR 1 negative dependence overestimation of N,a,b,c,d,e,f,g,A,B,C,Test of independence,To solve, have to assume highest order interaction=0 i.e. the chance of being in all the lists (in c) is a simple function of the chance of being on any single or list of lesser combination Or, there

48、is nothing special about “c”,Analyze relationship between categorical variables in a contingency table Logarithm of expected frequency of a cell expressed as linear function of effects for each cell and interaction term For 3 variables A with i levels, B with j levels, C with k levels, logarithm of

49、expected frequency of cell Fijk for cell ijk is,Log-linear modeling - General, main effect A first order effect AB second order effect (interaction),Log-linear modeling - CRM,Estimates value of a missing cell in a 2k contingency table k = number of sources Missing cell = number of cases not listed by any source (m222),Log-linear modeling,No interaction: sources are independent (1 model) Interaction between 2 sources only (3 models) Interactions between pairs of sources (3 models) Interactions between all sources 2 by 2 (1 model),How to chose the best model,Aim Best f

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论