Package for capture-recapture study to estimate the sensitivity of a surveillance system

vuthutrang.hmu · November 10, 2022, 5:49am

Hi all,

I aim to do a capture-recapture study to estimate the sensitivity of a COVID-19 surveillance system.

The dataset I’ve have weekly data from cases captured in a population-specific surveillance system (schools) and weekly data from the nation-wide surveillance system during the same period.

I’ve found the following packages “Rcapture", “multimark”, “openCR”, and "Rmark” and I would like to hear which one is most suitable and what the pro’s and con’s of them are.

Any help is appreciated. Thank you!

neale · November 15, 2022, 11:57am

Thanks for your post! I am afraid I do not have expertise in this, but will tag a few others who might know.

If you can describe more the exact comparisons you want to do, that may help. I can imagine that joining functions, particularly “fuzzy” or probabilistic matching may be useful to compare cases detected by differing surveillance systems.

@chris @cmaronga @isaacflorence @aspina @sophiemeakin @temuulen

aspina · November 15, 2022, 5:14pm

hey - never actually had to do a capture-recapture with r.
But agree joining will be a big part of what you need to deal with - a new package which extends the regular tidyverse for joining is the powerjoin package.
Otherwise the epiR - is quite outdated, but will have some useful helper functions with appropriate methodologies that others dont necessarily have.
The tidyverse package, yardstick also has functions for specificity and sensitivity.

aspina · November 19, 2022, 1:47pm

Also this case study is outdated but might be helpful to you

vuthutrang.hmu · November 25, 2022, 5:00am

Thank you all for your comments!

@aspina: I think this case study is very helpful! This might be silly but do you know where I could download the example dataset (e.g. salmonella.xlxs)? I have trouble with imaging the dataset structure

aspina · November 26, 2022, 5:08pm

good question - I wonder if @amy.mikhail knows if this is available?

amy.mikhail · November 28, 2022, 6:01pm

Hi @vuthutrang.hmu ,

This github repository is a bit out of date, as Alex said, and so was missing some of the required materials. I have now added a folder called data to the repository, which contains the data sets that you need to run the capture-recapture code in exercise 9.

The link to download the data is here:

There are three data sets (contained in two files) that you need:

salmonella.xlsx contains two data sets on separate worksheets (NSSS and NRLS);
threesources.dta is the third data set.

It is a while since I have looked at the code, but hopefully it should still work - it uses the Rcapture package for the last section.

vuthutrang.hmu · January 5, 2023, 6:29am

@amy.mikhail Thank you so much! I was successful download and have chance to practice as the case study

However, I saw in the case study only have guidance on calculate the sensitivity. But I need to calculate the 95%CI of sensitivity too. Do you know which command is suitable for calculate the 95% CI for sensitivity?

aspina · January 31, 2023, 8:58am

hi @vuthutrang.hmu - not sure those functions have a way of doing 95%CI, however you can get the CIs for any proportion using the binom.wilson() function from the binom package