I aim to do a capture-recapture study to estimate the sensitivity of a COVID-19 surveillance system.

The dataset I’ve have weekly data from cases captured in a population-specific surveillance system (schools) and weekly data from the nation-wide surveillance system during the same period.

I’ve found the following packages “Rcapture", “multimark”, “openCR”, and "Rmark” and I would like to hear which one is most suitable and what the pro’s and con’s of them are.

Thanks for your post! I am afraid I do not have expertise in this, but will tag a few others who might know.

If you can describe more the exact comparisons you want to do, that may help. I can imagine that joining functions, particularly “fuzzy” or probabilistic matching may be useful to compare cases detected by differing surveillance systems.

hey - never actually had to do a capture-recapture with r.
But agree joining will be a big part of what you need to deal with - a new package which extends the regular tidyverse for joining is the powerjoin package.
Otherwise the epiR - is quite outdated, but will have some useful helper functions with appropriate methodologies that others dont necessarily have.
The tidyverse package, yardstick also has functions for specificity and sensitivity.


Also this case study is outdated but might be helpful to you


@aspina: I think this case study is very helpful! This might be silly but do you know where I could download the example dataset (e.g. salmonella.xlxs)? I have trouble with imaging the dataset structure :frowning:


good question - I wonder if @amy.mikhail knows if this is available?


Hi @vuthutrang.hmu ,

This github repository is a bit out of date, as Alex said, and so was missing some of the required materials. I have now added a folder called data to the repository, which contains the data sets that you need to run the capture-recapture code in exercise 9.

The link to download the data is here:

There are three data sets (contained in two files) that you need:

  1. salmonella.xlsx contains two data sets on separate worksheets (NSSS and NRLS);
  2. threesources.dta is the third data set.

It is a while since I have looked at the code, but hopefully it should still work - it uses the Rcapture package for the last section.


@amy.mikhail Thank you so much! I was successful download and have chance to practice as the case study :grinning:

However, I saw in the case study only have guidance on calculate the sensitivity. But I need to calculate the 95%CI of sensitivity too. Do you know which command is suitable for calculate the 95% CI for sensitivity?

hi @vuthutrang.hmu - not sure those functions have a way of doing 95%CI, however you can get the CIs for any proportion using the binom.wilson() function from the binom package