Applied Epi Community

Considerations for sample size calculation

Hello everyone,

I am trying to determine the prevalence of Chlamydia in females aged 15 to 29 years of age in my country through a prevalence survey. What sample size should I take into consideration?

Thank you very much for your help,

Hi @epi_dude , I will try to help you.

There is a simple formula that you can use to calculate the sample size for a prevalence survey. For the original formula, you do not take the sampling techniques for the survey into consideration (e.g cluster or stratified sampling). The calculation is only based on the expected prevalence (proportion of diseased individuals) in your target population. The sample size calculation formula is : n = (Z2 * P(1-P))/d2);

  • n is the sample size
  • Z is the statistic corresponding to the level of confidence
  • P is the expected prevalence of the outcome indicator (in your example, the proportion of women aged between 15 to 29 years of age with a chlamydia infection)
  • d is the precision you want around your estimate.

You would usually aim for a 95% level of confidence in your estimate and thus Z would be 1.96. For the precision around your estimate (d), this will really depend how precise you want your estimate to be. The more precise, the larger the sample size. You might have to play around a bit with this number to see the impact on your sample size and to discuss within your study team how much precision you are willing to compromise to gain some positive parts from a reduced sample size (less resources required for survey implementation for example).

Calculating the sample size for such a simple prevalence survey can be done using online sample size calculators like OpenEpi.

In the event you will use more complicated sampling techniques, you will have to take this into consideration. For example, if you wanted to have a reliable estimate of chlamydia prevalence in the 15-19 years age group, and another estimate in the 20 to 29 years age group, you would calculate a simple sample size like explained above. But this would be sufficient for the prevalence estimate in only ONE age group. For your full study you would need to then double that sample size, so you would have sufficient observations in each age group to measure a reliable estimate of prevalence.

In the event you were to conduct a cluster based survey to measure the prevalence of chlamydia infection in women between 15 and 29 years of age, you would also need to account for a design effect in your study design. The design effect is a factor that is used to correct for the variation within clusters for your outcome of interest. If you expect the prevalence for chlamydia infection to vary greatly between different geographic areas, you would increase your design effect (and therefore also the sample size) to account for this. To get an estimate on the level of design effect to expect within a cluster based survey, you can check studies of similar design who generally report the design effect they calculated after analysing their data.

I hope this is helpful!

1 Like