Why do we do Descriptive Epidemiology?

As epidemiologists or data analysts in public health, we do a lot of descriptive analysis.

I am interested in hearing people’s thoughts on the reasons why we do descriptive epidemiology.

  • Why do you think it’s important?
  • How do you find it useful?
  • Why do you think we calculate the specific measures that we do?

I’m not interested in the exact processes or approaches, such as calculating a specific case fatality ratio, R number, percentage, or proportion.

Instead, I’m more interested in understanding how descriptive epidemiology helps you and why you think it’s important.

1 Like

Where to start, I can say that it so easy to jump to conclusion; to come up with and answer based on personal insights and experiences . Thanks to descriptive epidemiology, when is doing right, it keeps you far from ending up with false facts; helps to divide in parts any question (using the Time, Place and Person approach for example), creating a clear path to other steps (like analytical studies) for closing the knowledge gap in any public health related question.

2 Likes

Why do you think it’s important?

  • Because they are simple enough for people with non-stat technical knowledge to understand while at the same time they help in summarizing information at a high level to facilitate decision making
  • It is the foundation for advanced analysis. For instance, through age pyramids by gender, it is possible to do a bivariate analysis by gender or not, etc.
    How do you find it useful
  • It helps conceptualize information provided e.g. by understanding the bare minimums of time, person and place
  • I use it as basis for deciding cutoffs for continuous variables that do not have present categories. Example, depending on the weight distribution across a population, we can use the mean/median to decide on a meaningful cutoff or we could use the mean distance to health facilities to determine what further categorization would be most impactful
    Why do you think we calculate the specific measures that we do?
  • To keep analysis focused by allowing epis to know what’s expected
  • To standardize processes to ensure comparability or genralization across different non-related projects
3 Likes

More broadly I think the question you are getting at is, is it useful?

I think the answer is yes - speaking from an applied public health context, we need it to define the affected population in outbreaks and routine surveillance. This in turn identifies who needs to be targeted for public health interventions, or more in depth investigations to understand what is driving transmission in a given context.

Also - as collection of large volumes of routine data for public health action becomes more and more the norm, there is no other way to collate the information - there is far too much to just have a look at it manually, or go on gut feelings from the last bunch of patient interviews you did. Collating and summarising is needed in order to inform and decide next steps, put in place relevant interventions and prevent more cases from occurring.

That said - there is one ‘traditional’ perspective on descriptive epidemiology that I disagree on - the idea that you can’t use the same cases from your descriptive analysis in further analysis (logistic regression to identify the vehicle of infection in a foodborne outbreak for example. I don’t think this is true, because the two analyses are for different purposes - they ask different questions.

I also think hypothesis generating analysis needs to have its profile raised as an intermediate step between descriptive and hypothesis testing analysis. Investigative epidemiology cannot be planned out from start to finish in advance - the types of studies to do or what to focus on in a given outbreak are identified as the investigation progresses and part of it is about identifying leads (and then ways of testing them).

1 Like

A well-done descriptive analysis based on reasonable-quality data can tell us what is going on and what to do next.

They can be super powerful and huge bang for your buck, although hard to get right. The skill is definitely undervalued in my opinion.

As for why we use the measures we do, I guess because we’ve agreed as a community on key indicators that are helpful and feasible to calculate. The devil is in the details though!

1 Like

Descriptive epidemiology helps us gather common facts and build a collective understanding of what’s going on.

When we face a new problem, the first step is to establish a clear picture of the situation: What’s going on? Descriptive epidemiology facilitates this process by providing a common base of information that everyone involved can understand and discuss. This shared perspective is essential because we don’t work or think in isolation; it aligns the team’s understanding and sets a common ground for collaborative action.

When the team working on the problem shares similar perspectives, it becomes easier to communicate with those affected and involve other potential actors to help address the issue.

Once we’re on the same page about the “What,” we can try to dive into the “Why.”

2 Likes

Why do you think it’s important?
First, descriptive analyses can be very closely linked with data collection, cleaning, and validation efforts. If there are any issues with your data (impossible/unlikely values, over/undersampling of certain groups, potential misclassification) they will probably only become apparent in well-conducted descriptive analyses.

Second, descriptive statistics are easier for non-epidemiologists and non-statisticians to understand. They usually show the key pattern needed for decision makers (e.g. outbreaks were larger in areas with lower vaccine coverage) without the nuance of more advanced analyses (this effect remained after controlling for x, y, z).

How do you find it useful?
I often use them myself to identify any potential issues when I’m collecting data for epidemiological studies, to show progress to data-collection teams, and to explain the data to people not familiar with it. They can often be automated, if needed.

Why do you think we calculate the specific measures that we do?
Mainly because of familiarity of epidemiologists with certain indicators, I think. Also, certain data often have a natural way of being presented in the easiest way (e.g. an epicurve as a barchart, a change over time as a line graph, clustering of cases as a map, etc).

1 Like

Hello,

I consider , descriptive epidemiology is hte most important phase of all epidemiologic analysis. It allow to visualize pattern, discovery things in the data.
specialy for me, in my public health practice, it allow to keep a priorization between all interventions that we have for de population in diferents disease. I always have thought thah decriptive epidemiologic analysis is the most inportant activity that a epidemiologist should be do.

1 Like

Interesting stuff, I love the range of ideas.

@leoneler - Yeah, it totally can be about jumping to conclusions, and descriptive Epi can allow you to be surprised. I like the quote by Isaac Asimov, who once said, “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny…’” “That’s funny…” is the sound of something catching in your brain. A contradiction, an anomaly, just plain weird: something doesn’t fit what you expected.

@aokhisa I like your last point about standardized processes. I’ll talk about that a bit further on.

@amy.mikhail I wasn’t quite looking at whether it’s useful, more I think it is useful but why do you think that and why do we chose to do it. Your “traditional” perspective is interesting; I’ve never heard of that before. It seems somewhat similar to the idea that you cannot generate a hypothesis and confirm a hypothesis on the same source of information. Is that more what you mean? I think you can use the same cases for both descriptive and inferential analyses. The descriptive is just about getting summary estimates; the inferential is about quantifying uncertainty around those estimates. However, if you were to generate an idea that two groups are different in your descriptive, then you can totally quantify the uncertainty of that idea in “your” data. You just can’t say that those two groups are really different.

I think this relates to the movement in Statistics to stop talking about p-values as evidence and start talking about compatibility. We start to use phrases such as “the data are compatible with this pattern.” Then you’re highlighting that it is just your data, and conclusions should really only happen at a meta level. So scientific changes only start to happen in meta-analyses. But that’s getting a bit off-topic and into the Philosophy of Science. Here’s a paper on compatibility instead of evidence.

@paulablomquist - The devil is indeed in the detail. I like the idea that we’ve agreed, as a community, on key indicators. But how do you think they were arrived at? And did we agree? Or were they somewhat discovered by probability theory?

For instance, why do we calculate R? You could say we calculate R because it tells us how many people, on average, will be infected. Or you could say we calculate R because it is a summary of a distribution, and the most efficient and effective estimate to summarize a distribution in general is a mean, and R is just a mean. So did the Epis decide, or was it kind of already determined? And then, is R, or whatever we have—attack rates and proportions and means—are they the best way?

Now you could also ask: Is R descriptive Epi or inferential? Well, if you have case and contact data, then you can calculate R just by doing a summary of how many people got infected from each case, and that’s just descriptive. But we often don’t have that, so that’s why it sits in inferential usually.

@lnielsen I like how you reference a collective understanding. I think descriptive statistics can be thought of as a common language. I think that provides shortcuts for communication. Interesting that you consider it the “what” and then perhaps the inferential is the “why”? I tend to think of the descriptive as trying to figure out what is going on and how to describe whilst exploring stuff based on possible whys.

@kevin.vanzandvoort Very practical. Yeah, descriptive stuff is often how we spot patterns for stuff that is wrong. I wonder if you consider data cleaning and descriptive to be different things? In clinical trials, cleaning would be separated out as, I guess, descriptive analyses that help you identify issues, and then you sort them. Then descriptive would be the presentation of tables. But I get your point that unless you look at your data, you won’t know if stuff is rubbish, and the way to look at your data is usually descriptive stuff.

@jestrada I’m glad to hear how important you find it. So for you, it helps you really balance when you’ve got loads of information. I think this is a key aspect.

1 Like

Why do you think it’s important?

“See first, think later, then test. But always see first. Otherwise you will only see what you were expecting. Most scientists forget that.” Douglas Adams
“Garbage in, garbage out” — George Fuechsel
“The first principle is that you must not fool yourself—and you are the easiest person to fool.” — Richard Feynman
“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” — George Box
“If you don’t ask the right questions, you don’t get the right answers. A question asked in the right way often points to its own answer.”** — Amelia Earhart
“If a name is next to a quote then they probably didn’t say it, especially on the internet” Julius Caesar

I think it’s important because understanding large amounts of information can be challenging. Summarizing that information helps us create shortcuts for understanding and for communicating with others. For example, saying we have 1,000 cases is helpful, but showing someone a list of 1,000 rows of ages is overwhelming. Telling them the minimum and maximum age reduces those 1,000 rows to just two pieces of information.

This is why I think descriptive epidemiology is important—it provides shortcuts and summaries to help us comprehend often complex and overwhelming information.

How do you find it useful?

First, for understanding. Second, I find it useful as a common language. If descriptive statistics are like formal, “stuffy” language—mean, median, mode, percentage—then measures like R, attack rate, case fatality rate, or proportion of cases in certain age groups are like slang. These terms, while really being means or ratios, give quick and precise meaning to those general terms. They serve as a shorthand that makes communication way easier.

Why do you think we calculate the specific measures that we do?

One reason is Maximum Likelihood Estimation (MLE) Theory, a statistical framework for summarizing large amounts of information. MLE provides a way to get a summary from a large amount of information that has some really nice features:

  • Consistency: As the sample size increases, the estimates converge to the true parameter value.
  • Unbiasedness: The expected value of the estimate equals the true parameter value.
  • Efficiency: It provides the lowest possible variance among all unbiased estimators.
  • Sufficiency: It uses all the information in the data relevant to estimating the parameter.

These properties make MLE a robust and theoretically sound approach to summarizing information, which is why many of the measures we calculate stem from this framework. Basically if you were trying to summarise a bunch of information then a Mathematician has thought about it and said I reckon this is a pretty good approach in general.

Another reason is human nature, particularly our social structures and conventions. For instance, we use the term “R” to denote the basic reproduction number, but it was initially called “Z₀” by George Macdonald in malaria studies. It only became “R” after his death, when it was linked to 19th-century work on fertility. If that link hadn’t been made, we might have been referring to “Z” during COVID-19 instead!

1 Like

The R0 journey: from 1950s malaria to COVID-19 - the paper

The magazine which is much nicer to read

The picture because who reads anything

image