I’ve been working with data from several countries across the European region and struggle with cross country analyses due to differences in data collection and reporting. I’m wondering if open source software could offer a solution to this issue by standardizing, validating, and summarizing completeness of data.
Im not sure of the best way to set this up, and I suspect there are already many tools out there that can be used. Does anyone have any insight on how this could be a reality? Are there organizations out there attempting to do this? What are the major challenges involved?
Hope this stimulates some fruitful discussion, thanks everyone!
Interesting question and discussion to start.
In my case it has also been hard to standardize data collection from different projects as different contexts end up having different needs.
Maybe one way is to define the minimum data elements needed for reporting (have have same naming) and leave anything specific to each context for them to decide.
Once that’s defined, there exists tools that can assist in cleaning, validating, summarizing and reporting data. I am using R (open source) and it can help do that.
Happy to engage further
hey @agimm - super relevant discussion point!
I think probably the most effective way forward is to promote the use of data dictionaries and then have standard tags for common epidemiological variables.
UN OCHA has the hxl format - I don’t love the adding of a second row, nor the hashes, but if in a dictionary could work. I think if as a community we push for some additions to this then it could become really functional - because analysis scripts could then be recycled for loads of different datasets that have appropriate tags.
See this brief discussion here for more thoughts.
A standard but flexible platform would be useful as well. We’re currently exploring DHIS2 for this purpose, but we lack local experience and expertise in this platform specifically.
I believe both WHO and ECDC have minimum data standards for infectious disease reporting, though the requirements do vary across different pathogens/diseases.
Having a common set of core variables (pathogen/disease agnostic) would help - I can’t remember the name, but there is a WHO working group that was aiming to define core ontologies and requirements for outbreak data - they were active before the start of the pandemic, but I don’t know what has happened since then.
I do like the idea of open source software for surveillance - I find other paid approaches are often limiting in the most surprising ways (e.g. not being able to access a variable that contains free text without paying extra because it takes up too many bites). The payed options often end up being barriers to people enhancing existing surveillance systems (again literally having to pay to add new variables), or creating new ones. I do not have much personal experience with DHIS2, but it seems to be growing in popularity and is quite flexible with data entry options for different devices, telehealth etc. The setup is probably where people do need some support as it does require the appropriate IT architecture and also different stakeholders need to know what the system could be used for in their department, especially for departments that feed into each other’s databases / interoperability is key.