I’ve always found working with the COVID-19 PANGO lineages a bit confusing, so I made an R package to make it easier…
Want to know the expanded names of BL.1? Want to collapse B.1.1.529.2.75.1.2 to a shorter form? This is possible in the {pangoRo} package. You can read more about it and install from GitHub https://github.com/al-obrien/pangoRo. This package was inspired by a similar package available for Python called pango_aliasor.
A few notes on functionality:
Caches alias table per R session to avoid unnecessary calls back to PANGO source
Helper functions to access various PANGO tables
Expand or collapse vectors of COVID-19 lineages
Sort lineages
List lineage parents and children
Search a lineage name to see if related to another, either as a parent or child lineage
Thanks @al-obrien. This is of great value!! I did reach out to Cornelius Roemer, the developer of the python package Pango Aliasor, about considering an R friendly version of pango_aliasor several weeks ago, and I think he was working on it. I didn’t have time to build one. This would be extremely valuable to the genomic epidemiology community. Thanks @neale. This is great. Some of these nested pango aliases are tough to keep straight. The big challenge would be keeping track of the evolving pango designation. I have tested this out on a dataset, and it did pretty well. It missed one or two pango_lineages, but looks like a lag on the pango side. I will let Cornelius and the team know about this development. Thanks again @al-obrien for developing this. Will certainly reach out with suggestions to improve the package. Great work!!
Thanks @BryanTegomoh for doing some tests; I’m glad you like it and it has worked outside the ivory tower. Happy to hear any thoughts on making the package better for the community.