Big Data Tamps Down HIV Outbreaks

Big Data Tamps Down HIV Outbreaks

Photo-illustration: Getty Images


One of the best ways to prevent the spread of HIV is to treat those at high risk with a daily prophylactic pill. Unfortunately, this week Stanford University health researchers concluded that it’s simply too expensive to pre-treat even a fraction of people at increased risk for HIV.

But what if healthcare providers could track a brewing outbreak in real-time, and quickly help those at highest risk of infection? Thanks to big data and crackerjack new software, Canada’s westernmost province is doing just that.

In June 2014, a monitoring system operated by the British Columbia Centre for Excellence in HIV/AIDS (BC-CfE) detected a cluster of 11 new HIV cases in a town just outside Vancouver. The system, designed by bioinformatician Art Poon, analyzes massive amounts of HIV genetic data to detect outbreaks.

Such data is surprisingly easy to come by. In many developed countries, it is now routine for a doctor to sequence viral DNA from the blood of a HIV-positive patient. By doing so, the physician can identify which drugs, if any, the virus is resistant to and prescribe an optimal treatment.

In Canada, that DNA sequence data is regularly uploaded to BC-CfE’s secure Oracle database, home to 30,000+ anonymized HIV genotypes. Every time new sequences are added—which happens almost every day—it triggers the entire database to be downloaded to a secure workstation, where Poon’s software works its magic. During the download, all patient information is de-identified. “The system is designed to maintain patient privacy,” says Poon.

Once the download is complete, the software analyzes the de-identified DNA and demographic information to determine where new infections have popped up, if they carry drug-resistance mutations, and how they are related. HIV evolves very quickly, so if sequences from different infections are genetically similar, those infections are almost surely related by one or more recent transmissions.



Image: Art Poon
A network diagram of HIV cases, represented by circles. Related cases are linked by lines. Red circles indicate drug-resistant infections.


The software—which combines numerous open source software packages including Graphviz, LaTeX, and Jinja2—then produces a report with a diagram of an evolutionary tree: Each HIV infection is a branch, and clusters of related infections form bushy extensions. These clusters are visualized as network diagrams, from which health officials in British Columbia can identify hotspots of HIV transmission, especially areas where the amount of virus in patients’ blood is very high (which increases risk of transmission) and where drug-resistant strains are spreading. “We look for clusters of short branches in this tree that represent groups in the population where the virus is moving rapidly,” says Poon. “That becomes a significant clinical and public health problem.”

In June 2014, 8 of the 11 newly detected infections carried a mutation that made them resistant to a common type of HIV drugs. Public health officials accessed the pateints’ information at that point and notified the clinic where they were being treated. Then, local personnel conducted outreach to ensure the affected individuals had access to treatment to reduce their viral load and to offer partner notification and referral services.

It worked. The outbreak slowed, with only one new case by the end of 2014.

The system has been in operation for two years. In addition to daily reports, it produces a formal monthly report that is distributed to public health agencies across British Columbia, who meet via teleconference each month to discuss new clusters and make HIV prevention decisions.

“The concept is providing treatment as early as possible to individuals to reduce the chance of onward transmission,” says Poon. “Our objective is to get to zero incidence of HIV.”

Health agencies in the U.S. have expressed interest in the system, and Poon has already shared his code with the Centers for Disease Control in Atlanta. As routine genotyping for hepatitis C infection becomes more common, he is planning to use big data to track outbreaks of that virus too.


[Spectrum IEEE]

April 28, 2016 / by / in , , , , , , , , , ,

Leave a Reply

Show Buttons
Hide Buttons

IMPORTANT MESSAGE: is a website owned and operated by Scooblr, Inc. By accessing this website and any pages thereof, you agree to be bound by the Terms of Use and Privacy Policy, as amended from time to time. Scooblr, Inc. does not verify or assure that information provided by any company offering services is accurate or complete or that the valuation is appropriate. Neither Scooblr nor any of its directors, officers, employees, representatives, affiliates or agents shall have any liability whatsoever arising, for any error or incompleteness of fact or opinion in, or lack of care in the preparation or publication, of the materials posted on this website. Scooblr does not give advice, provide analysis or recommendations regarding any offering, service posted on the website. The information on this website does not constitute an offer of, or the solicitation of an offer to buy or subscribe for, any services to any person in any jurisdiction to whom or in which such offer or solicitation is unlawful.