Genetic hacking: genealogical databases can be exploited
Over 6 million people in the USA have given their genetic data to private companies offering ancestry tests. Many of consumers received raw results and uploaded data to public genealogical databases. Most popular database, called GEDmatch and MyHeritage, were used by 4.2 million individuals. Such widespread sharing of sensitive, genetic data increases the risk of privacy breaches.
In a new preprint (non-peer-reviewed publication), scientists from UC Davis demonstrated vulnerabilities of current solutions. They proposed and tested three major types of attack:
- Tiling – gathering data on many people by uploading publicly available genotypes,
- Probing – looking for people who carry specific genetic variants (i.e. disease risk),
- Baiting – getting random data with artificially prepared genotype without homozygosity.
All examples rely on a person uploading real/artificial data to genealogical databases (input) and receiving data about people that should not be visible to an attacker (output). Such hacking exploits scientific practices (easily accessible data), user experience of specific websites (easily providing private data), and algorithms matching DNA (phase-unaware method).
The authors proposed a set of strategies preventing those attacks. Among them: modifications of DNA matching algorithms, introduction of cryptographic signatures, rejecting genotypes with highly unlikely features, and filtering out uploads of publicly available data.
Gram Coop, one of the authors, concludes:
The good news is that these attacks are easily preventable. Some of the databases likely already have measures in place, that block these approaches. We gave the companies 90 days notice, and we hope they will publicly clarify how they’re responding to these vulnerabilities.Twitter
Preprint: Michael D. Edge, Graham Coop (2019). Attacks on genetic privacy via uploads to genealogical databases. Doi:10.1101/798272.