Carp in the Soil

Ridiculous sequencing results revealed how errors propagated from one research study to a global database

Garbage in, garbage out. But first you need to know what garbage looks like.

Figure 1. Carp in the soil. https://en.wikipedia.org/wiki/File:Cyprinus_carpio.jpeg

1. Collect the carp reads

List 1. Some spurious sequences from one of our sequencing project in DSMZ. The first couple of base pairs have been removed to show the poly-G tails.
Figure 2. The BLAST results of one of the spurious carp sequence. Image by author.
Figure 3. Fastp report on one of Selma’s sample. Image by author.

2. The “carp” were primer-adatper sequences

Figure 4. Sequences with poly-G tails contain primer and adapter sequences. Image by author. The sequences were designed by Bartram et al.

3. The scope of the contamination

4. How the sequences masqueraded as the carp into NCBI

5. How to get rid of the artefacts

Conclusion

--

--

A Neo4j Ninja, German bioinformatician. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sixing Huang

A Neo4j Ninja, German bioinformatician. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.