The world is moving towards the cloud computing fast. This is because the cloud is very easy, cheap, accessible and secure. Cloud providers such as Amazon Web Service (AWS) take over many repetitive tedius IT maintenance tasks for their customers. As a result, cloud users can focus on their own…

A fun project to learn graph database

Tired of all those “JOIN” in SQL? Did you have a headache every time you need to modify a schema in a relational database? If either answer is “Yes”, then you should give graph database such as Neo4j a try.

A graph database stores information as nodes and edges. Nodes…

An alternative view of the CARD database

The introduction of antibiotics was a milestone in our public health history. They are medicines used to prevent and treat bacterial infections such as pneumonia and tuberculosis. Antibiotics have saved literally millions of lives.

However, their overuse and misuse have led to the emergence of antibiotic resistant bacteria. These bacteria…

How to build a metagenomic binning pipeline on AWS (Part 2)

A metagenome is the sum of all single genomes in a habitat, be they viral, bacterial, or eukaryotic (see my introduction here). Currently, due to technical limitations, biologists have to shred all these single genomes and sequence the fragments. DNA fragments from the same organism should have similar DNA compositions…

How to build a metagenomic binning pipeline on AWS (Part 1)

Bioinformatics is leaping into the cloud

In the webinar “Scaling genomics workloads using HPC on AWS” on July 14, 2021, I learned that the heavyweights such as AstraZeneca and Illumina have already moved their genome analyses into the AWS cloud and have been reaping the great benefits ever since. The cloud reduced both the runtime and…

Three museums in Yokohama, Stavanger and Berlin taught me something unexpected

Everyone loves a good museum visit. It is an intensive learning session in our free time. Although it is the objects themselves that do all the talking, but let’s not overlook the contributions of the museum curators. They carefully select exhibits to educate and entertain the visitors. It is a…

Ridiculous sequencing results revealed how errors propagated from one research study to a global database

Garbage in, garbage out. But first you need to know what garbage looks like.

Figure 1. Carp in the soil.

Last year, when we were working at a publication about three Cyanobacteria, my colleague Pia Marter told me that the our three metagenome-assembled genomes (MAG) contain some DNA fragments from Cyprinus carpio (common carp). My first…

Build an analytic pipeline with ElasticBLAST, SNS, and DataBrew on AWS

Photo by tian kuan on Unsplash

Bioinformatic programs come and go, but BLAST stays.

BLAST, short for a Basic local alignment search tool, is the search engine for bioinformaticians. While Google takes text strings as queries and returns relevant web pages, BLAST accepts DNA or protein sequences as queries and returns similar sequences from the databases…

Combine Maps, OR-Tools, SendGrid and Cloud Functions to commandeer a delivery fleet

This article shows how to:

1. Set up a Cloud Storage in GCP that triggers a Cloud Function when a file is uploaded;

2. Set up a Cloud Function that calculates the optimal routing strategies with Google Maps and Google OR-Tools;

3. Send instruction emails to the carriers via SendGrid;

Gene cluster finding, annotation curation and seqeunce management all in one

If the 21st century is the Age of Biology (1), then genome sequencing is the harbinger. Genome sequencing basically turns DNA molecules into texts in computers. DNA sequences are stored in simple ASCII text files such as Fasta and Fastq. Biologists then run programs over them to discover proteins (open…

Sixing Huang

Certified Neo4j Professional, German bioinformatician in BGI Shenzhen. I want to learn more about Cloud, machine learning, Japanese and to travel the world.

