Knowledge Graphs — A Book Review

A primer on actioning KG, decisioning KG, digital twin, and more by Barrasa et al.

Sixing Huang
5 min readJun 19, 2023

Knowledge graphs (KG) have become one of the buzzwords in recent years. At GraphSummit Singapore 2023, I met both Jesús Barrasa and Jim Webber. They are prominent advocates of KGs and have co-authored a book called Knowledge Graphs. Then Hubert Ng from Neo4j gave me a copy as a gift. It was such a page-turner that I finished it in one day.

Figure 1. Knowledge Graphs by Jesús Barrasa, Amy E. Hodler, Jim Webber

Within merely 79 pages, the authors have covered the key aspects of KG, including the definition, construction, types, and its roles in contextual AI and business digital twins. Both technical and non-technical readers can enjoy the book because it focuses on the core concepts and spares us the programming details. In this article, I would like to share some of my learnings.

1. Taxonomy and ontology

Some of us may have already got some hands-on experience with KGs. Many times I simply converted the siloed data “as-is” from third-party databases into property graphs. For example, in my article Neo4j for Antibiotic Resistance, I downloaded and imported the organisms, their antibiotic genes, and the resisted antibiotics into Neo4j. This kind of graphication alone can reveal many relations in the data, and it lets the users discover new insights visually (1).

In Knowledge Graphs, the authors called these simple property graph models “richer graph models”. As the authors pointed out, we can make them even smarter by adding taxonomy to the mix. That is, we can organize some of the nodes in a broader-narrower hierarchy. The Linnaean taxonomy in biology is a good example. There are taxonomies in commerce, too. For example, we say that the Apple EarPods belong to “wireless earphones”, which in turn is a subcategory of “earphones”, which is under the category of “audio”. Taxonomies are often organized visually as trees.

More often than not, the raw data from the public databases do not contain any taxonomy. As a result, we need to add the taxonomy during the post-processing. We can either use a standard taxonomy, like the Linnaean taxonomy for organisms. Or we can develop our own taxonomy that meets our needs.

Figure 2. Adding a Linnaean taxonomy into a genome graph. From “Analyzing Genomes in a Graph Database”. Image by author.

Once the taxonomy is in place, we can do some amazing things with the graph. For example, if our Apple EarPods are out of stock, we can suggest the next best choice from the same category to an eager customer. In my line of work in bioinformatics, taxonomy allows us to perform core- and pan-genome analyses. It reveals the similarities and differences between the target genome and its close relatives. The results can tell us how an organism has adapted to its environment and how it has evolved from its early ancestor (1).

In fact, not only taxonomy but also other forms of ontologies can add context and value to a graph (Page 20). And we can even superimpose multiple layers of ontologies onto the same dataset. For example, a bacterium can be classified based on its phylogeny or on its habitat. But developing and maintaining custom ontologies are tedious. And because of their custom nature, it is hard to share and merge them among developers. We can address these issues by applying a standard ontology. In biomedicine, we can use BioCypher to generate Biolink ontologies for our KGs (1).

2. Actioning and decisioning KGs

In Knowledge Graphs, the authors have stated two types of KGs: actioning and decisioning. A visual comparison of the two can be found in this article by Maya Natarajan from Neo4j. In short, the actioning ones are for data management and the decisioning ones are for analytics.

Figure 3. Actioning vs decisioning knowledge graphs. Image by Maya Natarajan from https://neo4j.com/blog/from-graph-to-knowledge-graph-actioning-vs-decisioning-knowledge-graphs/

Data-centric institutions can use actioning KGs to achieve their data lineage, data catalog, and data governance goals (1). They are important in the era of ESG (Environmental, social, and corporate governance), especially for banks, drug companies, and research institutes. For this purpose, institutions are building solutions with graphs and graph visualizations. These solutions can help non-technical users to better manage data. For example, UBS has built a platform called Group Data Dictionary (GDD) with Neo4j. It is a data lineage tool as well as a data governance tool.

On the other front, decisioning KGs are used in analytics, machine learning, or data science. For instance, I used a decisioning KG to identify cellulose-degrading bacteria. And in this era of LLM, decisioning KGs have served as integral information sources in the retrieval-augmented generation. Examples are Doctor.ai (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13) and Tomaz Bratanic’s excellent NaLLM project (1, 2, 3). It will be interesting to build a project with both actioning and decisioning KGs. It is called the data fabric. The actioning KG tracks the data that go into the decisioning KG. This architecture will without doubt increase the AI explainability of the decisioning KG.

3. Digital twin

In its second to last chapter, the book mentioned the role of KG in digital twins. In essence, a computer model is an abstract and simplified version of the reality. And a digital twin is a particular computer model that simulates a real-world artifact or process. Just like an object in a video game, we can perform many kinds of actions on it and observe its reaction. In addition, we can update the digital twin with new data as they become available.

Figure 4. Hakodate Line between Hakodate and Oshamambe. Memgraph superimposed the station nodes on top of a map. It shows how the stations are connected in March 2023. Image by author.

KGs can act as digital twins. A simple example is a train station KG (1). Augmented with Google Earth Engine data, this KG allows us to plan maintenance, position new stations, and formalize marketing strategies. In bioinformatics, it is possible to combine flux balance analysis (FBA) and KG to simulate the metabolic network in a cell. We can use this digital twin to perform many different kinds of virtual experiments, such as gene knockout and medium optimization.

Conclusion

Readers’ time is precious. And the authors know that. So Knowledge Graphs is a short book. A read-through can clarify many of the new buzzwords and give us a good overview of the tech. But it deserves to be re-read again and again. This book is like a knowledge graph, connecting many dots around this exciting technology together. Perhaps the authors can indeed later publish an actual knowledge graph to summarize the content of this book.

But reading is no substitution for doing. The true mastery comes with repeated practice. Luckily, it is now quite easy to build your own KGs either for work or for leisure. There are many practical articles on Medium which can guide you through the building and evaluating of KGs. So please try some today!

--

--

Sixing Huang
Sixing Huang

Written by Sixing Huang

A Neo4j Ninja, German bioinformatician in Gemini Data. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.