JavaScript in Plain English

New JavaScript and Web Development content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

Can Doctor.ai understand German, Chinese and Japanese? GPT-3 Answers: Ja, 一点点 and できます!

Sixing Huang
JavaScript in Plain English
7 min readFeb 23, 2022

--

Photo by Jan Antonin Kolar on Unsplash

In my previous article GPT-3 for Doctor.ai, I have shown that GPT-3 enabled us to navigate a Neo4j knowledge graph using just English. Under the hood, it converted English questions into Neo4j’s Cypher queries and we used those queries to get answers from the database. But GPT-3 can do so much more than that. Let’s try to make it into a German, a Chinese, and a Japanese chatbot!

Figure 1. GPT-3 makes Doctor.ai multilingual. Image by the author.

You need a GPT-3 account, the Doctor.ai Neo4j backend, and the React frontend for this demo. You can set them up by following the instructions in Sections 1 and 2 from my previous article. Here I will demonstrate the process for the German language, the language of Goethe and Wagner. You can change the main language of Doctor.ai by setting the REACT_APP_LANGUAGE environment variable in Amplify. My code currently supports German, Chinese, and Japanese, mainly because I can judge their output qualities. You can find the source code in my Github repository here:

OpenAI offers an initial credit of $18. That should be more than enough for this project. Please still keep an eye on your “Free trial usage” progress bar because the generation of a large number of tokens can increase your usage quickly.

1. The architecture

To generate Cypher queries from German texts, I can think of two possible routes. The first one is to give GPT-3 enough “German question:Cypher” pairs to train its German-to-Cypher ability. In the second route, we translate the German questions into English with GPT-3, and then use GPT-3 the second time to generate the Cypher queries from these English translations together with our previous “English question:Cypher” training pairs (these pairs were defined in the previous article).

The first route has both advantages and disadvantages. On the one hand, the direct German-to-Cypher conversion does not need English translation explicitly. Less code means fewer bugs. But on the other hand, it means that we need to prepare a new training data set for every new language. That can become a problem for some rare languages. It reduces the code in the JavaScript app but requires more training data. And developers must have high language skills to pull this off. The reward is the higher language accuracy of the chatbot.

I chose the second route. Yes, it has an extra translation layer. But thanks to GPT-3, the code addition is minimal. This method also makes it easy to add other new languages to the mix. This project is built on top of our last one. It simply adds a GPT-3 translation function to the frontend. Then the English text will be used to generate a Cypher query as described in Section 3 of my previous article. The same GPT-3 translation function finally translates the answer from the knowledge graph back into the target language (Figure 2).

Figure 2. The architecture of the multilingual Doctor.ai. Image by the author.

2. Translation with GPT-3

The JavaScript code for the translation function can be taken directly from the GPT-3 Playground. The instruction variable carries values like “Translate this German to English”. It instructs GPT-3 to translate the raw_text to the target language. I wrapped it into a React function component for code reuse.

Code 1. The callTranslate function.

3. The frontend

The frontend code is quite similar to that of the previous project. Here I need to call the callTranslate function from the previous section to do the bilingual translations. Then I append the English question to the training data and send it to GPT-3. The result should be a Cypher query. That query will be used to fetch answers from Doctor.ai’s knowledge graph. Finally, callTranslate translates the English answer back into the target language.

Figure 3. Excerpts from Doctor.ai’s frontend. Image by the author.

4. Test the German Doctor.ai

Set up the Amplify frontend by following the steps described in my previous article. Add one more environment parameter REACT_APP_LANGUAGE with values such as “German”, “Chinese” or “Japanese” during the Amplify setup.

Now let’s ask Doctor.ai some German questions. I asked Doctor.ai what is Christianson syndrome and what is the pathogen of COVID-19. Notice that I have made a typo: “Krankkeitserreger” should have been “Krankheitserreger”. But GPT-3 still got it. Doctor.ai fetched the results from KEGG and translated the texts back into German for me.

Figure 4. German conversation in Doctor.ai. Image by the author.

The German answer looks quite OK, although not perfect. For example, the gender of “Syndrom” is neutral and the translation should have been “ist ein seltenes, X-chromosomales … Syndrom”, while the word “Phänotyp” is masculine and the sentence should have been “Der klinische Phänotyp”. But let’s admit that the German noun genders are hard even for experienced speakers. And fortunately, mistakes in noun genders rarely cause misunderstandings.

Now let’s speak Chinese:

Figure 5. Chinese conversation in Doctor.ai. Image by the author.

Here I asked what is the pathogen for cowpox and what is the function of a gene called PCBD2. The answers from Doctor.ai/GPT-3 were not good. First, it asserted wrongfully that a nonexistent virus called 小儿麻疹病毒 (“infantile measles virus”) is the causal pathogen of cowpox. The correct answer should be 牛痘病毒 (Cowpox virus). It also erroneously translated 牛痘 (cowpox) into “smallpox” under the hood. The functional description of PCBD2 was hardly understandable.

Finally, I asked Doctor.ai two easy Japanese questions.

Figure 6. Japanese conversation in Doctor.ai. Image by the author.

I first asked what is Christianson syndrome (Christianson syndromeは何ですか?). The answer from GPT-3 is quite short. After some debugging, it is clear to me that the English answer was OK but then GPT-3 had difficulty in translating it back into Japanese. It hit the max_tokens limit prematurely for some unknown reason. The second question is about the side effects of Indinavir (Indinavirの副作用は何ですか?). The answer was a list of nouns and the translation was OK. But there is room for improvement. For example, creatinine could have been translated as クレアチニン.

Conclusion

During my experiments, it occurred to me that Doctor.ai often presented me with empty answers. Because Doctor.ai’s knowledge graph is built around an English-controlled vocabulary, if medical terms from other languages were translated incorrectly and did not match their English counterparts, the Cypher query failed. This is why I used English medical terms in the Japanese and Chinese conversations above to avoid translation errors. I have also found out that GPT-3 sometimes did improvisations. For example, it gave me disease descriptions outside my database when I instructed it to translate “SARS-CoV-2” or “Christianson syndrome” into German (Figure 7).

Figure 7. GPT-3 improvises. Image by the author.

GPT-3 has made it quite simple to implement new languages. But be aware that its language skill varies among languages. The German conversation was good, while the Chinese one was a disaster. It can tolerate typos in German. The noun gender issue in the generated German texts is just a minor one. Users should still be able to understand the answers. But the Chinese answers were either factually incorrect or nonsensical. Finally, as you can see in its Japanese conversation, GPT-3 sometimes behaved strangely. We need to test Doctor.ai further to see how we can improve it. I also encourage you to take my code and do experiments with other languages. And please come back to tell me your results.

Licenses

Hetionet is released as CC0. STRING is freely available under a ‘Creative Commons BY 4.0’ license, while academic users may freely use the KEGG website.

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Join our community Discord.

--

--

Published in JavaScript in Plain English

New JavaScript and Web Development content every day. Follow to join our 3.5M+ monthly readers.

Written by Sixing Huang

A Neo4j Ninja, German bioinformatician in Gemini Data. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.

Write a response