Kanjigen, a comparative “character etymology” tool

Kanjigen is a convenience tool to compare and contrast several historical analyses of Chinese characters (hànzì or kanji, as used in Chinese and Japanese writing). It’s currently biased for English speakers learning foreign languages—the specialist will surely prefer his carefully-selected pile of native dictionaries and technical references. A secret goal of the author is that, by contrasting different theories side-to-side, Kanjigen will promote a bit of skepticism and caution in students.

Instructions: Short version

Type or paste a Chinese character. Press the “Search character” button (or the Enter key, in some browsers). Extra windows or tabs will open, showing analyses from several external resources. Compare them.

This tool needs support for HTML 5 and Javascript. It’s only tested in Mozilla Firefox. If you use Chrome or Opera, you’ll need to disable the popup blocker.

Instructions: Longer version

You can input several characters, but only the leftmost will be queried. Having extra space is useful for several reasons; it makes character input (IME) more convenient, and the text box can act as a small storage area to paste or type a bunch of characters, and later look at them one at a time.

Currently supported resources

Kanjigen doesn’t store or reproduce data from these sites; it only sends them an user-made query, to save the reader the trouble of typing the same character several times. If you’re the owner of one of these sites and want Kanjigen to stop using it, just talk to the author.

For reviews and discussion of each of the resources, see this post. The reader should be aware that Howell/Morimoto’s method is considered nonmainstream, and that the Japanese wiktionary sometimes quote Shizuka Shirakawa, whose unorthodox method is controversial.


There are buttons to convert to and from different forms:

Chinese character conversion is not always trivial. For example, there are cases where a simplified character stand for several traditional ones (e.g. Japanese 弁 was substituted for all of 辨、瓣、辯). Kanjigen doesn’t try to decide which is the correct conversion; when there are multiple alternatives, it just leaves the character alone. It also doesn’t touch non-official Japanese simplifications (拡張新字体 kakuchō shinjitai, 朝日文字 Asahi moji and the like). Other than simplification, it’s ignorant of the plethora of relationships between characters (stylistic and semantic variants, handwritten abbreviations, kana as cursive kanji, &c.)

By default, Kanjigen will attempt to automatically convert the query to the form most likely to have entries in each of the resources it supports. To query for the character as-is, uncheck the relevant option.

The Chinese conversions are based on data from the Unicode Unihan database, while the Japanese data was extracted directly from the Jôyô Kanji PDF table (and is available for reuse here).

Indexes and convenience links

After a character is queried, Kanjigen will show indexes for some paper dictionaries that include “etymological” information. These indexes come from Jim Breen’s KANJIDIC database and from Unihan, but they can be incomplete.

Kanjigen does not want to be a complete character dictionary. For convenience, links are provided to external databases (currently Unihan and Jisho.org) in which the reader can find the usual data such as stroke order, meanings, example words &c. Where applicable, a link is also provided for the relevant page in the online edition of the 18th-century Kāngxī dictionary.

Linking to Kanjigen

Add to the kanjigen URL (https://namakajiri.net/kanjigen) a hash character (#) followed by some Chinese characters, and the latter will come pre-loaded in the search box. What’s more, the indexes and links will be pre-filled with data for the first character. Extra windows won’t open automatically (popup blockers hate them); but the reader just needs to click the search button to see the external dictionaries.

On “kanji etymology”, “hànzì etymology”, “Chinese character etymology”

Chinese characters have structure; most of them are built from simpler characters, or from a few recurring building blocks. Characters usually stand for morphemes, and often (about 85–90% of characters), one part is a “phonetic” component that suggests its pronunciation, while the other is a “semantic” suggesting its meaning. To borrow a metaphor from Eve Kusher, Chinese characters are “multimedia tracks”.

Such structure is often ignored in modern foreign-language teaching, for various reasons. For one thing, historical language change has drifted both sound and meaning, making the hints ever more imprecise. In the case of Japanese, the characters predates the assignment of “translation readings” kun-yomi, so the phonetic hints are entirely unrelated to them (they’re still useful to predict some “sound readings” on-yomi, though less so than they were for the original Old Chinese readings). And there’s another reason; being rich in symbolism, the characters lend themselves to fanciful theories. It’s easy to look at “east” 東 and imagine a sun 日 behind a tree 木 (and even natives often think of the character in this way); only a historical analysis of ancient glyphs and forgotten readings will show that it was originally a picture of a wrapped package, borrowed to mean “east” because the words sounded alike. For the learner, a made-up story can be legitimately more useful as a mnemonic device.

Because sourced, scholarly explanations of the structure often need to refer to diachronic data (i.e. to the languages and character-shapes as they were in the past, not necessarily as they’re perceived now), they’re often called “character etymology”, “kanji etymology” and so on. But notice that etymology proper deals with words, not with characters, and characters do not explain words or ideas. The history of a character does not necessarily reflect the history of a word, and the composition of characters is not in itself an explanation of the meaning of a word. I use the word “etymology” (in scare-quotes) because it has became common, and in order to help interested people find this tool in search engines; but I agree with prof. Victor Mair that the reader should make sure to distinguish the history of graphical characters from actual linguistic etymology, the history of words.