Multiplicity in writing systems: terminological troubles

Consider the alphabetic principle: each letter represents one sound—or, to get technical, each grapheme (a symbol of writing) represents one phoneme (a linguistic sound).
So there’s a nice 1-to-1 mapping. But sometimes the mapping can get multiple, both in the direction of writing (1 sound is represented by N letters)…
…and in the direction of reading (1 letter represents N sounds):
Cases like ‹th›→/θ/ or ‹ng›→/ŋ/ are called digraphs. This is a widespread term, so I’d like to use it in my thesis. We can generalize, when N>2, to plurigraphs. And cases like ‹x›→/ks/ (when from the rule of the system we’d expect a grapheme to represent one sound, but this particular one represents two) could, by the same token, be called pluriphones. (Continue reading for why I decided against “polygraphs” and “polyphones”). I believe that Japanese kanji are basically morphographic; that is, I believe that, as a general, structural principle of the system, they represent morphemes, just like English letters stand for phonemes:
han-men“half-face”han “half”men “face”
han-men “opposite-face”han “opposite”men “face”
han-sei “half-life” han “half” sei “life”
han-sei “reflection, regret”han “opposite” sei “examination”
But there’s a lot of deviation from this “morphographic principle”, just like English has a lot of deviation from the alphabetic principle. There’s a kind of plurigraphy, where it takes many characters to represent a single, unitary morpheme:
大人 otona “adult”
煙草 tabako “tobacco”
五月雨 samidare “early summer rain”
In Japanese this is called jukujikun “multiple-character kun reading”. (There are a number of complications here. First, the words written in kanji could be read regularly within the morphographic principle, resulting in daijin “big-person”, kemurigusa “smoke-herb”, and gogatsu-amesatsuki-ame “May-rain”. It’s as if we reached the represented word through another, hidden word, semantically related to the target word: 煙草 ­→ [kemurigusa]tabako (though I don’t think this full route is followed by literate readers, who probably acquire a direct mapping). The second complication is that these words I’m presenting as unitary morphemes may actually be parsed as several morphemes, if the reader knows her etymology; samidare, for example, is sa-mi-dare “May-water-dripping”; but I posit that these words are opaque to most people, and, at any rate, the etymological morphemes of the word, sa-mi-dare =“May-water-dripping”, don’t map to the morphograms, ‹五月雨› = “five-month-rain”.) There’s also an analogue to pluriphony, but with morphemes:
kami-nari “god-cry” = “thunder”
kuchi-biru “mouth-border” = “lips”
I’m not sure what to call these; I’m reluctantly resorting to plurimorphemy—having many morphemes. Summing up the forms of multiplicity we’ve seen so far:
When reading:
N graphemes → 1 object
When writing:
N objects → 1 grapheme
So far so good. However, writing systems also have multiplicity of another kind entirely. Consider again the word “exit”, and compare the sound represented by the letter ‹e› in the following:
exit› /ɛksɪt/
‹brother› /bɹʌðəɹ/
We see that, depending on the word, the letter ‹e› may be pronounced either as /ɛ/ (“short E”), /iː/ (“long E”), or /ə/ (“schwa”). (There are also other cases, like “pale”, where the ‹e› seems to be used in a non-phonographical way – it doesn’t represent a phoneme, but rather acts as a hint to select the “long A” pronounciation of ‹a› (cf. “pal”)). This is different from pluriphony in the sense above. The ‹e› doesn’t represent several sounds sequentially, as ‹x› did represent /ks/; rather, it may represent one of several sounds in potential – but, in each actual use, only one of the possible values is selected (linguistics buffs: we’re talking about paradigmatic multiplicity vs. syntagmatic multiplicity). What do we call this? Boltz (1994) calls it “poliphony”, as do many others, on analogy with the common word “polysemy”. However, Rogers (2004) uses the word “poliphony” to describe our “pluriphony”, on analogy with “polygraphy” derived from “digraph”. I think I’m sidestepping the whole brouhaha, and going with polyvalent: literally, having multiple values. The word in English has confusing associations with the concept of chemical valency, but it has an older sense of “having multiple meanings”, that is, multiple possible (paradigmatic) meanings, so that it applies nicely. I don’t think I see it being used a lot in English (where it seems to be in competition with ‘multivalent’); but my thesis is in Portuguese, and polivalente isn’t all that rare for us, in the sense I want. Boltz has used “multivalent” in this sense, and M.O. Connor used “polyvalent” like this, too, when discussing cuneiform writing. I can use the same word to describe the analogous property of kanji:
ame “rain”
u “rain”
This is of course not an irregularity but a systematic property of Japanese writing, standing in stark contrast with its Chinese source: in Japanese, most morphographs are polyvalent. Boltz says polyvalence of this kind was quite common in ancient Chinese, and he call it “polysemy” – the word usually means “a word with many possible meanings”, but this extends quite naturally to cover morphographical polyvalence. All we’ve seen of polyvalence so far was in the direction of reading: we want to read a letter or kanji, and it has several possible values, and we have to pick one. But there’s polyvalence when writing, too: if you want to write an [iː] sound in English, you’ll have to choose between ‹e› (“me”), ‹ee› (“bee”), ‹ea› (“each”), ‹ie› (“field”) etc. And, in Japanese, the morpheme za “to sit” can be written morphographically as 坐 or 座, ichi “one” as any of ‹1, 一, 弌, 壱›. By now I’ve run out of creativity, so I think I’ll just name the two cases as “reading polyvalence” – or, more informally, “multiple readings” – and “writing polyvalence”, or “multiple orthography”. Polyvalence is why I didn’t call plurimorphemic graphs simply “plurimorphic”; “plurimorphism” sounds like a synonym for “polymorphism”, which in biological usage means “having many forms in potentia”. “Plurimorphemy” is ugly as hell, but at least it suggests “morphemes”, rather than “forms”. This leaves my current scheme like this:
SequentiallyWhen reading:
N graphemes → 1 object
When writing:
N objects → 1 grapheme
plurimorphemy (雷→kami-nari)
In potentiaWhen reading: reading polyvalence, phonological; multiple readings
(‹e› →{ [iː]| [ɛ]| [ə]}…)
reading polyvalence, morphological; multiple readings
When writing: writing polyvalence, phonological; multiple orthographies ([iː] → {‹e›|‹ee›|‹ea›|‹ie›…}) writing polyvalence, morphological; multiple orthographies
(ichi “one” → (1 | 一 | 弌 | 壱 …})
I’m not at all satisfied with how clumsy all of this is sounds, but I have to distinguish them, and I have to call them something. And we’ve not even began scratching the more interesting complications of Japanese writing: the way writing polyvalence tends to specialize in distinguishing nuances, for example (the tsukai-wake), which is a kind of sub-morphography, a fine-tuning of the morphographical principle at the semantic level; or the way its reading polyvalence, historically derived from translator’s glosses, ended up tying morphemes together in mental clusters, ame/ama-/-same | u all connected through ‹雨›, and this generic mental object RAIN partaking mutely in tsuyu, samidare, shigure, so that the writing even resembles a kind of… dare I say it?… [Edit history: v. 2: Changed the prefix for in sequentia multiplicity from poly- to pluri-, following flow’s suggestion in the comments below.]

  1. Thanks to my friend Henrique Moraes for corrections. He also tells me that the usual pronounciation of ‹x› in “exit” is [gz~kz]. I got /ɛksɪt/ from en.wiktionary; since this doesn’t change the argument, I think I’ll pull a Chomsky and just consider the voicing to be a phonetic-level realization of an underlying /ks/.

  2. Thanks for this thoughtful and clarifying post!

    “or the way its reading polyvalence, historically derived from translator’s glosses, ended up tying morphemes together in mental clusters, ame/ama-/-same | u all connected through ‹雨›, and this generic mental object RAIN partaking mutely in tsuyu, samidare, shigure, so that the writing sometimes even resemble a kind of…”


    This very thing, that there may be a mental object RAIN that connects all those various dots and may surface in writing as a singular 雨 (or indeed, in other words in a number of different kanji) is what I feel makes Japanese writing closest to being ‘ideographic’ in the modern world (even when it turns there’s no ‘true’ ideographic writing system possible).

    BTW the terminology is indeed vexing. May I say that while e.g. ‘polygraphy’ totally makes sense in the way you discuss it, to me it stops being a very good term when I look at the table and see 大人 ‘otona’ described as ‘polygraphic’. It just doesn’t work very well… maybe something using ‘pluri-‘? ‘plurigraphic’?

    Coming to think of it, the table lacks all the ‘mono-‘ aspects. And maybe even, if you take the 形音義 schema seriously, an entire dimension.

    Also, Japanese kana are a sort of phonographic writing, even if not a maximally analytic one; as such, it also displays some polyvalency, polygraphy and so on when you just think of は, を, しゃ, づ, ず.

    As of today, there’s a closely related post to LanguageLog: “Kanji of the year 2015”

  3. Yes, that’s what I was joking about: the entire mess sometimes ends up resembling a limited form of that taboo word, “ideography”. But I’m not advocating that Japanese writing is ideographic, far from it; it does encode sound, it does represent sentences of a language; it’s just that, in this limited sense, the polyvalent way they use kanji exhibits some of the properties that, in the past, non-linguists had called “ideography”.

    The idea of calling 大人 a polygraph (or, in this case, a digraph) is like this. One letter usually represents a phoneme, but with ‹th›, you need two letters to represent a phoneme: that’s a digraph. One kanji usually represents a morpheme, but with ‹大⁠人›, you need two kanji to represent a morpheme: that’s a morphographical digraph. With phonographical digraphs, the multiplicity is often a relic of a historical relation: in this case, the fact that ancient [tʰ] lenitized to [θ]. Morphographical digraphs are also a trace of history: in this case, the fact that the Chinese bi-morphemic word 大人 was glossed as Japanese mono-morphemic otona. So I see lots of parallels, and to call them by the same word is one way of asserting the parallels.

    But I hadn’t thought of the prefix pluri-; perhaps it’s a good idea, to avoid the already existing confusion with “polyphony” and such… hm…

    And yeah, this entire thing is so far ignoring the fact that kanji, unlike Latin letters, are not unitary graphemes: those would be the elusive kanji components, which sometimes work as approximate phonograms, sometimes as general semantic hints, sometimes as both, and it’s quite hard to tell which component is playing which role in which case. This is an important part of how DeFrancis can claim that Chinese hànzì is, “more than anything else, phonographic”: he treats the phonographic sub-hànzì system as a syllabary. I tried to extend his methods to Japanese kanji and ran some statistical analyses; when my thesis is out (ETA: Aug/2016) I’ll report it here :)

    I agree with your evaluation of kana as a phonographic sub-system (or “script”) with the features you pointed. This is one reason why I think Japanese kanji ended up being less phonographic than hànzì: they specialized the roles. Of course, Japanese kanji also have some phonography at the component level, just less so than in Chinese. (I envy the English distinction between “writing” and “script”, by the way. In pt we only have escrita, so that I had to make do with “sub-system”).

    Thanks for the comment!

  4. “the entire mess sometimes ends up resembling a limited form of that taboo word, “ideography””—indeed. I do not expecially endorse or promote the (continued) use of ‘ideography’ and related terms; and yes, Japanese writing does fundamentally encode sound (even if there are words that can be legitimately read out in one of several ways without damaging the text, e.g. 紅葉 as こうよう or もみじ). We want to have a term that matches and sticks; ‘ideograph’ has stuck and matches somewhat; others, like ‘character’, have stuck but seem to match a lot; yet others, like ‘tetragraph’, have so far hardly stuck, and seem to be unnecessarily superficial (‘writing symbol that looks square’, wat?).

    In the same vein, I do understand why you introduced ‘polygraph’, and think it does match; it’s only when I had made may way to your overview table I realized that word hadn’t quite stuck.

    I’m still mulling over your treatment of the ‘e’ in words like ‘fate’ v ‘fat’; to me it was really helpful when I read about the ‘open’ and ‘closed’ syllables in English phonology many years ago. According to this, we may understand the ‘e’ in ‘fate’ as a visual indicator of an open syllable. As such, isn’t this ‘a(…)e’ just a digraph that happens to be written discontinuously? Akin to some of the weird pre-, post-, super-, sub- or even circum-vowels used in Indic abugidas? (Plus, I had first to sort out whether your example was English ‘(for the) sake (of)’ or Japanese ‘sake’…)

  5. That sounds like a good way of describing the English ‘e’ (and that discontinuous way of writing a digraph is just another piece of history encoded as spelling…). Though, if we say that it’s a “marker of a preceding open syllable/long vowel”, we can generalize the fact that it also occurs for e…e, i…e, o…e, u…e (met/mete; tin/tine; rot/rote; rub/rube). Or we can just call all of ’em digraphs.

    And I thought it was only we non-natives who misread ‘sake’ as Japanese :) I guess that, in this particular context, I should use an example that doesn’t coincide with a Japanese romanized word…

  6. Great post! I’ll be thinking about this some more, but I just wanted to add this:

    “it’s just that, in this limited sense, the polyvalent way they use kanji exhibits some of the properties that, in the past, non-linguists had called “ideography”.”

    In my opinion, this is precisely because the old Japanese understanding (and to a large extent the current understanding worldwide) of the Chinese writing system was based on the “ideographic misapprehension” or whatever you want to call it.

    In Chinese, it makes sense to insist that Chinese characters encode words or morphemes, not ideas. In Japanese, cracks start to appear in this idea once you have 犬 pronounced both “ken” and “inu”, both meaning “dog”. The argument that 犬 is better understood as encoding the words/morphemes ken and inu, both of which happen to mean “dog”, than the meaning “dog”, usually corresponding to the words/morpheme ken/inu is a fairly technical one and I think there are reasonable vantage points from which one can reject it.

    But it doesn’t actually matter which direction the writing system is “really” pointing in, because medieval/early modern Japanese people obviously believed the “meaning” thing, and this belief allowed the creation of forms like 煙草 for “tabako” or 五月蝿い for “urusai”.

    I wouldn’t argue that forms like this are central to the Japanese writing system or that it should be understood as fundamentally ideographic. But I think that the fact that it has been and is used that way sometimes is much more interesting even from a linguistic standpoint than is often acknowledged. (It sounds like your thesis will be arguing along these lines except with actual rigor and meaningful results! I’m very excited to read it!)

  7. @flow: “pluri-” has grown on me as an unambiguous and suggestive label for sequential multiplicity, and I’m using it in the current version of the text. If you want your real name credited in a footnote, just mail it to me at !

    @Matt: You flatter me :) My thesis is mostly a bibliographic review, and my arguments are mostly small collections of counter-examples. The only meaningful result I can take credit for is the statistical analysis of kanji component usage, and all I (think I can) show is that they’re a lot less phonographic than hanzi (unsurprisingly, since Japanese has kana to specialize the phonographic role, and systematic multiple readings for kanji).

    Your point that this is how the Japanese conceived kanji is a good one. I think that’s what Chad Hansen tried to argue for (in a sinological context) in his debate with Unger; but, with the rhetoric he used, they talked past each other. Again I cannot help but see more of the disciplinary biases that Lurie pointed out: as a philosopher, it must have been quite natural for Hansen to be more interested in how people conceived of characters, than in what they “really are” in a scientific sense. Then Unger reacted as if what Hansen described as “Chinese thought” were Hansen’s own proposal about what kanji really are.

    In related lines, I’m also interested in argumentum ad sinogramma: when people claim that something is such, because the Chinese character is made from the components such and such (for example, that martial arts are really paths to peace, because 武 is made from 止 “stop” a 戈 “spear”). This is the hanzi equivalent of the etymological fallacy, but culturally speaking it’s so interesting! It’s associated with the worst excesses of Orientalism, but I believe that the Orientalists learned this from the Orientals in the first place, because the latter just love doing it. A Japanese lit teacher of mine, for example, once argued that “greed” is a cold emotion, because, unlike most emotions, 欲望 is written without a “heart” 心 classifier; and therefore, when a character had changed from hate 憎悪 to greed, he had grown, well, “heartless”. This doesn’t even have to follow the actual historical analysis of the character: one tea ceremony teacher claims that tea 茶 < 荼 is the plant 艹 which connects 丨 people 人 to Heaven 天, as clearly represented in the kanji. The problem with this kind of fancy analysis is that, being easily dismissed, they obscure the fact that many of the components that DeFrancis calls "phonographic" actually work as both phonetic and semantic hints (think of 包 in 抱泡胞飽苞蚫 et cetera). DeFrancis himself recognizes this point in “Chinese Language: Fact and Fantasy”, but then lets it out of his "Chinese as phonographic" proposal, when he treats all phonetic components as a hidden syllabary. Tōdō Akiyasu, on the other hand, consider this kind of semantic-cum-rebus kanji, which he calls 会意形声, as essential to the writing system, and related it to ancient Chinese word families. Since we have no evidence of what was the rationale of the people who built the characters, and since it's so easy to come up with arbitrary semantic associations, we can't know for sure when the components were originally intended as phonological, semantic, or both; but I think there are enough clear cases to establish all three categories. And there are cases, like 東, where a semantic association has clearly superseded an original phonetic one; so again we have to distinguish scientific models from cultural concepts, and diachrony from synchrony, historical fact from cognitive processing of the current system.

  8. When it comes to Hansen and Unger talking past each other, their disciplinary stances are certainly a factor, as you and Lurie point out. But if Hansen really was interesting primarily in how the Chinese themselves conceptualized characters rather than how they actually work, as you surmise, then his argument is even weaker. The historical and textual evidence clearly demonstrates that the early Chinese never conceptualized Chinese characters as ideographic. As is clear from Han dynasty lexicographic sources, the Chinese always believed that three features inhered in Chinese characters: sound, form, and meaning. There has never been, so far as I know, any time in history in which a Chinese person has imagined that the character 好 could be used to write shàn ‘good’ or měi ‘fine’ or jiā ‘excellent’ or xíng ‘okay’ or indeed any words other than hǎo ‘good’ and hào ‘to love’. (I use modern Mandarin pronunciations here for convenience.) Take any set of synonyms or near-synonyms from any period in Chinese history (e.g. shuō ‘speak’, tán ‘talk’, jiǎng ‘talk’, tǎolùn ‘discuss’, liáo ‘chat’, etc.) and you will see that each of these distinct lexical manifestations of a single ‘idea’ must be written with different graphs, just as different English words must be. In Shuōwén Jiězì 說文解字, nearly every entry takes care to specify a pronunciation.

    In developing literacy (in Classical Chinese), the ancient Japanese, Koreans, and Vietnamese shared this recognition that characters had fixed meanings and pronunciations. So I have to disagree with Matt’s comment above about an old Japanese conceptualization of Chinese writing as ideographic. In learning to read Classical Chinese, young students were taught to associate with each an on (sound) and kun (meaning). (This is how modern foreign learners of Chinese memorize characters as well.) It is only because characters were believed to have a fixed on and kun that one could then make use of that on or kun to repurpose them to write a native word. After all, it would be pointless for me to make use of a character’s pronunciation to represent a native word if I thought my potential readers believed those characters were ideographic and had no fixed pronunciation.

    So I think even if one gives Hansen the benefit of the doubt——that he was trying to explain how Chinese people conceptualized the way their writing worked, not trying to explain how a linguist would analyze its actual working——he’s still wrong. He’s actually presenting his faulty Western view of how characters work under the guise of an authentic Chinese view. It’s a double mis-representation.

    I would argue, completely to the contrary, that the actual use of Chinese characters within the Chinese script has always been somewhat more flexible than the Chinese themselves imagined.

  9. I was reading Matsuo Basho’s haiku…one of them is this:

    日の道や 葵傾く 五月雨

    At first, I thought 五月 = gogatsu, which is half-true, but in this case, it’s poetic and Matsuo Basho lived a long time ago, it should be “satsuki”…

    Later, I found another one:

    五月雨に 鶴の足 短くなれり

    Now, I thought it was “satsuki ni”…it turns out to be “samidare ni”…so confusing…

  10. I would argue that English and Japanese writing systems fall on different points on the continuum between phonemic and morphemic writing. Both systems are fundamentally “morphophonemic.”
    Japanese kanji and their varied pronunciations have always reminded me of the Latin roots of English words. For example, “nat” spells a morpheme that may be pronounced several different ways: innate, nature, natural, nation, national.
    The letter “a” itself may be polyvalent at the phonotactic level (due to native historical sound changes), but “nat” is the plurigraph for a morpheme (originally borrowed from continental scholars in written form).
    Isn’t this parallel to different historical go-on and kan-on readings of kanji, if not their kun and ad-hoc readings?

  11. Interesting post and discussion, to which I know far too little to contribute.

    Does anyone know how readings like もみじ for 紅葉 started and spread? A prescriptive origin or a multi-source natural emergence? That might shed more light on the extent to which people perceived kanji as ideographs.

    @leoboiko, would you call readings like おやしらず for 親不知 pluri-grapho-morphemy, or is it just a matter of a non-linear script? Or do we give up and realize we’re in rebus-kanbun territory? :)

