(This is about modern transcriptions using the Latin alphabet; if you’re looking for historical Old Japanese transcription techniques, might I interest you in this other post?)

Because I’m quoting material from different works in this blog, I can end up citing various transcriptions and romanizations, which can be confusing. In fact I am confused. This post is to attempt to set things straight about how people represent Old Japanese (OJ) words in modern texts.

First of all, I find it’s desirable to keep in mind a distinction between transcription and reconstruction; transcription of text is much more certain and stable than phonetic reconstructions. Old Japanese was written in a number of ways, but for linguistic purposes the most interesting technique is the use of Chinese characters for their phonetic values, or “phonograms”. Contrary to popular opinion, this wasn’t invented in Man’yôshû times as man’yôgana; when Chinese writing came to Japan, phonogram writing was already common in China (for transcription of foreign words etc.) and in Korea (for native words), and the set of characters chosen as phonograms shows that Japanese usage was a continuation of that tradition.

Now the puzzle (in Kuhn’s sense) that has long fascinated OJ scholars is the fact that OJ phonogram writing preserves some distinctions that were lost in later Japanese texts. For example, at the time of the 17th-century priest Keichû, the phonograms 乎 and 於 were both pronunced [wo] (in modern Japanese, they’re both [o]). But he noticed that, in past texts, the accusative particle wo could be written as 乎, but never 於; and precisely the opposite held for the first syllable of the verb “to think” (which for him was womou). In fact there was a group of phonograms interchangeable with 乎, and another with 於, but these groups never mixed; “to think” was always written with a phonogram from the second group, and the direct-object particle with one from the first group.

Early kokugaku scholars like Keichû, Norinaga and Ishizuka thought of these distinctions mostly in terms of orthography: the Ancients had a “proper way” of writing that was now lost. But Ishizuka did start to notice these distinctions could be explained by phonetics (he discovered there was a “muddy/clear” distinction, i.e. voiced/unvoiced), which paved the way for the modern linguist Hashimoto Shinkichi to attempt the first OJ reconstructions. Hashimoto found out that the distinct phonogram-sets could be predicted from some morphological environments: 賣 and 米 both were characters for me, but the first ocurred only in imperative verbal forms (meireikei), while the second only in realis/conditional forms (izenkei). Hashimoto called these groups kou 甲 and otsu 乙 (“A” and “B”) types, respectively. There were, though, some merged syllables that didn’t occur in verbal inflections; Hashimoto classified those into A or B using more complex criteria, based on alternative readings for the characters. He didn’t give a name to the remaning majority of syllables which show no A/B distinction; following Miyake, we can call those the C-type (hei 丙). A– and B-types only occur for syllables which later merged for rhymes e, i, o; syllables corresponding to rhymes a and u have no such distinctions (i.e. they’re like C-types). Therefore, the following OJ rhymes are attested:

a (=aC)
i (=iC) iA iB
u (=uC)
e (=eC) eA eB
o (=oC) oA oB

Based on comparative data from nearby languages, Hashimoto believed these distinctions were not just ortographical but reflected ancient phonological distinctions. OJ had 15 possible consonantal onsets plus the zero-onset; multiplied by these 11 rhyme classes, there would be 165 possible syllables; but, as usual, only a fraction of those actually occurs—for example, iA and iB never appear after t, while iC never appears after p. The full set of syllables found in texts by Hashimoto amounts to 88:

Rhyme p b m w t n d r s z y k g
a (=aC) a pa ba ma wa ta na da ra sa za ya ka ga
iA piA biA miA kiA giA
iB piB biB miB kiB giB
i (=iC) i wi ti ni di ri si zi
u (=uC) u pu bu mu tu nu du ru su zu yu ku gu
eA peA beA meA keA geA
eB peB beB meB keB geB
e (=eC) e we te ne de re se ze ye
oA moA toA noA doA roA soA zoA yoA koA goA
oB moB toB noB doB roB soB zoB yoB koB goB
o (=oC) o po bo wo

Our discussion so far depends on four claims by Hashimoto: 1) That there are extra syllablic distinctions in the writing of Old Japanese; 2) that they reflect lost phonological distinctions; 3) further, that these distinctions happen in the rhyme (i.e. in the syllable final, not in the initial consonant); and 4) that the attested phonograms can be sorted into the above 88 syllables. This is what Miyake calls “the consensus view” (p. 50). As far as I know (which is not much), today mostly everyone agrees that 1 and 2 must be correct, and probably 3 too; a few scholars have minor disagreements about 4, but the general scheme is accepted (Yoshitake Saburô posited 86 syllables, Lange 82, Nagata/Mabuchi propose 91; Tôru, Igarashi, Inukai and Bentley accept 89 by adding poA/poB).

The problems is that scholars have used notations that embody assumptions not only of attested phonogram usage, which is more or less certain, but also of reconstructions, which come and go. For a long time, it was assumed without proof that type C would be phonetically equivalent to type A, reducing the 11 attested classes to 8; what’s more, many scholars assumed that the syllables were all of the CV form, meaning the A/B distinction must be purely vocalic. These assumptions framed the famous “eight-vowel system” of OJ, and were embodied in a number of notations. While an 8-vowel reconstruction is in principle possible, there is no strong evidence, and many recent scholars dispute it. Kiyose has a list of “pro-eight vowels” and “anti-eight vowels” reconstructions; most of the anti-8 camp posit that some of the distinctions were realized as glides ([y] or [w]). C-type rhymes could in principle have an entirely distinct phonetic realization, though Miyake’s analysis show that the textual evidence makes it unlikely (they employ phonograms that fall in the same classes as A and B; see p. 263–264)—but one should not simply assume this without arguments.

Which the caveats in mind, I can finally try to sort out the various OJ romanizations I’ve stumbled upon:

Superscript, latin Superscript, Japanese Yale (Martin) Japanese-style Miller; Ohno Modified Mathias-Miller Frellesvig & Whitman
a, aC a, a a a a a a
iA i yi i i î i
iB i iy ï ï ï wi
i, iC i, i i i i i i
u, uC u, u u u u u u
eA e ye e e ê ye
eB e ey ɜ ë ë e
e, eC e, e e e e e e
oA o wo o o ô wo
oB o o ö ö ö o
o, oC o, o o o o o o

Miller is given on its p. 180 as a minor revision of Japanese-style by changing ɜ to ë; both of them assume C-type = A-type. The umlaut originally denoted vowel centrality, but central/peripheral reconstructions are more or less discredited by now (I think?). Miyake prefers Yale romanization for its neutrality, but personally I dislike it for being misleading—one has to keep in mind at all times that the ey, ye, iy, yi, wo and o don’t mean or suggest a phonetic realization like [ey], [ye], etc., but are purely abstract, algebraic symbols. I also find it typographically hideous, what’s with the underlined o and all the extra ys. Modified Mathias-Miller, adding a circumflex diacritic to Series-A, seems a more reasonable neutral transcription (as long as you know that the circumflex doesn’t denote a long vowel). Frellesvig & Whitman’s is listed for convenience, but it’s not a pure transcription; by my criteria, it steps into reconstruction territory (the authors call it a “phonemic interpretation”). F&W proposes iC=iA, eC=eB, and oC=oB; its “y” and “w” represent actual glides in pronunciation and are not just an abstract notation as in Yale.


  • Marc Hideo Miyake, Old Japanese: A phonetic reconstruction.
  • Roy Andrew Miller, The Japanese Language.
  • John R. Bentley, The Origin of Man’yôgana.
  • Kiyose, Gisaburô. Japanese Linguistics and Altaic Linguistics. Tokyo: Meiji shoin. (apud Miyake).
  • Frellesvig & Whitman. The Oxford Corpus of Old Japanese.

