Kōji Kawamoto’s theory of Japanese poetical metre (written for sweetheart)

Dear sweetheart,

This is a summary of Kōji Kawamoto’s theory of Japanese poetic metre. I am writing it from memory (quite distant memory, in fact), so it probably contain errors or half-remembered things. As such, it’s not really a reliable source for your thesis. But you’ve seen just how hard is The Poetics of Japanese Verse to get a hold of these days; it’s not on the usual online sources, it was nowhere in the libraries of Dublin or Bochum, & on bookstores it has reached the triple digits, in the manner of academic books under late-stage capitalism. By now you can finally access a copy, like some fabled treasure, at the same place I originally found it: in the charming little library of the Japan Foundation, São Paulo chapter. But I fear that, at this point, it might be too late. Kawamoto is dense reading; I had to make two or three separate attempts, each one exhaustingly intensive, until I felt like I had digested the gist of it. And of course at this point you can’t afford to muck around with the tarpits of one-more-source. That’s why I’m writing this summary, however lacking my memory is. You can’t of course cite something like this; but maybe you can use it to guide a brief look through Kawamoto’s book, pick out some choice quotations, and use this theory in an academically responsible way.

(I will drag a bit to get to the main point—well, you know me. Skim near the bottom for the meat of the thing, i.e. how to find rhythms in Japanese poetry.)

Continue reading “Kōji Kawamoto’s theory of Japanese poetical metre (written for sweetheart)”

Jōyō kanji variants: The curious case of 叱 and [censored]

I’m working on a reliable, machine-readable edition of the Jōyō kanji data, and this came up. Can you spot the difference between 𠮟 and 叱? Me neither. Let’s look at the reference image:

Comparison between  and 叱 (Joyo Kanji-hyo reference image

…Welp. The left one is a left-to-right stroke stopping at the end, in the model of 七 “seven”; the right one is right-to-left, sweeping at the end, as in 匕 “spoon / sitting person”. But, still. These government people are very through, to list these minor variant glyphs of the same character.

Except these are supposed to be different characters altogether.

Continue reading “Jōyō kanji variants: The curious case of 叱 and [censored]”

« Letras, fortnightly »: an email newsletter

I’m still on hiatus (too busy for proper blogging) but I wanted to keep in touch with the world somehow. At the same time, I’ve grown ever more dissatisfied with commercial walled gardens – Facebook, Twitter, Tumblr etc.; but a casual blog is a bit too much of “throwing petals in the Grand Canyon” (no one even uses RSS readers anymore, and republishing stuff on walled-garden feeds causes all sorts of headaches). Considering this, and taking inspiration from the pleasure I derive from Warren Ellis’ Orbital Operations, I’ve decided to set up a bi-weekly email newsletter: Letras, fortnightly.

The content is about linguistics & literature in general, not just Japanese studies (for which this blog remains the output). However, due to my interests, Japanese stuff still creeps in. I’m setting up archives, so you can browse the first issue and see what I’m aiming for.

Stay tuned for future developments in this blog.


Here’s one Japanese grammatical form that I rarely find discussed. The following scene is from Ueshiba Reach’s Discommunication, v.2:


Reading right to left:

“What are you doing?!”
“Each and every one of these drawers has a precious treasure of mine! Don’t just go about messing with them!”


“Ah! Now’s not the time for this!!” “Busy, busy!”
Akechao… (Let’s just open it…)

A while later:


“Now how about this other drawer? Akechao

This is the abbreviated inflection of -chau, itself the abbreviation of -te shimau. If you listen to the spoken language at all, or if you ever read any manga, you already know -chau. When I first started learning Japanese, I recall how much trouble this form gave me; it wasn’t found in my textbooks, nor grammar books, nor dictionaries. (Similarly, a narrator from Helen DeWitt’s wonderful The Last Samurai rejoices at finding a then-unusual little book on colloquial Japanese abbreviations, which finally solves the problems that had plagued him since forever.) Those were the dark ages of Japanese education, kids. They taught us starting from -masu forms, they told us to use oru and de gozaimasu, they told us cursive kanji were “necessary for daily life in modern Japan”, and they thought it improper to teach “wrong” Japanese. Luckily, these days people have learned the value of teaching a language as she is spoke, and today no one would have any trouble finding -chau in a dictionary or textbook.

Of course, if I knew how to search adult grammars, I’d have found it even in the past century. Martin’s godlike Reference Grammar (1975) thoroughly details all possible combinations of -te shimau with -wa, -nai, causatives/passives, etc., all possible contractions including dialectal differences, and illustrates no less than five meanings:

  1. Finishing an action (perfective aspect): Tsui ni Taiyō ga shizunde shimatta “Finally the sun finished sinking”.
  2. Doing something all the way through, completely (completive aspect): O-Kane wo otoshite shimatta “I lost all the money”.
  3. Ends up doing, gets around to doing; Tabe-sugite o-Naka wo kowashite shimatta “Being such a glutton, I ended up with a ruined stomach”.
  4. Just a strong or emphatic past: Ichatta “They’re gone!”
  5. As a mood indicator, it marks annoyance, displeasure at how things ended up, frustration of expectations: Nan de mo nonjau “He’ll drink any damn thing!”.

For didactic purposes we could perhaps classify those, grosso modo, in two main strands of meaning: to finish doing (completive aspect) and to end up doing despite one’s will (non-volitional mood). As the English translations (“finish”, “end up”) suggest, both have to do with the meaning of shimau as an independent verb: “to finish, to store away”.

This is why I like -chao (-te shimaō) so much. denotes volition; that is, something she wants to do. -chau denotes ending up doing something, despite of oneself—that is, involuntarily (kowachatta! “I accidentally broke it!”). The combined volition-nonvolition effect is resembles “I baked you a cookie, but I ate it”: “let’s end up opening the drawer!” “let’s put ourselves in the state of ‘whoops, I’ve opened it!’”“let’s just open it [and not worry about the consequences]!”


Acceptability judgments in Japanese

The “armchair debate” in linguistics is about to what extent we can trust a researcher’s own intuition when they deem a sentence acceptable or not. Linzen & Oseki (2015) make the point that what works for English might not work for lesser-studied languages:

The vast majority of published English judgments can be replicated with naive participants (Sprouse & Almeida, 2012; Sprouse et al., 2013). We argued that this is due to two reasons. First, a large proportion of the acceptability judgments illustrate obvious and uncontroversial contrasts (Type I/II judgments). Second, more subtle contrasts (Type III judgments) are informally vetted by a large community of linguists who are native English speakers. While not foolproof, this informal peer review process weeds out most questionable judgments (Phillips, 2010).

To examine the efficacy of the peer review process in languages other than English, we selected acceptability judgments in Hebrew and Japanese that we deemed to be questionable. A half (in Hebrew) or a third (in Japanese) of the Type III judgments failed to replicate, while all Type II judgments were robustly replicated. These results suggest that (1) formal acceptability rating experiments are not necessary for each and every judgment, (2) linguists can effectively identify questionable contrasts, and (3) informal peer review mechanisms are less effective for languages spoken by a smaller number of linguists.

For illustration, an “uncontroversial” contrast would be like:






‘It rained this morning.’





‘It thundered last night.’

Whereas a “questionable” one would be:








‘Mary’s criticism that she did not hear’







‘Mary’s criticism that she did not hear’

Linzen & Oseki’s data are test sentences from real studies. This latter one is from Sakai’s Complex NP Constraint and case conversion in Japanese (1994); and the group study rated the pair in the opposite direction as Sakai’s intuition.

Translation and professional pretense

[…] One of the greatest offerings that such programs provide students is a sense of what it means to be a professional. Unfortunately, this is not always taught in class, and has to be picked up by osmosis—by paying attention to how the teachers talk about the profession, how they present themselves as professionals. Some programs offer internships that smooth the transition into the profession. Even then, however, the individual translator-novice has to make the transition in his or her own head, own speech, own life. Even with guidance from teachers and/or working professionals in the field, at some point the student/intern must begin to present himself or herself as a professional—and that always involves a certain amount of pretense:

“Can you modem it to our BBS by Friday?”
“Yes, sure, no problem. Maybe even by Thursday.”

You’ve never used a modem before, you don’t know what BBS stands for or how one works, but you’ve got until Friday to find out. Today, Tuesday, you don’t say “I don’t have a modem” or “What’s a BBS?” You promise to modem the translation to their BBS, and immediately rush out to find someone to teach you how to do it.

“What’s your rate?”
“It depends on the difficulty of the text. Could you fax it to me first, so I can look it over? I’ll call you right back.”

It’s your first real job and you suddenly realize you have no idea how much people charge for this work. You’ve got a half hour or so before the agency or client begins growing impatient, waiting for your phone call; you wait for the fax to arrive and then get on the phone and call a translator you know to ask about rates. When you call back, you sound professional.

[…] So you pretend to be an experienced translator. To put it somewhat simplistically, you become a translator by pretending to be one. As we saw Paul Kussmaul (1995:33) noting in Chapter 7, “Expert behaviour is acquired role playing.”It should be obvious that the more knowledge you have about how the profession works, the easier it will be to pretend successfully […] note, however, that the need to “pretend” never really goes away.

(Douglas Robinson, Becoming a translator: an accelerated course.)

I’d argue further that this kind of presentational role-playing (which is necessary for any profession and even for other social roles) cannot be explicitly taught in class; osmosis is its natural path of acquisition. For an in-depth, exceedingly interesting and criminally overlooked discussion, see Goffman, The Presentation of Self in Everyday Life .

For delightful tales of bravado and sheer cold-bloodness in the context of interpreting, see Kató Lomb, Polyglot: How I learn languages:

I was hired to interpret into Japanese for the first time in my life. The Hungarian hosts and I were waiting at the Ferihegy airport. Our leader was a widely popular, old politician known for his flowery style, but my knowledge of Japanese didn’t permit me to say much more than “Japanese is good, Hungarian is good, long live!” However, the first sentence I was supposed to translate into Japanese (and with which I was supposed to launch my career) went like this: “The black army of weed-scatterers will in vain try to obscure the unclouded sky of the friendship between Japanese and Hungarian peoples!”

Xu Shen the historical reconstructionist

Timothy O’Neill:

This article puts forward a new interpretation of the lexicographic method of the Shuowen Jiezi 說文解字 by rereading the original text and traditional commentaries through the lens of authorial intention. Within the paradigm of traditional Chinese hermeneutics, intentionality serves as the linchpin of philological methodology. The central argument of the article is that the lexicographic macrostructure and microstructures of the Shuowen are designed to prove that the changes in the writing systems are historically and graphemically observable, and consequently that the original intentions of the sages who used guwen [古文 “ancient writing”; here, the original Chinese script as designed by four-eyed Cangjie] to write the classics are literally recoverable by working backwards through the reforms and changes in writing to a proper understanding of how they classified and used their words in the guwen writing system. An annotated translation of the “Shuowen Postface” in light of this new interpretation concludes the discussion.

A quote:

Because they began with the inherently flawed assumption that writing has never changed, Xu argues that junwen scholars [who thought guwen to be unauthentic and preferred to work with Qin-era Small Seal script] were therefore blindly working with what they wrongly perceived to be the genuine intentions of the sages as encoded graphemically in the writing system—that is, interpretations of words based on the structure of the characters that write them. It is almost a slap in the face of the jinwen scholars that Xu Shen presents them with the historicist argument that they have been basing at least a select portion of their exegetical and governmental policy work on the drastically harmful if not immoral alterations specifically made by Qin officials to the writing system. Hence a portion of the intentions the jinwen scholars were finding in their graphemic analysis of the classics (shuozi jiejing 說字解經 “explaining characters in order to explicate the classics”) were the intentions not of sages, but of vile Qin criminals, one of whom was a eunuch regicide, no less.⁶⁷


67. This parallelism in the Postface to the title of the work provides a strong argument for understanding the original title of the Shuowen Jiezi to mean something like “analyzing the three distinct writing systems [i.e. Cangjie’s original, Zhou Great Seal and Qin Small Seal] in order to explicate their offspring characters ”.

An interesting point for me is how much importance they gave to the now-discredited practice of explaining words, and their etymology (in the older sense of the term), in terms of character analysis. This was considered to be the route to proper hermeneutics of the authorial intention of the sages, and therefore a basis for policy and law. Xu Shen specifically claims that his 540-classifier system had been deliberately set up by Cangjie and consciously employed by Confucius and Zuo Qiuming in their classics (and therefore that it’s needed to understand the sages properly). The Shuowen lists 9353 characters with 1163 “repeats” (), which are actually entries tracking changes in graphical structure—and therefore, under this paradigm, corruption, not only in structure but also in sound and meaning; this implies that 12.43% of the (then) modern small-seal graphs were “wrong”.

And here’s a piece of the translation of Shun’s scholarly trashing, just for fun:

They consider Qin lishu to be the writing of the time of Cangjie, saying that from father to son it was transmitted one to the other—how could it receive revisions and changes? Then they rashly say the head of ma and ren together makes chang , that ren holding shi makes dou , and that as for hui , it is a bent zhong .

Multiplicity in writing systems: terminological troubles

Consider the alphabetic principle: each letter represents one sound—or, to get technical, each grapheme (a symbol of writing) represents one phoneme (a linguistic sound).

/tɪn/: /t/ /ɪ/ /n/
‹tin›: ‹t› ‹i› ‹n›

So there’s a nice 1-to-1 mapping. But sometimes the mapping can get multiple, both in the direction of writing (1 sound is represented by N letters)…

/θɪŋ/: /θ/ /ɪ/ /ŋ/
‹thing›: ‹th› ‹i› ‹ng›

…and in the direction of reading (1 letter represents N sounds):

/ɛksɪt/: /ɛ/ /ks/ /ɪ/ /t/
‹exit›: ‹e› ‹x› ‹i› ‹t›

Cases like ‹th›→/θ/ or ‹ng›→/ŋ/ are called digraphs. This is a widespread term, so I’d like to use it in my thesis. We can generalize, when N>2, to plurigraphs.

And cases like ‹x›→/ks/ (when from the rule of the system we’d expect a grapheme to represent one sound, but this particular one represents two) could, by the same token, be called pluriphones. (Continue reading for why I decided against “polygraphs” and “polyphones”). Continue reading “Multiplicity in writing systems: terminological troubles”

Annals of the Great Hanzi Debate: Handel’s response to Unger

Here’s one more contribution on the DeFrancis/Ungerian proposal that all writing systems are fundamentally phonographic: Zev Handel, Logography and the classification of writing systems: a response to Unger (2015).

I think we can all agree that the DeFrancis research programme has successfully proved that hànzì and kanji are not ideographic, that they represent language, and that they (also) encode phonological information and are decoded (also) into sounds. The remaining question is largely a matter of emphasis: do we think that this phonographical component is so important and fundamental that hànzì, or even kanji, should be understood as, ultimately, a kind of phonography (and a poor one at that)? Or, is it productive to consider its non-phonographical components important enough so as to classify them as “another kind” of writing (and perhaps not so poor at all)?

Handel above reviews psycho- and neurolinguistic studies, and argues for the latter position.

Greg Pingle on Mongolian Sinitic; and John Phan on Sino-Vietnamese

Greg Pingle has a cool Quora answer to the question: what if Ainu had borrowed from Chinese? He compares it to Mongolian Sinitic loans, which are unlike Sino-Xenic in being a) whole-word, not morpheme-based, and b) orally transmitted.

In the discussion thread for the old post on Sino-Xenic, commenter 番 brought my attention to John Phan’s work on Sino-Vietnamese. Phan has kindly uploaded a preview to his dissertation, Lacquered Words, where he argues, contra Miyake, that

unlike Sino-Korean or Sino-Japanese, Late Sino-Vietic resulted from a bilingualism in Sinitic and Vietic languages that flourished in the area of northern Vietnam throughout the Tang dynasty.

[…] Contrary to current analyses of Sino-Vietic lexica (which assume reading-based transfusions similar to the origins of Sino-Korean or Sino-Japanese), I claim that the bulk of Sinitic loanwords in Vietnamese resulted from bilingual contact, between a form of Sinitic native to the region of modern day northern Vietnam and contemporary forms of Vietic language. For reasons discussed below, I have termed this variety of Sinitic “Annamese Middle Chinese” (AMC). Unlike in the Korean peninsula or the Japanese archipelago, I claim that the river plains of northern Vietnam were home to a rooted and thriving community of AMC speakers for most of the first millennium, and it is the presence of this community and the bilingual effects of their coexistence with Vietic speakers that fundamentally defines the nature of Sino-Vietic contact throughout history.

[…] However, when AMC obsolesced as a spoken language in the region, it left a form of Literary Sinitic behind which entered into a hyperglossic relationship with the new dominant form of speech, i.e. pVM. This hyperglossic relationship was in turn analogous to contemporary hyperglossic arrangements in Korea and Japan of the 2nd millennium.

In this way, even though Vietnamese is not a Sinitic language, ancient Vietnamese speakers were bilingual in some form of spoken Middle Chinese. This makes the Vietnamese a kind of interesting hybrid:

  • Like Mongolian, there was oral transmission and bilingualism;
  • But, like Japanese and Korean, there was diglossia (or “hyperglossia”) with Literary Sinitic (wényán/kanbun), including a “system of sinographic reading” based on the Qieyun rime tables.

I eagerly await the full dissertation.

Writing kanji on the air is even better practice than writing on paper

Margaret Thomas, Air Writing as a Technique for the Acquisition of Sino-Japanese Characters by Second Language Learners.

Summary: When studying a kanji, native Japanese speakers often trace its strokes with their fingers on the air, palm, or thigh, while keeping their eyes fixed on the source model. (They do it for recalling, too, often closing the eyes or averting the gaze). This is called “air writing” (kūsho 空書 or karagaki 空書き). Thomas experimented with 75 non-native learners, of 22 different mother languages, and found that air writing helped retention significantly more (p < 0.01) than pen writing or visual memorization—though the effect size was modest, and only noticeable when memorizing harder kanji (some 15.43% more hits for the hardest kanji set).

Interestingly, six participants who were told to not use kūsho still did it spontaneously during recall tasks; either with their hands, or by mimicking kūsho patterns with subtle head or torso movements.

Thomas tested only kanji recall, not recognition (which is likely the most important task in the modern age). However, she does mention a couple studies suggesting that native speakers can recognize kanji more easily when allowed to air-write (Matsuo et al, Dissociation of writing processes: functional magnetic resonance imaging during writing of Japanese ideographic characters, 2000; and Matsuo et al, Finger movements lighten neural loads in the recognition of ideographic characters, 2003).

I think it’s reasonable to suppose that, for non-native learners, too, air-writing helps with both recall & recognition. This is good news because you can practice anywhere with your own body.

A 2015 count of Japanese word frequency

*Slowly emerges from underground cave*

I’ve been doing computer things to large samples of Japanese text. To be more specific, I’ve been feeding the full contents of the Japanese Wikipedia to Mecab, R, python, and several small shell scripts.

It occurred to me that, while these things are at hand, it would be simple to make a new count of frequent Japanese words. So I did. You can see what is it like at this Wiktionary page. Full TSV tables are available for download: the count of lemmas (uninflected words), and of inflected word forms.

New stuff about kanji is forthcoming.

*Slowly submerges to cave*