The Nanbanjin Nikki

ザ南蛮人日記

Announcing myougiden, a command-line Japanese/English dictionary

Where have I been, you ask? I’ve disappeared for the last two weeks! I didn’t write anything, talked to no one, was nowhere to be seen!

As it happens, the Muse of Programming possessed me forcefully, and after some intense days taken by a mood, I ended up with this:

myougiden screenshot

myougiden is a new JMdict-based dictionary for the command-line. If you’re in a POSIX-style system (I think OSX should work, probably, perhaps), and you’re interested in trying it out, refer to the README. Here’s a copy of the current features list for hype:

It’s still rough and untested, so it might not work on your system (I tested it on Debian GNU/Linux, wheezy, and Python 3.2). Please tell me of any bugs!

Now I should probably go back to my thesis…

Comments

I don’t know what to say! The software looks great, but… you used word processor romanization in its name!

By Matt on .

What, you expected Myōgiden? But you can’t use non-ASCII characters in Linux command names! I mean, you can, but… you can’t. It’s just Not Done. It would be unnatural.

Besides, according to the command-not-found database, currently there are no commands, in all of Debian’s hoards, starting with myo- — which means myougiden can be invoked with myo[TAB].

By leoboiko on .

What, you expected Myōgiden?

Well, ideally…

I just think that degrading to “no indication of long vowel” is superior to degrading to word processor style. I don’t know what the official Kunrei-shiki standard says, but at least in the Hepburn world that’s the done thing (e.g. passports, train station names).

By Matt on .

Losing phonemic information hurts my computolinguistic sensibilities (even non-phonemic information—I’m very bothered when I have to write e.g. yokuzuna or kanazukai and can’t distinguish underlying /du/ from /zu/. And speaking of that, it should be いなづま not いなずま。 It’s the “wife of the rice”!)

By leoboiko on .

little fix: url is “….japaneseenglish…”

By gobr on .

This makes me remember that I should some day write script to adapt Edict to work with the OS X Dictionary.app format. It would be a simple XML→XML conversion, but it takes time and gumption, so…

By Carl on .

After I was 80% done, the thought popped that I should have looked into the DICT protocol… It’s true that I do a lot of firulas* like color and “intelligent” guessing, but perhaps it would be possible to write it as a custom server/client pair with protocol extensions, while remaining compatible with existing software. Oh well.

*firula: Unreasonably indulgent design; like, say, a backpack with almost too many kinds of inner divisions (“almost” because it’s never too many).

By leoboiko on .

This is pretty slick! I’ll be using its regex support for searches instead of Nihongo Resources from now on.

Thanks! Honesty binds me to confess that myougiden is quite slower than I hoped; in part because it attempts to “do what you mean”, by running many types of queries until one matches. And regexes unfortunately give it a significant performance hit. If the latency gets too uncomfortable, try passing lots of parameters to reduce query guessing. Also, depending on what you need, consider simply grepping edict.utf8 or edict2.utf8 (this method has been my primary “dictionary” for many years, and myougiden grew out of this workflow).

(and if anyone has suggestions of how to make this thing faster, I’m all ears! profiling shows that most of the time is spent on the SQL queries, not on the fluff.)

By leoboiko on .

I would love it if you could add support for reading the EPWING dictionary format as well (http://ja.wikipedia.org/wiki/EPWING). I’ve got some dictionaries in this format, and I’m currently stuck using some Windows-based readers in a VM.

Since myougiden is in Python, I might try to add EPWING support myself, if I get the time.

Yeah multi-dictionary support has been asked–there’s a ton of little neat stuffs to add but I kinda grew tired of coding for now, & am concentrating on nethack my thesis, but I’ll try my hand at it when I’m coding again, & of course patches are welcome.