Trouble using utf-8 with ncurses?
Disclaimer: these are experimental results and I’m not too sure about what’s going on.
First of all, link to -lncursesw instead of -lncurses (from packages libncursesw5{,-dev} on Debian/Ubuntu).
I assume your source files are in utf-8. This will not work:
#include <curses.h>
int main(void)I
{
initscr();
mvaddstr(0, 0, "こんにちは世界!");
refresh();
endwin();
}
Neither will this:
+ #include <ncursesw/curses.h>
int main(void)
{
initscr();
+ mvaddwstr(0, 0, "こんにちは世界!");
refresh();
endwin();
}
This does work, as long as it is run under an utf-8 terminal:
#include <ncursesw/ncurses.h>
+ #include <locale.h>
int main(void)
{
+ setlocale(LC_CTYPE, "en_US.UTF-8");
initscr();
+ mvaddstr(0, 0, "こんにちは世界!"); /* NOT mvaddwstr() */
refresh();
endwin();
}
This too works in utf-8 terminals, as long as the user did set his LC_ALL (or LC_CTYPE) to an *._UTF-8 locale:
#include <ncursesw/ncurses.h>
#include <locale.h>
int main(void)
{
+ setlocale(LC_CTYPE, "");
initscr();
mvaddstr(0, 0, "こんにちは世界!");
refresh();
endwin();
}
I think the last example might also work for other encodings, as long as 1) the user has set his environment to the same encoding as the terminal, 2) all characters you use are available in the legacy encoding, and 3) you manually detect non–utf-8 environments and convert the strings through you convert your utf-8 strings to wchar_t with iconv(3)mbstowcs(3) and use mvaddwstr(), etc.. I’m not interested enough in legacy encodings to try.
The setlocale(3) call must be before initscr(). I managed to reproduce the same results in gauche scheme (using the amazing c-wrapper), but I had no luck with ruby-ncurses, even after installing the locale extension and calling Locale::setlocale.
Finally, be careful about Unicode characters which take two columns of text, such as the “full-width” Han symbols and Japanese kana above. This is a visual property; it has nothing to to with the number of bytes the character has in a given representation — confusingly, pre-Unicode people used the term “wide char” to mean characters represented by more than one byte in some binary encoding. libncursesw can handle these “wide” characters just fine, but if you write a full-width glyph to line/column (0, 0), it actually occupies (0, 0) and (0, 1). If you then output anything else to (0, 1), garbling may ensue. This example screenshot may help to clarify:

Notice how the full-width string beginning at column 0 aligns with even columns? If you want to replace one of the Japanese characters with half-width glyphs (that is, mostly everything else), it’s safest to write two characters over it, beginning at its starting column; complete with a space if you only want to write one.
No comments
No comments yet.
Leave a comment