[vox-tech] vim and utf-8 support (newbie alert)

Mark K. Kim vox-tech@lists.lugod.org
Mon, 9 Jun 2003 13:27:14 -0700 (PDT)


On Mon, 9 Jun 2003, Peter Jay Salzman wrote:

> when you use vim, i assume:
>
> * that what you type is encoded with ascii or latin-1.
> * the encoding is related to the characters you see on the xterm
>    via the font the xterm is using.
> * the "stuff inside the encoding" (what gets encoded) is related to the
>    keys that you press with your fingers via a vim keymap.

It depends on the foreign language and how it's encoded.

> and when you want to use a foreign language with vim, the best way to do
> that is:
>
> * start an xterm with a suitable font: "xterm -fn <fontname> -e vim"
> * use utf-8 encoding which uses encodes unicode and ISO10646 text.
> * load a suitable keymap to help make entering text easier.
>
>
> is all this correct so far?  even in a "touchy-feely" way?   i'm a
> complete newbie in this topic.

It depends on the foreign language and how it's encoded.

The XTerm has its own encoding and fonts (mostly designed for latin-based
languages).  VIM also has its own encoding and fonts.  It gets really
tricky because there are so many systems depending on each other, and you
may have to trick one or more of the systems to make the foreign language
work, but which systems you can trick depends on the foreign language
you wanna work with.

What language are you working with?  Latin-based languages only need font
change, and you can probably just change the fonts on XTerm.  Multibyte
languages (ie, CJK) generally need special XTerm that understands that
language (generally using its own, non-utf-8, encoding).  I won't even
touch right-to-left or up-and-down languages (that requires both terminal
and Vim support.)

> if this is about correct, how does one tell vim to encode the text using
> utf-8?

   :set encoding=utf-8

That tells VIM to interpret the file as though it's encoded in UTF-8.
But VIM's got no idea how the data should be displayed so I think it
attemps to display them in unicode by default.  So your terminal should
also be capable of unicode and got all the necessary fonts.

Works great under WindowsXP (everything's in unicode; just make sure you
got the fonts installed.)

> and how do you tell vim "i want to use language X whose characters are
> unicode number UT-Y through UT-Z?   or doesn't it work quite that way?

I don't think the unicode characters are marked by languages.  Some are
obvious (CJK, though subset of C is also used by JK), but others are less
so (punctuation, alphabets, etc.)  Many characters are also not in
sequence (I think Chinese is broken up in two or more sets -- unicode is
constantly evolving and they need to maintain backwards compatibility.)

-Mark

-- 
Mark K. Kim
http://www.cbreak.org/
PGP key available upon request.