[vox-tech] vim and utf-8 support (newbie alert)

Peter Jay Salzman vox-tech@lists.lugod.org
Mon, 9 Jun 2003 13:59:58 -0700


thanks mark...

On Mon 09 Jun 03,  1:27 PM, Mark K. Kim <markslist@cbreak.org> said:
> On Mon, 9 Jun 2003, Peter Jay Salzman wrote:
> 
> > * start an xterm with a suitable font: "xterm -fn <fontname> -e vim"
> > * use utf-8 encoding which uses encodes unicode and ISO10646 text.
> > * load a suitable keymap to help make entering text easier.
> >
> >
> > is all this correct so far?  even in a "touchy-feely" way?   i'm a
> > complete newbie in this topic.
> 
> It depends on the foreign language and how it's encoded.
> 
> The XTerm has its own encoding and fonts (mostly designed for latin-based
> languages).  VIM also has its own encoding and fonts.  It gets really
> tricky because there are so many systems depending on each other, and you
> may have to trick one or more of the systems to make the foreign language
> work, but which systems you can trick depends on the foreign language
> you wanna work with.
> 
> What language are you working with?  Latin-based languages only need font
> change, and you can probably just change the fonts on XTerm.  Multibyte
> languages (ie, CJK) generally need special XTerm that understands that
> language (generally using its own, non-utf-8, encoding).  I won't even
> touch right-to-left or up-and-down languages (that requires both terminal
> and Vim support.)

right-to-left languages are really, really, really well supported in
vim.  at least, they seem to be.  check out:

   :set rl

all the vim commands i can think of work well.


the language i'm thinking of is hebrew, but with some important issues.

1. i need vowel support.
2. i really want to have mixed hebrew/english

i believe taken together, i want to use ISO 10646 which can represent
all languages at the same time.

> > if this is about correct, how does one tell vim to encode the text using
> > utf-8?
> 
>    :set encoding=utf-8
 
> That tells VIM to interpret the file as though it's encoded in UTF-8.
> But VIM's got no idea how the data should be displayed so I think it
> attemps to display them in unicode by default.  So your terminal should
> also be capable of unicode and got all the necessary fonts.
 
as a first stab at getting utf-8 capable xterms, i set:

   LC_CTYPE=en_US.UTF-8

but wierd things started to happen, like mutt's threading lines turned
into really strange characters.  i guess the applications themselves
need to be utf-8 aware too.

> Works great under WindowsXP (everything's in unicode; just make sure you
> got the fonts installed.)
 
that makes me very sad...   :(

> > and how do you tell vim "i want to use language X whose characters are
> > unicode number UT-Y through UT-Z?   or doesn't it work quite that way?
> 
> I don't think the unicode characters are marked by languages.  Some are
> obvious (CJK, though subset of C is also used by JK), but others are less
> so (punctuation, alphabets, etc.)  Many characters are also not in
> sequence (I think Chinese is broken up in two or more sets -- unicode is
> constantly evolving and they need to maintain backwards compatibility.)

okay.  it never is that easy, eh?   :-)

it totally sucks that mixed hebrew-with-vowels/engish turned out to be
such a hard thing to do.  :( sucks even worse that it's easy on windows
xp.   :(

pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D