[vox-tech] vim and utf-8 support (newbie alert)

Ken Bloom vox-tech@lists.lugod.org
Mon, 9 Jun 2003 17:05:34 -0700


On 2003.06.09 13:59, Peter Jay Salzman wrote:
> thanks mark...
> 
> On Mon 09 Jun 03,  1:27 PM, Mark K. Kim <markslist@cbreak.org> said:
> > On Mon, 9 Jun 2003, Peter Jay Salzman wrote:
> >
> > > * start an xterm with a suitable font: "xterm -fn <fontname> -e
> vim"
> > > * use utf-8 encoding which uses encodes unicode and ISO10646 text.
> > > * load a suitable keymap to help make entering text easier.
> > >
> > >
> > > is all this correct so far?  even in a "touchy-feely" way?   i'm a
> > > complete newbie in this topic.
> >
> > It depends on the foreign language and how it's encoded.
> >
> > The XTerm has its own encoding and fonts (mostly designed for
> latin-based
> > languages).  VIM also has its own encoding and fonts.  It gets
> really
> > tricky because there are so many systems depending on each other,
> and you
> > may have to trick one or more of the systems to make the foreign
> language
> > work, but which systems you can trick depends on the foreign
> language
> > you wanna work with.
> >
> > What language are you working with?  Latin-based languages only need
> font
> > change, and you can probably just change the fonts on XTerm.
> Multibyte
> > languages (ie, CJK) generally need special XTerm that understands
> that
> > language (generally using its own, non-utf-8, encoding).  I won't
> even
> > touch right-to-left or up-and-down languages (that requires both
> terminal
> > and Vim support.)
> 
> right-to-left languages are really, really, really well supported in
> vim.  at least, they seem to be.  check out:
> 
>    :set rl
> 
> all the vim commands i can think of work well.
> 
> 
> the language i'm thinking of is hebrew, but with some important
> issues.
> 
> 1. i need vowel support.
> 2. i really want to have mixed hebrew/english
> 
> i believe taken together, i want to use ISO 10646 which can represent
> all languages at the same time.
> 
> > > if this is about correct, how does one tell vim to encode the text
> using
> > > utf-8?
> >
> >    :set encoding=utf-8
> 
> > That tells VIM to interpret the file as though it's encoded in
> UTF-8.
> > But VIM's got no idea how the data should be displayed so I think it
> > attemps to display them in unicode by default.  So your terminal
> should
> > also be capable of unicode and got all the necessary fonts.
> 
> as a first stab at getting utf-8 capable xterms, i set:
> 
>    LC_CTYPE=en_US.UTF-8
> 
> but wierd things started to happen, like mutt's threading lines turned
> into really strange characters.  i guess the applications themselves
> need to be utf-8 aware too.
> 

Here is what I have found from a bit of research now. running uxterm or 
runnning xterm -u8 makes the xterm support data in both directions 
(input and output) that is encoded in hebrew. With that, select a font 
for the xterm that is encoded in iso10646-1. There should be lots of 
good choices, you can check them out using xfontsel.

I observed that for some reason the culmus fonts can't work as iso10646 
in xfontsel or xterms, but they can handle it just fine in pango 
applications. Perhaps that's a bug in the debian packaging of these 
fonts (perhaps not having fonts.dir entries for iso10646-1) - I haven't 
filed a bug on this though, so someone else can if they think it's a 
bug.

Inside the xterm, run vim -H (or just run vim and :set rl this won't 
change vim's keyboard layout, so it will let you type english too, 
although it wll be backwards) and set its encoding to utf-8 as per the 
directions that we have already discussed:
:set encoding=utf-8

Set up your X keyboard according to instructions included at 
http://imagic.weizmann.ac.il/~dov/Hebrew/pango-hebrew.html
A diagram of the keyboard layout is available at that site.

Hit right-alt once to toggle to hebrew and start typing. When you save 
your work, the result will be a utf-8 encoded file.

As an added bonus, applications that support pango (and this includes 
the standard text boxes in GTK+ 2.x, gaim conversations, AbiWord, and 
more) will also allow hebrew input using the same right-alt language 
switch.

 From the looks of things, Vim's keyboard layouts also appear to 
support vowels when you're working in UTF-8, but I don't know what keys 
you have to press to get them. They have the advantage of having some 
kind of phonetic layout (pete, can you confirm this for me?), but the 
disadvantage of only working in vim.

שלום

-- 
I usually have a GPG digital signature included as an attachment.
If you don't know what it is, either ignore it or visit www.gnupg.org
Fingerprint: D5E2 8839 6ED3 3305 805C  941F 9476 A9BD E2B2 CAD1
The key is keyID E2B2CAD1 on pgp.mit.edu