[vox-tech] vim and utf-8 support (newbie alert)

Peter Jay Salzman vox-tech@lists.lugod.org
Mon, 9 Jun 2003 15:15:58 -0700


note: in what follows, i'm a bit schizophrenic about "iso 10646" and
"unicode".  since the tables and encodings are compatible in the most
recent versions of the standard, i'm using them interchangeably.

On Mon 09 Jun 03,  2:35 PM, Micah J. Cowan <micah@cowan.name> said:
> On Mon, Jun 09, 2003 at 04:06:01PM -0500, Jay Strauss wrote:
> 
> OOC, Pete, are you planning on doing Hebrew homework or something like
> that with vim?
 
i have some notes on vocabulary and grammar in dead tree format that i'd
like to convert into magnetic format.   ;-)

>   2. I don't believe you can get the Hebrew vowels; but I haven't
>      tried.
 
i only learned what ISO 10646 and utf is a few hours ago, but i thought
that was the whole point of the ISO standard and unicode.

i read that some of the characters in the 31 bit characterset were
designated "combination characters" which provide accents for
characters.

the thing i read mentioned that you can think of combination characters
as being accents on typewriters.  they don't take up space by
themselves, but instead are combined with the character right before
them.  it also said there were unicode precomposed characters, which are
pre-accented characters, but that

a) these are included in unicode for backwards compatibility
b) you can always use two characters (combination characters) to
represent pre-composed characters.

that's the reading i got from:

   http://www.cl.cam.ac.uk/~mgk25

of course, i didn't have time to read the whole thing (i read it over a
lunch break and got about 1/4th of the way through), so there might be
some kind of clause "i know we told you this was in the standard, but
the implementations do this...".

from reading that document, it at least sounds like what i want to do is
possible...

>   3. Not all languages automatically support conversion to
>      Unicode. For example, I can type in Japanese, and then attempt to
>      export the text file, but unicode will not be one of the
>      available encodings.
 
would this depend on the application you're using?   if i understand
this correctly, c99 has support for a 31 bit character for unicode.

i could be stating that wrong -- i think it mentioned that the majority
of the languages are contained in the first 16 bits, so perhaps c99 has
a 16 bit character type, but i do remember that there's some kind of c99
support for ISO 10646.
 
> Doesn't help you much, though, does it? ;)
 
heh.  well, before all this, i had zip, zero, nada knowledge of unicode,
iso 10646, encodings, character tables, utf-2, utf-4, utf-8 and all
sorts of non-english non-sense.

the difference is, at least i *know* what i don't know.  that's a step
up from not knowing what i already didn't know!   ;-)

pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D