[vox-tech] Need X Windows performance monitoring help

Bill Broadley vox-tech@lists.lugod.org
Tue, 27 Aug 2002 00:17:18 -0700


On Thu, Aug 15, 2002 at 07:16:52PM -0700, Eric Nelson wrote:
> We are developing an embedded system which uses Java and X Windows.  
> It doesn't have a VGA.
> 
> When we run some code on a desktop with lots of ram and vga, we get 
> pretty good performance, but on our embedded system, performance is 
> poor.

First of all I'd like to advocate IBM JVM for x86's if it fits your
license requirements.  I've been playing europa (java xbattle like game),
and of all the people that play IBM's JVM is the only one that seems
entirely stable, it's quite fast as well.

> There are several factors which may contribute to this problem: 
> swapping of the libraries or X Server itself, latency in the X 

Vmstat is a good place to start, I'd read the man page fully.  Try
vmstat or vmstat 1 (1 data point per second).

Keep in mind that with linux you have a unified cache, and that
for binaries that are read only without enough memory you get ONLY
page in's for code.  Since the binaries are read-only when locked 
the pages are just invalidated.  So for a 32 MB binary and 24 MB ram
you will continuously keep reading from storage to try to get the
32 MB in memory (assuming the right memory access pattern).  As opposed
to traditional swapping where each page miss results in a read and
a write (which will happen if you try to have to much data in memory).

Of course embedded systems often have much slower I/O systems then
your average IDE disk, which exaberates the problems.

Flash can be VERY slow to write, you should definitely be VERY careful
to insure all writes are non-blocking for any kind of performance/realtime
requirements.


> Server, processor speed, Java, the video driver, etc.  So maybe we 
> need more ram, or a faster video driver, or to lock X into ram, or 
> simply go to a faster clock.
> 
> My question is, how can we time stamp events, or get a trace of memory 
> and swap usage, or whatever necessary, to pin down where the problem 
> is?

It's fairly easy to write a cycle counter to get very accurate timing
that will not effect the system much, although you don't mention the
cpu your using, only afaik pentiums and higher have the traditional
cycle counter, I use something like:
inline
unsigned long long
getcycle (void)                 /* return the cycle counter in an unsigned 
                                   long long.  If you have similiar code for 
                                   any other architecture/os please email me */
{
  unsigned long high, low;

  __asm__ __volatile__ (".byte 0x0f,0x31"
                        :"=a" (low), "=d" (high));
  return (((unsigned long long) high << 32) + low);
}


If your not swapping (and you shouldn't be), I'd sprinkle getcycle calls
throughput your code and track where the time is.  I'm not that familiar
with java profilling, but it's not to hard to call a c function from
java (unless it's an applet).

So what cpu are you targeting (intel x86? clone?)  Application?  Ram?  
Storage? (flash?)  Vram?  How much?  Video chipset? Any?  Supporting
bitblit?  Even the most basic bitblt support can make radical difference
for anything involving scrolling.

Send us more detail and we might be able to offer other suggestions.

-- 
Bill Broadley
Mathematics
UC Davis