[vox-tech] Assistance With C Program

Daniel Hurt vox-tech@lists.lugod.org
Wed, 19 Nov 2003 12:59:43 -0800


> -----Original Message-----
> From: Tim Riley [mailto:timriley@timriley.net]
> Sent: Tuesday, November 18, 2003 5:39 PM
> To: dwhurt@ucdavis.edu
> 
>
> Subject: Re: [vox-tech] Assistance With C Program
> I've been a C programmer for over a decade, and C's been my only
> language ever since starting. Go ahead and email what you'd like,
> and I'll provide a peer review.

Sent to you and to Micah.  Thanks for looking at it.  The wang-landau.tgz is
the data generation program.  You should be able to type make and it should
compile.  The second file is the data analysis program.  If you would like
to meet and talk about it sometime, let me know when is good for you.

> I don't worry about memory leaks because UNIX/Linux has such robust
> "garbage collection." If the code were Windows, then memory leaks
> remain until reboot.

The problem is that the simulation should take up around 5MB of memory
roughly (depends on allot of factors).  The executable is small, around 10K.
But every so often the simulations memory requirements become huge.  It
takes up around 500MB which forces the computer to start using swap memory
which is death for the simulation.  It would take on the order of a year for
the simulation to finish. 

> Are you using arrays and accessing them like 'a[x]'? This can be
> rewritten using a pointer which will run much faster. I'm not
> sure what you mean about the size of the program. I've never had
> any issues with the number of bytes the size of the executable is.
> How many lines of code is your project?

The project is probably around 800-1000 lines for the data generation
program and then I have written around a 1,500 line program for the analysis
of that data.  I hope the above comment answered the question about the
program size.  It is the accessing of swap that I need to avoid which I
think is due to a memory leak.

A brief synopsis of the computer program:
I have defined two data types.  The first is essentially a pointer to the
next data type which I use to house all the information.  Why do I do this?
Basically, I am not sure how many of these points of information I will have
or where they will be.  So I have to allocate a pointer for every possible
energy level.

I then start the simulation.  When I come across a level, I allocate a
information data structure and set the pointer of that level to point to
that. Hope that makes some sense.

Now onto accessing the array.  I am accessing this 1D array of pointers as
an array.  It the fastest way to do it within the context of the simulation
because I have set it up so the simulation generates the index automatically
and so I do not have to search for the right pointer.  So I have to make one
calculation and I have the exact location.  I tried using a binary tree and
searching for the level instead of indexing and this way is considerably
faster in this application because I already have a natural index.  Also
every access to the binary tree was on the order of 15-30 searches to find
the right level because of the size of the tree which is slow when you are
repeating over 10^10 iterations.  Finally, you have to worry about balancing
out the tree.  A flat array was just easier to program and I think faster
for this application.

> Are you willing to explore the next level in computer science --
> abstract datatypes, cohesion, and coupling? Only with this level
> of commitment can you advance beyond arrays and functions.

I would love to learn.

> Could you explain this a little more?

The data types that I am using are as follows:
################################
typedef struct{
   char visited;
   wlhist_t *node;
}histptr_t;

typedef struct wlnode{
   double energylevel;
   long double edhist;
   unsigned long long mag;
   unsigned long long vacancy;
   unsigned long visits;
   unsigned long vishist;
}wlhist_t;
################################

For instance, the energy of a 10x10 array (small) will vary from 0.00 to
400.00 with some standard set of constants.  However, to create an index and
assure that I am not getting round off error, I multiply by 100 and convert
to an int.  So now I have levels from 0 to 40,000.  My pointer array takes
up about 750kb, since the simulation about 1% of available levels, total
size of the program is about 1.5MB for a small lattice which should be more
than manageable.  But as I was saying earlier, every so often it will take
up all of the available memory (has happened on multiple computers running
different compilers) and started using swap which just kills the program.  

Sorry for the long response and thanks,
   
Dan