[vox-tech] Memory addressing?

Bill Broadley bill at broadley.org
Wed Jun 23 18:56:46 PDT 2010


On 06/23/2010 11:36 AM, Ken Bloom wrote:
> On a 32-bit machine, this will eat up most of the computer's address
> space, including *all* application space, and some kernel space (so you
> can expect things to segfault).

Assuming no PAE.  With PAE often the kernel and other user processes 
would be in different memory ranges.  So the kernel and other apps could 
be perfectly happy while you fill your particular segment.


> On a 64-bit machine, it should work
> though.

Nope, at least with the normal defaults stack segments tend to be less 
than 4GB.

On a 32GB ram 64-bit machine:
$ ./a.out
Segmentation fault

> Stuff gets pushed on the stack after the end of the array, approximately
> 1GB into memory. This means you have a stack that's 1GB long when you
> get inside printf, which is seriously much more than the OS is prepared
> to let you have for your stack. (This will overflow the stack whether
> your machine is 32-bits or 64-bits.) Try mallocing your array instead.

Ya, definitely should work since that won't be on the stack.

#include <stdio.h>
#include <stdlib.h>

int *table;

#define big 1000000000

int main( void )
{
       int i;
       table = (int *) malloc(sizeof(int)*big);
       for (i=0;i<big;i++)
       {
	table[i]=i/2048;
       }
       printf ("%d\n",table[4096]);
       free(table);
       return 0;
}

$ ./a.out
2

> (Oh and change that printf to be printf("%px Hello world\n",table); so
> you can see whether the allocation is working.)
>
> If you're looking at an IDS that big, you're going to need to find an
> appropriate caching data structure that can write the infrequently used
> parts out to disk. Or you're going to have to find some other way of
> minimizing the number of packets you're keeping track of at a time.

Exactly.

I'd suggest:
1) Don't require contiguous memory (you get much more that way)
2) Plan on parallel access (not a single thread/core)
3) Don't ignore pages, cache lines, L1/L2/L3 cache, etc.

So ignoring the cost of the hash calculation a memory lookup takes 
approximately 100ns.  If you do 8 or 16 in parallel you could get an 
effective latency (or throughput if you prefer) of 10ns.  Of course if 
you can get into L3 cache the latency and bandwidth will go up an order 
of magnitude.  Of course getting into L1/L2 will get you another magnitude.

So basically for optimal performance you'll end up with a multi-level 
datastructure that allocates chunks of memory much less than 4GB.  After 
all for any IDS the number of active sessions in any short time is going 
to be MUCH MUCH MUCH less than 2^96.  To keep that in perspective if 
your IDS handled a billion machines sending 1000 packets a second for 2 
billion years it still would be less than 2^96.

So, sure a hash of the 2^96 space is easy to write and plenty for 
smaller installations as you scale you are going to have to get 
significantly more efficient.


More information about the vox-tech mailing list