[vox-tech] perl db question

Ted Deppner vox-tech@lists.lugod.org
Mon, 11 Mar 2002 22:44:57 -0800


On Wed, Mar 06, 2002 at 04:45:23PM -0800, Peter Jay Salzman wrote:
> i have to admit, having a human readable flat file is kind of sexy.

Yes it's nice sometimes.  This makes much sense on things that need to be
a certain way, ie HTTP server logs, where the data will never be iterated
over repeatedly nor individual lines changed or deleted from.

> my application is to store dates that certain work is done on a vehicle.
> like a list of all dates/odometer readings of oil changes, brake pad
> inspections, battery electrolyte checks, coolant flushes and stuff
> like that.

Have you studied relational databases yet?  If there are well defined
relationships between those datum you listed, than an RDBMS is probably a
better choice than so many flat files -- even if you index them.

> i don't think this can compare with what commerce sites use databases
> for, but after the years roll by, i'm sure the databases can reach a
> respectable size.  conceivably, someone could keep records on the same
> vehicle for a decade.

It won't be years.  If you do *any* sort of deleting, changing or aging of
data, you'll be reparsing and reindexing those flat files.  That will
start to noticably slow you down with just thousands of records.  Note
that as your program develops you'll probably reference the dataset
repeatedly, meaning that a 0.1 second operation done 20 times to display a
single report page is now taking 2 seconds.  If that's the "next page"
option, 2 seconds is a long time to wait.

> what do you think?  i'd rather stay away from a server based DB system.
> would DBD::CSV be ok for this sort of thing?

DBD:: is nice. I've used it against MS-SQL (think Sybase) and MySQL
servers quite a bit.

I don't know about DBD::CSV.  If you can't run sql against it I'd say no.

Real world.  I worked on a home grown accounting system for a period of
several years.  It started as un-indexed flat files.  It became slow as we
hit thousands of records (totalled across the whole system).  It became
unbearable as we hit 100,000 records.  Billing was taking nearly 24 hours
to run on just 1,000 customer accounts.  We'd already cached the files in
memory arrays at that time too...

Indexing the flatfiles would have helped.  However, since it was flat
files, those datum were cycled through to find the right line.  An
indexing scheme useing Berkeley DB can only hold one index, and usually
the keys must be unique (these are not always true, but traditionally
are).  This means if the key is first name, you can't have two Joe's.  If
the key is "FirstnameLastname" you can't have two John Smiths.  And so on.
If the key is on phone number, you can't search on last name.  And so on.

Addtionally, if you do your own flat file and/or indexed file, all the
"logic" is on you to glue your program space to your data.  Data
management is still your job.

Enter SQL.  Yes, the servers can be a problem to build and maintain
(Oracle, MSSQL, Sybase), but can also be quite easy and still featureful
(MySQL), and problem free.

With SQL the data storage and retrieval are abstracted from your program.
You say "give me all the Smiths in Auburn who owe us money for more than
60 days" and you get EXACTLY that data back.  You are free to improve your
code, UI, etc... and not worry about "how am I going to get that data from
my files quickly and efficiently".

I migrated that accounting system from a flatfile data layer to MySQL.  24
hours run time turned into 15 minutes run time.  All I did was replace all
the data access module function calls with appropriate simple SQL.  I
could have gotten another order of magnatude improvement by modifying the
application code to ask more specific questions of the data layer.


Use SQL.  Use DBI so you can change SQL servers later if you find the
need (your app gets bought by a dotcom).  You won't regret it.

I'm off the soapbox.  Hope it helps.

-- 
Ted Deppner
http://www.psyber.com/~ted/