[vox-tech] perl db question

Bill Broadley vox-tech@lists.lugod.org
Tue, 12 Mar 2002 22:50:10 -0800


I won't go into details, but in summary I'd strongly recommend
a SQL database for anything much more complicated then a small table
or two.  I'll list some of the things you should consider if your
going to do it yourself:

1.  What do you do if you add a column?  Write a program to migrate
    your data from the old format?  How many lines of code?
2.  What happens if *gasp* you want to allow more then one user/process
    access the data?  Locking is tricky, painful to get right, painful
    to track down when you get it wrong, and can result in lost data.  
3.  What happens if a table doesn't fit in memory?  What if it does?
    Getting a nice balance of speed, performance and robustness is
    tricky.  Ideally buffer sizes would change dynamically as needed,
    and handle paging to/from disk as needed, complete with safechecking
    in case of a powerfailure.
4.  What happens if your index isn't fast enough?  What data structure
    will you use?  Hashtables?  How big are the bins?  Which hash
    function?  What if part doesn't fit in memory?  Ideally these would
    selfadjust.
5.  New kinds of datatypes, timestamps, strings, blobs, arrays,
    datetime, date, timezones etc are all a pain to handle, not to
    mention date calculations, searches, and similar.  How much code will
    you use?  Will you have y2k problems? Y2010?  Y2038 (unix epoch).
6.  New reports, ways to find records, new ways to view data, etc.
7.  New functionality, say emailing customers who haven't bought something
    in 90 days.  Maybe a most reliable cars report based on repairs per
    car.
8.  Recovery tools, indexs can and do get out of sync, files corrupt,
    even backups are tricky for live databases, are you going to write
    tools to check the validity of a row/column?  Rebuild indexes?  Check
    for illegal values?
9.  Things like holding strings can be tricky, bounds checking, insuring
    strings don't leak into the next column, insuring that control/escape
    characters don't cause confusion, unicode support, string sorting by
    partial columns etc.

Each of the above can take a substantial amount of developer time,
substantial number of lines of code, bugs, and time to track them down,
losing data can be very annoying to users.  Lost confidence, etc.
Rebuilding indexes because a rule, data, whatever changed unexpectedly
isn't fun.

In my experience small projects often waste much more time reimplementing
a database, then it would to just support a real relational database
up front.  Spend your time on the unique stuff, the next level of
complexity, whatever it maybe will likely be easier with a database.
SQL syntax can be quite handy... use it...  Hell I even write my apache
logs to a sql database and I find it quite handy.

It gives me higher density (more logs per MB), greater flexibility when
I want to find something out quick.  Anything from which bots searched
my webserver today, what are the top-10 broken url's, top 10 url's,
top 10 refers, tracking the progress of the various web-borne virus's etc.

It populates 6 relational databases, storing each agent, host, refer, url
etc only if it's unique.  Not bad for 45 lines of perl or so.

-- 
Bill Broadley
Mathematics/Institute of Theoretical Dynamics
UC Davis