[vox-tech] Unnecessary RDBMS features [was Need your help in recovering database.]

Matt Roper vox-tech@lists.lugod.org
Fri, 25 Apr 2003 14:09:32 -0700


On Fri, Apr 25, 2003 at 09:03:07AM -0700, Tim Riley wrote:
...
> > Regardless of how stable MySQL may be, your database server is still
> > susceptible to power outages, hardware failures, etc.
> 
> My observations have shown that hardware is very reliable, the
> energy crisis is over, and battery power supplies cover gaps well.
...
> Hardware, and more importantly the operating system,
> are much more reliable now than they used to be.
> I can remember when something would fail and
> either the database or operating system would lock up, and we'd have to
> reboot the computer. But Linux has never locked up on me, and neither
> has Mysql. I believe that Mysql has never failed because it doesn't have
> all the gook (transactions, locks, tablespaces, segments, etc.) that causes
> DBMSs to get confused or overheated.

You've been very fortunate.  But you can't count on perfect hardware
because eventually you will run into problems.  I know I've been very
unlucky, but in the past year I've had to replace RAM, two power
supplies, a motherboard, and a hard drive -- clearly hardware defects do
still pop up from time to time.  It's certainly worth your while to take
precautions.

You software may never have locked up, but are you absolutely positive
that there are no hidden bugs that might pop up on some obscure corner
case that you just haven't encountered yet?  The argument "it hasn't
happened to me yet" just doesn't fly in the real world when lots of
money (or possibly even lives) are at stake.

> But what if a crash does occur? Recovery from this might not require
> programming intervention, however, if the application is robust and
> users are trained to query their last transaction and then complete them again.
> But even if the DBMS supports commit/rollback
> transactions, checking the last transaction after a crash is prudent anyway.

But if you don't use database transactions, it might not be possible to
determine whether the last transaction completed or not.  Maybe I run a
store and my database contains both my current inventory and customer
orders.  A customer tries to place an order and something crashes after
the items have been entered in order table, but before their quantities
are updated in the inventory table.  Now when the system comes back up,
the customer sees that his order made it into the database and figures
everything is fine (which it is from his point of view).  However my
inventory numbers are too high and there's no good way for me to detect
this.  Similar problems arise if I update the inventory first and the
order table second.  The problem is that sometimes you can't easily tell
whether a transaction completed by simply requerying the database.

> If a web site crashes, the customer won't receive an order reference number,
> and would probably (hopefully) try again, and generate a new order.

And if you had just updated their balance before the crash?  Now you've
billed them twice for a single order -- you're in big trouble.

> >  Without transactions, your PHP script sends the SQL statements one
> > at a time and the results are immediate.  If the web server crashes
> > (either software crash or hardware crash) between two consequetive SQL
> > statements, only the first one will actually get shipped off to the
> > database server and take effect.  Without transactions, the DB server
> > has no way of knowing that the webserver was planning sending more
> > updates.  Your data is left in an inconsistent state since the database
> > doesn't realize that more commands should have been coming and when you
> > resurrect the web server, it won't know that it stopped halfway through
> > the PHP script.
> 
> Well, this would result in an order record with no corresponding
> line items. But I contend that this situation is better than missing the
> entire order. (A rollback would delete the order.) At least this way
> a phone call could be made to the customer to fix the situation.

Are you serious?  How long do you think places like Amazon would be in
business if a few incidents like this happened?  It's much better to
lose the entire transaction and maintain consistency (e.g. no order, but
no charge either) than to get your data into some inconsistent state
that could cause problems.  Even if you are willing to alienate your
customers and force them to correct you everytime you have a glitch,
it's going to get very costly to go back and hand correct all the
possibly inconsistent accounts.

> > Yes, but if you have a sequence of SQL statements that should be
> > performed as a package (all or nothing), there's no way to ensure that
> > the changes from the first statements only take effect if _every single_
> > statement got through.  If something (web server, DB server, network
> > connection, etc.) dies halfway through, your data is inconsistent.
> >
> 
> Yes, this is the main selling point for transaction processing DBMS's. However,
> I believe that its cost, in CPU and disk cycles, is too expensive, and the likelihood
> of having a database failure is remote on a transaction-less DBMS. I wouldn't
> have thought so myself had I not used Mysql on Linux and Solaris.

There are certainly applications where transactions are unnecessary --
any application where you never have to perform a string of operations
as a package (and a lot of applications fall under this category).  I
would imagine a site like Slashdot wouldn't need transactions since all
you ever do is a "select" (to build a story page) or a single "insert"
(to add a comment or story).  But there are a lot of other applications
(e-commerce only being one) where transactions are absolutely crucial.

> > Who says the database has to be copied to another file on disk for every
> > insert/update/delete?  I think usually the disk would just have the
> > "global version" of the database and all working copies would be stored
> > in RAM.
> 
> I'm making the assertion that commit/rollback copies the status quo to disk
> because that's how Oracle implements it. It uses what they call a rollback
> segment which is stored in one or more files. Changes are made
> to the live database after a copy of the effected rows are applied to the
> rollback segment. If the DBMS encounters a "commit", then the rollback
> segment space is relinquished. If it encounters a "rollback" or if an error occurs,
> the contents of the rollback segment space is copied back to the live database.
> 
> Unfortunately, the rollback segment will fill up, especially during a
> large process like printing checks which does a lot of updates.
> Then you've got a royal mess. Your users become idle and
> you have to find more disk space
> to enlarge the rollback segment. (I won't go into the anxiety of
> deciding which files to delete to make room.)

I would imagine that most transactions are relatively short and your
system would have a fairly small number of these massive update
processes.  The short transactions aren't going to slow you down since
they fit in RAM (in fact they'll probably speed you up as Henry pointed
out -- a single write to disk instead of multiple writes).

But I'm not sure exactly what types of actions you're referring to when
you talk about large processes.  Is it something that could be broken up
into multiple smaller transactions?  If not, you still wouldn't want to
run it without a transaction because customers using your system in the
meantime would see your database in an inconsistent state -- i.e. some
of the rows updated and some not updated.  Their queries should either
show the results of the database before the operation starts or after it
finishes, not some intermediate state (granted, there are applications
where this wouldn't matter, but there are also a lot where it would).

> > One way to implement transactions would be to store a copy of
> > modifications to the database in memory (not a copy of the whole
> > database, just a description of the changes) until the transaction is
> > committed.  When a commit comes along you'd perform the following steps:
> >
> >     * grab a "validation lock"
> >     * see if the updates in the transaction conflict with the updates
> >       from any other transaction that has committed since the current
> >       transaction started; if they do, restart the transaction
> >     * if there are no conflicts, write all the updated data to disk
> >       atomically (this would be journaled somehow to ensure that a crash
> >       during this phase would be easy to recover from)
> >     * release the "validation lock"
> >
> > On the other hand, if the transaction is rolled back, you simply dump
> > your memory copy and there's no disk activity at all.  This is a
> > well-known scheme called "optimistic concurrency control," but there are
> > certainly other transaction implementations that work just as well.
> > Note that there's no unnecessary disk writing -- all the working copies
> > are stored in memory.
> 
> OK. Storing the rollback segment in memory would be a lot faster than
> storing it on disk. In Oracle's case, one could put the rollback
> segment on a RAM disk.  But I can imagine a big update statement
> that'll probably eat it up fast and start the page faults. Then what
> if the swap space gets depleted? Not only is the DBMS down, but all
> the other applications are also.  But I'm being hypothetical. 

I imagine the database would monitor the size of its rollback segments
and write them to disk if/when it became necessary.  But this would
still be very rare (you don't run huge updates too often) and it
certainly wouldn't be every single update/insert/delete.

> Have you used Optimistic Concurrency Control? Strange, but its
> name implies process management.

Another important use of transactions is to make sure that concurrent
transactions to not conflict.  Suppose I run a really large update
statement that takes 10 minutes to complete.  If you do a select five
minutes into my process (maybe you want to sum up some value that I
happen to be updating), you need to either see the database as it
existed before I started or you need to wait until I'm done.  If this
doesn't happen, you're going to get bad data.

> >  I doubt that transactions are the source of this difference.
> 
> Well, if consideration is given to all the work that needs to take place for the
> commit/rollback feature to undo a potentially large amount of work, it's
> conceivable that a DBMS applies a major time chunk to this.

I fail to see what the "large amount of work" is.  Every time you do
some kind of an update, that value needs to be stored somewhere, either
to global storage on disk if you're not using transactions, or to a
working copy in RAM if you do have transactions.  When you rollback,
nothing gets written to disk, so a rollback is much more efficient than
writing some data, realizing you screwed up, and writing the old data
back (assuming that's even possible by this point).  If you commit, you
write the entire transaction block to disk in one shot; as Henry pointed
out, this is more efficient than performing a lot of separate writes.
In the worst case scenario, every single transaction is just a single
update so the additional overhead is an additional memory store of each
changed value.  This is trivial compared the time to write the data to
disk and will not be noticeable.

> >  I've seen lots of benchmarks that
> > indicate that Postgres (which uses transactions) performs about the same
> > or maybe even a little faster than MySQL (without transactions).
> > If transactions were such a performance hit, the performance of MySQL
> > and Postgres wouldn't be so similar.
> 
> Every feature added to a DBMS has to take CPU cycles, and
> Mysql has no features (well, few, but just enough). I would want
> to dissect this comparison further.

It is a well known fact that I/O is the primary factor in the
performance of any centralized database system (in distributed databases
the communication costs can outweigh I/O, but that isn't important
here).  A few extra CPU cycles are unnoticeable compared to the I/O
cost.  Databases are always willing to spend extra CPU cycles if they might
reduce I/O overhead.  Transactions can reduce I/O overhead in a lot of
cases and only add to it on the very large (rare) updates you mentioned
earlier.

> > Probably slightly, but I doubt that the slowdown is as great as you
> > imply.
> 
> Again, I believe the ten-fold number. I think Jeff got it right when he made the
> analogy of a Corvette vs. a Cadillac. I would even extend it by comparing
> a Covette to an Abrams tank. Which would you rather drive?

As I said before, some applications have no need for transactions, but
claiming that transactions are bad across the board (as you initially
did) is not a valid argument.  I actually use MySQL without transactions
for most of my database stuff these days, but I know that transactions
are critical in a lot of applications.

<snip>

> > Transactions should not span "user interactive" time.... they should only
> > last while a (hopefully short) batch of selects, updates, or network
> > communication events take place.  That is, after the user enters
> > everything into temporary storage,
> 
> Having a temporary set of tables with which the user interacts with could
> be problematic. My vote is to design the database such that it maps
> one-to-one to the customer's real world, and to have users interact with the
> live set of tables.

What if multiple users try to interact with the same live tables
concurrently?  Now you've got some concurrency issues to deal with.
While transactions aren't necessary for dealing with these, they can
definitely be more efficient than just locking all the tables,
performing the updates, and unlocking since there's the possibility that
the updates do not conflict and that concurrency is safe.



Matt

-- 

*************************************************
* Matt Roper <matt@mattrope.com>                *
* http://www.mattrope.com                       *
* PGP Key: http://www.mattrope.com/mattrope.asc *
*************************************************