[vox-tech] Unnecessary RDBMS features [was Need your help in recovering database.]

Fri, 25 Apr 2003 09:03:07 -0700

Matt Roper wrote:

> On Thu, Apr 24, 2003 at 07:23:33PM -0700, Tim Riley wrote:
> > > If either the web server or database server goes down while you're
> > > adding Order_Items, you'll have an incomplete order, with no way of
> > > knowing that it is incomplete. This is Bad(TM).
> >
> > Mysql 3.22 has never failed me. (Now running out of disk space would
> > present a problem, of course, but a frequent grepping of df should
> > forestall this. But even if it's not prevented, there's no solution anyway.)
> > But it doesn't take a crash for this event to occur--the customer could
> > abandon the transaction in the middle. Having commit/rollback doesn't
> > affect this.
>
> Regardless of how stable MySQL may be, your database server is still
> susceptible to power outages, hardware failures, etc.

My observations have shown that hardware is very reliable, the
energy crisis is over, and battery power supplies cover gaps well.

>  The idea behind
> transactions is that they're "all or nothing."  The idea behind
> transactions is not to make sure that a single query succeeds, but that
> a sequence of queries all succeed or fail together.

> > > To make matters worse, there's the following chicken-or-egg problem:
> > > You want to charge the customer's credit card for the order, but not
> > > unless you're sure that the order will be in the database
> >
> > How can the credit card number be selected from the customer's order if
> > it's not in the database?
>
> The credit card number might be in the database already, but the actual
> list of items ordered are not.  But instead of using the credit card
> example, let's just suppose you're maintaining the user's account
> balance in the database.  When the user makes an order, you need to
> perform two operations:  perform an "insert" on the order table and
> perform an "update" on the "account" table.  If you perform these
> queries without transactions, there's the possibility that something
> will happen between the two SQL statements (power outage, etc) and only
> the first one will actually be recorded on the hard drive.

This is the main selling point for commit/rollback transaction processing. But is
it necessary?

Hardware, and more importantly the operating system,
are much more reliable now than they used to be.
I can remember when something would fail and
either the database or operating system would lock up, and we'd have to
reboot the computer. But Linux has never locked up on me, and neither
has Mysql. I believe that Mysql has never failed because it doesn't have
all the gook (transactions, locks, tablespaces, segments, etc.) that causes
DBMSs to get confused or overheated.

But what if a crash does occur? Recovery from this might not require
programming intervention, however, if the application is robust and
users are trained to query their last transaction and then complete them again.
But even if the DBMS supports commit/rollback
transactions, checking the last transaction after a crash is prudent anyway.

If a web site crashes, the customer won't receive an order reference number,
and would probably (hopefully) try again, and generate a new order.

>  If you
> update the account balance first, the user will have been billed for
> items, but since you never got around to placing those items in the
> order table, they'll never be shipped.  On the other hand, if you place
> the items in the order table first and then the crash occurs, the user
> will wind up getting the items for free (i.e. when you resurrect the
> server, you have no way of knowing that the account balance was never
> updated).

>
>
> > > , and
> > > then be unable to delete the order because, right at that very moment,
> > > the web server crashed
> >
> > A web server is a stateless machine--no connection is maintained between
> > mysql and the browser like say ftp. If apache crashed in the middle of an order,
> > the customer's order will be incomplete with or without commit/rollback.
> > Moreover, this is the same situation if the customer abandoned the order
> > in the middle--a very natural business situation.
>
> You're missing Micah's point.  Suppose the web application is written in
> PHP.

OK, I see it: PHP is statically linked to Apache, not used as a CGI script.

>  Without transactions, your PHP script sends the SQL statements one
> at a time and the results are immediate.  If the web server crashes
> (either software crash or hardware crash) between two consequetive SQL
> statements, only the first one will actually get shipped off to the
> database server and take effect.  Without transactions, the DB server
> has no way of knowing that the webserver was planning sending more
> updates.  Your data is left in an inconsistent state since the database
> doesn't realize that more commands should have been coming and when you
> resurrect the web server, it won't know that it stopped halfway through
> the PHP script.

Well, this would result in an order record with no corresponding
line items. But I contend that this situation is better than missing the
entire order. (A rollback would delete the order.) At least this way
a phone call could be made to the customer to fix the situation.

>
>
> > >
> >
> > > , and you have no contextual information to tell
> > > you later that that order shouldn't be in there because you haven't
> > > been paid for it yet.
> >
> > I think the meaning here is that because apache crashed, there's
> > no paper trail to collect payment.
> >
> ...
> > > You must be server-psychic. Can you explain to me how you would catch
> > > the errors above, in a reliable fassion?
> >
> > Did you see NBC last night exposing psychic frauds? Anyway, the issues
> > you raised were either resource failures or natural application events.
> > If disk resources are monitored and expanded as needed, mysql will
> > succeed to execute every valid SQL statement.
>
> Yes, but if you have a sequence of SQL statements that should be
> performed as a package (all or nothing), there's no way to ensure that
> the changes from the first statements only take effect if _every single_
> statement got through.  If something (web server, DB server, network
> connection, etc.) dies halfway through, your data is inconsistent.
>

Yes, this is the main selling point for transaction processing DBMS's. However,
I believe that its cost, in CPU and disk cycles, is too expensive, and the likelihood
of having a database failure is remote on a transaction-less DBMS. I wouldn't
have thought so myself had I not used Mysql on Linux and Solaris.

<snip>

Matt Roper wrote:

> On Thu, Apr 24, 2003 at 07:23:52PM -0700, Tim Riley wrote:
> ...
> > My observations have shown a significant performance hit. Consider what happens:
> > for every insert, update, or delete, the database's status-quo is copied to
> > another file on the disk.
>
> Who says the database has to be copied to another file on disk for every
> insert/update/delete?  I think usually the disk would just have the
> "global version" of the database and all working copies would be stored
> in RAM.
>

I'm making the assertion that commit/rollback copies the status quo to disk
because that's how Oracle implements it. It uses what they call a rollback
segment which is stored in one or more files. Changes are made
to the live database after a copy of the effected rows are applied to the
rollback segment. If the DBMS encounters a "commit", then the rollback
segment space is relinquished. If it encounters a "rollback" or if an error occurs,
the contents of the rollback segment space is copied back to the live database.

Unfortunately, the rollback segment will fill up, especially during a
large process like printing checks which does a lot of updates.
Then you've got a royal mess. Your users become idle and
you have to find more disk space
to enlarge the rollback segment. (I won't go into the anxiety of
deciding which files to delete to make room.)

>
> One way to implement transactions would be to store a copy of
> modifications to the database in memory (not a copy of the whole
> database, just a description of the changes) until the transaction is
> committed.  When a commit comes along you'd perform the following steps:
>
>     * grab a "validation lock"
>     * see if the updates in the transaction conflict with the updates
>       from any other transaction that has committed since the current
>       transaction started; if they do, restart the transaction
>     * if there are no conflicts, write all the updated data to disk
>       atomically (this would be journaled somehow to ensure that a crash
>       during this phase would be easy to recover from)
>     * release the "validation lock"
>
> On the other hand, if the transaction is rolled back, you simply dump
> your memory copy and there's no disk activity at all.  This is a
> well-known scheme called "optimistic concurrency control," but there are
> certainly other transaction implementations that work just as well.
> Note that there's no unnecessary disk writing -- all the working copies
> are stored in memory.

OK. Storing the rollback segment in memory would be a lot faster than storing
it on disk. In Oracle's case, one could put the rollback segment on a RAM disk.
But I can imagine a big update statement that'll probably eat it up fast
and start the page faults. Then what if the swap space gets depleted? Not only is
the DBMS down, but all the other applications are also. But I'm being
hypothetical. Have you used Optimistic Concurrency Control? Strange, but its
name implies process management.

>
>
> > I think this is why benchmarks have shown Mysql to be ten times faster
> > than Oracle.
>
> Do you have a link to these benchmarks?

No. I just remember this number from a review about 4 years ago.

>  I doubt that transactions are
> the source of this difference.

Well, if consideration is given to all the work that needs to take place for the
commit/rollback feature to undo a potentially large amount of work, it's
conceivable that a DBMS applies a major time chunk to this.

>  I've seen lots of benchmarks that
> indicate that Postgres (which uses transactions) performs about the same
> or maybe even a little faster than MySQL (without transactions).
> If transactions were such a performance hit, the performance of MySQL
> and Postgres wouldn't be so similar.

Every feature added to a DBMS has to take CPU cycles, and
Mysql has no features (well, few, but just enough). I would want
to dissect this comparison further.

>
>
> > >  The database system that we use in our shop
> > > (intersystems Cache) has no problem with gigabytes of database and thousands of
> > > updates daily, with full rollback functionality.
> >
> > Oh, they function--they don't break the database--they just slow things down.
>
> Probably slightly, but I doubt that the slowdown is as great as you
> imply.

Again, I believe the ten-fold number. I think Jeff got it right when he made the
analogy of a Corvette vs. a Cadillac. I would even extend it by comparing
a Covette to an Abrams tank. Which would you rather drive?

But maybe your commit/rollback DBMS is not too slow. At the time I worked
on an Oracle DBMS, I didn't think so either.

>
>
> > >  There's no good reason why
> > > the SIZE of a database has anything to do with it's ability to audit db
> > > transactions (commit/rollback).
> > >
> > > We have been doing testing of DB systems, when we encounter strange errors.
> > > So, we take a current snapshot of the db, rollback to just before the time that
> > > the error occured, and step one by one through transactions until the error
> > > occurs.
> >
> > I don't think the software technology of commiting/rollbacking transactions is
> > providing this functionality. Once a transaction is commited, it can't be rolled
> > back, and how can you take a snapshot of uncommitted transactions in a database?
> > RDBMS' have a snapshot feature, but its purpose is to synchronize multiple
> > databases,
> > which sounds like what you're doing here.
>
> I agree here; transactions aren't the mechanism to use for taking DB
> snapshots.

I think I misunderstood the feature that Jan was promoting. My response
was based upon my understanding of how Oracle's snapshot function behaved
5 years ago. Maybe snapshots can be taken of uncommitted transaction
now. But once again, there's a major cost for this feature. And if
transactions were suddenly unavailable, I'm sure another cleaver form of
debugging would get created using a snapshot.

<snip>

Jeff Newmiller wrote:

> On Thu, 24 Apr 2003, Tim Riley wrote:
>
> >
> >
> > "Micah J. Cowan" wrote:
> >
> > > On Thu, Apr 24, 2003 at 01:48:04PM -0700, Tim Riley wrote:
> > > > Commit/rollback requires that a copy be made of each datum involved
> > > > in an insert, update, or delete statement. This is very expensive. But
> > > > why make the copy? I know my insert, update, or delete is the correct
> > > > thing to do at the time.
> > >
> > > But what about the quite frequent case where what you're doing is the
> > > correct thing to do *only* if the next transaction you're about to do
> > > succeeds?
> > >
> > > There are many cases where it would be a bad thing to let the
> > > transaction you just completed remain effective if the next one
> > > fails. If the server shuts down right between the two, you are left
> > > with quirky data.
> > >
> > > Example: suppose you need to insert a customer's order for a set of items in
> > > your e-commerce solution. Your solution is to add a row to the Order
> > > table, plus one row in the Ordered_Items table for each item ordered
> > > (so Shipping can determine what to ship). You link the two by ID in
> > > another table, so you can find the Ordered_Items by the Order, and
> > > vice-versa.
> > >
> > > If either the web server or database server goes down while you're
> > > adding Order_Items, you'll have an incomplete order, with no way of
> > > knowing that it is incomplete. This is Bad(TM).
> >
> > Mysql 3.22 has never failed me. (Now running out of disk space would
> > present a problem, of course, but a frequent grepping of df should
> > forestall this. But even if it's not prevented, there's no solution anyway.)
>
> Anything that interferes can cause problems.... network disconnection,
>
> > But it doesn't take a crash for this event to occur--the customer could
> > abandon the transaction in the middle. Having commit/rollback doesn't
> > affect this.
>
> Transactions should not span "user interactive" time.... they should only
> last while a (hopefully short) batch of selects, updates, or network
> communication events take place.  That is, after the user enters
> everything into temporary storage,

Having a temporary set of tables with which the user interacts with could
be problematic. My vote is to design the database such that it maps
one-to-one to the customer's real world, and to have users interact with the
live set of tables.

> the final group of related operations
> should all be invoked and results committed as a result of one user input
> event (proceed button).  This is no different than locking.

>
> Nevertheless, elements involved during this sequence of operations may
> include multiple computers separated by an undefined number of network
> links, and supported by an unspecified range of power sources, and subject
> to unspecified product reliability factors, as well as multitasking shared
> access to a common data store.

OK. I'm limiting my assertion to a single database instance. For example,
Goggle has thousands of machines in a neuro-networked configuration which requires a
Ph.D. to understand. (I've checked their job listings.) But if these multiple
computers you're referencing act as clients to a single database server,
then I still contend that commit/rollback is unnecessary and slows things down.

>
>
> Locking can require sophisticated algorithms to cover all these cases...
> producing the net effect of transactions. I prefer to let someone else
> figure out how to implement the transactions, and simply specify that they
> are required.  You apparently don't... well, I don't like to argue
> religion either.

I don't understand?

>

>
>
> > > To make matters worse, there's the following chicken-or-egg problem:
> > > You want to charge the customer's credit card for the order, but not
> > > unless you're sure that the order will be in the database
> >
> > How can the credit card number be selected from the customer's order if
> > it's not in the database?
> > > - customers
> > > get very irate when charged for a service they have not
> > > received. OTOH, it'd be just as bad to put the order in there, then
> > > charge the customer, only to find out the credit card info is bad
> >
> > A credit card rejection is a very natural event in the business world.
> > The application should be able to handle this with an update statement,
> > and committing/rollbacking would not affect this.
>
> A rollback is a lot simpler to use at the application level... so it is
> more likely to be used.
>
> >
> > > , and
> > > then be unable to delete the order because, right at that very moment,
> > > the web server crashed
> >
> > A web server is a stateless machine--no connection is maintained between
> > mysql and the browser like say ftp. If apache crashed in the middle of an order,
> > the customer's order will be incomplete with or without commit/rollback.
> > Moreover, this is the same situation if the customer abandoned the order
> > in the middle--a very natural business situation.
>
> See above.  For a web server handling a lot of transactions, _any_ time
> the server fails it will be likely to interfere with a transaction.
> And, again, transactions should not be applied to trans-connection
> recovery.

I don't understand?

>
>
> > > , and you have no contextual information to tell
> > > you later that that order shouldn't be in there because you haven't
> > > been paid for it yet.
> >
> > I think the meaning here is that because apache crashed, there's
> > no paper trail to collect payment.
>
> Yes, in the real world screwing the supplier is frowned on at least as
> much as screwing the customer.
>

I don't understand?

>
> > > The only way I know of to resolve the situation
> > > is:
> > >
> > >   1. Create the order in the DB.
> > >   2. Attempt to charge the card.
> > >   3. If the charge was successful, commit the Order and all the
> > >      Order_Items.
> > >   4. If the charge was unsuccessful, rollback.
> >
> > A credit card authorization rejection should be an update to the
> > customer or order table as such. The record of the attempted order should
> > remain in the database to provide customer service to the unfulfilled customer.
>
> True, but the issue of how to update the "current state" of the order
> remains.  Even if you parse the entire transaction history every time you
> want to figure that information out, you will need to lock multiple tables
> to do so, and then you are dealing with the Dining Philosopher's problem.

The Dining Philosopher's problem solves the problem of allocating
scarce resources to many consumers (Philosophers eating rice with
not enough chopsticks), while avoiding a deadlock. However, commit/rollback
solves a different problem. The subject seems to be deviating.

>
> You might go a long time without encountering access conflict, but that
> just makes any bugs in your code harder to find.

>
> Transactions address these issues with tested algorithm implementations.
>
> > > > If I'm making a mistake to the database,
> > > > I'll fix it when I catch it.
> > >
> > > You must be server-psychic. Can you explain to me how you would catch
> > > the errors above, in a reliable fassion?
> >
> > Did you see NBC last night exposing psychic frauds?
>
> Is that relevant, or just a snide comment? Please stay out of the gutter.
>

It was a reaction. Lighten up.

>
> > Anyway, the issues
> > you raised were either resource failures or natural application events.
> > If disk resources are monitored and expanded as needed, mysql will
> > succeed to execute every valid SQL statement. However, if an application
> > needs to respond to naturally occuring but abnormal business events,
> > commit/rollback will not help.
>
> Commit/rollback won't help user-level issues, but it is a lot easier to
> update multiple tables with it than with locking, and both resource
> failures and shared access contention are covered.
>

Locking is a separate feature and independent of commit/rollback. I don't
think DBMS locking is necessary, but that's another subject.

>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
> ---------------------------------------------------------------------------
>
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech