[vox-tech] Unnecessary RDBMS features [was Need your help in recovering database.]

Tim Riley vox-tech@lists.lugod.org
Fri, 25 Apr 2003 18:16:48 -0700


Matt Roper wrote:

> On Fri, Apr 25, 2003 at 09:03:07AM -0700, Tim Riley wrote:
> ...

> <snip>

>
> You software may never have locked up, but are you absolutely positive
> that there are no hidden bugs that might pop up on some obscure corner
> case that you just haven't encountered yet?

I haven't inspected the Mysql code, but the beauty is in its simplicity. Also,
it's had 8 years of time testing.

>  The argument "it hasn't
> happened to me yet" just doesn't fly in the real world when lots of
> money (or possibly even lives) are at stake.

My experience is all I have to contribute. I haven't perform any benchmarks
because I have nothing to sell. If you think that something bad will happen
if Mysql is used, well...

>
> <snip>
>
> But if you don't use database transactions, it might not be possible to
> determine whether the last transaction completed or not.

Well, there is a way to determine order completion.
Usually an order reference number is assigned at the beginning
of the order entry process and the payment information is the last to
be collected. If there's an order reference number but no payment information,
then the order is incomplete. But whether the cause of the incomplete order
was because of customer abandonment or system failure, as far as the
database is concerned, it doesn't matter. All of the fulfillment processes
can still function correctly.


>  <snip>

> And if you had just updated their balance before the crash?  Now you've
> billed them twice for a single order -- you're in big trouble.

I think you can stay out of trouble at the application level. For example,
have all the updates take place as the second to the last step in the process,
right before the all-complete event is set. Note: this is one of the two critical
regions created using a transaction-less DBMS. Therefore, the judgment
call of whether or not to use the overhead
of commit/rollback can be measured as the likelihood of a crash taking
place right there.

Moreover, I'm making the assertion that a transaction-less DBMS
is less (much less) likely to crash, causing this problem. It is true
that hardware failure could cause a crash. But how many times
has that happened in the past year?

The cost of the overhead in commit/rollback is all the unnecessary
copying that goes on everyplace else outside this region when an
insert, update, or delete statement is executed.

I guess the ideal situation would
be to have the transaction processing available, but turned off everywhere
but this critical region.

> <snip>

>
> > Well, this would result in an order record with no corresponding
> > line items. But I contend that this situation is better than missing the
> > entire order. (A rollback would delete the order.) At least this way
> > a phone call could be made to the customer to fix the situation.
>
> Are you serious?  How long do you think places like Amazon would be in
> business if a few incidents like this happened?  It's much better to
> lose the entire transaction and maintain consistency (e.g. no order, but
> no charge either) than to get your data into some inconsistent state
> that could cause problems.

The application could be designed so that nothing bad occurs if
there's a partial order. Whether or not Amazon takes the time to
call customers who've suffered do to technical problems is their
decision.

>  Even if you are willing to alienate your
> customers and force them to correct you everytime you have a glitch,
> it's going to get very costly to go back and hand correct all the
> possibly inconsistent accounts.
>

One who questions the need for rollback/commit is not advocating
customer alienation. Also, after a crash, it's prudent for every
internal computer user to check their last transaction for completeness,
whether commit/rollback transaction processing is used or not.
It's not costly.

>
> <snip>

>  But there are a lot of other applications
> (e-commerce only being one) where transactions are absolutely crucial.

Because of the stateless nature of http and the browser, the e-commerce
application has to take into account the
concept of a partial transaction. The controls that are built into the application
to handle this stateless nature make commit/rollback unnecessary, except
for that critical region at the very end.

>
> <snip>

>
> > Unfortunately, the rollback segment will fill up, especially during a
> > large process like printing checks which does a lot of updates.
> > Then you've got a royal mess. Your users become idle and
> > you have to find more disk space
> > to enlarge the rollback segment. (I won't go into the anxiety of
> > deciding which files to delete to make room.)
>
> I would imagine that most transactions are relatively short and your
> system would have a fairly small number of these massive update
> processes.  The short transactions aren't going to slow you down since
> they fit in RAM (in fact they'll probably speed you up as Henry pointed
> out -- a single write to disk instead of multiple writes).

A transaction is a transaction. I'm not making any assumptions about
it size. No matter the size, we are at the mercy of the DBMS as to
how it goes from the SQL statement to the disk.

How about referencing my point that the rollback segment fills up and
it causes a big hassle. Believe me when I say that I tried to tweak the
check printing process as best as I could to not fill it up.

But maybe disk space is now so plentiful that rollback segments don't fill up.
If that's your experience, I'd like to hear about it.

>
>
> But I'm not sure exactly what types of actions you're referring to when
> you talk about large processes.  Is it something that could be broken up
> into multiple smaller transactions?

Yes, breaking up the update statements into smaller chunks is how to
counter the hassles of rollback segments filling up. It has caused a lot
of work and many days of wishing rollback segments would go away.

>  <snip>

>
> Another important use of transactions is to make sure that concurrent
> transactions to not conflict.  Suppose I run a really large update
> statement that takes 10 minutes to complete.  If you do a select five
> minutes into my process (maybe you want to sum up some value that I
> happen to be updating), you need to either see the database as it
> existed before I started or you need to wait until I'm done.  If this
> doesn't happen, you're going to get bad data.

Yes, not having commit/rollback exposes the risk of selecting columns
having bad summations before an update is complete. This is the second
critical region. If this process *had* to be run during business hours, I
would put a lot of effort into its efficiency. As an alternative, instead of
having the user query the summation field, have the user execute a
"select sum()" instead. With properly set indexes, you'd be amazed as
to how fast Mysql performs this. (Personal experiences available upon
request.)

>
>
> <snip>
>
> I fail to see what the "large amount of work" is.  Every time you do
> some kind of an update, that value needs to be stored somewhere, either
> to global storage on disk if you're not using transactions, or to a
> working copy in RAM if you do have transactions.

My readings have taught me that the rollback segment is stored on disk.
The "large amount of work" is all the copying of the status quo to the
rollback segment that takes place when an insert, update, or delete
statement is executed.

>  When you rollback,
> nothing gets written to disk, so a rollback is much more efficient than
> writing some data, realizing you screwed up, and writing the old data
> back (assuming that's even possible by this point).

Yes, I used to type "rollback" to undo changes when I made a mistake
with a where clause in an update statement at the command line.
I used it as a crutch. If I hadn't had it available, I'm sure I would have
selected the rows more carefully to build the where clause before
executing the update.

> <snip>
> >
> > Every feature added to a DBMS has to take CPU cycles, and
> > Mysql has no features (well, few, but just enough). I would want
> > to dissect this comparison further.
>
> It is a well known fact that I/O is the primary factor in the
> performance of any centralized database system (in distributed databases
> the communication costs can outweigh I/O, but that isn't important
> here).

Yes, I could have worded my assertion about features taking CPU cycles
better. Actually features take a lot of disk cycles too, especially
commit/rollback. But my point is to question the assertion that
Postgres with transactions is as fast as Mysql without.

>  <snip>
>
> As I said before, some applications have no need for transactions, but
> claiming that transactions are bad across the board (as you initially
> did) is not a valid argument.

My claim is that commit/rollback transaction processing is unnecessary,
and it slow things down.

Bad across the board? I have since identified one critical region
where database consistency is at risk, and another critical region
where rollback/commit might be useful. But in either case, not having
rollback/commit is managable. Also, it is my claim that commit/rollback
transactions are I/O expensive, and not having the feature makes
database crashes less likely.

>  <snip>
> >
> > Having a temporary set of tables with which the user interacts with could
> > be problematic. My vote is to design the database such that it maps
> > one-to-one to the customer's real world, and to have users interact with the
> > live set of tables.
>
> What if multiple users try to interact with the same live tables
> concurrently?  Now you've got some concurrency issues to deal with.
> While transactions aren't necessary for dealing with these, they can
> definitely be more efficient than just locking all the tables,
> performing the updates, and unlocking since there's the possibility that
> the updates do not conflict and that concurrency is safe.

It's OK for multiple users to interact with the same live table concurrently,
even the same row, even to make changes. The reason against having
locks is that these multiple users are unlikely to mess up the same column in
a single row. This is because at that moment in time, there is only one right
value for that column, and if two people update that column at the same
time with the right value, what's the harm?

>

<snip>