[Gllug] A programming question. long.

Thu Jul 5 21:37:56 UTC 2001

On Thu, Jul 05, 2001 at 04:59:23AM -0700, David Irvine wrote:
> I was talking to Bruce last night about the small
> prediciment I am in - I have a database with about 3
> million rows in the main table.  Theres about 100-150
> users accessing this data,  they want to copy the top
> row off the table and into the application and mark it
> as dealt with.
> 
> open connection
>  select the top 1 * from the table where the date they
> should be called by is in the past.
> copy data
> delete row
> close connection
> 
> then if for example we need to call them again, we
> just insert the row back in with a suitable date, for
> example in a weeks time.
> 
> What i need to do is make sure that nobody gets the
> same row, this can only happen if somebody accesses
> the table with a select statement before the row the
> previous user deletes the row from the table.
> 
> so we need some sort of locking. Enter the spin lock.

No, you don't.  You need server-side rules.  Whenever your data *has* to
behave in a certain way, you use server-side rules.  That the database
itself guarantees the integrity of the data *before* passing it on to
the client app and all the application has to do is present it.

One way to do it with triggers.  If you attach a trigger to a table then
it is run for each row that is selected/deleted/updated (whichever event
you tied the trigger).  A select trigger affects what data (if any) a
row returns to the select query - so a row can decide, for example, to
hide itself and not be seen.

That offers one solution since we're only talking about 1 row each
time, which makes triggers a waste since they run once for every record
affected by the query.

I think the simplest way to do this would be with a view.  Don't show
them the real table, let them query a view.  A view looks like a table
but actually runs a select query (which could pull data from many
different tables, of course).  So when they look at the table you can
run a select query which shows them only the one relevant record.  

Here's a way to do it based on the outline you have given:

Add a "locked" column to the underlying table and have the view's select
query specify "WHERE locked = false".

Attach a select trigger to the view.  It is important to realise that
the trigger will apply to the data the view *returns* to a select query
done on the view and is totally seperate from the select query run to
"create" the view contents.  So long as the underlying select query is
guarantueed to return a maximum of one record - the top 1, as you
specified - then the trigger will only run once (only one record and the
right record at that).

Now, this trigger could examine the data returned, go to the original
table and mark the corresponding record as "locked" - so that record
will *not* be returned when anybody else looks at the view.

To *guarantee* that no two people see the same record, you need to run
this inside a transaction and perform an exclusive table lock - this is
the only way to ensure that concurrent accesses don't return the same
record (by blocking concurrent access).

That's it, basically.  You *could* do the above in the underlying table
- here's why not:

1.  Separating the functions makes the logic easier.  For example,
because the view's select query (if properly constructed) either returns
the one correct record or nothing, the trigger is only run once and
doesn't have to do any validation.

2.  If you put the locking trigger into the underlying table then it
runs every time *any* process selects from the table.  By placing it in
the client view, you guarantee that it only runs at the appropriate
point.

3.  It means you can hide the underlying table from the client - or at
least not give them update permissions - and only give them access to
the view.

4.  Very importantly - by using a view you only need to perform an
exclusive table lock on the *view* and not the whole underlying table.

The method shown above isn't complete, though.  Consider: the selected
record has to remain locked until a record is successfully inserted or
the insert fails - the table lock remains all that time, as well.  So
you should call this *from* the insert operation.  Put an insert trigger
on the target table.  This trigger should start the transaction and then
run through the above method.  As soon as valid data is returned it can
release the lock and return.

Let me give you an example.  Suppose I were writing an application with
a similar mechanism - an application which let people book appointements
with a dentist.  I would allow them to insert records to the
appointments table, putting in their details etc, either a blank or a
suggested time and a boolean "I can/can't accept a different time"
field.  An insert trigger would then check through the assigned
appointments and either put a date into the date field or return an
error (no valid available dates) cancelling the insert.

The advantage of doing it that way, inside the database itself,
guarantees that no errors in any client application can mess up the
appointments - only my database code can mess things up;)  It means the
application doesn't need to worry about how valid appointment times are
decided or anything else about how bookings are organised.  All the
client app has to do is insert a record to a table.  It then examines
the inserted record to see which date/time was allocated.  If it doesn't
like it, it can always delete the record.

It also means that you can have different client apps - a web interface,
a C++-coded GUI app, whatever - but only write the "business" logic
once.  Otherwise you have to duplicate the business logic in each
different application, dramatically multiplying the chances of a fsck
up.

In my opinion, databases take care of data.  Applications just present
it.  As much of the business logic as possible should go into the
database (only break that rule if performance absolutely depends on it).

For example, the insert trigger in my dentist appointment table could
also add an entry to a log table and update relevant details in that
patient's record.  A delete trigger on the same table could take care of
most of the bookkeeping required when an appointment is cancelled -
including finding the patients at the head of the waiting list who could
be offered the free space.

This is not just the stingy POV of a suspicious database coder.  I'm an
application coder as well - but when I have my database hat on I don't
trust me-the-application-coder any more than I trust anyone else.  My
databases don't let my applications make complex alterations - my
applications perform simple operations and then check error codes and/or
relevant tables to see what happened.  Then they present the result.

Applications *certainly* shouldn't be trying to lock data (in any
serious sense).  If you do it that way and the connection fails or the
application crashes then you have a potentially disastrous mess left
hanging in the data.  It's as dangerous an approach as allowing airline
pilots to veto the actions of air traffic controllers.

It's also silly.  The database-engine developers spend all their time
designing robust locking mechanisms etc.  There's no need to waste time
duplicating their efforts.

-- 
Bruce

Bitterly it mathinketh me, that I spent mine wholle lyf in the lists
against the ignorant.  -- Roger Bacon, "Doctor Mirabilis"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 261 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20010705/4a5d1663/attachment.pgp>