[Gllug] A programming question. long.

Fri Jul 6 17:48:58 UTC 2001

On Fri, Jul 06, 2001 at 03:53:34PM +0100, home at alexhudson.com wrote:
> On Fri, Jul 06, 2001 at 12:19:01PM +0100, Bruce Richardson wrote:
> > > I don't necessarily agree with 'exclusive table lock' (bit of a waste for
> > > one row!)
> > 
> > Which is why I said the lock should be placed on the view, not the
> > underlying table. 
> 
> But then you're locking the view, not the data - and the data is still
> exposed (which is fine, so long as everything is using the view..).

That is the *whole point*.  You have one or more admin uids and give
them access rights to the tables but client processes connect using
other uids which *only* have access to the views.  They can't see the
tables and so the tables are *not* exposed.  I really don't see your
problem here.  Sure, someone who has access to all the database
passwords could write their own app which ignored the proper procedures
and updated anything they felt like at any time.  So make sure that the
people who do have access to the passwords do their jobs properly.
That's a management issue.

> be honest, you don't need a view & set of triggers, because the standard SQL
> row locks are all you need.

No, row locks are *not* enough.  Go look at David's description of his
problem.  The way he is doing it he needs to know that *only* one client
is *looking* at the row.  The only way to do that is to lock the *table*
before running the select query.  It's the only way to ensure that
another select query is not also doing the same thing you are.

That's why my example uses a view (apart from all the other perfectly
good reasons which I listed).  It means that the database can apply one
procedure to *client* access and data entry but allow administrative
processes to do their own thing.

I go into greater detail on this further down.

> 
> > > successfully, which is silent data loss. Ick. It's up to the application,
> > > not the database, to remove items from the store.
> > 
> > No, it's the database's job IMO.
> 
> Ah :) Age old database disagreement. IMO, business logic has no place in the
> database, for one simple reason - the database can't maintain consistency.

Code is code is code.  If the server-side programming language can
implement a function, why is that invalid?  Perhaps you could actually
say what is *wrong* with my appointments example?  Where, precisely,
does it fuck up the data or fail to assure it's consistency?

> The database can tell you for sure that transactions proceeded okay, and
> that the data fulfils a set of rules, but can't make the data obey those
> rules by itself. 

It can validate everything that comes in and reject or modify anything
that is inappropriate.  It can also implement a whole set of procedures
by using views and cascading triggers - step 1 causes step 2 causes step
3 etc.  An application can then do one simple insert/update/select and
know that either steps 1,2,3 etc will follow on or an error will be
returned.

> 
> > In my example the select trigger does something essential - marking
> > the selected record "don't touch".
> 
> Which is what I don't understand - you seem to be using table-locking to
> implement row-locking, which might be fair enough for MySQL (which doesn't
> support row-level locking), but not great for those dbs which do.

FWIW the examples I am giving are based on POSTGRESQL.  Look, David has
asked for something *beyond* normal row locking.  He wants to
*guarantee* that *only* one client process can *see* that row (I would
implement it a different way but this is what he asked for).  Row-level
locking can prevent two processes updating a row simultaneously but
can't stop two select queries reading the same row.  Consider how it
would work without a table lock: one client process starts a transaction
and runs a select query with no table lock.  Just as that query starts,
a second client process kicks in.  *Neither* process can do anything to
affect what the other process can see.

David's outline specifies that a client process grabs the top row from a
query and then runs with it.  For this to work without errors, no other
client process may grab that row.  So we need to stop other client
processes - those client processes which are also trying to perform the
same operation, that is - from seeing this row.  So:

1.  I use the views.  Any client application wanting to carry out this
operation should insert a record into the appointments view.   That way
the triggers and locks used to implement this operation *only* apply to
connections/processes which are explicitly trying to perform this
operation.  Administrative processes or client processes which simply
want to read the tables (if you want to allow them read access) are
*completely* unaffected.

2.  An exclusive lock is applied to the select view - just the view,
mind you, not the tables underneath.  This only blocks other client
processes which are trying to perform the same operation and only until
a record has been found and marked "in use".  As soon as the row has
been updated the lock is released - so multiple processes *can* perform
this operation concurrently.  But now the row is marked "in use" and all
other client processes performing this operation will ignore that row.
So long as the table is properly indexed, the view will not be locked
for very long.

There is another way to do it which doesn't use table locking.  Leave
the select process open and use row-level locking when updating the
"in use" field of the desired row.  Then you add an update trigger to the
underlying table which raises an exception if a process tries to update
the "in use" field to "in use" when it already has that value.  Then it
would be OK if multiple processes saw the same row because only one
would be allowed to claim it - the others would have to repeat the
select/update cycle until no error is returned.

There are two problems with this:

1.  There is no way to assign a pecking order here.  Unfortunate timing
could mean one process stuck in a loop because each time it fails, looks
for a new row and tries to claim it, it is beaten to the row by another
process.  The best you can do in that situation is return an error after
so many attempts.

David's database has enough users for this to be a real threat.

2.  It means adding a trigger to the underlying table - a trigger which
will also be run when *any* process updates *any* field in the row.
That's an unnecessary overhead in this case.

Of course, the details of David's application may mean that this
approach has other benefits which offset these disadvantages.  But my
initial method does exactly what he requested with minimal impact on any
other database processes.

> 
> > This way the database takes care of all data validation tasks. 
> 
> The database has no idea about data validation; business logic should be in
> the software. For example, let's say that a banking application needs to
> update a number of accounts at once (a transfer, perhaps). It's up to the
> application to specify that the account debited and the account credited
> must be done so within one transaction.

Yes, but if this is a common operation then the database can provide a
ready-made set of views and triggers to perform it.  If I were designing
that database and I knew this to be a common operation then I could
provide a credit-transfer view.  To perform a credit transfer you insert
a row with the source account ID in one field, the destination account
ID in another field and the amount in a third.  An insert trigger on the
view can then update the relevant accounts - or return an error.

The security offered by this method is *particularly* appropriate to
banking;)  I can give the "teller" account write access to the view but
not to any of the underlying data.  Then some other person - I don't
care who they are or whom they work for - can write an applcation for
the bank tellers' terminals and I know they can only perform the
operations I have given them access to.  Their application can be buggy
and clumsy but it can't access data it shouldn't in any way.  My trigger
can also log all transfers and attempted transfers without them knowing
or being able to do anything about it.

> It's not for the database to attempt
> to work that out - the data within the database has meaning, and it's the
> application that determines that meaning. The database can't hold the debit
> transaction open expecting a credit transaction for the same amount - that
> requires a priori knowledge of the type of transaction. 

Which you do have, for known, familiar operations.  Any new operations
have to be explicitly added in - but that's the way it should be.  In a
bank you absolutely cannot have third parties arbitrarily adding
procedures to your accounts database.

As I said before, when I'm designing a database - even when I will be
the person coding the app that uses the database - I regard the
application coder with the same suspicion I would the cowboy consultant
writing the bank teller account.  I don't cut myself any more slack than
I would him.  He can do his own data validation on the application side
- he's positively welcome to reject incorrectly filled-out data forms,
it eases the load on my database.  But my database will check anything
he passes to it and only let him see/do what he's supposed to.  Not one
thing more.

> The database may very well be happy that the data has been correctly stored
> (as it may have been), but if it crashed at the wrong time (between two
> account transactions, for example) even though the transaction log would
> roll-back part of the data, the data itself might not be self-consistent.
> The database guarantees only that it stores the data as you ask; the
> application needs to make sense of the data itself.

Not in my bank.  My bank is a big bank and there may be many
applications, written by a series of different people/companies,
accessing my data.  My database provides the procedures - the
applications present choices to the user and then request the operation
corresponding to the user choice.  My database checks that the
user/application has authority to perform the operation, checks that the
given parameters are valid, performs the operation, logs it and returns
the results - or that subset of the results which the user/application
is authorised to see.

(When I design a database, you will have gathered, I am both paranoid
and megalomanic - not to mention schizophrenic, if I am also the
application designer.)

In banking, most business logic is completely independent of the
application.  If a customer of a certain credit rating is only allowed a
particular overdraft limit then that applies no matter which application
is trying to debit the account.  In my bank there is only one way to
debit the account and that is for your application to ask my database to
do it.  Nicely.

Your's from the pillbox in front of the rack,

-- 
Bruce

A problem shared gives the consolation that someone else is now
feeling as miserable as you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 261 bytes
Desc: not available
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20010706/70562b2e/attachment.pgp>