[Nottingham] Merging databases problem

Sat Nov 19 21:44:08 GMT 2005

On Saturday 19 Nov 2005 00:29, Martin wrote:
> David Aldred wrote:
> > On Friday 18 Nov 2005 17:43, Martin wrote:
> >>"The two databases are physically identical, just contain different
> >>data. The issue is that several fields are of user-defined type, which
> >>apparently don't just get copied with a SQL "insert into db1:table
> >>(x,y,z) select (x,y,z) from db2:table". Some really annoyingly long SQL
> >>may do the trick, except that these types contains lists of indefinite
> >>length, and from what we've experienced, if you screw up the transfer,
> >>you hose the *enter* table."
> >
> > Immediate thought: what's the field type going to be in the merged table?
> >   If you know what field type you're aiming at, can't you programatically
> > cast the existing data into the new format?
>
> They have C++ wrappers for doing various tricks for inserting new
> data... but:
>
> "We actually already have the C++ tools to insert/update, etc. (since
> that's how the data got in there in the first place). The hard part is
> getting the join to work (preserving relational constraints as we move
> joined tables from one db to another, losing original IDs in the
> process), and we're running into memory issues (keeping maps of old
> ids/new ids in memory). As well, we need to work in some parallelism to
> shrink the amount of time this all takes."
>
> This is all for a few billion entries and on old Sun Solaris kit.
>
> Any ideas to avoid a month or so of C++ hackings?

I was wondering if they could:

1.  Create a copy of each of the databases, using casts to get rid of the 
vagaries of the user-defined types;
2.  Run the merge using SQL on the copies - so that the data types being 
merged are standard;
3.  Do whatever is then needed to reconvert the merged data field back to 
whatever was user defined about them.

Step 2 should be able to preserve relational aspects, provided Step 1 respects 
existing links and IDs; if the relational bits are there at the end of stage 
2 and nothing silly is done to IDs in stage 3, then the result should be what 
they want.  

Um, I think! :-)
-- 
David Aldred