Data History & Attribution Subsystem - Table Design

Updated: 15 Jul 2002

This is an example of a physical table structure for recording Data History, or Lineage, one of the components of Data Quality. It also allows for Data Attribution, i.e. acknowledgement of who supplied the data (person and/or organisation). See also the further explanatory notes below. Comments and suggested improvements are invited.

Tips for using this page:

[ Notes ] [ Legend ] [ Updates ] [ Other Tables ] [ Proposal ] [ Fields ] [ Field definitions ] [ UISIC ] [ UIS ] [ ASF ]

                                                +------
                                                ¦
CA0000                                          ¦
Cave (for example) Master Table                 ¦
+---+-------------------------+----+ 1   »      ¦
¦227¦Cave ID                  ¦A10*¦¦»----------+
¦...¦etc etc other fields     ¦... ¦            ¦  
+---+-------------------------+----+            ¦  
Similar master table for other entities         ¦ 
which are recording data history.               ¦ 
                                                ¦
                                                ¦
CA0421                                          ¦  
Cave Updates (m-m link)                         ¦  
+---+-------------------------+----+ 0:n «      ¦  
¦227¦Cave ID                  ¦A10*¦¦«----------+  
¦421¦Update ID                ¦A10*¦¦»-----+    ¦  
+---+-------------------------+----+ 1 »   ¦    ¦  
This table enables links to all            ¦    ¦  
Update Batches for a Cave (shown), and     ¦    ¦ 
all Caves for an Update Batch (not shown). ¦    ¦  
It would be automatically added to by the  ¦    ¦  
program whenever a cave is added or        ¦    ¦  
updated during an Update Batch.            ¦    ¦  
                                           ¦    ¦
                                           ¦    ¦
XU0000                                     ¦    ¦  
Update batches                             ¦    ¦  
+---+-------------------------+----+ 1:1 « ¦    ¦  
¦421¦Update ID                ¦A10*¦¦«-----+    ¦
¦422¦Update - person ID       ¦A10 ¦L 2         ¦  
¦423¦Update - organisation ID ¦A9  ¦3           ¦  
¦424¦Update - date            ¦D   ¦2           ¦  
¦425¦Update - person          ¦A25 ¦1 3         ¦  
¦426¦Update - org code        ¦A3  ¦L 2         ¦  
+---+-------------------------+----+            ¦  
This table shows who (person, org, date) has    ¦
supplied this batch of primary data. It may not ¦
be the same org as operating the database site. ¦
The Update ID is auto-generated. One Update ID  ¦
(batch) could cover several entities.           ¦
(1) If not available via ID link 422.           ¦  
(2) Mandatory entry by user.                    ¦
(3) Auto-filled from Person or Org table.       ¦
                                                ¦  
                                                ¦
                                                ¦
CA0427                                          ¦  
Cave Key-in Sessions (m-m link)                 ¦  
+---+-------------------------+----+ 0:n      « ¦  
¦227¦Cave ID                  ¦A10*¦¦«----------+  
¦427¦Key-in ID                ¦A10*¦¦»-----+    ¦
+---+-------------------------+----+ 1  »  ¦    ¦
This table enables links to all            ¦    ¦  
Key-in Batches for a Cave (shown), and     ¦    ¦ 
all Caves for a Key-in Batch (not shown).  ¦    ¦  
It would be automatically added to by the  ¦    ¦  
program whenever a cave is added or        ¦    ¦  
updated during a Key-in Session.           ¦    ¦  
                                           ¦    ¦
                                           ¦    ¦
XK0000                                     ¦    ¦  
Data key-in sessions                       ¦    ¦
+---+-------------------------+----+ 1:1 « ¦    ¦
¦427¦Key-in ID                ¦A10*¦¦«-----+    ¦
¦428¦Key-in - person ID       ¦A10 ¦L 2         ¦
¦545¦Key-in - organisation ID ¦A9  ¦4           ¦
¦429¦Key-in - start date      ¦D   ¦4           ¦
¦449¦Key-in - end date        ¦D   ¦4           ¦
¦430¦Key-in - person          ¦A25 ¦1 3         ¦
¦546¦Key-in - org code        ¦A3  ¦5 6         ¦
+---+-------------------------+----+            ¦
This table shows who originally keyed in the    ¦
primary data. It may not be the same org as is  ¦
operating the current database site. Where      ¦
keyed data is imported from another             ¦
site, the original Key-in ID should be          ¦
retained and a new one not generated.           ¦
The Key-in ID is auto-generated.                ¦
One Key-in ID (batch) could cover several       ¦
entities and several Update batches.            ¦
(1) If not available for display via            ¦
    ID link 428.                                ¦
(2) Mandatory entry by user.                    ¦
(3) Auto-filled from Person table.              ¦
(4) Auto-filled by system.                      ¦
(5) If not available for display via            ¦
    ID link 545.                                ¦
(6) Auto-filled from Org table.                 ¦
                                                ¦
                                                ¦
CA0450                                          ¦
Cave Uploads(m-m link)                          ¦
+---+-------------------------+----+ 0:n      « ¦
¦227¦Cave ID                  ¦A10*¦¦«----------+
¦450¦Upload ID                ¦A10*¦¦»-----+  
+---+-------------------------+----+ 1  »  ¦  
This table enables links to all            ¦  
Upload Batches for a Cave (shown), and     ¦ 
all Caves for an Upload Batch (not shown). ¦  
It would be automatically added to by the  ¦ 
program whenever a cave is included in     ¦ 
an Upload Batch.                           ¦ 
                                           ¦  
                                           ¦  
XL0000                                     ¦  
Uploads                                    ¦  
+---+-------------------------+----+ 1:1 « ¦  
¦450¦Upload ID                ¦A10*¦¦«-----+  
¦451¦Upload - person ID       ¦A10 ¦L 2
¦452¦Upload - source org ID   ¦A9  ¦3
¦543¦Upload - target org ID   ¦A9  ¦3
¦453¦Upload - date            ¦D   ¦3
¦454¦Upload - person          ¦A25 ¦1
¦455¦Upload - source org code ¦A3  ¦4 3
¦544¦Upload - target org code ¦A3  ¦L 2
+---+-------------------------+----+
This table shows the who, when and where
of a data transfer between two systems. It
may be an upload to a more central system
or it may be between peer databases.
The Upload ID is auto-generated by the
source system. One Upload ID (batch) can 
include several Update IDs and entities.
(1) If not available via ID link 451.
(2) Mandatory entry by user.
(3) Auto-filled by system.
(4) If not available via ID link 452.

»-----------------------------------------------+
                                                ¦
CA0431                                          ¦
Cave (for example) latest data history          ¦
+---+-------------------------+----+ 1:1 «      ¦  
¦227¦Cave ID                  ¦A10*¦¦«----------+ 
¦447¦Data awaits uploading    ¦A1  ¦4           ¦  
¦ 76¦Latest update by         ¦A40 ¦3 4         ¦  
¦431¦Latest update ID         ¦A10 ¦» 2         ¦  
¦432¦Latest update - person   ¦A25 ¦1           ¦  
¦433¦Latest update - org code ¦A3  ¦1           ¦  
¦434¦Latest update - date     ¦D   ¦1           ¦  
¦435¦Latest key-in ID         ¦A10 ¦» 2         ¦  
¦436¦Latest key-in - person   ¦A25 ¦1           ¦  
¦547¦Latest key-in - org code ¦A3  ¦1           ¦  
¦437¦Latest key-in start date ¦D   ¦1           ¦  
¦448¦Latest key-in end date   ¦D   ¦1           ¦
¦456¦Latest upload ID         ¦A10 ¦» 2         ¦  
¦457¦Latest upload - person   ¦A25 ¦1           ¦  
¦458¦Latest upload source code¦A3  ¦1           ¦  
¦548¦Latest upload target code¦A3  ¦1           ¦  
¦459¦Latest upload - date     ¦D   ¦1           ¦  
+---+-------------------------+----+            ¦  
This optional cave table shows only the latest  ¦
updates for a cave. It is read-only, all values ¦
having been auto-filled by the system.          ¦
If data history is being kept via the other     ¦
tables then this table is not necessary, and    ¦
field 447 would be held instead in the Cave     ¦
master table. If data on only the latest updates¦
is being kept, then omit the other tables and   ¦
keep it here. If minimal latest-only data is    ¦
being kept, then store only fields 447 and 76.  ¦
If desired, all candidate fields for this table ¦
can be stored instead in the Cave master table  ¦
and this table omitted altogether. Other        ¦
entities would be similar to all the above.     ¦
(1) If not available via associated ID link.    ¦  
(2) If the many-to-many link tables are not     ¦
    being used by the Cave's master table,      ¦
    then this field could 1:1 link to its resp  ¦  
    history entity master table to avoid the    ¦
    need to store the associated following      ¦
    fields here.                                ¦
(3) Old ASF field - to be replaced by 431 etc.  ¦  
(4) Put in master table if this table is not    ¦
    present. This flag is set as soon as any    ¦
    addition or update is done to this Cave     ¦
    since the last Upload, and reset after the  ¦
    Upload.                                     ¦  
                                                ¦
                                                ¦
CA0460                                          ¦
Field value qtys updated per upload per org.    ¦
+---+-------------------------+----+ 0:n      « ¦  
¦227¦Cave ID                  ¦A10*¦¦«----------+
¦450¦Upload ID                ¦A10*¦
¦460¦Field ID updated         ¦S*  ¦
¦519¦Updating organisation ID ¦A9* ¦
¦461¦Values qty before upload ¦S   ¦
¦462¦Values qty after upload  ¦S   ¦
¦463¦Qty of new/changed values¦S   ¦
+---+-------------------------+----+
This table keeps the history of the 
quantity of updates for each field. 
It is also used in calculating the 
total update count for an organisation 
at any given time. It could be the 
result of several Update Batches by 
that organisation in the Upload Batch. 
This View shows the fields updated and 
the organisation for a given cave and
Upload. It may be necessary to limit 
an Upload to Updates from only one 
organisation (see main Notes). This 
read-only table exists only in the 
central db. All its fields are 
auto-filled by the system.



The tables below enable linking between
Updates, Key-ins and Uploads. For example
they would enable you to find all the
Update Batches included in a particular
Upload. These tables would be 
automatically added to by the data-entry
program during normal operations.

XL0421
Updates per Upload.
+---+-------------------------+----+
¦450¦Upload ID                ¦A10*¦
¦421¦Update ID                ¦A10*¦
+---+-------------------------+----+


XL0427
Key-ins per Upload.
+---+-------------------------+----+
¦450¦Upload ID                ¦A10*¦
¦427¦Key-in ID                ¦A10*¦
+---+-------------------------+----+


XK0421
Updates per Key-in
+---+-------------------------+----+
¦427¦Key-in ID                ¦A10*¦
¦421¦Update ID                ¦A10*¦
+---+-------------------------+----+

The table below is an alternative 
location for 447[Data awaits uploading] 
when updated status is being monitored
on a per field basis rather than on the
per cave basis shown above.

CA0447                                
Cave (for example) field update status
+---+-------------------------+----+ 
¦227¦Cave ID                  ¦A10*¦
¦ 82¦Field ID                 ¦S*  ¦
¦447¦Data awaits uploading    ¦A1  ¦
+---+-------------------------+----+

Notes:

General: Data History
Data Attribution Quantification

While it is quite easy to record who supplied data for a cave and when, it is more difficult to numerically quantify just how much updating can be attributed to various organisations or individuals. This quantification may be needed if some kind of updating "score" is being kept to encourage competition in updating among clubs, or if there is a need to distribute any revenue from data licensing or publication in an equitable way. Below are some points concerning this subject. Where "organisation" is mentioned, an individual who supplies data independently is also assumed:


Legend:

Columns shown:
1. Standard UIS Field ID (FID).
2. Field name.
3. Data type.

Codes:
* = A primary key. 
L = A lookup table is used to set this field's values.
R = Right-justified.
» = This field provides a link to another entity, i.e. is a "foreign key".
x = This field is not yet included in the Aust Karst Index.
- The names above the table descriptions are example filenames for the 
  tables using max 8 chars.
- Any numbers to the right of a field refer to the notes below that table.

Table linkages (respective instances in detail table per instance in master table):
0:1 = from 0 to 1 instances
0:n = from 0 to many instances
1:1 = always 1 instance
1:n = at least 1 instance
The vertical bar(s) at each end of a link show the actual fields linked.

Data types:
S    = Short integer, up to 32,768.
N    = Numeric, decimals allowed.
Annn = Alphanumeric up to 255 chars long (A1-A255).
D    = Date.
M    = Memo (variable length free text).
B    = BLOB (Binary Large Object, e.g. a photo image).
[
Top ] [ Tables ]

Updates:

15-Jul-2002 7-Jun-2002. Initial release.
Copyright © 2002 Peter Matthews. May be reproduced for non-commercial use provided this copyright paragraph is retained. This ongoing development of field definitions etc is being conducted by the Informatics Commission (UISIC) of the International Union of Speleology (UIS). The Australian Speleological Federation's (ASF) national Karst Index Database has been used as a pilot.
Page address: http://www.uisic.uis-speleo.org/exchange/histables.html
P. Matthews. Email: matthews@melbpc.org.au