Because the same database software or structure is not needed at each end, an exchange standard will facilitate:
Below is UISIC's opening proposal. Each aspect in turn will be presented for discussion via the CaveData Internet mailing list, prior to emailing (or posting) the results to delegates for comment and later for voting.
If you do not have an Internet connection you will not
really be able to take part fully in this discussion phase.
However, if you have views on this matter, you can still
supply input by sending it to me (if possible in plain text
on a diskette) for incorporation into the discussion.
Official UISIC delegates who do not have Internet will
still receive the final drafts by post for comment, and
later will receive the final material for voting.
PROPOSAL
Requirements to allow data exchange
The following three requirements are needed to allow the
valid transfer, comparison and/or consolidation of cave/karst
data between independent databases:
It is not required that the same software or database
structure be used at each end of the transfer.
We now look at these three requirements in more detail:
1. Record Identifiers
The record identifiers (database keys) should be constructed as
follows to conveniently achieve uniqueness:
aabbbnnnnn
For example: AUVSA00035
where:
Once created for a record, the identifier should never be
changed, regardless of where the record travels, or what
has happened to the original organisation, or which
organisation is currently looking after the master copy
of the record.
2. Field Definitions
When the field names and field values of international
definitions are actually being used, they will need to be
expressed in various human languages. Language-independent
numeric codes are therefore used wherever
possible as a common reference to the field name or
field value regardless of the language currently being
used.
Field names: Each field name is represented by a
simple numeric integer such that a given field with a
particular meaning has the same numeric code
regardless of the language in which its name and
definition are expressed. For example, a Field ID of "7"
could have the name "Rock type" when expressed in
English.
The field names themselves are recorded in two fields -
one for normal usage and having a length of 25
characters, and another to suit some early database
systems and having a length of 10 characters.
Field values: Each field value is, wherever
possible, represented by a simple numeric integer code such that
a given field value with a particular meaning has the
same numeric code regardless of the language being
used. For example, a Field Value of "26" in Field 7
(Rock Type) could translate to "sandstone" when
expressed in English, or "Sandstein" when expressed in
German.
Where commonly accepted local field values or codes
already exist for a field which has only local
significance, for example, "Geological Bed Names" or
"Parish", then these local codes should be used, but the
meanings will then need to be transferred, along with the
data, in any data exchange.
3. Transfer format
When transferring data between different databases,
UIS's standard transfer format should be used (Name:
Karstcom? InterKarst? ...). This format will use only
standard ISO text characters, and will be independent of any
database software or structure. Therefore any database
system needs only to be able to translate to or from this
one common intermediate format in order to exchange
data with any other co-operating database system.
Entity List
The lengths in the following list should be used for the
fixed-length serial number component in the record IDs
of the respective entities. Note that the serial number
need only be large enough to allow for the maximum
number of records for that entity generated by the one
organisation, not for the quantity of records stored at any
one site; this is because any duplicate serial numbers will
be distinguished by the originating country+organisation
code.
The list is a draft initial list only. Further entities can be
added as needed. The two-letter codes have been chosen
to reflect the entity in more than one language where
possible.
The first two requirements above (identifier and
definitions) should be used from the beginning if
possible. It does not matter which database software you
use, nor the structure of your database, nor which subset
of the available fields you have chosen, provided that
you have adhered to the field and field value definitions.
For example, multi-valued fields have to stay as multi-valued fields.
The fact that many of us already have cave databases in
existence, and are already using various independent
field definitions, should not be a reason to prevent us
from establishing a standard which can be used by new
systems, or by later evolution of our existing systems
if/when we feel that the time is right. Further, as we go
through the field definitions, it is expected that we can
come up with definitions which will allow many of our
existing fields to comply with them anyway. In fact, one
of the fields in the proposed list allows classifying the level
of compliance of each existing field. Any existing fields
which are found to already comply with the standard
definitions could then be validly transferred to other
databases.
Record identifiers
The use of an internal identifier (key) is normally
routine for identifying and linking database records.
However it needs to be globally unique so that there is
no risk of it duplicating an existing key when loaded
into someone else's database. We do not want to have to
change the incoming key in such a situation, because
then any linkages between entities in the original
incoming tables would be lost.
Public "cave numbers", while needed for normal public
usage, are not ideal for this identifier because they do
sometimes get changed, they vary in their structure, and
they can be unnecessarily long.
The record identifier does not need a component to
identify the entity type, because this can readily be
handled externally.
The scheme described above is currently in use as a test
in the Australian national database.
The method described (a country code + an org code)
allows each organisation which produces data records to
issue internationally unique keys without needing to
refer to any central authority. The 3-letter org codes
would be set at the national level by the speleo
community in that country.
The serial number part is fixed-length, left-zero-filled, so
that the alphanumeric record IDs will sort correctly
when required. The serial number component for the ID
of a particular entity needs only to be large enough to
cover the maximum quantity of records for the entity
which could be generated by the one organisation, as
opposed to the maximum quantity of entity records
stored at any site. The proposed entity codes and key
lengths are shown in the table above.
Regardless of ID design, organisational arrangements
need to be made to allow separate clubs to contribute
their data to the total database information for a given
cave, i.e. merging of records. In the Australian pilot, this
is done by allowing only one club to update the national
database for any given cave area, but of course with a
mechanism to allow other clubs to contribute, and to get
proper attribution for the data they have provided.
Where a database already exists, and it proves to be not
feasible to convert its keys to the above scheme, then a
mechanism needs to be added so that the international
keys are produced whenever data is exported. The
mechanism must ensure that the same internal record
always produces the same external key. For example, if
the existing internal record keys were a simple integer,
then the external key could be produced by left-padding
the number with zeros and adding the appropriate five
letters to the front.
Note however that if the key was changed in this way,
any instances of its use as a "foreign key" in linked
tables of other entities must also be changed. (A
"foreign key" is a non-key field (usually) in a table whose
value is the same as the key field(s) of a different table.).
For example, a map entity record describing a map
might have a field containing the ID of a cave entity
which was shown on that map; when the map and cave
records are exported, the cave ID value in the foreign key
field of the map record must be altered in the same way
as the cave records were. Obviously it's much simpler if
a once-only change can be made to the whole database to
align its keys and foreign keys to the international
standard; from then on, no more key conversions need to
be made.
Field definitions
Field definitions will be systematically discussed in
English via the Internet before being circulated to UISIC
delegates for further comment and eventual voting. This
initial batch of fields are first-pass general caving fields;
after some of these are out of the way we can also start to
look at fields which are more scientific or specialised.
The suggested procedure is (improvements invited!):
Transfer format
Background: An early version of a transfer format was
successfully used by the Australian database in 1985
when ASF used it to produce their national cave, map and
reference list, the 500-page Australian Karst Index 1985 book.
UISIC subsequently issued a draft standard to delegates
for comment at the UISIC meeting during the 1989 UIS
Congress in Budapest. In 1991 ASF produced a standard
formalising their Karst Data Interchange (KDI) format
as used in 1985. Since then, a programme has been produced by
Glenn Baddeley which translates from this early KDI
transfer format into a series of plaintext tables for
importing into a multi-table database. This was
demonstrated at the 1993 UIS Congress in Beijing, and was used as recently as 1999
by ASF to convert all its old 1985 data into its new PC-based relational database.
Based on the foregoing successful experience, UISIC had planned to issue an updated
version of the 1989 UISIC draft for further discussion and comment, however XML and
its associated standards are now available, so the whole exchange format
is being revisited by a special
Working Group. This also implies expressing the previously discussed field definitions
using XML and associated formats.
Nevertheless, regardless of the actual exchange format, the basic technique of
exchange would be as follows:
[ UISIC ] [ Contents ]
[ Top ]
29-Nov-00
26-Oct-97
20-Jul-97 Original version
Max Records
Length of created by
ID Entity Serial No. one Org.
---- ------------------ ---------- -----------
AR article, paper 6 1M
AT attribute, field n/a
AV attribute value n/a
CA cave/karst feature 5 100K
EN entity n/a
JN journal 4 10K
OR organisation 4 10K
PA land parcel 5 100K
PE person 5 100K
PH photograph 5 100K
PL plan, map 5 100K
PM permanent mark 5 100K
PS map series 3 1K
RE region, area 4 10K
RP report 5 100K
SM specimen 5 100K
SP species 5 100K
ST site, place 3 1K
SV survey 5 100K
SU subject n/a
SY system n/a
XK key-in batch 5 100K
XL upload batch 5 100K
XU update batch 5 100K
A technique for producing paper data-entry forms containing the standard fields has been
devised using HTML for platform independence. Custom data-entry forms containing the desired
sub-set of standard fields can therefore be produced where wanted, facilitating the
off-line consolidation of data from disparate sources prior to data entry into a
database. These will be developed in parallel with the definitions and added to a fields
Form Library.
Voilá!
Updates
9-Dec-02
Copyright © 1997-2000 UIS Informatics Commission.
May be freely reproduced provided this copyright notice is retained.
Page address:
http://www.uisic.uis-speleo.org/exchange/exchprop.html
Site:
P. Matthews