Notes on DATABASEs
March 31, 2014
From Wikipedia – A database is an organized collection of data. The data are typically organized to model relevant aspects of reality in a way that supports processes requiring this information. For example, modeling the availability of rooms in hotels in a way that supports finding a hotel with vacancies.
Harvey and Press provide a definition – “A database is a collection of inter-related data organised in a pre-determined manner according to a set of logical rules, and is structured to reflect the natural relationships of the data and the uses to which they will be put, rather than reflecting the demands of the hardware and software.”
From Databases for Historians, Designing Databases for Historical Research by Mark Merry
The historian is faced with particular kinds of problem when it comes to converting sources into a useful database resource, problems which are not shared by most other database users. This (as we shall see) boils down to two separate inescapable realities of historical research:
- The historian often does not know precisely what kinds of analyses they want to conduct when starting out on their research
- The extent and scope of the information contained within the historian’s particular sources cannot usually be anticipated fully
- and I will add: the historian does not quite know how to fit traditional historical documents into digitized files that will be comprehensibly put together into a cohesive thematic online structure. (wow that was pretty good!)
Databases have very strict rules about what type of information goes where, how it is represented and what can be done with it and if the information from our sources can be made to obey these rules then it has become data.
One of the database rules alluded to above is that all data in the database sit in tables, regardless of what kind of data they are.
There are four main elements to any table in a database, and each of these have (somewhat inevitably) a number of names:
§ Table (also known as Entities)
§ Field (also variously known as Column, Variable, Attribute)
§ Record (also known as Row)
§ Field name (also known as the Attribute name)
Database Rules referred to by Harvey and Press
- The ‘information rule’: all information in a relational database is represented in one way – as a value in a field in a table.
- Each table in a relational database will have a name which uniquely identifies it and which will distinguish it from the other tables; the table should contain information about only one entity.
- Each field within a table is given a unique name within the table.
- The values in each field must have the same logical properties, that is, must be of the same datatype: numerical or text.
- Records will contain closely related information about a single instance of the entity (that is, the subject of the table – for example, the forename and surname of a single individual in a table about individuals).
- The order of records in a table must not be significant.
- The order of fields in a table must not be significant.
- Each complete record must be unique.
- Each field should contain ‘atomic values’: this is, indivisible data (e.g. first and last names of an individual should always be held in separate fields).
Datatypes
Each field within a database must be of a certain datatype. In essence what datatypes do is to control what type of data is entered into a field. Each field in each table is assigned a datatype, usually ‘text’ or ‘numeric’, and this in turn dictates what kind of data can be entered into that field. Every field in every table will need to have one of these datatypes assigned.
The basic datatypes: text
- This is the default datatype for your fields which will be assigned by the database whenever you add a new field to a table
- This datatype will allow the entry of textual and numerical values, although it will treat that latter differently from numerical values entered into a ‘number’ datatype field
The basic datatypes: memo
- Fields with this datatype are used for lengthy texts and combinations of text and numbers
- Data in these types of field cannot be sorted and difficult, although not impossible, to query
The basic datatypes: number
- This datatype allows the entry of numerical data that can be used in arithmetical calculations
- There are a number of variations of this datatype
The basic datatypes: date/time
- This datatype can be customised in order to control the format of dates that are entered
- This datatype will allow the sorting of records chronologically
The basic datatypes: currency
- This datatype allows the entry of numerical values data used in mathematical calculations involving data with one to four decimal places, with the inclusion of currency symbols
The basic datatypes: autonumber
- This datatype automatically generates a unique sequential or random value whenever a new record is added to a table.
- AutoNumber fields cannot be updated, that is, you cannot enter data into them manually
The basic datatypes: yes/no
- A field with this type of datatype will only contain one of two values (Yes/No, True/False, or On/Off)
- Quite often database software will represent this type of field as a checkbox in the table
The basic datatypes: OLE
- A field with this datatype is one in which another file is embedded, as a Microsoft Excel spreadsheet, a Microsoft Word document, an image, a sound or video file, an html link, or indeed any other type of file