Published on 2003-11-27 by John Collins. Please follow me on Twitter for more:
The relational database model uses a two-dimensional structure of rows and columns to store data, in tables of records corresponding to real-world entities. Tables can be linked by common key values. E.F. Codd first designed this model in 1970, while working for IBM, and it's simplicity revolutionised database usage at the time. Codd's work was in many ways ahead of it's time, as computing power could not support the overheads of his database system (Hasan 1999).
In the 1980s the power of computers had grown to the point where these overheads were no longer a problem, and today relational database management systems (DBMS) are available on local desktops, as well as large organisational database management servers.
The techniques of entity-relationship (ER) modelling and the structuring of data in normalised tables have become popular with trained database administrators and designers, who routinely use relational DBMS to store huge volumes of organisational data with very high transaction rates.
Although deceptively simple to design and operate, relational database simplicity for the end-user does fall down when it comes to running queries. Accessing data from relational databases may require complex joins of many tables and is distinctly non-trivial for untrained end-users, who may be forced to hire IT professionals to structure such queries in a query language, such as SQL. When queries of a writing nature are run, such as INSERT, DELETE and ALTER TABLE, the consequences of getting it wrong are greatly increased when they are employed on a live system environment.
Figure 1 A typical 3D hypercube
In a multi-dimension database system, the data is presented to the user in such a way as to represent a hypercube, or multi-dimensional array, where each individual data value is contained within a cell accessible by multiple indexes. A simple example is given in the previous diagram, Figure 1, where a fictional student exam result database is presented. This database contains three dimensions, namely Result, Student Name and Exam. In this example, an individual student (represented by Student Name) may have their exam results for several exams compared over a period of time, for example a four-year undergraduate course. This ability to present data in such a top level view is unique to multi-dimensional systems, and shows just how powerful these systems can be. Of course a multi-dimensional system is not limited to three dimensions as in the previous example, but when we go beyond that amount, it becomes more difficult to present such structures in a pictorial view. If we stick with the previous example presented in Figure 1; let us now add a fourth dimension called Subject. Let us assume our students study computer science, with subjects in Databases, Programming and Software Engineering. If we imagine this new dimension as being a box containing our previous three dimensions, then we would have three such boxes, namely one for each subject that our students were tested on, as shown in Figure 2.
Figure 2 A four-dimensional database structure
Of course this model can be extended to a fifth dimension, and a sixth and so on, until all requirements are met. But what are the advantages of such a system over a traditional relationship system? We will look at some of these in the next section.
If we look again at our example of a student exam result dataset, there are many reasons why it is more efficient to represent our dataset with a multi-dimensional array rather than a relational table. For example, all similar information is lined up in a single dimension, like Results, so that they can be very quickly summed up to a total or quickly compared to get an immediate idea of how student results are fairing this semester.
The multi-dimensional array structure represents a higher level of organisation than the relational table. The structure itself represents a more intelligent view of the data it contains, because our perspectives of this data are embedded directly into the structure as dimensions, as opposed to being placed into fields. For example, if we were to design a fictional relational table for our student results, it might look something like the following diagram:
|John Collins||Operating Systems||60|
|Larry Wall||Operating Systems||70|
|Linus Torvalds||Operating Systems||99|
The structure of this relationship table can tell us nothing of the nature of the contents of these fields, only that there are three fields Student Name, Exam and Result, and there are nine records. If we were to present a three-dimensional view of this data, while adding a third dimension called Semester, it might look something like the following:
As you can see from Figure 3, there is no need to have Result as a dimension, because the exam results are going to be contained within the cells of the database structure. Another obvious advantage is the removal of the duplication in the relational table, where each student name was repeated three times for each exam that they participated in. In the multi-dimensional view, the Student Name and the Exam become dimensions, or in effect indexes into that data, so having duplicates does not make any sense.
Notice how all related information neatly lines up in the multi-dimensional view, for example all programming results for John Collins over all three semesters lines up along the z-axis (i.e. from the diagram view perspective, the Semester dimension), while all exam results for John Collins in all subjects line up on the x-axis (the Exam dimension). Programming results for all students line up on the y-axis (the Student Name dimension).
From this example, it is clear to see the inherent intelligence in this database structure; in the ER table such views of specific data would not be possible without writing complex SQL queries.
Apart from the inherent advantages of using a multi-dimensional array structure, multi-dimensional databases also contain the following advantages (Kenan):