What is a Datanbase

Strengths and weaknesses of relational DBMSs

Relational database systems are by far the most important type of database around today. There has been a widespread acceptance of this model for the traditional business applications, including payroll, order processing and airline reservations systems. There has also been a widespread acceptance for applications which access databases over the Internet. Well-designed relational databases can provide appropriate data storage and retrieval facilities over a long timescale.

Because of the dominance of relational databases, this book is largely concerned with the relational model, although the object-oriented and object-relational models are discussed in chapter 7. In this section we look at a number of advantages of relational database systems over other methods (including file-based systems and other types of DBMSs) and also several weaknesses that are cited by the proponents of object-oriented database systems:

Strengths

· Relational databases support a simple data structure, namely tables

· Relational databases limit redundancy or duplication of data. As all the data pertaining to a particular object is stored together, and then linked to related objects, there is no need to store data about the original object in more than one place. In practice, not all redundancy is eliminated - as we shall see (in chapter 2), the process of normalisation will lead to some redundancy, although controlled

· Data inconsistencies are avoided. By storing the data relating to an object in one place, it only needs to be kept up to date in that one place. This saves time at the data entry stage and reduces the likelihood of inconsistencies arising

· Relational databases provide physical data independence (to a large extent). Database users do not have to be aware of the underlying structure of the objects in the database. The specification of the structure (the tables, attributes and relationships), the constraints, the access methods etc are stored separately and are independent of the application programs that use the data. This makes programming and program maintenance easier

· Relational databases offer logical database independence, in that data can be viewed in different ways by different users. With relational databases this is achieved with the definition of the appropriate SQL view

· Expandability is relatively easy to achieve - by adding new views of the data as they are required

· Relational databases support ad hoc queries (one-off or tailor-made) using the SQL query language.

Weaknesses

· Poor representation of 'real world' entities and their relationships. When relational databases are designed, entities are fragmented into smaller relations through the process of normalisation (chapter 2). This fragmentation into many relations leads to relations which do not exist or correspond to entities in the real world. Such a design is inefficient, as many joins may be required to recover data about that entity.

The model is said to suffer from semantic overloading in that one construct (the relation) is used to represent two different things (entities and relationships): there is no mechanism for differentiation between entities and relationships. The normalised component, the relation, is not sufficient to represent both the data and data relationships. With relational databases, an entity is broken up into several relations, thus querying becomes cumbersome since project, select and join operations have to be used frequently to reconstruct the entities.

Also it is difficult to represent hierarchies of data. For example, both students and lectures share some of the same data (name, address etc). In a relational database we would have to define three relations: one for the 'super class' person as well as for student and lecturer, and to retrieve the information may well require a join

· Difficulty in representing complex data types. As a simple example, consider an attribute 'address'. In a relational database, either we define an address attribute as one atomic value of type string or it could be defined as a number of attributes (one each for street, city, country and postcode). In the latter case, writing queries would be more complicated, as each field would have to be mentioned. A better solution to either case is to allow structured data types - such as the type address with subparts street, city, country and postcode. Now an instance of type address can either be viewed as a whole or as individual subparts.

The limited data types in the relational model cannot represent 'real word' objects that have a complex structure, and this leads to unnatural joins, which are inefficient. Any item in the tuple, i.e. intersection of a row and column, must be an atomic data type since it assumes both horizontal and vertical homogeneity. Horizontal homogeneity means that each tuple of a relation must be composed of the same attributes. This can be a disadvantage in that it is not possible to store closely related objects in the same category if they differ slightly in the attributes they possess. For example, there may be adult and child aeroplane tickets, with children having the additional attribute 'guardian'. Vertical homogeneity means that the values in a particular column of a relation must all come from the same domain.

Relational databases support a small, fixed collection of data types and do not permit users to define new types. In many applications the attributes' domains require far more complex data types. Many relational DBMSs allow the storage of binary large objects (BLOBs) - a data value that contains binary information representing images, digitised videos or audios, or any large and unstructured object. Typically, it is held as an attribute in a relational database which references to a file. Storing the data in external files is not a good way of manipulating this data: the DBMS has no knowledge of the structure of this data and cannot perform queries or operations on it. In addition, BLOBs cannot contain other BLOBs. As an example, a picture can be stored as a BLOB by an RDBMS. The complete picture can be displayed but not part of it, and the internal structure is not known to the RDBMS.

We will see in a chapter 7 that relational database systems are not well suited to support certain complex applications such as 'advanced database applications' including computer-aided design (CAD). A clear illustration of the drawbacks of a homogeneous data structure is the so-called parts explosion. Here, some object (such as an aircraft) is composed of parts and composite parts; these latter items in turn are composed of other parts and composite parts, and so on. Data types exhibiting this arrangement cannot be stored in relational databases

· Difficulty in implementing recursive queries. For example in the entity-relationship model example in figure 1.6 an example of a recursive query would be to find books which have the same book title as a book entitled Gardening for Beginners. This query involves searching the same table twice. We shall see in chapter 4 that recursive queries can be quite difficult to specify and implement

· SQL is not computationally complete: it supports only a limited number of operations and does not allow new operations to be defined.

Because SQL is computationally incomplete and cannot provide all of^-the operations provided by most programming languages, the SQL standard provides embedded SQL to help develop more complex database applications. However, this leads to the so-called impedance mismatch problem because we are mixing different programming paradigms (or models).

SQL is a language handling data relations either in rows or columns. The main difference of such a declarative language from high-level languages of procedural or functional programming is that the latter can handle single data relations each time. In other words, SQL handles more than one row of data at the same time. Furthermore, SQL and highlevel programming languages represent data in different ways. Especially when attributes of date and time are concerned, SQL has the corresponding data types that automatically 'translate' the value to a meaningful format. High-level programming languages need to convert integer values to the date and time formats. Such type conversion is responsible for as much as a 30% increase in programming effort and use of resources

There is no support for domain-specific organisation constraints in the relational model. Organisation constraints are additional rules specified by the owners of a database that the database must satisfy. For example, an upper limit of 50 may be allowed for the number of employees working in each department of an organisation. In a relational database system there is no support for such a constraint and thus they have to be programmed into the applications using the database - leading to duplication of effort and inconsistent data.