IPT home IPT A Virtual Approach by Peter Whitehouse
Quick Links:
 
 
SQL home
Information and Intelligent Systems Social and Ethical Implications Human Computer Interaction Software and Systems Engineering eXercise Files Course Outline and Assessment A-Z of Geeky Acronyms Terrace Work Program 2004 Sillybus FAQ = Frequently Asked Questions Help
 
 

Introduction to SQL
Structured Query Language

MySQL and AccessSQL

General

General purpose languages (Pascal, FORTRAN, COBOL, C...) allow users to solve many types of problems but are rarely used to build commercial database applications due to questionable efficiency. Whilst it is sometimes the case that interfaces are built with lower level languages, the database engine is almost always SQL in some form or another.

As a contemporary example, most large websites these days are driven by databases underneath - the language used to store and manipulate the data in those databases is usually some flavour of SQL. The languages to get that data to your browser window, however, are hypertext pre-processing scripting languages like PHP, ASP, JSP, Cold Fusion). They are usually server-side meaning the executable code is processed and the answers are sent to you from the server - these languages are generally embedded in webpages and executed live. This is how a chat forum can appear to be updating itself as you type.

Algorithmic and scripting languages are inadequate (or at least verbose) when dealing with database type data, creating files and accessing/modifying them. To provide even the most rudimentary database functionality and ease of maintenance is complex and prone to error in a low level language. Delphi and other modern database applications can use an external SQL engine and report generator to 'simplify' the creation of database applications.

SQL (Structured Query Language) is a FOURTH GENERATION DECLARITIVE LANGUAGE. It is considered thus because (with declaritive systems) you specify WHAT you want it to do, but are not required to tell it HOW to go about it. It is written specifically to deal with information that appears to be stored in tables. It is set-based, harnesses an internationally accepted syntax and can deal with huge quantities of information very efficiently.

If you take a REALITY CHECK for a moment, there are low-level storage structures that conform to any notion of a table - primitive data types must still be at the heart of the storage (integers, characters etc) and at a much lower level, 1's and 0's on disk are interpreted at a much higher level in table form.

SQL interfaces insulate the user from the PHYSICAL LAYER, and presents (through much efficient pre-processing) data in 'human friendly' tabular form. Humans are comfortable with tabulated data, and indeed can deal with complex patterns of numbers and characters very efficiently and intuitively in this form. Additionally, a large body of set-based theory allows us to manipulate tables powerfully and SQLs harness this conceptual strength to allow enormously complicated things to be done simply.

SQL is not, however, the 'bees knees' of languages. It will allow solutions to certain types of problems, and is very powerful in those situations. Take SQL out of a declaritive problem set and place it in a procedural realm and you will soon realise it's limitations in solving those sorts of problems suited to algorithmic languages. It is hopeless, for example, as a gaming platform - that is not what it was designed for.

A RELATIONAL DATABASE comprises (usually) a collection of tables. Each table in a relational database has something to do with at least one other table in that database system.

A view of the MUSIC database

This simple relational database comprises 3 simple tables (Albums, tracks and Performers) with SerialNumber being the common link between all 3 tables.

This is the MUSIC database and you will be dealing with it and variations of it in depth shortly.

Tables contain RECORDS (or rows) of objects organised into columns called FIELDS. In the above example, the Performers table has 3 fields (SerNum, Artist and Instrument). Each field has a type also (common field types include, character, integer, real, ctring, date, currency, BLOB..).

MySQL (the 'flavour of SQL we will start with) offers many field types, however we shall initially concentrate on two: TEXT =char(n) and NUMBERS =decimal(n).

Eventually we will choose from a wider range of data types available - the field type is determined by the sort of information you are wanting to store there - numbers or not-numbers is usually the first distinction. Within numbers, there are lots of different styles of numbers (real, integer, date). Non-numbers also encompass various storage classes including text, file and so on. You can see a more complete set of data types in the MySQL Dictionary associated with this section.


THE MUSIC DATABASE

the Music Database - in design

The MUSIC database comprises 3 related tables....

ALBUMS Table
this table contains most general information
about each recording
7columns, each row is a single recording
the albums table

TRACKS Table
this table details the songs and their order to be
found on each disk of a particular recording
4 columns, one song per row

the tracks table

PERFORMERS Table
this table summarises who played what instrument
on a particular recording
3 columns, one instrument per row.

the performers table

The above database will, through the course of instruction, become familiar to you as you investigate it's contents. It may appear strange to you that the data is stored in tables of this shape - DON'T PANIC, there is a GOOD reason for the shape of the table containers - this is covered elsewhere in the course.

SQL is DECLARITIVE in nature - you tell it what to do, not how to go about doing it. It is termed a RELATIONAL LANGUAGE (as it conforms to the Relational Model of Data).

Information stored using this type of language is called a RELATIONAL DATABASE, the controlling software is termed the RELATIONAL DATABASE MANAGEMENT SYSTEM (RDMS)


The Relational Model Of Data - Some Important Starting Definitions

Throughout this section, aspects of the relational model of data will be explained. Presented here are a selection of characteristic properties that typify data in a relational database.

It is understood that:

  • It is nearly always possible to express information in table form
  • A table, however, may not be the most efficient storage form, nor may it be easily read. Because we are using computers to assist us in manipulating and accessing the information, the arguement of 'efficiency' of storage becomes less and less relevant.
  • There are often many ways to convert data to tables but different representations make different assumptions about the information.

example 1: a fragment of bit-mapped graphics could be reduced to a series of pixel co-ordinates and their corresponding indexed colour:

              x     y     color
              ---------------------------
              53    287   1
              99    115   3
              4     16    1
              :     :     :

This 3 column table (a ternary relation) is one way of representing the information stored in a bitmap's file. We could hand this data (in some form) to a program that could recreate the image exactly, by placing the coloured pixels in the correct screen coordinates.

NOTICE that the row order is irrelevant to the overall image

example 2: A Simplified Family Tree

a family tree

It should be noted that no assumptions have been made regarding the gender of the offspring or their parents due to the likelihood that some names are used equally for each gender (purists would also argue that you have not been given this information and so cannot include it).

In both the above cases, the relationship between the entries in each of the rows are the SAME for each row, but NO ROW IS THE SAME AS ANY OTHER ROW

Entries in each row RELATE to each other (eg. coordinates together define a pixel, or x is_parent_of_y)

TABLES are collections of INSTANCES (tuples/rows/examples stored as records) of relationships. Each instance in a table has the same relationship type as other rows in that table, but a different one to rows in other tables in the system.

Good table headings usually suggest what the relationships are. For example: in the Performers Table

ser-num...features...artist...playing...instrument

The table's instances (ie. rows/tuples in the table) form the tables' POPULATION. Providing the table with instances is called POPULATING the table. An un-populated table is termed a null table.

It is not possible to populate the table with unrelated instances. As an example, it makes no sense to provide meteorological data in a table about births, deaths and marriages - regardless what the astrologers say :)

TABLE ARITY - provides the degree of the relationship, or the number of objects involved in the relationship, or the number of columns on the table.

tracks has ARITY 4(a quatanary relationship)
performers has arity 3 (a ternary)
albums has arity 7

it is possible to have arity 1, with no instances - an empty (or null) UNARY table.


Normalised Tables

The relational model of data requires that each FIELD (or column entry) is a SINGLE ENTITY (ie. no repeating groups). In other words, a column entry should be only a single piece of information.

         eg:  ser_num     songs
              -------------------------------
              NCC1701     song1, song2, song3

The above example is incorrect, as both the column heading (a plural) and inspection of the data reveal. This formation is not supported by the relational model of data, and therefore should we accept it as part of our table system, we would need to accept that some types of data retrieval requests (queries) are either impossible or very inefficient to complete.

To re-work the same information into a relationally acceptable form, we could have:

         eg:  ser_num     song
              -----------------
              NCC1701     song1
              NCC1701     song3

It should be noted that the two tables above 'say' essentially the same thing, the latter table, however, is easier or more straight-forward to access.

Tables without repeating groups are termed FLAT(or normalised).

Column entries should not be able to be divided (that is they should be SINGLE VALUED FACTS). IF this is not the case, search and retrieval problems arise through these repeating groups

         eg   mum   dad      children
              ------------------------------------
              bob   carol    ted,alice,fred,olivia

This is really a repeating field also, as we have listed child objects together in a single field.

"Who is the father of 'fred'?" requires a messy search using sub-strings and partial pattern matching (such a search becomes performance prohibitive on large databases)

Row headings NOT ALLOWED - a relationally acceptable table has rows in which order is not important ! Repetition of rows is also not allowed

         eg   Mon   Tue   Wed  Thur
           1  IPT   ENG   MI   CHEM
           2  MII   GR    GEO  IPT
           3  FR    RE    ART  YAK PLATTING

is a 'trivial' example where row headings (1,2,3...) allow the table to make sense - remove them and we lose information.

The above information could be transformed into a number of relationally acceptable forms as detailed below:

         either  day   period  class
                 mon   3       FR
                 wed   1       MI

this formation allows simple searching

         or   period   mon   tue  wed   thur
              3        FR    RE   ART   YAK
              1        IPT   ENG  MI    CHEM

Q: "When do you have french?" involves a messy search

Relational table design should never be dictated by the types of questions that are foreseen. The purpose to which the database is to be put is important, but if it is designed well, it can be used for many different things (with an increased liklihodd that it can cope with unforseen queries as well).

Although these issues are important to performance of the database system, using them as the sole basis for design increases the liklihood that unplanned questions may not have answers in the resulting tables.

Another common mis-conception is the difference between how the data is presented and how it is stored - these two aspects of the data may have little to do with each other. Inded, in a well designed system, the data may be displayed with great flexibility.

It is (unfortunately) possible in poorly designed systems that certain questions have no easily attained answer.

DON'T PANIC! ----> correctness of table design is covered elsewhere in the course.

Tables that have been stored relationally in simple forms can be VIEWED differently - a VIEW is an alternative presentation of stored information, displayed prettily.


Table Keys

Keys are essential in a relational database. They are statements of UNIQUENESS. Keys are what is used by the RDMS to tell the rows of a table apart from each other, and how rows in one table relate to corresponding rows in another table.

Unfortunately, terminology like Primary Key would tend to intimate that that type of key is more important than any other type of key - this is plain nonsense. All keys, be they 'primary', secondary, foreign or merely indexed fields are important.

  1. PRIMARY KEYS

    This is a KEY that uniquely identifies each row of a table. Each table should have at least a primary key, else we lose the power to manipulate the information fully, and maintaining the tables' data may not be possible. It makes sense that rows of a table are all different - why should we store the same record twice?

    In the Music Database, the primary keys are as follows:

    • Albums = ser_num
    • Tracks = ser_num + song
    • Performers = ser_num + artist + instrument

    A key can be a SIMPLE KEY (one column) or a COMPOSITE KEY (involving 2 or more columns in combination)

    A key column should NEVER BE NULL - that is, there should always be a unique value supplied in a key column

  2. The larger the arity of the key, the more work the RDMS has to do to maintain the integrity checking as data is added/modified. Large keys can seriously impede system performance. In certain circumstances, an artificial primary key (single column, like an ID) is created, to simplify the indexing/updating of table rows.

  3. FOREIGN KEYS

    A foreign key in a table is a corresponding column that is a PRIMARY KEY in another table.

    The Music database has two foreign key definitions -

    In the tracks table, a particular SerNum must already exist in the albums table: tracks.SerNum is a foreign key referencing albums.

    Similarly, performers.SerNum is a foreign key referencing albums. This means that in order to enter performer or track listing for any album, that album must already exist in albums.

   eg. Suppose we create a new music table 'Origins':

        Artist          Home
        char(20)        char(30)
        key

the Artist column is the primary key of this table, in both the Albums and Performers tables, that column now becomes a foreign key.

3. SECONDARY KEYS

  • A secondary key is a column (or combination of columns) in a table that also has no repeats contained in it. In Access, secondary keys are implemented as Unique Indexes. A secondary key is put in place to reflect a constraint (or rule) that exists in the universe the database is made to work in. They are designed to provide extra data checking, ensuring inconsistent data is not entered.

Key columns can be used to ensure that the database retains its integrity - that is the information is not allowed to be stored in an inconsistent form.

Foreign key definitions enforce Referential integrity of the database. Be warned, however that they can cause CASCADE effects. Consider the effect of deleting an album - what effects would it have on the database???


Significant Populations

A population is SIGNIFICANT if there are sufficient instances to deduce the PRIMARY KEY, and the underlying relationship binding the table together.

 

wonko@wonko.info
©Copyright t 1992..2018+. Edition 26.150117
wonkosite
Creative Commons License
This work is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 2.1 Australia License
.