IPT home IPT A Virtual Approach by Peter Whitehouse
Quick Links:
 
 
SQL home
Information and Intelligent Systems Social and Ethical Implications Human Computer Interaction Software and Systems Engineering eXercise Files Course Outline and Assessment A-Z of Geeky Acronyms Terrace Work Program 2004 Sillybus FAQ = Frequently Asked Questions Help
 
 

Multi-Table Queries

Set Oriented Concepts

SQL returns sets of results - the result table is a set of tuples.

We have used in and union as set oriented operations.

Also available are the following:

any - meaning some member of

all - meaning every member of



any

1965 in (1962, 1965, 1971) is true

1965 = any (1962, 1965, 1971) is true
(is birthyear some member of the set(...))

Problem:
1962 ^=any (1962, 1965, 1971) is true
this is not the same as NOT IN

1962 ^=any (1962) this is false
this is the only case where it is

all

1962 =all (1962, 1965, 1971) is false
1962 <> to every member

1967 ^=all (1962, 1965, 1971) is true

1962 ^=all (1962, 1965, 1971) is false

1962 >=all (1962, 1965, 1971) is false

1971 >=all (1962,1965,1971) is true
it is also the LARGEST

In general, we use not in rather than ^=all


Sub-Queries

Sub-queries are, in general terms, selects that form sets of values for use inside other selects

select col
from tab
where col OPERATOR

(select col
from ....)

where the OPERATOR is a set operation (in, ^=all, ... =)

in and not in

In the above diagram, set intersection (underneath) and set difference (on the top) is illustrated. The corresponding query is:

select albname
from Albums
where PDate between 1990 and 1995
and Recco = 'EMI'
and Media = 'LP'
and sernum not in

(select sernum
from performers
where instrument = 'Piano Accordian')

and sernum in

(select sernum
from tracks
where song like '%buffalo%')

NOTES ON SUBQUERIES

  • The equals sign ('=') is only useful for subqueries that are single value returns
  • In order of precedence, sub queries are calculated FIRST, and normally ONLY ONCE
  • The returning result from the sub-query must be compatible with the condition linked to it.
  • A sub-query must return a SINGLE SET (ie. one column) of results. Should a multi-table return prove necessary, one solution is to create a view or temporary table that contains the desirec field collection.
  • It is INVALID to ORDER BY in a sub-query ==> set theory tells us that the returning set has no inherent order
  • It is INVALID to use DISTINCT in a sub-query ==> again, set theory tells us that the returning set is automatically distinct

UNION, INTERSECTION and DIFFERENCE

union

Unioning two (or more) type compatible fields allows us to 'merge' the two populations to form a single, distinct list.

select artist
from performers

union

select artist
from albums
order by 1 asc

The above query results in a distinct list of artists (as set operators exclude all repeats) - SINGLE copies of all the artist names in the performers table, with any new ones added from the albums table.

To use ORDER BY for unioned queries, you MUST use column numbers and not names (as the result table column contents come from a composite source)

select sernum, artist
from albums

union

select sernum, artist
from performers

order by 2

The above query will be aranged by the ARTIST column


Query Formation

In order to identify what the query you must write is, you first must identify where the information resides (ie. which columns, tables..) and what has to be done to that information before you want to see it (column functions, grouping, ordering...)

eg:In the CLASS database, list the full names of people that share hobbies in common with Fred Smith

select name, fname <-- people
from student
where sid in

(select sid <--sids that like one or more of Fred's hobbies
from enjoys
where hobby in

(select hobby <--Fred's hobbies
from enjoys
where sid in

(select sid <--Fred's sid
from student
where sname = 'Smith'
and fname = 'Fred')))


JOINS

subquery vs join

There are MANY 'flavours' of join available in mSQL, and some correspondingly powerful operators.

A detailed examination of Joins is available in a paper written by Mr Peter Whitehouse for the QSITE State Conference in 1999


The CROSS Join

When you take every row in one table and join it to every row from another table, you are said to be forming the Cartesial Join or a CROSS JOIN.

Such a join is 'unconstrained', and in many situations, provides useless information as rows of data are connected to otherwise un-related information.

For Example:

gpsSchools

school
GT
BSHS
NC
ACGS

The following is an example of a CROSS JOIN, that partially re-works the GPS Playlist problem:

select P.school, Q.school
from gpsSchools P CROSS JOIN gpsSchools Q

The result table would contain 2 copies of every game, plus games where teams play with themselves - a less than useful result.

The cross join above is functionally identical to the following code:

select P.school, Q.school
from gpsSchools P, gpsSchools Q

In the above code, the COMMA (,) is a JOIN OPERATOR when used in the FROM clause. By default, mSQL performs a CROSS JOIN when a comma is used.


The NATURAL INNER Join

When you join two tables, ensuring that ONLY related rows are placed together, you are usually performing an INNER join.

To connect, in the Music Database say, album details to Track information, we could use the following query:

select albname, artist, song
from albums A, tracks T
where A.serNum = B.serNum

In this query, we explicitly state the join condition (that is what must be correct if two tuples are to be connected). Another formation of the same query is as follows:

select albname, artist, song
from albums NATURAL INNER JOIN tracks

or indeed, just:

select albname, artist, song
from albums JOIN tracks

In both of the above queries, only rows in the Albums table that have corresponding Tracks entries are displayed.

This becomes complicated when tables have columns with the same names but where you DON'T want those columns to be part of the join (remembering mSQL will associate them BECAUSE they have the same name)

Should a MULTI-column natural inner join be necessary, then the USING clause can be used, using the following syntax guide:

select col {,col}
from table1 JOIN table2 using (col,col {,col})

As an example, a query that would list Album Names of those that feature the artist also playing an instrument on that album:

select distinct albName
from albums join perforers using (artist, sernum)

Since, in the music database different conventions have been used for storing names (Surname, Firstname VS Firstname Surname), the above query only finds those artists that go under a single name (eg. Bjork and Vangelis)

All of the above examples rely on the joined columns having the same names in both of the joined tables. Often, however, this is not the case, and we are faced with using different named columns as the basis of joins. The ON clause can be used to overcome the difficulty, as follows:

select col {,col}
from table1 JOIN table2 ON (colFromTable1 = colFromTable2)


The LEFT OUTER Join

The first table mentioned in a FROM clause is termed the LEFT table. A LEFT OUTER JOIN is used when you require all of the population of the LEFT table's nominated column(s) and either RIGHT table data that corresponds or NULLS if no data corresponds on the RIGHT.

To illustrate, consider a query that delivers AlbNames and corresponding tracks on that album. We could perform a natural inner join thus:

select albName, song
from albums JOIN tracks

In the query above, NOT ALL of the AlbNames are present in the answer table - only those that actually have tracks recorded.

select albName, song
from albums LEFT OUTER JOIN tracks

The above query will contain ALL of the albNames - the song column will contain NULLS in those situations where track listings are missing for an album.


The RIGHT OUTER Join

The RIGHT OUTER JOIN is similar to the LEFT OUTER JOIN, only the 'dominant' table is the RIGHT table (all of it's tuples will be present, accompanied by either corresponding data or NULLS from the left table).

These join types are VERY useful when it comes to creating datasets for database applications.


JOIN or SUB-QUERY?

Consider the table: Data

Person char(15)
Sex char(1)
Birth_yr dec(6)

problem:list persons born in the same year as someone else, together with that year

MySQL: (using sub-q)


select person,birth_yr
from data
where birth_yr in

(select birth_yr
from data
group by birth_yr
having count(person) >1 )

(using join)


select a.person, b.person, a.birth_yr
from data a, data b
where a.birth_yr = b.birth_yr
and a.person < b.person

MySQL is OPTIMISED to allow JOINS to be evaluated faster than the recently added sub-queries. This is, however, far from standardised between 'flavours' of SQL. Microsoft's Access SQL, for example, is optimised to evaluate Joins, and sub-queries in Access SQL can take a long time to deliver answers.


Some diagram symbology for common multi-table formations

an intersection (IN)

select a
from X
where a in

(select b
from Y)

this is a conjunction
NOT IN (alternative version

select a
from X
where a NOT IN

(select b
from Y)

this is a disjunction
Cross Join (unconstrained join) select X.a
from X,Y
select X.a
from X CROSS JOIN Y
Natural Inner Join

select X.a
from X,Y
where X.a = Y.b

 

select X.a
from X NATURAL INNER JOIN Y
select X.a
from X JOIN Y
leftOuterJoin select X.a
from X LEFT OUTER JOIN Y
rightouterjoin select X.a
from X RIGHT OUTER JOIN Y

 

wonko@wonko.info
©Copyright t 1992..2018+. Edition 26.150117
wonkosite
Creative Commons License
This work is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 2.1 Australia License
.