Databases: The Relational Model and Relational Algebra

Basic Terms

Relation Schemas

Relation schemas are finite sets of attributes.

Notation for some relation $R$ with attributes $A, B, C, D$ and primary key $B, D$ :

R(A, \underline{B}, C, \underline{D})

The domain of a schema is the set of all possible tuples that can be associated with the schema.

For relation $R$ , this is denoted:

\dom{(R)}

Keys

A superkey of a relation schema is any subset of that schema that can uniquely identify a tuple.

In other words, all tuples have unique values in the superkey.

Or more formally, for $r(R)$ and any two distinct tuples $t_1, t_2 \in r$ , if $t_1[K] \neq t_2[K]$ for some $K \subseteq R$ , then $K$ is a superkey.

Note that for $r(R)$ , $R$ is always a superkey.

A candidate key is a “minimal superkey”.

More formally, superkey $K$ is a candidate key if no subset $K' \subset K$ exists that is also a superkey.

A primary key is the specific candidate key designated to a relation schema.

All relations must have exactly one primary key.

A foreign key of a relation schema is a subset of attributes that reference a particular primary key.

The referenced primary key is often a different relation, but may also be of the same relation.

Attributes

Attributes are labels for parts of a schema.

The domain of an attribute is the set of values that can be associated with the attribute.

For attribute $A$ , this is denoted:

\dom{(A)}

Relation Instances

Relation instances are finite sets of tuples.

To show that $r$ is an instance of schema $R$ , we write:

r(R)

Example

A venn diagram representing the inner/theta join.

Tuples

Tuples are mappings from schema to schema domain.

Concretely, a tuple can be thought of as a set of attribute-value pairs.

We will use the following notation in this cheatsheet:

$t[X]$ extracts attribute set $X$ .
$t[X]$ is itself a tuple.
$t[A]$ extracts a single attribute $A$ .
$t[A]$ is a value. (Yes, I know this is confusing.)
$(t_r : t_s)$ combines tuples $t_r$ and $t_s$ .
$(t_r : t_s)$ is itself a tuple.

The Null Value (Extended RA)

The null value is a special value that may be part of an attribute domain.

We will use the symbol $\relnullvalue$ to represent null in this cheatsheet.

While the null value is an important part of the relational model, it is not a core part of relational algebra.

Operators

Selection

Produces the subset of tuples that satisfy a condition.

For $r(R)$ and selection condition $c$ :

\relselect_{c}{(r)} = \setdef{t \in r}{c(t)}

Example

$r$
$A$	$B$	$C$
$a_1$	$2$	$x$
$a_2$	$8$	$x$
$a_3$	$7$	$x$
$a_4$	$9$	$y$

$\relselect_{(B>5) \wedge (C=x)}{(r)}$
$A$	$B$	$C$
$a_2$	$8$	$x$
$a_3$	$7$	$x$

Projection

Produces a relation with a subset of attributes.

For $r(R)$ and attribute set $X$ :

\relproject_{X}{(r)} = \setdef{t[X]}{t \in r}

Example

$r$
$A$	$B$	$C$	$D$
$a_1$	$b_1$	$c_1$	$d_1$
$a_2$	$b_2$	$c_2$	$d_2$
$a_3$	$b_3$	$c_3$	$d_3$

$\relproject_{B, D}{(r)}$
$B$	$D$
$b_1$	$d_1$
$b_2$	$d_2$
$b_3$	$d_3$

Rename

Casts a relation to a different schema.

For $r(R)$ and a compatible schema $S$ :

\relrename_{S}{(r)}

Example

$r$
$A$	$B$	$C$
$a_1$	$b_1$	$c_1$
$a_2$	$b_2$	$c_2$
$a_3$	$b_3$	$c_3$

$\relrename_{S(D, E, F)}{(r)}$
$D$	$E$	$F$
$a_1$	$b_1$	$c_1$
$a_2$	$b_2$	$c_2$
$a_3$	$b_3$	$c_3$

Union, Intersection, and Difference

Set theoretic operations between union-compatible relations (i.e. same schema).

For $r_1(R)$ and $r_2(R)$ :

\begin{align*} r_1 \cup r_2 &= \setdef{t}{(t \in r_1) \vee (t \in r_2)} \\ r_1 \cap r_2 &= \setdef{t}{(t \in r_1) \wedge (t \in r_2)} \\ r_1 - r_2 &= \setdef{t}{(t \in r_1) \wedge (t \notin r_2)} \end{align*}

Example

$r_1$
$A$	$B$	$C$
$\bm{a_1}$	$\bm{b_1}$	$\bm{c_1}$
$\bm{a_2}$	$\bm{b_2}$	$\bm{c_2}$
$\bm{a_3}$	$\bm{b_3}$	$\bm{c_3}$
$\bm{a_4}$	$\bm{b_4}$	$\bm{c_4}$
$\bm{a_5}$	$\bm{b_5}$	$\bm{c_5}$

$r_2$
$A$	$B$	$C$
$\bm{a_4}$	$\bm{b_4}$	$\bm{c_4}$
$\bm{a_5}$	$\bm{b_5}$	$\bm{c_5}$
$\bm{a_6}$	$\bm{b_6}$	$\bm{c_6}$
$\bm{a_7}$	$\bm{b_7}$	$\bm{c_7}$

$r_1 \cup r_2$
$A$	$B$	$C$
$\bm{a_1}$	$\bm{b_1}$	$\bm{c_1}$
$\bm{a_2}$	$\bm{b_2}$	$\bm{c_2}$
$\bm{a_3}$	$\bm{b_3}$	$\bm{c_3}$
$\bm{a_4}$	$\bm{b_4}$	$\bm{c_4}$
$\bm{a_5}$	$\bm{b_5}$	$\bm{c_5}$
$\bm{a_6}$	$\bm{b_6}$	$\bm{c_6}$
$\bm{a_7}$	$\bm{b_7}$	$\bm{c_7}$

$r_1 - r_2$
$A$	$B$	$C$
$\bm{a_1}$	$\bm{b_1}$	$\bm{c_1}$
$\bm{a_2}$	$\bm{b_2}$	$\bm{c_2}$
$\bm{a_3}$	$\bm{b_3}$	$\bm{c_3}$

$r_1 \cap r_2$
$A$	$B$	$C$
$\bm{a_4}$	$\bm{b_4}$	$\bm{c_4}$
$\bm{a_5}$	$\bm{b_5}$	$\bm{c_5}$

$r_2 - r_1$
$A$	$B$	$C$
$\bm{a_6}$	$\bm{b_6}$	$\bm{c_6}$
$\bm{a_7}$	$\bm{b_7}$	$\bm{c_7}$

Cartesian Product

Produces every possible combination of tuples from two relations.

For $r(R)$ and $s(S)$ :

r \times s = \setdef{(t_r:t_s)}{(t_r \in r) \wedge (t_s \in s)}

Example

$r$
$A$	$B$	$C$
$a_1$	$b_1$	$c_1$
$a_2$	$b_2$	$c_2$

$s$
$M$	$N$
$m_1$	$n_1$
$m_2$	$n_2$
$m_3$	$n_3$

$r \times s$
$A$	$B$	$C$	$M$	$N$
$a_1$	$b_1$	$c_1$	$m_1$	$n_1$
$a_1$	$b_1$	$c_1$	$m_2$	$n_2$
$a_1$	$b_1$	$c_1$	$m_3$	$n_3$
$a_2$	$b_2$	$c_2$	$m_1$	$n_1$
$a_2$	$b_2$	$c_2$	$m_2$	$n_2$
$a_2$	$b_2$	$c_2$	$m_3$	$n_3$

Theta/Inner Join and Equijoin

Theta join combines the tuples of two relations using a matching criterion.

For $r(R)$ and $s(S)$ , and some arbitrary matching criterion $c$ :

r \bowtie_c s = \setdef{ (t_r : t_s) }{ (t_r \in r) \wedge (t_s \in s) \wedge c(t_r : t_s) }

Theta join can also be defined as:

r \bowtie_c s = \relselect_{c(r, s)}{\parens{r \times s}}

Duplicate attributes are not removed.

Equijoin is a special case of theta join that only tests for equality between attributes.

Example

$r$
$A$	$B$	$C$
$a_1$	$b_1$	$0$
$a_2$	$b_2$	$3$
$a_3$	$b_3$	$2$

$s$
$M$	$N$
$m_1$	$3$
$m_2$	$4$
$m_3$	$1$
$m_4$	$2$

$r \bowtie_{N \le C} s$
$A$	$B$	$C$	$M$	$N$
$a_2$	$b_2$	$3$	$m_1$	$3$
$a_2$	$b_2$	$3$	$m_3$	$1$
$a_2$	$b_2$	$3$	$m_4$	$2$
$a_3$	$b_3$	$2$	$m_3$	$1$
$a_3$	$b_3$	$2$	$m_4$	$2$

Natural Join

A special case of equijoin that matches tuples on all their common attributes.

For $r(R)$ and $s(S)$ :

r \bowtie s = \setdef{ \parens{t_r : t_s[S - R]} }{ (t_r \in r) \wedge (t_s \in s) \wedge m(t_r, t_s) }

where $m$ is “all common attributes must match”:

m(t_r, t_s) = \parens{\vphantom{\frac{x}{x}} t_r[R \cap S] = t_s[R \cap S]}

Natural join can also be defined as:

r \bowtie s = \relproject_{R \cup S}{\parens{\relselect_{m(r, s)}{\parens{r \times s}}}}

where $m$ is:

m(r, s) = \parens{\vphantom{\frac{x}{x}} r[R \cap S] = s[R \cap S]}

and $\relproject_{R \cup S}$ assumes removal of duplicate attributes.

Natural join is dangerous in real applications. Use equijoin to explicitly match tuples instead.

Example

$r$
$A$	$B$	$C$	$D$
$a_1$	$b_1$	$x$	$x$
$a_2$	$b_2$	$\bm{x}$	$\bm{y}$
$a_3$	$b_3$	$\bm{x}$	$\bm{y}$
$a_4$	$b_4$	$\bm{x}$	$\bm{y}$

$s$
$C$	$D$	$E$
$y$	$y$	$e_1$
$\bm{x}$	$\bm{y}$	$e_2$
$\bm{y}$	$\bm{x}$	$e_3$

$r \bowtie s$
$A$	$B$	$C$	$D$	$E$
$a_2$	$b_2$	$\bm{x}$	$\bm{y}$	$e_2$
$a_3$	$b_3$	$\bm{x}$	$\bm{y}$	$e_2$
$a_4$	$b_4$	$\bm{y}$	$\bm{x}$	$e_3$

Division (Extended RA)

For $r(R)$ and $s(S)$ , with $S \subseteq R$ :

r \div s = \setdef{t \in \relproject_{R-S}{(r)}}{\Phi(t)}

where:

\newcommand{\X}{\rule[-2ex]{0mm}{2ex}} \Phi(t) = \underbrace{\X \forall t_s \in s}_{(1)} \, \underbrace{\X \exists t_r \in r}_{(2)} \, ( \underbrace{\X t_r[S] = t_s}_{(3)} \underbrace{\X \wedge t = t_r[R-S]}_{(4)} )

In plain English: (1) For all tuples in $s$ , (2) there is at least one relation in $r$ (3) whose common attributes match, (4) and whose non-common attributes are the same.

Division can also be defined as:

r \div s = r' - \underbrace{ \rule[-2ex]{0mm}{2ex} \relproject_{(R-S)}{\parens{\vphantom{\frac{x}{x}} [r' \times s] - r}} }_{\text{``disqualifier'' term}}

where $r'$ is “all possible result tuples”:

r' = \relproject_{(R-S)}{(r)}

Example

$r$
$A$	$B$	$C$	$D$
$\alpha$	$\beta$	$x$	$x$
$\bm{\alpha}$	$\bm{\beta}$	$\bm{x}$	$\bm{z}$
$\bm{\alpha}$	$\bm{\beta}$	$\bm{w}$	$\bm{z}$
$\gamma$	$\delta$	$x$	$x$
$\bm{\gamma}$	$\bm{\delta}$	$\bm{x}$	$\bm{y}$
$\bm{\gamma}$	$\bm{\delta}$	$\bm{x}$	$\bm{z}$
$\gamma$	$\delta$	$w$	$x$
$\gamma$	$\delta$	$w$	$y$
$\bm{\gamma}$	$\bm{\delta}$	$\bm{w}$	$\bm{z}$
$\gamma$	$\delta$	$w$	$w$
$\bm{\epsilon}$	$\bm{\zeta}$	$\bm{x}$	$\bm{y}$
$\bm{\epsilon}$	$\bm{\zeta}$	$\bm{x}$	$\bm{z}$
$\bm{\epsilon}$	$\bm{\zeta}$	$\bm{w}$	$\bm{z}$
$\bm{\eta}$	$\bm{\theta}$	$\bm{x}$	$\bm{y}$

$s$
$C$	$D$
$x$	$y$
$x$	$z$
$w$	$z$

$r \div s$
$A$	$B$
$\gamma$	$\delta$
$\epsilon$	$\zeta$

In $r$ , values $(\gamma, \delta)$ and $(\epsilon, \zeta)$ are the only values that occur with all values $(x, y)$ , $(x, z)$ , and $(w, z)$ .

Outer Join (Extended RA)

Similar to theta join, except in the result relation, we also include tuples that don’t have matches in the join.

These unmatched tuples get padded with null values in the result relation.

Full outer join includes all tuples of both operands:

\begin{align*} r \relfullouterjoinSubscripted_c\, s &= (r \bowtie_c s) \cup \parens{\vphantom{\frac{x}{x}} \parens{r - \relproject_R{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ & \qquad \cup \parens{\vphantom{\frac{x}{x}} \parens{r - \relproject_R{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ &= (r \relleftouterjoinSubscripted_c\, s) \cup (r \relrightouterjoinSubscripted_c\, s) \end{align*}

Left outer join includes all tuples of the left:

\begin{align*} r \relleftouterjoinSubscripted_c\, s &= (r \bowtie_c s) \cup \parens{\vphantom{\frac{x}{x}} \parens{r - \relproject_R{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ &= s \relrightouterjoinSubscripted_c\, r \end{align*}

Right outer join includes all tuples of the right:

\begin{align*} r \relrightouterjoinSubscripted_c\, s &= (r \bowtie_c s) \cup \parens{\vphantom{\frac{x}{x}} \parens{s - \relproject_S{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ &= s \relleftouterjoinSubscripted_c\, r \end{align*}

To summarize the differences between the joins:

Inner/Theta

Left Outer

Right Outer

Full Outer

Example

$r$
$A$	$B$	$C$	$D$
$a_1$	$b_1$	$x$	$x$
$a_2$	$b_2$	$\bm{x}$	$\bm{y}$
$a_3$	$b_3$	$\bm{y}$	$\bm{x}$
$a_4$	$b_4$	$z$	$w$

$s$
$C$	$D$	$E$
$y$	$y$	$e_1$
$\bm{x}$	$\bm{y}$	$e_2$
$\bm{y}$	$\bm{x}$	$e_3$

$r \relfullouterjoin s$
$A$	$B$	$C$	$D$	$E$
$a_1$	$b_1$	$x$	$x$	$\relnullvalue$
$a_2$	$b_2$	$\bm{x}$	$\bm{y}$	$e_2$
$a_3$	$b_3$	$\bm{y}$	$\bm{x}$	$e_3$
$a_4$	$b_4$	$z$	$w$	$\relnullvalue$
$\relnullvalue$	$\relnullvalue$	$y$	$y$	$e_1$

$r \relleftouterjoin s$
$A$	$B$	$C$	$D$	$E$
$a_1$	$b_1$	$x$	$x$	$\relnullvalue$
$a_2$	$b_2$	$\bm{x}$	$\bm{y}$	$e_2$
$a_3$	$b_3$	$\bm{y}$	$\bm{x}$	$e_3$
$a_4$	$b_4$	$z$	$w$	$\relnullvalue$

Grouping Operator (Extended RA)

Performs calculations over groups of tuples within a relation. Produces a relation containing the results.

For $r(R)$ and operator subscript $G$ :

\relgroup_G{(R)}

where $G$ is a list containing:

one or more attributes from $R$ to be taken as grouping attributes, and
one or more aggregate functions, written in the form $\theta{(A, \dots)}$ , where $\theta$ is an aggregate function to be applied to attributes $A, \dots$ .

Formally, aggregate functions can be considered to take a multiset (i.e. a set with duplicates) of values.

Aggregate functions defined in SQL-92:

AVG, MAX, MIN, SUM, COUNT

Most major DBMS implementations offer many more aggregate functions and allow user-defined functions.

Example

$r$
$A$	$B$	$C$	$D$
$x$	$\alpha$	$5$	$7$
$x$	$\alpha$	$7$	$3$
$x$	$\beta$	$1$	$9$
$x$	$\beta$	$5$	$6$
$y$	$\alpha$	$3$	$1$
$y$	$\alpha$	$2$	$1$
$y$	$\beta$	$3$	$5$
$z$	$\alpha$	$9$	$4$

$\relgroup_{A, \text{COUNT}(A), \text{MAX}(C), \text{MAX}(D)}{(r)}$
$A$	$\text{COUNT}(A)$	$\text{MAX}(C)$	$\text{MAX}(D)$
$x$	$4$	$7$	$9$
$y$	$3$	$3$	$5$
$z$	$1$	$9$	$4$

$\relgroup_{A, B, \text{COUNT}(A), \text{SUM}(C)}{(r)}$
$A$	$B$	$\text{COUNT}(A)$	$\text{SUM}(C)$
$x$	$\alpha$	$2$	$12$
$x$	$\beta$	$2$	$6$
$y$	$\alpha$	$2$	$5$
$y$	$\beta$	$1$	$3$
$z$	$\alpha$	$1$	$9$

The $\text{COUNT}()$ function is interesting in that it doesn’t really matter which column you use.

Generalized Projection (Extended RA)

We can extend the projection operator $\pi$ to also contain expressions for computation.

Example

$r$
$A$	$B$	$C$	$D$
$a_1$	$2$	$6$	$d_1$
$a_2$	$8$	$9$	$d_2$
$a_3$	$3$	$2$	$d_3$

$\relproject_{A, B + C}{(r)}$
$\phantom{A}A\phantom{A}$	$B + C$
$a_1$	$8$
$a_2$	$17$
$a_3$	$5$

Further Concepts

Functional Dependencies

TODO: Write about this!