simshadows

Databases: The Relational Model and Relational Algebra

Basic Terms

Relation Schemas

Relation schemas are finite sets of attributes.

Notation for some relation RR with attributes A,B,C,DA, B, C, D and primary key B,DB, D:

R(A,B,C,D) R(A, \underline{B}, C, \underline{D})

The domain of a schema is the set of all possible tuples that can be associated with the schema.

For relation RR, this is denoted:

Dom(R) \dom{(R)}

Keys

A superkey of a relation schema is any subset of that schema that can uniquely identify a tuple.

In other words, all tuples have unique values in the superkey.

Or more formally, for r(R)r(R) and any two distinct tuples t1,t2rt_1, t_2 \in r, if t1[K]t2[K]t_1[K] \neq t_2[K] for some KRK \subseteq R, then KK is a superkey.

Note that for r(R)r(R), RR is always a superkey.


A candidate key is a “minimal superkey”.

More formally, superkey KK is a candidate key if no subset KKK' \subset K exists that is also a superkey.


A primary key is the specific candidate key designated to a relation schema.

All relations must have exactly one primary key.


A foreign key of a relation schema is a subset of attributes that reference a particular primary key.

The referenced primary key is often a different relation, but may also be of the same relation.

Attributes

Attributes are labels for parts of a schema.

The domain of an attribute is the set of values that can be associated with the attribute.

For attribute AA, this is denoted:

Dom(A) \dom{(A)}

Relation Instances

Relation instances are finite sets of tuples.

To show that rr is an instance of schema RR, we write:

r(R) r(R)

Example
A venn diagram representing the inner/theta join.

Tuples

Tuples are mappings from schema to schema domain.

Concretely, a tuple can be thought of as a set of attribute-value pairs.

We will use the following notation in this cheatsheet:

The Null Value (Extended RA)

The null value is a special value that may be part of an attribute domain.

We will use the symbol \relnullvalue to represent null in this cheatsheet.

While the null value is an important part of the relational model, it is not a core part of relational algebra.

Operators

Selection

Produces the subset of tuples that satisfy a condition.

For r(R)r(R) and selection condition cc:

σc(r)={trc(t)} \relselect_{c}{(r)} = \setdef{t \in r}{c(t)}

Example
rr
AABBCC
a1a_122xx
a2a_288xx
a3a_377xx
a4a_499yy
σ(B>5)(C=x)(r)\relselect_{(B>5) \wedge (C=x)}{(r)}
AABBCC
a2a_288xx
a3a_377xx

Projection

Produces a relation with a subset of attributes.

For r(R)r(R) and attribute set XX:

πX(r)={t[X]tr} \relproject_{X}{(r)} = \setdef{t[X]}{t \in r}

Example
rr
AABBCCDD
a1a_1b1b_1c1c_1d1d_1
a2a_2b2b_2c2c_2d2d_2
a3a_3b3b_3c3c_3d3d_3
πB,D(r)\relproject_{B, D}{(r)}
BBDD
b1b_1d1d_1
b2b_2d2d_2
b3b_3d3d_3

Rename

Casts a relation to a different schema.

For r(R)r(R) and a compatible schema SS:

ρS(r) \relrename_{S}{(r)}

Example
rr
AABBCC
a1a_1b1b_1c1c_1
a2a_2b2b_2c2c_2
a3a_3b3b_3c3c_3
ρS(D,E,F)(r)\relrename_{S(D, E, F)}{(r)}
DDEEFF
a1a_1b1b_1c1c_1
a2a_2b2b_2c2c_2
a3a_3b3b_3c3c_3

Union, Intersection, and Difference

Set theoretic operations between union-compatible relations (i.e. same schema).

For r1(R)r_1(R) and r2(R)r_2(R):

r1r2={t(tr1)(tr2)}r1r2={t(tr1)(tr2)}r1r2={t(tr1)(tr2)}\begin{align*} r_1 \cup r_2 &= \setdef{t}{(t \in r_1) \vee (t \in r_2)} \\ r_1 \cap r_2 &= \setdef{t}{(t \in r_1) \wedge (t \in r_2)} \\ r_1 - r_2 &= \setdef{t}{(t \in r_1) \wedge (t \notin r_2)} \end{align*}

Example
r1r_1
AABBCC
a1\bm{a_1}b1\bm{b_1}c1\bm{c_1}
a2\bm{a_2}b2\bm{b_2}c2\bm{c_2}
a3\bm{a_3}b3\bm{b_3}c3\bm{c_3}
a4\bm{a_4}b4\bm{b_4}c4\bm{c_4}
a5\bm{a_5}b5\bm{b_5}c5\bm{c_5}
r2r_2
AABBCC
a4\bm{a_4}b4\bm{b_4}c4\bm{c_4}
a5\bm{a_5}b5\bm{b_5}c5\bm{c_5}
a6\bm{a_6}b6\bm{b_6}c6\bm{c_6}
a7\bm{a_7}b7\bm{b_7}c7\bm{c_7}
r1r2r_1 \cup r_2
AABBCC
a1\bm{a_1}b1\bm{b_1}c1\bm{c_1}
a2\bm{a_2}b2\bm{b_2}c2\bm{c_2}
a3\bm{a_3}b3\bm{b_3}c3\bm{c_3}
a4\bm{a_4}b4\bm{b_4}c4\bm{c_4}
a5\bm{a_5}b5\bm{b_5}c5\bm{c_5}
a6\bm{a_6}b6\bm{b_6}c6\bm{c_6}
a7\bm{a_7}b7\bm{b_7}c7\bm{c_7}
r1r2r_1 - r_2
AABBCC
a1\bm{a_1}b1\bm{b_1}c1\bm{c_1}
a2\bm{a_2}b2\bm{b_2}c2\bm{c_2}
a3\bm{a_3}b3\bm{b_3}c3\bm{c_3}
r1r2r_1 \cap r_2
AABBCC
a4\bm{a_4}b4\bm{b_4}c4\bm{c_4}
a5\bm{a_5}b5\bm{b_5}c5\bm{c_5}
r2r1r_2 - r_1
AABBCC
a6\bm{a_6}b6\bm{b_6}c6\bm{c_6}
a7\bm{a_7}b7\bm{b_7}c7\bm{c_7}

Cartesian Product

Produces every possible combination of tuples from two relations.

For r(R)r(R) and s(S)s(S):

r×s={(tr:ts)(trr)(tss)} r \times s = \setdef{(t_r:t_s)}{(t_r \in r) \wedge (t_s \in s)}

Example
rr
AABBCC
a1a_1b1b_1c1c_1
a2a_2b2b_2c2c_2
ss
MMNN
m1m_1n1n_1
m2m_2n2n_2
m3m_3n3n_3
r×sr \times s
AABBCCMMNN
a1a_1b1b_1c1c_1m1m_1n1n_1
a1a_1b1b_1c1c_1m2m_2n2n_2
a1a_1b1b_1c1c_1m3m_3n3n_3
a2a_2b2b_2c2c_2m1m_1n1n_1
a2a_2b2b_2c2c_2m2m_2n2n_2
a2a_2b2b_2c2c_2m3m_3n3n_3

Theta/Inner Join and Equijoin

Theta join combines the tuples of two relations using a matching criterion.

For r(R)r(R) and s(S)s(S), and some arbitrary matching criterion cc:

rcs={(tr:ts)(trr)(tss)c(tr:ts)} r \bowtie_c s = \setdef{ (t_r : t_s) }{ (t_r \in r) \wedge (t_s \in s) \wedge c(t_r : t_s) }

Theta join can also be defined as:

rcs=σc(r,s)(r×s) r \bowtie_c s = \relselect_{c(r, s)}{\parens{r \times s}}

Duplicate attributes are not removed.

Equijoin is a special case of theta join that only tests for equality between attributes.

Example
rr
AABBCC
a1a_1b1b_100
a2a_2b2b_233
a3a_3b3b_322
ss
MMNN
m1m_133
m2m_244
m3m_311
m4m_422
rNCsr \bowtie_{N \le C} s
AABBCCMMNN
a2a_2b2b_233m1m_133
a2a_2b2b_233m3m_311
a2a_2b2b_233m4m_422
a3a_3b3b_322m3m_311
a3a_3b3b_322m4m_422

Natural Join

A special case of equijoin that matches tuples on all their common attributes.

For r(R)r(R) and s(S)s(S):

rs={(tr:ts[SR])(trr)(tss)m(tr,ts)} r \bowtie s = \setdef{ \parens{t_r : t_s[S - R]} }{ (t_r \in r) \wedge (t_s \in s) \wedge m(t_r, t_s) }

where mm is “all common attributes must match”:

m(tr,ts)=(xxtr[RS]=ts[RS]) m(t_r, t_s) = \parens{\vphantom{\frac{x}{x}} t_r[R \cap S] = t_s[R \cap S]}

Natural join can also be defined as:

rs=πRS(σm(r,s)(r×s)) r \bowtie s = \relproject_{R \cup S}{\parens{\relselect_{m(r, s)}{\parens{r \times s}}}}

where mm is:

m(r,s)=(xxr[RS]=s[RS]) m(r, s) = \parens{\vphantom{\frac{x}{x}} r[R \cap S] = s[R \cap S]}

and πRS\relproject_{R \cup S} assumes removal of duplicate attributes.

Natural join is dangerous in real applications. Use equijoin to explicitly match tuples instead.

Example
rr
AABBCCDD
a1a_1b1b_1xxxx
a2a_2b2b_2x\bm{x}y\bm{y}
a3a_3b3b_3x\bm{x}y\bm{y}
a4a_4b4b_4x\bm{x}y\bm{y}
ss
CCDDEE
yyyye1e_1
x\bm{x}y\bm{y}e2e_2
y\bm{y}x\bm{x}e3e_3
rsr \bowtie s
AABBCCDDEE
a2a_2b2b_2x\bm{x}y\bm{y}e2e_2
a3a_3b3b_3x\bm{x}y\bm{y}e2e_2
a4a_4b4b_4y\bm{y}x\bm{x}e3e_3

Division (Extended RA)

For r(R)r(R) and s(S)s(S), with SRS \subseteq R:

r÷s={tπRS(r)Φ(t)} r \div s = \setdef{t \in \relproject_{R-S}{(r)}}{\Phi(t)}

where:

Φ(t)=tss(1)trr(2)(tr[S]=ts(3)t=tr[RS](4)) \newcommand{\X}{\rule[-2ex]{0mm}{2ex}} \Phi(t) = \underbrace{\X \forall t_s \in s}_{(1)} \, \underbrace{\X \exists t_r \in r}_{(2)} \, ( \underbrace{\X t_r[S] = t_s}_{(3)} \underbrace{\X \wedge t = t_r[R-S]}_{(4)} )

In plain English: (1) For all tuples in ss, (2) there is at least one relation in rr (3) whose common attributes match, (4) and whose non-common attributes are the same.

Division can also be defined as:

r÷s=rπ(RS)(xx[r×s]r)“disqualifier” term r \div s = r' - \underbrace{ \rule[-2ex]{0mm}{2ex} \relproject_{(R-S)}{\parens{\vphantom{\frac{x}{x}} [r' \times s] - r}} }_{\text{``disqualifier'' term}}

where rr' is “all possible result tuples”:

r=π(RS)(r) r' = \relproject_{(R-S)}{(r)}

Example
rr
AABBCCDD
α\alphaβ\betaxxxx
α\bm{\alpha}β\bm{\beta}x\bm{x}z\bm{z}
α\bm{\alpha}β\bm{\beta}w\bm{w}z\bm{z}
γ\gammaδ\deltaxxxx
γ\bm{\gamma}δ\bm{\delta}x\bm{x}y\bm{y}
γ\bm{\gamma}δ\bm{\delta}x\bm{x}z\bm{z}
γ\gammaδ\deltawwxx
γ\gammaδ\deltawwyy
γ\bm{\gamma}δ\bm{\delta}w\bm{w}z\bm{z}
γ\gammaδ\deltawwww
ϵ\bm{\epsilon}ζ\bm{\zeta}x\bm{x}y\bm{y}
ϵ\bm{\epsilon}ζ\bm{\zeta}x\bm{x}z\bm{z}
ϵ\bm{\epsilon}ζ\bm{\zeta}w\bm{w}z\bm{z}
η\bm{\eta}θ\bm{\theta}x\bm{x}y\bm{y}
ss
CCDD
xxyy
xxzz
wwzz
r÷sr \div s
AABB
γ\gammaδ\delta
ϵ\epsilonζ\zeta

In rr, values (γ,δ)(\gamma, \delta) and (ϵ,ζ)(\epsilon, \zeta) are the only values that occur with all values (x,y)(x, y), (x,z)(x, z), and (w,z)(w, z).

Outer Join (Extended RA)

Similar to theta join, except in the result relation, we also include tuples that don’t have matches in the join.

These unmatched tuples get padded with null values in the result relation.

Full outer join includes all tuples of both operands:

rcs=(rcs)(xx(rπR(rcs))×{(,)})(xx(rπR(rcs))×{(,)})=(rcs)(rcs)\begin{align*} r \relfullouterjoinSubscripted_c\, s &= (r \bowtie_c s) \cup \parens{\vphantom{\frac{x}{x}} \parens{r - \relproject_R{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ & \qquad \cup \parens{\vphantom{\frac{x}{x}} \parens{r - \relproject_R{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ &= (r \relleftouterjoinSubscripted_c\, s) \cup (r \relrightouterjoinSubscripted_c\, s) \end{align*}

Left outer join includes all tuples of the left:

rcs=(rcs)(xx(rπR(rcs))×{(,)})=scr\begin{align*} r \relleftouterjoinSubscripted_c\, s &= (r \bowtie_c s) \cup \parens{\vphantom{\frac{x}{x}} \parens{r - \relproject_R{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ &= s \relrightouterjoinSubscripted_c\, r \end{align*}

Right outer join includes all tuples of the right:

rcs=(rcs)(xx(sπS(rcs))×{(,)})=scr\begin{align*} r \relrightouterjoinSubscripted_c\, s &= (r \bowtie_c s) \cup \parens{\vphantom{\frac{x}{x}} \parens{s - \relproject_S{(r \bowtie_c s)}} \times \braces{(\relnullvalue, \dots)} } \\ &= s \relleftouterjoinSubscripted_c\, r \end{align*}

To summarize the differences between the joins:

Inner/Theta

A venn diagram representing the inner/theta join.

Left Outer

A venn diagram representing the left outer join.

Right Outer

A venn diagram representing the right outer join.

Full Outer

A venn diagram representing the full outer join.

Example
rr
AABBCCDD
a1a_1b1b_1xxxx
a2a_2b2b_2x\bm{x}y\bm{y}
a3a_3b3b_3y\bm{y}x\bm{x}
a4a_4b4b_4zzww
ss
CCDDEE
yyyye1e_1
x\bm{x}y\bm{y}e2e_2
y\bm{y}x\bm{x}e3e_3
rsr \relfullouterjoin s
AABBCCDDEE
a1a_1b1b_1xxxx\relnullvalue
a2a_2b2b_2x\bm{x}y\bm{y}e2e_2
a3a_3b3b_3y\bm{y}x\bm{x}e3e_3
a4a_4b4b_4zzww\relnullvalue
\relnullvalue\relnullvalueyyyye1e_1
rsr \relleftouterjoin s
AABBCCDDEE
a1a_1b1b_1xxxx\relnullvalue
a2a_2b2b_2x\bm{x}y\bm{y}e2e_2
a3a_3b3b_3y\bm{y}x\bm{x}e3e_3
a4a_4b4b_4zzww\relnullvalue

Grouping Operator (Extended RA)

Performs calculations over groups of tuples within a relation. Produces a relation containing the results.

For r(R)r(R) and operator subscript GG:

γG(R) \relgroup_G{(R)}

where GG is a list containing:

Formally, aggregate functions can be considered to take a multiset (i.e. a set with duplicates) of values.

Aggregate functions defined in SQL-92:

AVG, MAX, MIN, SUM, COUNT

Most major DBMS implementations offer many more aggregate functions and allow user-defined functions.

Example
rr
AABBCCDD
xxα\alpha5577
xxα\alpha7733
xxβ\beta 1199
xxβ\beta 5566
yyα\alpha3311
yyα\alpha2211
yyβ\beta 3355
zzα\alpha9944
γA,COUNT(A),MAX(C),MAX(D)(r)\relgroup_{A, \text{COUNT}(A), \text{MAX}(C), \text{MAX}(D)}{(r)}
AACOUNT(A)\text{COUNT}(A)MAX(C)\text{MAX}(C)MAX(D)\text{MAX}(D)
xx447799
yy333355
zz119944
γA,B,COUNT(A),SUM(C)(r)\relgroup_{A, B, \text{COUNT}(A), \text{SUM}(C)}{(r)}
AABBCOUNT(A)\text{COUNT}(A)SUM(C)\text{SUM}(C)
xxα\alpha221212
xxβ\beta 226 6
yyα\alpha225 5
yyβ\beta 113 3
zzα\alpha119 9

The COUNT()\text{COUNT}() function is interesting in that it doesn’t really matter which column you use.

Generalized Projection (Extended RA)

We can extend the projection operator π\pi to also contain expressions for computation.

Example
rr
AABBCCDD
a1a_12266d1d_1
a2a_28899d2d_2
a3a_33322d3d_3
πA,B+C(r)\relproject_{A, B + C}{(r)}
AAA\phantom{A}A\phantom{A}B+CB + C
a1a_18 8
a2a_21717
a3a_35 5

Further Concepts

Functional Dependencies

TODO: Write about this!