Categories and Subcategories

The adjacency model

Added 2009-02-14


The fundamental structure of the adjacency model is a one-to-many relationship between a parent entry and its child entries. As with any one-to-many relationship, the child entries carry a foreign key to their parent. What makes the adjacency model different is that the parent and child entries are both stored in the same table.

create table categories
( id       integer     not null  primary key 
, name     varchar(37) not null
, parentid integer     null
, foreign key parentid_fk (parentid) 
      references categories (id)
);

Here's some sample data that might populate this table, and we should be able to get an idea of the parent-child relationships (if not grasp the entire hierarchy) just by looking at the data:

idnameparentid
1animalNULL
2vegetableNULL
3mineralNULL
4doggie1
5kittie1
6horsie1
7gerbil1
8birdie1
9carrot2
10tomato2
11potato2
12celery2
13rutabaga2
14quartz3
15feldspar3
16silica3
17gypsum3
18hunting4
19companion4
20herding4
21setter18
22pointer18
23terrier18
24poodle19
25chihuahua19
26shepherd20
27collie20

Terms commonly used with the adjacency model include tree, root, node, subtree, leaf, path, depth and level. There can be one or more trees in the table, and the parent foreign key is NULL for each tree's root node. A root node is therefore at the "top" of its tree. A node is any entry, while a leaf is any node that has no children, i.e. for which there exists no other node having that node as its parent. A subtree is the portion of the tree "under" any node. The depth of a subtree is the maximum number of levels of subtree beneath that node. These may not be official terminology definitions, but they work for me.

Why is it called a tree when it grows down from the "root" which is at the top? Mere convention.

Now let's see how a tree or hierarchy can be used to implement a category/subcategory structure.

Working with categories and subcategories

Using the adjacency model to implement categories and subcategories can be reduced to two simple steps:

  1. manage the hierarchical data
  2. display the hierarchical data

Managing the hierarchy is nothing special. Just look again at the table layout. There's a primary key column (id) and a foreign key referencing it (parentid). Other than that, it's a dead simple table. Use INSERT, UPDATE, and DELETE as with any other table. Whether we actually declare the foreign key on parentid, which is necessary for referential integrity, is secondary to the basic design. (Referential integrity means that the parent row should exist before the child row referencing it is inserted, and so on. See the article Relational Integrity in the Resources below.)

Displaying the hierarchy is challenging, but not difficult. Categories and subcategories can be handled in HTML in many ways. Current best practice is to use nested unordered lists. For further information, see Listamatic: one list, many options in the Resources below.

Displaying all categories and subcategories: site maps and navigation bars

To display the hierarchy, we must first retrieve it. The following method involves using as many LEFT OUTER JOINs as necessary to cover the depth of the deepest tree. For our sample data, the deepest tree has four levels, so the query requires four self-joins. Each join goes "down" a level from the node above it. The query begins at the root nodes.

select root.name  as root_name
     , down1.name as down1_name
     , down2.name as down2_name
     , down3.name as down3_name
  from categories as root
left outer
  join categories as down1
    on down1.parentid = root.id
left outer
  join categories as down2
    on down2.parentid = down1.id
left outer
  join categories as down3
    on down3.parentid = down2.id
 where root.parentid is null
order 
    by root_name 
     , down1_name 
     , down2_name 
     , down3_name

Notice how the WHERE clause ensures that only paths from the root nodes are followed. This query produces the following result set:

root_namedown1_namedown2_namedown3_name
animalbirdieNULLNULL
animaldoggiecompanionchihuahua
animaldoggiecompanionpoodle
animaldoggieherdingcollie
animaldoggieherdingshepherd
animaldoggiehuntingpointer
animaldoggiehuntingsetter
animaldoggiehuntingterrier
animalgerbilNULLNULL
animalhorsieNULLNULL
animalkittieNULLNULL
mineralfeldsparNULLNULL
mineralgypsumNULLNULL
mineralquartzNULLNULL
mineralsilicaNULLNULL
vegetablecarrotNULLNULL
vegetableceleryNULLNULL
vegetablepotatoNULLNULL
vegetablerutabagaNULLNULL
vegetabletomatoNULLNULL

Each row in the result set represents a distinct path from a root node to a leaf node. Notice how the LEFT OUTER JOIN, when extended "below" the leaf node in any given path, returns NULL (representing the fact that there was no node below that node, i.e. satisfying that join condition).

As we can see, this result set contains all our original categories and subcategories. If the categories and subcategories are being displayed on a web site, this query can therefore be used to generate the complete site map. An abbreviated query, that goes down only a certain number of levels from the roots, regardless of whether there may be nodes at deeper levels, can be used for the site's navigation bar.

We can display this sample data using nested unordered lists like this:

What's the easiest way to transform the result set into the nested ULs? In ColdFusion, we use nested CFOUTPUT tags, with the GROUP= parameter on all but the innermost list. Very straightforward indeed. In other scripting languages, as the saying goes, your mileage may vary. Take comfort in the fact that once you've coded it, you will never have to change your site map page again.

What if the hierarchy is more than, say, three or four levels deep? What if it's fifteen levels deep? My response to this question is threefold.

First, a query with fifteen self-joins may be a little more tedious to code but most assuredly will not present any difficulty to your database engine.

Second, in certain databases such as Oracle and DB2, recursion is built in, so you can go as many levels deep as you wish—although don't fool yourself, the coding required to display an arbitrary number of levels is no picnic either. Do not make the mistake of simulating recursion by coding a script module that calls itself, because from the database perspective, this is a series of calls (a query in a loop) and the performance will reflect this.

Thirdly, if you have a tree that goes more than three or four levels deep, you may have difficulty conveying this structure satisfactorily in a visual way. You may want to go back and re-think how you expect your users to actually navigate through the hierarchy. Sometimes the best solution is simply to show no more than three levels, with some sort of visual clue that there are further levels below the nodes shown.

The path to the root: the breadcrumb trail

Retrieving the path from any given node, whether it is a leaf node or not, to the root at the top of its path, is very similar to the site map query. Again, we use LEFT OUTER JOINs, but this time we go "up" the tree from the node, rather than "down."

select node.name as node_name 
     , up1.name as up1_name 
     , up2.name as up2_name 
     , up3.name as up3_name 
  from categories as node
left outer 
  join categories as up1 
    on up1.id = node.parentid  
left outer 
  join categories as up2
    on up2.id = up1.parentid  
left outer 
  join categories as up3
    on up3.id = up2.parentid
order
    by node_name    

Here's the result set from this query:

node_nameup1_nameup2_nameup3_name
animalNULLNULLNULL
birdieanimalNULLNULL
carrotvegetableNULLNULL
celeryvegetableNULLNULL
chihuahuacompaniondoggieanimal
collieherdingdoggieanimal
companiondoggieanimalNULL
doggieanimalNULLNULL
feldsparmineralNULLNULL
gerbilanimalNULLNULL
gypsummineralNULLNULL
herdingdoggieanimalNULL
horsieanimalNULLNULL
huntingdoggieanimalNULL
kittieanimalNULLNULL
mineralNULLNULLNULL
pointerhuntingdoggieanimal
poodlecompaniondoggieanimal
potatovegetableNULLNULL
quartzmineralNULLNULL
rutabagavegetableNULLNULL
setterhuntingdoggieanimal
shepherdherdingdoggieanimal
silicamineralNULLNULL
terrierhuntingdoggieanimal
tomatovegetableNULLNULL
vegetableNULLNULLNULL

Here each row in the result set is a single path, one for every node in the table. On a web site, such a path is often called a breadcrumb trail. (This name is somewhat misleading, because it suggests that it might represent how the visitor arrived at the page, which is not always the case. The accepted meaning of breadcrumb is simply the path from the root.)

In practice, we'd have a WHERE clause that would specify a single node, so in effect, the results above are all of the breadcrumbs in the table.

To display a breadcrumb trail in the normal fashion, from root to node, just display the result set columns in reverse order, and ignore the nulls. For example, let's say we run the above query for the category "companion" and get this:

node_nameup1_nameup2_nameup3_name
companiondoggieanimalNULL

The breadcrumb would look like this:

animal » doggie » companion

Simple, eh?

Resources

Listamatic: one list, many options
The power of CSS when applied to the lowly UL.
Trees in SQL by Joe Celko
The nested set model, alternative to the adjacency list model.
Storing Hierarchical Data in a Database
Modified Preorder Tree Traversal method.
Relational Integrity
Primary and foreign keys and stuff like that.