Friday, December 30, 2011

Oracle Hierarchical queries


It was really confusing when I was going through the documentation of Oracle Hierarchical queries for the first time. I red it many times to understand it well,  practiced many times to save it in my memory and put it down in my own words so that it stays there for a long time in the way I understood it. And I hope you too can understand it well.

When we talk about hierarchical queries, then it is very well clear that we are dealing with hierarchical data. Hope you know what a hierarchical data refers to. It can be a parent child relation, boss to employee relation, manager to reportee etc. So whatever such data we have in a table we need to select the data suitable to our convenience.

Understanding Hierarchical query clause:
When we are trying to fetch data from our hierarchical table, it is very obvious that we need to know two things.
  1. From which level of hierarchy we need to start? That is we need a starting point.
  2. What kind of relationship are we looking in this hierarchy? That is we need a connection between the data.
Hope you understood above two parts, if then, you wont forget your hierarchical query clause hereafter. For our hierarchical query clause we need both the above "start with" and "connect" keywords.

START WITH condition CONNECT BY condition

Hence START WITH specifies the root row of the hierarchy and CONNECT BY specifies the relationship between the parent rows and child rows of the hierarchy.

Initially while I was going through the examples of hierarchical query the operation PRIOR was so confusing for me and I was not able to understand this code well enough. Then I learned in a different method and now it become easy for me. Let me try to explain it in the below example.

--Create one table to hold our hierarchical data

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Dec 30 17:52:02 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> create table hierarchy_test
  2  (
  3  emp_id number(10),
  4  l_name varchar2(10),
  5  m_name varchar2(10),
  6  f_name varchar2(10),
  7  manager_id number(10),
  8  constraint hierarchy_test_pk primary key (emp_id)
  9  );
Table created.
SQL>

--Now insert the hierarchical data into this table

 SQL> INSERT INTO hierarchy_test VALUES (100,'King','','','');
1 row created.
SQL> INSERT INTO hierarchy_test VALUES (101,'Scott','','','100');
1 row created.
SQL> INSERT INTO hierarchy_test VALUES (102,'King','','',101);
1 row created.
SQL> INSERT INTO hierarchy_test VALUES (103,'King','','',101);
1 row created.
SQL> INSERT INTO hierarchy_test VALUES (104,'King','','',101);
1 row created.
SQL> INSERT INTO hierarchy_test VALUES (105,'King','','',103);
1 row created.
SQL> INSERT INTO hierarchy_test VALUES (106,'King','','',103);

--Example for hierarchical query
SQL> select emp_id, manager_id, level
  2  from hierarchy_test
  3  start with emp_id=100
  4  connect by  PRIOR emp_id =  manager_id
  5  order by level;


    EMP_ID MANAGER_ID      LEVEL
---------- ---------- ----------
       100                     1
       101        100          2
       104        101          3
       102        101          3
       103        101          3
       106        103          4
       105        103          4
7 rows selected.

You can use PRIOR keyword on either side of the operator '='. But I got confused why? What difference does it make. Also I was not able to understand why this keyword is used here.

Let me read this clause as below so that I make sure that I understood it well.

connect by  PRIOR emp_id =  manager_id
as
connect by previous emp_id = manager_id of the current row.

So, here in the example, first row emp_id=100, manager_id=null.
Now to get the current row get all the records for which the manager_id is equal to the employee id of the previous record. Now you go back and look into the result, you will be clear on why we use PRIOR in our code.

You can rewrite the above code as below with same result, as per my above sentence.

 SQL> select emp_id, manager_id, level
  2  from hierarchy_test
  3  start with emp_id=100
  4  connect by   manager_id =PRIOR emp_id
  5  order by level;
    EMP_ID MANAGER_ID      LEVEL
---------- ---------- ----------
       100                     1
       101        100          2
       104        101          3
       102        101          3
       103        101          3
       106        103          4
       105        103          4
7 rows selected.
SQL>

What if you give the PRIOR keyword just opposite as below for our data?

SQL> select emp_id, manager_id, level
  2  from hierarchy_test
  3  start with emp_id=100
  4  connect by   emp_id =  PRIOR manager_id
  5  order by level;
    EMP_ID MANAGER_ID      LEVEL
---------- ---------- ----------
       100                     1
SQL>

We get only one result set right?
Here what happens is we started with emp_id 100 and for the next row we said that, get me the row for which the employee id is the manager id of the previous record!!

Ooops!!
Kind don't have a manager and hence we have that column value as null. Hence oracle doesn't find any way to proceed further. Hence we got only one record.

Hope you got my point.

Now, what if I am not using the start by keyword. Your code will work but result is bad and not as expected.

SQL> select emp_id, manager_id, level
  2  from hierarchy_test
  3  connect by  PRIOR emp_id =  manager_id

    EMP_ID MANAGER_ID      LEVEL
---------- ---------- ----------
       104        101          1
       106        103          1
       101        100          1
       103        101          1
       105        103          1
       100                     1
       102        101          1
       106        103          2
       105        103          2
       104        101          2
       103        101          2
    EMP_ID MANAGER_ID      LEVEL
---------- ---------- ----------
       102        101          2
       101        100          2
       104        101          3
       102        101          3
       103        101          3
       105        103          3
       106        103          3
       106        103          4
       105        103          4
20 rows selected.
SQL>


Level 4 considers only 105 and 106 whereas level 3 consider 105,106, 102, 103 and 104 and it goes like that till level 1. Hence give the start with keyword with your hierarchical query.

Related Articles:
How do Oracle process Hierarchical queries.


****
Hope this is useful. Thanks Phoenix.


2 comments:

  1. I think this is an great blogs. Such a very informative and creative contents. These concept is good for these knowledge. I like it and help me to development very well. Thank you for this brief explanations... Oracle Training in Chennai | Oracle Training institutes in Chennai

    ReplyDelete
  2. Explained well!! I was also confused initially. Thanks for the blog.

    ReplyDelete