Friday, November 12, 2010

Analytic Functions

Some of the things that are hard to do in straight SQL are actually
very commonly requested operations, including:

a) Calculate a running total – Show the cumulative salary within a department row by row, with each row including a summation of the prior rows' salary.

b) Find percentages within a group – Show the percentage of the total salary paid to an individual in a certain department. Take their salary and divide it by the sum of the salary in the department.

c) Top-N queries – Find the top N highest-paid people or the top N sales by region.

d) Compute a moving average – Average the current row's value and the previous N rows values together.

e) Perform ranking queries – Show the relative rank of an individual's salary within their department.

Analytic functions, which have been available since Oracle 8.1.6, are designed to address these issues. They add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with the pure SQL approach. These extensions are currently under review by the ANSI SQL committee for inclusion in the SQL specification.

A quick example, which calculates a running total of salaries by department, and an explanation of what exactly is happening, will give you a good initial understanding of analytic functions:

SQL> break on deptno skip 1
SQL> select ename, deptno, sal,
sum(sal) over
(order by deptno, ename) running_total,
sum(sal) over
(partition by deptno
order by ename) department_total,
row_number() over
(partition by deptno
order by ename) seq
from emp
order by deptno, ename
/

ENAME DEPTNO SAL RUNNING_TOTAL DEPARTMENT_TOTAL SEQ
---------- ---------- ---------- ------------- ---------------- ----------
CLARK 10 2450 2450 2450 1
KING 5000 7450 7450 2
ADAMS 20 1100 9850 1100 1
FORD 3000 12850 4100 2
ALLEN 30 1600 21225 1600 1
BLAKE 2850 24075 4450 2


In the above code, we were able to compute a RUNNING_TOTAL for the entire query. This was done using the entire ordered resultset, via SUM(SAL) OVER (ORDER BY DEPTNO, ENAME). We were also able to compute a running total within each department, a total that would be reset at the beginning of the next department. The PARTITION BY DEPTNO in that SUM(SAL) caused this to happen – a partitioning clause was specified in the query in order to break the data up into groups.

The ROW_NUMBER() function is used to sequentially number the rows returned in each group, according to our ordering criteria(a SEQ column was added to in order to display this position). So, we see that SCOTT is the fourth row in department 20, when ordered by ENAME. This ROW_NUMBER() feature has many uses elsewhere, for example to transpose or pivot resultsets (as we will discuss later).

This new set of functionality holds some exciting possibilities. It opens up a whole
new way of looking at the data. It will remove a lot of procedural code and complex (or inefficient) queries that would have taken a long time to develop, to achieve the same result. Just to give you a flavor of how efficient these analytic functions can be, over the old 'pure relational ways', let's compare the performance of the
above query with 1000 rows instead of just 14 rows. Both the new analytical functions and the 'old' relational methods are used here to test the performance of the query. The following two statements will set up a copy of the SCOOTT.EMP table with just the ENAME, DEPTNO, and SAL columns along with an index (the only one needed for this example). I am doing everything using DEPTNO and ENAME:

The above took 0.61 seconds and 294 logical I/Os. Now, repeat the same exact query using only 'standard' SQL functionality:

sql> select ename, deptno, sal,
(select sum(sal)
from t e2
where e2.deptno < emp.deptno
or (e2.deptno = emp.deptno and e2.ename <= emp.ename ))
running_total,
(select sum(sal)
from t e3
where e3.deptno = emp.deptno
and e3.ename <= emp.ename)
department_total,
(select count(ename)
from t e3
where e3.deptno = emp.deptno
and e3.ename <= emp.ename)
seq
from t emp
order by deptno, ename
/

How Analytic Functions Work

The first part of this section will contain the dry details of the syntax and definitions of terms. After that, we'll dive right into examples. I'll demonstrate many of the 26 new functions (not all of them, as many of the examples would be repetitive). The analytic functions utilize the same general syntax and many
provide specialized functionality designed for certain technical disciplines not used by the everyday developer. Once you get familiar with the syntax – how to partition, how to define windows of data, and so on – using these functions will become very natural.

The Syntax

The syntax of the analytic function is rather straightforward in appearance, but looks can be deceiving. It starts with:

FUNCTION_NAME(,,…)
OVER
( )

There are up to four parts to an analytic function; it can be invoked with arguments, a partition clause, an order by clause, and a windowing clause. In the example shown in the above introduction:

4 sum(sal) over
5 (partition by deptno
6 order by ename) department_total,

In this case:
❑ SUM is our FUNCTION_NAME.

❑ (SAL) is the argument to our analytic function. Each function takes between zero and three arguments. The arguments are expressions – that could have been the SUM(SAL+COMM) just as well.

❑ OVER is a keyword that identifies this as an analytic function. Otherwise, the query parser would not be able to tell the difference between SUM() the aggregate function and SUM() the analytic function. The clause that follows the OVER keyword describes the slice of data that this analytic function will be performed 'over'.

❑ PARTITION BY DEPTNO is the optional partitioning clause. If no partitioning clause exists, the entire resultset is treated as a single large partition. You will use that to break a result set into groups and the analytic function will be applied to the group, not to the entire result set. In the introductory example when the partitioning clause was left out, the SUM of SAL was generated for the entire resultset. By partitioning by DEPTNO, the SUM of SAL by DEPTNO was computed– resetting the running total for each group.

❑ ORDER BY ENAME is the optional ORDER BY clause; some functions demand it, and some do not. Functions that depend on ordered data, such as LAG and LEAD that are used to access the 'previous' and 'next' rows in a result set, necessitate the use of an ORDER BY clause. Other functions, such as AVG do not. This clause is mandatory when using a windowing function of any sort (see below in the section on The Windowing Clause for more details). This specifies how the data is ordered within a group when computing the analytic function. You did not have to order by both DEPTNO and ENAME in this case since it is partitioned by DEPTNO – it is implied that the partitioning columns are part of the sort key by definition (the ORDER BY is applied to each partition in turn).

❑ WINDOWING CLAUSE was left out in this example. This is where the syntax gets a little confusing looking in some cases. We will delve into all of the permutations for the windowing clause in detail below.

The Partition Clause

The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words 'partition' and 'group' are used synonymously here, and in the Oracle documentation. The analytic functions are applied to each group independently – they are 'reset'
for each group. For example, above when we demonstrated a cumulative SAL function, we partitioned by DEPTNO. When the DEPTNO changed in the result set, we reset the cumulative SAL to ZERO and summation started anew.

If you omit a partitioning clause, then the entire result set is considered a single group. In the introductory example we used SUM(SAL) without a partitioning clause in order to obtain a running total for the entire result set.

It is interesting to note that each instance of an analytic function in a query may have an entirely different partitioning clause; the simple example we started this chapter with does this in fact. The column RUNNING_TOTAL did not supply a partitioning clause; hence, the entire result set was its target group. The column DEPARTMENTAL_TOTAL on the other hand, partitioned the result set by
departments, allowing us to compute a running total within a department.

The partition clause has a simple syntax and is very similar in syntax to the GROUP BY clause you normally see with SQL queries:

PARTITION BY expression <, expression> <, expression>

The Order By Clause

The ORDER BY clause specifies how the data is sorted within each group (partition). This will definitely affect the outcome of any analytic function. The analytic functions are computed differently in the presence of an ORDER BY clause (or lack thereof). As a very simple example, consider what happens when we use AVG() with and without an ORDER BY clause:

sql> select ename, sal, avg(sal) over ()
from emp;
/

ENAME SAL AVG( SAL)OVER()
---------- --------- --------------
SMITH 800.00 2073.21
ALLEN 1600.00 2073.21
WARD 1250.00 2073.21
JONES 2975.00 2073.21
MARTIN 1250.00 2073.21
.
.
.

SQL> select ename, sal, avg(sal) over (ORDER BY ENAME)
from emp
order by ename
/

ENAME SAL AVG(SAL)OVER(ORDERBYENAME)
---------- --------- --------------------------
ADAMS 1100.00 1100.00
ALLEN 1600.00 1350.00
BLAKE 2850.00 1850.00
.
.
.

Without the ORDER BY clause, the average is computed over the entire group and the same value is given to every row (it is being used as a reporting function). When AVG() is used with the ORDER BY, the average for each row is the average of that row and all preceding rows (here it is used as a window function). For example, the average salary for ALLEN in the query with the ORDER BY clause is 1350
(the average of 1100 and 1600).

SQL> select ename, deptno,
sum(sal) over (order by ename, deptno) sum_ename_deptno,
sum(sal) over (order by deptno, ename) sum_deptno_ename
from emp
order by ename, deptno
/

ENAME DEPTNO SUM_ENAME_DEPTNO SUM_DEPTNO_ENAME
---------- ---------- ---------------- ----------------
ADAMS 20 1100 9850
ALLEN 30 2700 21225
BLAKE 30 5550 24075
.
.
.
Both of the SUM(SAL) columns are equally correct; one of them is computing the SUM(SAL) by DEPTNO and then ENAME whereas the other does it by ENAME and then DEPTNO. Since the result set is order by (ENAME, DEPTNO) the SUM(SAL) that is computed in that order looks more correct but they both come to the same result, the grand total is 29025.

The syntax of an ORDER BY clause with analytic functions is as follows:

ORDER BY expression ,


The Windowing Clause

This is where the syntax gets a little complicated in appearance. Although it's not that hard in reality, the terms are a little confusing at first. Terms such as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is the default window with an ORDER BY clause, are not terms that you use everyday. The syntax for a window clause is fairly complex to list out. Rather than trying to redraw the 'wire diagram' that you can review in the Oracle8i SQL Reference Manual yourself, I will list all of the
variants of the windowing clause and explain the set of data that each would use, within a group. First, though, let's look at what the windowing clause does for you.

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group. For example, the range clause RANGE UNBOUNDED PRECEDING means 'apply the analytic function to every row in the current group from the first row in the group to the current row'. The default window is an anchored
window that simply starts at the first row of a group and continues to the current row. If a window is used such as:

SUM(sal) OVER
(PARTITION BY deptno
ORDER BY ename
ROWS 2 PRECEDING) department_total2,

This would create a sliding window within a group and compute the sum of the current row's SAL column plus the previous 2 rows in that group. If we need a report that shows the sum of the current employee's salary with the preceding two salaries within a department, it would look like this:

SQL> break on deptno
SQL> select deptno, ename, sal,
sum(sal) over
(partition by deptno
order by ename
rows 2 preceding) sliding_total
from emp
order by deptno, ename
/

DEPTNO ENAME SAL SLIDING_TOTAL
---------- ---------- ---------- -------------
10 CLARK 2450 2450
KING 5000 7450
MILLER 1300 8750
20 ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075
SCOTT 3000 8975
SMITH 800 6775


The relevant portion of the query here was:
2 sum(sal) over
3 (partition by deptno
4 order by ename
5 rows 2 preceding) sliding_total

The partition clause makes the SUM(SAL) be computed within each department, independent of the other groups (the SUM(SAL) is 'reset' as the department changes). The ORDER BY ENAME clause sorts the data within each department by ENAME; this allows the window clause, rows 2 preceding, to access the 2 rows prior to the current row in a group in order to sum the salaries. For example, if you note the
SLIDING_TOTAL value for SMITH is 6775, which is the sum of 800, 3000, and 2975. That was simply SMITH's row plus the salary from the preceding two rows in the window.
We can set up windows based on two criteria: RANGES of data values or ROWS offset from the current row. We've seen the RANGE clause a couple of times, with a RANGE UNBOUNDED PRECEDING, for example. That says to get all rows in our partition that came before us as specified by the ORDER BY clause. It should be noted that in order to use a window, you must use an ORDER BY clause. We will now look at the ROW and RANGE windows and then finish up by describing the various ways in which the
windows may be specified.

Range Windows

Range windows collect rows together based on a WHERE clause. If I say 'range 5 preceding' for example, this will generate a sliding window that has the set of all preceding rows in the group such that they are within 5 units of the current row. These units may either be numeric comparisons or date comparisons and it is not valid to use RANGE with datatypes other than numbers and dates.

If I have the EMP table with the date column HIREDATE and I specify:
count(*) over (order by hiredate asc range 100 preceding) then this will find all of the preceding rows in the partition such that the HIREDATE is within 100 days of
the current row's HIREDATE. In this case, since the data is sorted by ASC (ascending, small to big), the values in the window would consist of all of the rows in the current group such that the HIREDATE was less then the current row's HIREDATE and within 100 days of it. If we used:

count(*) over (order by hiredate desc range 100 preceding)
instead and sorted the partition DESC (descending, big to small) this would perform the same basic logic but since the data in the group is sorted differently, it will find a different set of rows for the window. In this case, it will find all of the rows preceding the current row, such that the HIREDATE was greater than
the current rows HIREDATE and within 100 days of it. An example will help make this clearer. I will use a query that utilizes the FIRST_VALUE analytic function. This function returns the value of the expression using the FIRST row in a window. We can see where the window begins easily:

SQL> select ename, sal, hiredate, hiredate-100 windowtop,
first_value(ename)
over (order by hiredate asc
range 100 preceding) ename_prec,
first_value(hiredate)
over (order by hiredate asc
range 100 preceding) hiredate_prec
from emp
order by hiredate asc
/

ENAME SAL HIREDATE WINDOW_TOP ENAME_PREC HIREDATE_
---------- ---------- --------- ---------- ---------- ---------
SMITH 800 17-DEC-80 08-SEP-80 SMITH 17-DEC-80
ALLEN 1600 20-FEB-81 12-NOV-80 SMITH 17-DEC-80
WARD 1250 22-FEB-81 14-NOV-80 SMITH 17-DEC-80
JONES 2975 02-APR-81 23-DEC-80 ALLEN 20-FEB-81
BLAKE 2850 01-MAY-81 21-JAN-81 ALLEN 20-FEB-81
CLARK 2450 09-JUN-81 01-MAR-81 JONES 02-APR-81
TURNER 1500 08-SEP-81 31-MAY-81 CLARK 09-JUN-81


We ordered the single partition by HIREDATE ASC. We used the analytic function FIRST_VALUE to find the value of the first ENAME in our window and the first HIREDATE in our window. If we look at the row for CLARK we can see that his HIREDATE was 09-JUN-81, and 100 days prior to that is the date 01-MAR-81. For convenience, this date is put into the column WINDOWTOP. The analytic function then
defined as the window every row in the sorted partition that preceded the CLARK record and where the HIREDATE was between 09-JUN-81 and 01-MAR-81. The first value of ENAME for that window is JONES and this is the name that the analytic function returns in the column ENAME_PREC.

Looking at this from the HIREDATE DESC (descending) perspective, we see instead:

SQL> select ename, sal, hiredate, hiredate+100 windowtop,
first_value(ename)
over (order by hiredate desc
range 100 preceding) ename_prec,
first_value(hiredate)
over (order by hiredate desc
range 100 preceding) hiredate_prec
from emp
order by hiredate desc
/

ENAME SAL HIREDATE WINDOWTOP ENAME_PREC HIREDATE_
---------- ---------- --------- --------- ---------- ---------
ADAMS 1100 12-JAN-83 22-APR-83 ADAMS 12-JAN-83
SCOTT 3000 09-DEC-82 19-MAR-83 ADAMS 12-JAN-83
MILLER 1300 23-JAN-82 03-MAY-82 MILLER 23-JAN-82
FORD 3000 03-DEC-81 13-MAR-82 MILLER 23-JAN-82
JAMES 950 03-DEC-81 13-MAR-82 MILLER 23-JAN-82
KING 5000 17-NOV-81 25-FEB-82 MILLER 23-JAN-82
MARTIN 1250 28-SEP-81 06-JAN-82 FORD 03-DEC-81
TURNER 1500 08-SEP-81 17-DEC-81 FORD 03-DEC-81
CLARK 2450 09-JUN-81 17-SEP-81 TURNER 08-SEP-81
BLAKE 2850 01-MAY-81 09-AUG-81 CLARK 09-JUN-81

Row Windows

Row windows are physical units; physical numbers of rows, to include in the window. Using the preceding example as a ROW partition:

count (*) over ( order by x ROWS 5 preceding )


That window will consist of up to 6 rows; the current row and the five rows 'in front of' this row (where 'in front of' is defined by the ORDER BY clause). With ROW partitions, we do not have the limitations of the RANGE partition – the data may be of any type and the order by may include many columns. Here is an example similar to the one we just did above:

sql> select ename, sal, hiredate,
first_value(ename)
over (order by hiredate asc
rows 5 preceding) ename_prec,
first_value(hiredate)
over (order by hiredate asc
rows 5 preceding) hiredate_prec
from emp
order by hiredate asc
/

ENAME SAL HIREDATE ENAME_PREC HIREDATE_
---------- -------- --------- ---------- ---------
SMITH 800.00 17-DEC-80 SMITH 17-DEC-80
ALLEN 1600.00 20-FEB-81 SMITH 17-DEC-80
WARD 1250.00 22-FEB-81 SMITH 17-DEC-80
JONES 2975.00 02-APR-81 SMITH 17-DEC-80
BLAKE 2850.00 01-MAY-81 SMITH 17-DEC-80
CLARK 2450.00 09-JUN-81 SMITH 17-DEC-80

Looking at CLARK again, we see that the first value in the window ROWS 5 PRECEDING is SMITH; the first row in the window preceding CLARK going back 5 rows. In fact, SMITH is the first value in all of the preceding rows, for BLAKE, JONES, and so on. This is because SMITH is the first record in this group (SMITH is the first value for SMITH even). Sorting the group in a descending fashion reverses the windows

SQL> select ename, sal, hiredate,
first_value(ename)
over (order by hiredate desc
rows 5 preceding) ename_prec,
first_value(hiredate)
over (order by hiredate desc
rows 5 preceding) hiredate_prec
from emp
order by hiredate desc
/

ENAME SAL HIREDATE ENAME_PREC HIREDATE_
---------- -------- --------- ---------- ---------
ADAMS 1100.00 12-JAN-83 ADAMS 12-JAN-83
SCOTT 3000.00 09-DEC-82 ADAMS 12-JAN-83
MILLER 1300.00 23-JAN-82 ADAMS 12-JAN-83
JAMES 950.00 03-DEC-81 ADAMS 12-JAN-83
FORD 3000.00 03-DEC-81 ADAMS 12-JAN-83
KING 5000.00 17-NOV-81 ADAMS 12-JAN-83
MARTIN 1250.00 28-SEP-81 SCOTT 09-DEC-82
TURNER 1500.00 08-SEP-81 MILLER 23-JAN-82
CLARK 2450.00 09-JUN-81 JAMES 03-DEC-81


SQL> select ename, hiredate, sal,
avg(sal)
over (order by hiredate asc rows 5 preceding) avg_5_before,
count(*)
over (order by hiredate asc rows 5 preceding) obs_before,
avg(sal)
over (order by hiredate desc rows 5 preceding) avg_5_after,
count(*)
over (order by hiredate desc rows 5 preceding) obs_after
from emp
order by hiredate
/

ENAME HIREDATE SAL AVG_5_BEFORE OBS_BEFORE AVG_5_AFTER OBS_AFTER
---------- --------- -------- ------------ ---------- ----------- ---------
SMITH 17-DEC-80 800.00 800.00 1.00 1987.50 6.00
ALLEN 20-FEB-81 1600.00 1200.00 2.00 2104.17 6.00
WARD 22-FEB-81 1250.00 1216.67 3.00 2045.83 6.00
JONES 02-APR-81 2975.00 1656.25 4.00 2670.83 6.00
BLAKE 01-MAY-81 2850.00 1895.00 5.00 2675.00 6.00
CLARK 09-JUN-81 2450.00 1987.50 6.00 2358.33 6.00
TURNER 08-SEP-81 1500.00 2104.17 6.00 2166.67 6.00
MARTIN 28-SEP-81 1250.00 2045.83 6.00 2416.67 6.00
KING 17-NOV-81 5000.00 2670.83 6.00 2391.67 6.00
JAMES 03-DEC-81 950.00 2333.33 6.00 1587.50 4.00
FORD 03-DEC-81 3000.00 2358.33 6.00 1870.00 5.00
MILLER 23-JAN-82 1300.00 2166.67 6.00 1800.00 3.00
SCOTT 09-DEC-82 3000.00 2416.67 6.00 2050.00 2.00
ADAMS 12-JAN-83 1100.00 2391.67 6.00 1100.00 1.00

The Functions

There are over 26 analytic functions available to use with this feature. Some of them have the same name as existing aggregate functions such as AVG and SUM. Others have new names and provide new functionality. What we will do in this section is simply list the available functions and give a short description of what their purpose is.

AVG
(
expression )

Used to compute an average of an expression within a group and
window. Distinct may be used to find the average of the values in a
group after duplicates have been removed.


COUNT
(
<*>
)

This will count occurrences within a group. If you specify * or some
non-null constant, count will count all rows. If you specify an expression, count returns the count of non-null evaluations of expression. You may use the DISTINCT modifier to count occurrences of rows in a group after duplicates have been removed.

DENSE_RANK

This function computes the relative rank of each row returned from a query with respect to the other rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and continuing on up.
The rank is incremented every time the values of the ORDER BY expressions change. Rows with equal values receive the same rank (nulls are considered equal in this comparison). A dense rank returns a ranking number without any gaps. This is in comparison to RANK below.

FIRST_VALUE

This simply returns the first value from a group.

LAG
(expression,
,
)


LAG gives you access to other rows in a resultset without doing a selfjoin.
It allows you to treat the cursor as if it were an array in effect. You can reference rows that come before the current row in a given group. This would allow you to select 'the previous rows' from a group along with the current row. See LEAD for how to get 'the next rows'. Offset is a positive integer that defaults to 1 (the previous row). Default is the value to be returned if the index is out of range of the window (for the first row in a group, the default will be returned)

LAST_VALUE This simply returns the last value from a group.

LEAD
(expression,
,
)


LEAD is the opposite of LAG. Whereas LAG gives you access to the a
row preceding yours in a group – LEAD gives you access to the a row that comes after your row. Offset is a positive integer that defaults to 1 (the next row). Default is
the value to be returned if the index is out of range of the window (for the last row in a group, the default will be returned).

MAX(expression) Finds the maximum value of expression within a window of a group.

MIN(expression) Finds the minimum value of expression within a window of a group.

RANK

This function computes the relative rank of each row returned from a query with respect to the other rows, based on the values of the expressions in the ORDER BY clause. The data within a group is sorted by the ORDER BY clause and then a numeric ranking is assigned to each row in turn starting with 1 and continuing on up. Rows with the same values of the ORDER BY expressions receive the same rank;
however, if two rows do receive the same rank the rank numbers will subsequently 'skip'. If two rows are number 1, there will be no number 2 – rank will assign the value of 3 to the next row in the group. This is in contrast to DENSE_RANK, which does not skip values.

Examples

The TOP-N Query

SQL> select *
from (select deptno, ename, sal,
dense_rank()
over (partition by deptno
order by sal desc)
dr from emp)
where dr <= 3
order by deptno, sal desc
/

DEPTNO ENAME SAL DR
---------- ---------- ---------- ----------
10 KING 5000 1
CLARK 2450 2
MILLER 1300 3
20 SCOTT 3000 1
FORD 3000 1
JONES 2975 2
ADAMS 1100 3


Here the DENSE_RANK() function was used to get the top three salaries. We assigned the dense rank to the salary column and sorted it in a descending order. If you recall from above, a dense rank does not skip numbers and will assign the same number to those rows with the same value. Hence, after the resultset is built in the inline view, we can simply select all of the rows with a dense rank of three or less,
this gives us everyone who makes the top three salaries by department number. Just to show what would happen if we tried to use RANK and encountered these duplicate values:

SQL> select deptno, ename, sal,
dense_rank()
over (partition by deptno
order by sal desc) dr,
rank()
over (partition by deptno
order by sal desc) r
from emp
order by deptno, sal desc
/

DEPTNO ENAME SAL DR R
---------- ---------- ---------- ---------- ----------
10 KING 5000 1 1
CLARK 2450 2 2
MILLER 1300 3 3
20 SCOTT 3000 1 1
FORD 3000 1 1
JONES 2975 2 3
ADAMS 1100 3 4
SMITH 800 4 5


If we had used RANK, it would have left ADAMS (because he is ranked at 4) out of the resultset but ADAMS is one of the people in department 20 making the top 3 salaries so he belongs in the result. In this case, using RANK over DENSE_RANK would not have answered our specific query. Lastly, we had to use an inline view and alias the dense rank column to DR. This is because we cannot use analytic functions in a WHERE or HAVING clause directly so we had to compute the view and then filter just the rows we wanted to keep. The use of an inline view with a predicate will be a common
operation in many of our examples.

Now on to the question 'Give me up to three people who make the top salaries by department':

SQL> select *
from (select deptno, ename, sal,
count(*) over (partition by deptno
order by sal desc
range unbounded preceding)
cnt from emp)
where cnt <= 3
order by deptno, sal desc
/

DEPTNO ENAME SAL CNT
---------- ---------- ---------- ----------
10 KING 5000 1
CLARK 2450 2
MILLER 1300 3
20 SCOTT 3000 2
FORD 3000 2
JONES 2975 3
30 BLAKE 2850 1
ALLEN 1600 2
TURNER 1500 3

Pivot Query

A pivot query is when you want to take some data such as:
C1 C2 C3
-- -- --
a1 b1 x1
a1 b1 x2
a1 b1 x3

and you would like to display in the following format:
C1 C2 C3(1) C3(2) C3(3)
----- ----- ------ ----- ----
a1 b1 x1 x2 x3

This turns rows into columns. For example taking the distinct jobs within a department and making
them be columns so the output would look like:


DEPTNO JOB_1 JOB_2 JOB_3
---------- --------- --------- ---------
10 CLERK MANAGER PRESIDENT
20 ANALYST ANALYST CLERK
30 CLERK MANAGER SALESMAN


instead of this:
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN


SQL> select deptno,
max(decode(seq,1,ename,null)) highest_paid,
max(decode(seq,2,ename,null)) second_highest,
max(decode(seq,3,ename,null)) third_highest
from (SELECT deptno, ename,
row_number() OVER
(PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
where seq <= 3
group by deptno
/

DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
10 KING CLARK MILLER
20 SCOTT FORD JONES
30 BLAKE ALLEN TURNER

This simply created an inner resultset that had a sequence assigned to employees by department number in order of salary. The decode in the outer query keeps only rows with sequences 1, 2, or 3 and assigns them to the correct 'column'. The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand what I mean by that if you see the resultset without
the GROUP BY and MAX:

SQ> select deptno,
(decode(seq,1,ename,null)) highest_paid,
(decode(seq,2,ename,null)) second_highest,
(decode(seq,3,ename,null)) third_highest
from (select deptno, ename,
row_number() over
(partition by deptno
order by sal desc nulls last)
seq from emp)
where seq <= 3
/

DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
10 KING
10 CLARK
10 MILLER
20 SCOTT
20 FORD
20 JONES
30 ALLEN
30 BLAKE
30 MARTIN
9 rows selected.

Ref : Expert Oracle - Thomas Kyte

No comments:

Post a Comment