Wednesday, November 24, 2010

Some times you can have one's cake and eat it too

Finally a blog after 3 months..finally managed to overcome my laziness and finally found something interesting to blog. The past 1 month has been exciting with me joining a new company called fcs-inc.com and i am looking forward to doing some exciting things in my work place.

Now coming to the blog, if you are wondering what the logic behind this title is then read on...

More often than not, when we are stuck up with some bad performing SQL, one of the first things we do is to look for the plan generated by the optimizer.
Most of us would be wondering sometimes as to why this optimizer is not using Indexes and going for a FULL table scan and we keep scratching our heads. There are many reasons and we will discuss one of them here and how to overcome it.

Show time starts ......

Lets try creating a table called t.


SQL> drop table t;

Table dropped.

Elapsed: 00:00:02.20
SQL> create table t (x varchar2(10), y int);

Table created.

Elapsed: 00:00:00.17

Ok, so our simple 2 column table is created.

Now, lets create a simple index on column x of table t

SQL> create index idx_x on t(x);

Index created.

Lets insert some data now that our DDL s are done.


SQL> insert into t
  2  (select rownum, rownum+10 from dual connect by level < 1000000);

999999 rows created.

Elapsed: 00:00:19.29
SQL> commit;

Commit complete.

Elapsed: 00:00:00.03

Lets try to give optimizer the maximum amount of information by gathering the statistics.


SQL> exec dbms_stats.gather_table_stats('sh','T');

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.03
SQL> exec dbms_stats.gather_index_stats('sh','IDX_X');

PL/SQL procedure successfully completed.

Here starts the real fun.

I have all distinct values for x in my table.

Lets try to run a small query and see the results.

Before that we will set the AUTOTRACE ON.

SQL> SET AUTOTRACE TRACEONLY


SQL> SELECT * FROM T WHERE X = 9999;

Elapsed: 00:00:00.34

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |   517  (11)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |   517  (11)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_NUMBER("X")=9999)  ---- (Oracle is internally using TO_NUMBER here)

Ooops...this is a FULL TABLE SCAN. So to pick up one row out of 9,99,999 Oracle is scanning the entire table when there is a index that it could have used to generate the rows. This is definitely not correct.
But why did oracle choose the full table scan ??

A careful look at the query and the predicate information will reveal this.

SELECT * FROM T WHERE X = 9999

The data type of x is varchar2 and we are comparing a varchar2 column with a number.

Hence oracle has is not using the index.

Lets modify the query a bit and see the results.


SQL> select * from t where x = '9999';

Elapsed: 00:00:00.10

Execution Plan
----------------------------------------------------------
Plan hash value: 3460324958

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    11 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    11 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_X |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"='9999')

Bingo .... that is very sweet of Oracle to give us the expected plan.
The only difference was the quotes around the value '9999' .
Look at the change in the predicates also from the first and second queries
Since oracle in this case had to compare a varchar2 with a varchar2 , there was no need for an implicit conversion and hence the index was used.

Now, lets us improvise our query a bit.

Lets try to force Oracle to use the index 

SQL> select /*+ index(idx_x) */ x, y from t where x = 9999;

Elapsed: 00:00:00.31

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |   517  (11)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |   517  (11)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_NUMBER("X")=9999)

No use in forcing Oracle to use the index as it cannot use it because of the TO_NUMBER applied in the predicates.

This is getting interesting now and lets see how the optimizer behaves when we use an explicit TO_NUMBER conversion.

SQL> SELECT * FROM T WHERE x = to_number('9999');

Elapsed: 00:00:00.32

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |   517  (11)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |   517  (11)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_NUMBER("X")=9999)

The results are the same...expected

So, how do we over come this problem.

There are 2 solutions.

First is to create a FUNCTION BASED INDEX on the column that you would be applying the predicate on.

SQL> create index idx_fn_t_x on t(to_number(x));

Index created.

Elapsed: 00:00:02.50
SQL> select * from t where x = 999999;

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
Plan hash value: 4214757644

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  9984 |   107K|    13   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T          |  9984 |   107K|    13   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_FN_T_X |  4000 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(TO_NUMBER("X")=999999)

That works. Our new function based index IDX_FN_T_X is in operation

The second solution is to change the apply the SQL function on the other side of the predicate.

SQL> SELECT * FROM T WHERE x = to_char('9999');

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3460324958

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    11 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    11 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_X |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"='9999')

There you go...the second solution is far simple to implement and you have one less index overhead.
Also, one look at the costs of  both the plans and you will see that the second solution is better than the first one.

Now who says that you cannot always have one's cake and eat it too....Thanks to wiki for having explained this here http://en.wikipedia.org/wiki/Have_one's_cake_and_eat_it_too

More on types on Indexes in the upcoming posts...until then happy reading







LINK+IMAGE