Wednesday, September 16, 2009

INDEXING

As usual, I was answering few intresting questions on OTN Database forums.
This morning,I have come across of a very interesting, funny question, where the poster gave an example with some explanation and asked why the Optimizer ignores INDEX?

== POSTING ON OTN DATABASE FORUM
SQL> CREATE table t (a VARCHAR2(10),b VARCHAR2(10))
2 /
Table created.

SQL> BEGIN
2 FOR i IN 1..10000
3 LOOP
4 INSERT INTO t VALUES (i,'A');
5 END LOOP;
6 END;
7 .
SQL> /
PL/SQL procedure successfully completed.
SQL> COMMIT;
Commit complete.
SQL> CREATE INDEX t_a_idx ON t (a)
2 /
Index created.

QL> SELECT * FROM emp WHERE empno>500
2 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=14 Bytes=560)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=2 Card=14 Bytes=560)
2 1 INDEX (RANGE SCAN) OF 'PK_EMP' (INDEX (UNIQUE)) (Cost=1 Card=14)

Above query is not ignoring the possibility of using an index on empno column,while the same query with table tignoring the usage of index on 'a' column why??

SQL> SELECT *
2 FROM t WHERE a>500
3 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=9500 Bytes= 133000)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=6 Card=9500 Bytes =133000)

SQL> SELECT *
2 FROM t WHERE a>'500'
3 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=5552 Bytes= 77728)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=6 Card=5552 Bytes =77728)
=== END OF POSTING

The poster gave two examples, using EMP table and newly created table. Example used with EMP table has worked perfectly, i.e. index has bee used. But, his question was, why in another example, used different table, Optimizer doesn't uses the INDEX?

Analyzing given examples, few points can be easily cleared.
1. After table creation, data insertion and index creation, there is no clue in the example of statistics collections.But, in the examples, there is cardinality, cost and bytes. I guess, Optimizer might have useddyanmic samplying. I am not sure, because, the poster doesn't mentioned Oracle version and Optimizer*parameters value or how dows he collected the statistics.

SQL> SELECT *
2 FROM t WHERE a>500
3 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=9500 Bytes= 133000)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=6 Card=9500 Bytes =133000)

2. Column 'a' has defined as varchar datatype.
(a VARCHAR2(10), and the predicate value has given in numeric.

SQL> SELECT *
2 FROM t WHERE a>500
3 /

Leaving it to Oracle to do the internal conversion. I remember long time ago, we faced the issue when datatype was character type and predicate value was given numeric. Therefore, Optimizer was ignoring index. I guess this could be one of the reason for ignoring INDEX.
May be, internal calculations and other theory can detailed exaplin by Jonathan Lewis, if at all, he reads my blog.

In another example,

SQL> SELECT *
2 FROM t WHERE a>'500'
3 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=5552 Bytes= 77728)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=6 Card=5552 Bytes =77728)

I dont know the high value and low value for the column 'a' (statistics). If I remember, the forumal for selectity calculation when there are no hitograms in place, is :
c1 > 'value' 1 - (High - Value / High - Low)
In the given example, the poster inserted 10,000 rows using pl/sql.
If we look at the cardinality prediated by the Optimizer is 5552, which is more than 50%. surely, oracle ignores the INDEX and favours FTS (another reason).

Jonathan had written a nice function to convert high/low raw value number. Because, the value stored in hagh/low (for character) is a raw value.

http://jonathanlewis.wordpress.com/2006/11/29/low_value-high_value/

Few other concerns.

I guess the poster used AUTOTRACE utility to **predicate** the execution plan. As Jonathan reiterates that the execution plan preciated by the AUTOTTRACE is not the real one, it could be a lie or simply pridicated.

I have recommended the poster to use DBMS_XPLAN.DISPLAY (if Oracle >=9iR2 version) to get the excecution plan and also the predicate filtering value, which was clearly missing the AUTOTTRACE utility.

I have not written this to hurt the poster nor to disapoint anybody. I was just sharing my thought on this.Any comments on thhis are welcome.

Index Rebuild Part II

I am not getting into the moot of when to rebuild index or not, since these things already discussed by many experts at their sites. I am just providing the internal machanism, pros & cons of rebuilding index.

Oracle has 4 main features with regard to its internal maintenance of
indexes that makes index rebuilds such a rare requirement.

1) 50-50 block split and self-balancing mechanism
2) 90-10 block split mechanism for monotonically increasing values
3) Reusability of deleted row space within an index node
4) Reusability of emptied nodes for subsequent index splits.

These 4 features combined almost eliminates the need for index rebuilds.

In fact most indexes that have index entries randomly distributed throughout
the index tree structure remain both balanced (as all Oracle indexes must
be) and fragmentation free *because* leaf row entries are reused.
Inserts/Updates and Deletes result in "holes" being created but importantly
"refilled" typically at even rates. The issue of wasted deleted space is
therefore typically a non-issue.

Rebuilding such indexes can actually be detrimental to overall performance
for a number of reasons. Firstly, it requires a significant amount of
resources and can conflict with the general running of the database. But
perhaps more importantly, it can actually be self-defeating in what rebuilds
are supposed to achieve. That's because after an index rebuild, the index is
more tightly packed with less overall free space (else why rebuild). This
means however that index splits are more likely to now occur which directly
impacts performance due to the additional I/O and CPU this entails. And
after the block split, we now have two blocks each with 50% free space.
After a period of time, the index potentially has "issues" due to
insufficient used space and the vicious rebuild cycle continues. The better
course of action is to do nothing and let the index evolve to it's natural
"equilibrium".

When should one perform a rebuild?

Firstly, if the index value were to have monotonically increasing values
then any deleted space could be a problem as this space may not be reused
(making feature 3 above redundant). However, if sufficient entries are
deleted resulting in index nodes being fully emptied (say via a bulk delete)
then feature 4 would kick in and the deleted space could be reused. The
question now becomes one of *when* would the equivalent amount of index
entries be reinserted from the time of the deletions, as index scans (in all
it's manifestations) would be impacted during this interim period. So
monotonically increasing values *and* sparse deletions would present one
case for an index rebuild. These types of indexes can be identified as
having predominately 90-10 splits rather than the usual 50-50 split.

Another case would be an index that has deletions without subsequent inserts
or inserts within an acceptable period of time. Such a case would result in
wasted space that can't be effectively reused as there's not the sufficient
insert activity to reclaim the space. However, in this scenario, it's really
the *table* itself rather than the indexes directly that should be rebuilt.
Because such "shrinkage" results in both the table and associated indexes
being fragmented with HWMs that need resetting (to prevent performance
issues with Full Table Scans and all types of Index Scans). Yes the index
needs rebuilding but only as a result of the dependent table being rebuilt
as well.

Index Rebuild Part I

Couple of days ago, as per devt. team request, I had split a partition of one of the partitioned tables, which was having local indexes, fortunately, the partition which I split was belong to current month partition and all the indexes status became unusable and I had to rebuild them.
I was surprised to see that while index rebuilding, it has taken double space than the original one and released 50% occupaid space after the index rebuild. I was suprised because, I was an impression that only index rebuild online would take double space because it create a jounral table and put the data there untile index rebuild finishes.

--- The reason for taking double space while rebuilding is :

"For a rebuild, the index has to be scanned, the results sorted, and new extents built
to hold the newly constructed index. These extents must co-exist with the original index until the process is complete and the original can be dropped.Thus, the rebuild requires a sort and sufficient free space to hold two versions of the index temporarily."
-------------

I have found few good notes on rebuild index offline/online, its pros and cons of rebuild indexes. Thought, it would be helpful to others.

An index rebuilt either Online or Offline.

Online Index Rebuild Features:
+ ALTER INDEX REBUILD ONLINE;
+ DMLs are allowed on the base table
+ It is comparatively Slow
+ Base table is referred for the new index
+ Base table is locked in shared mode and DDLs are not possible
+ Intermediate table stores the data changes in the base table, during the index rebuild to update the new index later

Offline Index Rebuild Features:
+ ALTER INDEX REBUILD; (Default)
+ Does not refer the base table and the base table is exclusively locked
+ New index is created from the old index
+ No DML and DDL possible on the base table
+ Comparatively faster

So, the base table is not referred for data when the index is rebuilt offline.

When an index is rebuilt offline there is no FTS on the base table. When index is rebuilt online all the blocks from the base table are accessed.

These conclusions are when we donot make an scenario when the index is unusable and then there
is data load to the base table, and finally the index is rebuilt.

As documentation says, while REBUILDing an index OFFLINE, the base table is not referred. But there are situations where the base table is accessed similar to an index create, they are:

+ index is made "unusable"
+ data is loaded with sql loader with skip_unusable_indexes = TRUE
+ index is REBUILD OFFLINE

Basically, when an index is made "unusable", it is normally meant that it will not be used at all (that is drop later) or it has to be drop/create. Here the index being unusable takes more priority than the Offline rebuild of it. An unusable index has to refer the base table while rebuilding.

1 comment: