In many of the most recent clients I have worked with and for, it has been left to the developers to design database schemas and make
the performance of the application meet user requirements. These are both tasks that have traditionally been left to DBAs but as the DBA
job changes, so does the developer job.
This presentation is geared primarily toward developers or those who might need partitioning but have not used it as of yet. I will only
briefly touch on the maintenance and availability aspects of partitioning, which mainly benefit DBAs, in favor of the advantages to
applications.
What is
Partitioning?
First, I want to say that different databases do things with partitioning very differently. In this presentation, I am very much
speaking directly about Oracle partitioning, not any other database.
A database partition is a way to physically separate the data in large tables without having to rewrite your application. The main
benefit to an application is performance. Once a table gets over a certain size, queries and other DML begin to slow. Even with
judicious indexing, large tables just take more time to scan.
To explain a partition adequately, there are some key concepts that need to be understood.
-
Data file - A data file is the physical OS disk file that stores data. All data in an Oracle database ends up in a data file. A
tablespace may be made up of 1 or more data files.
-
Segment - A segment is a logical container for an object in an Oracle database. A normal table (i.e. non partitioned) is exactly
one segment. A regular index is also a segment. A partitioned table will be made up of as many segments as it has partitions. The
same is true of partitioned indexes.
-
Tablespace - A tablespace is a logical container for segments. A tablespace may be empty but it will most likely hold one or more
segments.
-
Data Blocks & Extents - Data blocks and extents are not really pertinent to a discussion of partitioning. I am throwing these
in just for the curious. A data block is a physical chunk of data stored in a data file and an extent is a set of contiguous data
blocks. A segment is made up of extents. While a segment can span multiple data files, an extent must be contained within a single
data file. If this doesn't make sense, that's ok. I won't mention these two objects again.
Figure 1 Tablespaces, Data Files and Segments
So, with that understanding, a tablespace is made up physically of data files and logically of segments. Because a normal table is a
single segment, it will reside within a single tablespace. That same table will probably exist in multiple data files. This is an
important performance concept because this set up allows Oracle to spread a single table across multiple physical or logical disks.
When you perform a full table scan, you scan the entire segment that is that table. An index is also a segment, so when you scan an
index, you are scanning that segment. For a large table, that scan (whether table or index) can take time.
As I said, a regular table, a single segment, exists completely within a single tablespace. A partitioned table, however, is made up of
multiple segments. Each segment may reside in the same or different tablespaces as other partitions in the table.
NOTE: There is a myth (a partial holdover from the past) that I hear fairly frequently. That myth says that partitioning is not
usable or useful unless each partition exists in its own tablespace. This myth is usually used on existing systems as a reason to not
partition existing tables (who wants to create a bunch of tablespaces on an old database?). This myth is not true. In many cases,
especially from a maintenance or availability perspective, you may want to put each partition in its own tablespace but it is in no way
required and you can still get significant performance improvements without that step.
Why is Partitioning
Important?
Partitioning has traditionally been used in Very Large Databases (VLDB). A VLDB is usually a datawarehouse or other OLAP/reporting
database. Not that long ago, a few gigabytes (GB) was considered a VLDB. Today, OLTP databases can easily reach the GB level and even go
way beyond.
I have personally seen more reporting instances being combined with OLTP applications creating a kind of hybrid database. This makes
even more important to make the best use of your data.
Oracle recommends that any table over 2GB be considered as a partitioning candidate. That makes sense because if you can put your data
in different segments, then it may be possible to scan only those segments and not the entire table. The same can be said for indexes.
If an index is partitioned, and if Oracle can figure out what partitions will hold the data without looking at every segment, you will
scan less data making your statements return faster.
With the appropriate parallel configuration (which is beyond the scope of this presentation), even full scans of a partitioned table are
faster than a non-partitioned table. The true benefit, though, comes from Oracle optimizing partition access by only accessing those
partitions that contain required data (or at least only those that potentially contain the data). This is called partition
pruning and I will delve into that a bit more later.
Partition Keys
A partition key may or may not be the primary key of the table. In my experience, except for hash partitioning, it is rarely the primary
key.
A partition key will be one or more columns in your table. In the same way a primary key identifies a distinct row of data in a table, a
good partition key will be something that distinctly identifies a category of data. For example, in a retail environment, a good key
might be SALE_DATE. In a geographic application, a good key might be country or state.
The partition key is something that should exist in most, if not all, of the where clauses in queries against that table. This will
allow the Oracle optimizer to only look at those partitions who that might include the required data (partition pruning).
When you tell Oracle to create the table as a partitioned table, you identify the key(s). I will show examples of this later on. You
also tell Oracle how many partitions to create (how you do that is dependent on the type of partition). When data comes in, in the form
of inserts, Oracle looks at what the value in the key is and puts the data in the correct partition. Then when you update, delete or
select data from the table, and include the partition key in the where clause, Oracle will only look at the impacted partitions.
Figure 2 Partition Data Operations (Partition Key sale_date)
In addition to the key, when you create your partitions, you must identify the data that will exist in the partition. If we stick with
the sale_date example in the figure above, then sale_date is the key (left side of where clause). "All records from 2005" and
"all records from before 2005" would be the key values (right side of where clause).
A partition key's values must also make sure that a record can only exist in a single partition. In our example above, we could not
have created partitions like "all records from 1-jan-2005 though 31-jan-2005" and "all records from 15-jan-2005 though
15-feb-2005". Oracle would not know which partition to put the data in and it would not know which partition to query on retrieval.
All of the partition types below are dependent on the partition key (or keys) that exist in your data. The partition key itself is what
drives the type of partition you will use. Identifying the right partition key is critical to effective use of partitioning. Remember
that it should be a key that you primarily query by. This is not an absolute rule, but is one to keep in mind.
New in 11g is the ability to use virtual columns as your partition key. A virtual column is an expression rather than a specific column
in the table. For example, you can add to dates, or consolidate using last_day(order_date) or even substring out a piece of a column and
that is a virtual column. Prior releases of Oracle did not allow virtual columns as partition keys. You needed to populate the data
(usually via a trigger) on insert and that hidden column would become the partition key.
Page: 1 2 3