Oracle Essentials Oracle Database 11g (46 page)

MODEL supports analytical queries that include prior-year comparisons and share of ancestor, and it is particularly useful in budgeting, forecasting, and other statistical applications. Example MODEL usages include calculating sales differences in two geographies, calculating percentage change, and calculating net present value. The SQL MODEL clause can also use simultaneous equations and regression in calculations.

OLAP and Data Mining Capabilities

For stored cubes (objects with predefined multidimensional joins), facts, and dimensions in the relational database, Oracle introduced the OLAP Option to the Oracle9
i
database. OLAP database capabilities are most commonly accessed althrough SQL, though there is also a Java API. Oracle Database 11
g
added support for OLAP SQL

query rewrite.

As an OLAP alternative, Oracle now offers a technology that is relational database-agnostic: Essbase, from Hyperion (acquired by Oracle in 2007) is an OLAP engine that can extract data from Oracle and other databases. This OLAP solution is especially popular for Hyperion’s financial applications and in cases where business
234

|

Chapter 10: Oracle Data Warehousing and Business Intelligence
analysts want to generate their own cubes. Essbase cubes can also be accessed using Oracle Business Intelligence Enterprise Edition (OBI EE) tools.

Data-mining algorithms were first embedded in the Oracle9
i
database in the Data Mining Option. These were initially accessible via a Java API, but Oracle later added a PL/SQL API.

For applications-based data mining independent of underlying database technology, the OBI EE tools can utilize data-mining algorithms Oracle acquired with Sigma Dynamics in 2006. This technology is known as Real-Time Decisions (RTD).

We describe OLAP and data mining in the database and Oracle’s business intelligence tools in the following sections.

Database Extensibility and the Data Warehouse

A growing trend in data warehousing is the storage of multiple datatypes within the database. These extended database capabilities are described in
Chapter 14,
but we’ll quickly mention here how these options might be useful in data warehousing.

Multimedia

The Multimedia feature set (formerly knows as
inter
Media) opens up the possibilities of including documents, audio, video, and some locator functions in the warehouse. Of these, text retrieval (Oracle Text) is most commonly used in warehouses today. However, the number of organizations storing other types of data, such as images, is growing. Often, storage of these types of data is driven by a need to provide remote users with access.

Spatial Option

The Spatial Option is also relevant in a data warehouse in which data is retrieved based on proximity to certain locations. Spatial data includes some type of geographic coordinates. Typically, companies use add-on products in conjunction with Oracle’s Spatial Option. An example of this option’s use for data warehousing is a marketing analysis application that determines the viability of retail outlets at various locations.

XML

Oracle added native XML datatype support to the Oracle9
i
database, along with XML and SQL interchangeability for searching. Oracle provided key technology in the development of the XQuery standard, and began shipping a production version of XQuery with Oracle Database 10
g
Release 2. XML database performance was greatly improved in Oracle Database 11
g
through the introduction of binary XML.

Oracle estimates that binary XML offers performance gains of up to 15 times compared to the XML LOBs that were previously available.

Analytics, OLAP, and Data Mining in the Database

|

235

Managing the Data Warehouse

Once you’ve built a data warehouse topology, you could deploy multiple Oracle databases to implement the data warehouse and its data marts. Enterprise-wide warehouses are common on Unix servers, but are also appearing on clustered (RAC) Linux platforms. Smaller data marts are common on Windows and Linux. Many organizations are consolidating data marts and enterprise data warehouses on the more scalable platforms.

Oracle Enterprise Manager provides a common GUI for managing these multiple instances regardless of the underlying operating system. EM is browser-based with a multiuser repository for tracking and managing the Oracle instances. (EM is dis-

cussed in much more detail in Chapter 5.)

In warehousing, in addition to basic management, ongoing tuning for performance is crucial. Enterprise Manager supports many of the automated diagnostics and tuning features added in Oracle Database 10
g
and more recent releases.

Within the largest warehouses and data marts, you may want to manage or maintain availability to some of the data even as other parts of the database are moved offline.

Oracle’s Partitioning Option enables data partitions based on business value ranges (such as date) or discrete values for administrative flexibility, while enhancing query performance through the cost-based optimizer’s ability to eliminate access to nonrelevant partitions. For example, “rolling window” administrative operations can be used to add new data and remove old data using time ranges. A new partition can be added, loaded, and indexed in parallel, and optionally removed, all without impacting access to existing data.

Range partitioning
first became available in the Oracle8 Partitioning Option.
Hash
partitioning
was added to the Oracle8
i
Partitioning Option enabling the spread of data evenly based on a hash algorithm for performance. Hashing may be used within range partitions (
composite partitioning
) to increase the performance of queries while still maintaining the manageability offered by range partitioning. Oracle9
i
introduced
list partitioning
—partitions based on discrete values such as geographies. A composite partitioning type,
range-list partitioning
, which allows youto partition by dates within geographies, was added in Oracle9
i
Release 2. More composite types were added in Oracle Database 11
g
including
list-hash
,
list-list
,
list-range
, and
range-range partitioning
.
Interval partitioning
was also added in Oracle Database 11
g
, providing automatic creation of range partitions when needed.

Other Software for the Data Warehouse

A data warehouse isn’t necessarily built with a single software product, nor is it simply a database. In addition to the database capabilities we’ve described, if you’re going to build an effective data warehouse topology like the one we’ve outlined, your software will provide the following functionality:

236

|

Chapter 10: Oracle Data Warehousing and Business Intelligence
Data Warehouses and Backups

Early data warehousing practitioners often overlooked the need to perform backups.

Their belief was that since data for the warehouse was extracted from operational systems, the warehouses could easily be repopulated from those same systems if needed.

However, as warehouses grew and the transformations needed to create and refresh them evolved, it became evident that backups of data warehouses were necessary because the transformation process had grown extremely complicated and time-consuming. Today, planning for warehouse availability includes not only an understanding of how long loading will take, but also backup and recovery operations.

Due to the tactical nature of such warehouses, planning often also includes designs for high availability, disaster recovery, and lifecycle information management.

Extraction of data from operational data sources

The movement of needed data from source systems for the purpose of loading a data warehouse. This process might involve extracting a large amount of bulk data or a steady stream of incremental changes.

Transformation and/or cleansing of data

Because the data in a data warehouse can come from many different sources, the data must frequently be converted into a common format. Original data might also need to be cleansed to eliminate or correct invalid values.

Loading the data warehouse/marts

This process might also occur in bulk or a steady stream.

Basic reporting

Standard reports should be easily accessible by nontechnical business analysts in a browser-based portal or dashboard and could be published.

Ad hoc query and analysis

Tools business analysts can use for picking and choosing data items and building their own queries. The results can be published into reports.

Advanced OLAP for multidimensional analysis

More advanced analysis capability needed to spot business changes and trends typically where a large number of dimensions are present.

Data mining

Usually used where there are a large number of variables present to determine a best model for known outcomes and then used to predict future results where outcomes are not known.

Metadata management

Store descriptive business and technical data and enable extended management services such as versioning and impact analysis.

The following sections provide descriptions of how Oracle can deliver such functionality in various tools and database features and options.

Other Software for the Data Warehouse

|

237

Extraction, Transformation, and Loading

The first three requirements described in the previous list are often handled by what are called ETL tools (for extraction, transformation, and loading). Those who are experienced in data warehouse solutions realize that the process of understanding the data sources, designing the transformations, testing the loading process, and debugging is often the most time-consuming part of deployment. Transformations generally remove bogus data (including erroneous entries and duplicate entries), convert data items to an agreed-upon format, and filter data not considered necessary for the warehouse. These operations not only improve the quality of the data, but frequently reduce the overall amount of data, and that, in turn, improves data warehouse performance.

The frequency of extraction and loading is largely determined by the required timeli-ness of the data in the warehouse. Most extraction and loading takes place on a

“batch” basis with a known time delay (typically subhourly or hourly or daily today).

Many first-generation warehouses were completely refreshed during the loading process. As data volumes grew, this became impractical due to the limited time frames available for loading. Today, updates to tables are most common. When a need for near real-time data exists, warehouses can be loaded nearly continuously using a
trickle feed
.

Is Cleanliness Best?

Once the data in the warehouse is “clean,” is this version of the true nature of the data propagated back to the originating OLTP systems? This is an important issue for data warehouse implementation. In some cases, a “closed loop” process is implemented whereby updates are provided back to the originating systems. In addition to minimizing some of the cleansing that takes place during future extractions, operational reports become more accurate.

Another viable option is to avoid cleansing by improving the quality of the data at the time of its input into the operational system. As noted previously in this chapter, this is critical if OLTP systems are to be directly accessed for business intelligence. Improving data quality at the source also enables high-speed loading techniques to be used in near real-time data warehouses (since transformations can be eliminated).

Improving data quality at the source can sometimes be accomplished by not allowing a “default” condition as allowable input into a data field. Presenting the data-entry person with an array of valid options, one of which
must
be selected, is often a way to ensure the most consistent and valid responses. Many companies also provide education to the data-entry people, showing them how the data they’re keying in will be used and what the significance of it is.

238

|

Chapter 10: Oracle Data Warehousing and Business Intelligence
Simple extraction and transportation of data is possible using one of several Oracle database features:

Transparent Gateways and Heterogeneous Services

Provide a bridge to retrieve data from non-Oracle sources using Oracle SQL to load an Oracle database. Heterogeneous Services provide ODBC connectivity to non-Oracle relational sources. Gateways can optionally provide a higher level of performance when extracting data from non-Oracle sources.

Transportable Tablespaces

Another feature for data movement, Transportable Tablespaces enable rapid data movement between Oracle instances without export/import. Metadata (the data dictionary) is exported from the source and imported to the target. The transferred tablespace can then be mounted on the target. Oracle Database 10
g
introduced cross-platform Transportable Tablespaces, which can move a tablespace from one type of system (e.g., Solaris) to another (e.g., Linux).

Oracle Streams

Streams have been bundled with Oracle since Oracle9
i
Release 2. Oracle Streams include Oracle’s log-based replication, Advanced Queues (AQ), and since Oracle Database 10
g
, includes Transportable Tablespaces. Streams are often used for near real-time data movement. Oracle Database 10
g
added support for downstream capture, which allows changed data to be collected from log files, eliminating overhead, RMAN and Transportable Tablespaces for instantiation, support for LONG, LONG RAW, and NCLOB datatypes, and asynchronous change data capture that uses Streams to transport only changed records from a source database to a target.

Other books

There's Only Been You by Donna Marie Rogers
The Sweet Life by Rebecca Lim
Reliquary by Douglas Preston, Lincoln Child
The Fortress of Solitude by Jonathan Lethem
Raced by K. Bromberg
A Book of Five Rings by Miyamoto Musashi
Bishop's Song by Joe Nobody