DB2 Express Loading Data
Home Tutorials Reviews Weblog

Feature: Loading data into DB2 Express using the Load Wizard
Motivation: One of the key measures of DBMS ease of use is loading data



The screenshot above shows the definition of the FMASTER, Football Master Table derived from Doug Drinen and his pro-football-reference.com databases. Doug's database has a simple 3 table structure (master, season, games)that is upto date with the latest, 2004 data and stretches back to the 1920's. The raw data is available for free in CSV format so the data should be simple to load. Lets see how DB2 Express measures up.

As we can see from the screen shot above the FMASTER table is pretty simple with only 6 fields - FBID which identifies a player uniquely then first and last names followed by position played, birth date year and debut year in the NFL. There are more complex databases, particularly for baseball, but the football data is easier to query. The screen shot at the left shows the simple CSV data layout.

To start the load process, highlight the FMASTER table in the upper content pane and the right mouse click on it. A popup menu should appear and choose the Load menuitem. That in turn after about 5-8 seconds cause the Load Wizard to appear and so you are ready to load in data.

Note that the FBID field has been chosen as the primary key and it uses a unique coding of the first and last name of each player. This is one of the nice features of the DB2 load process - DB2 wizards try to anticpate your needs during each step with simple default choices that often work. But users always have recourse to simple data entry choices that can over ride the defaults easily.

Also DB2 wizards always alow shortcuts. Users can finish early and by pass unneeded step. In addition, Table schema data entry is simplified as we shall see just below.

When the Load Data Wizard activates the first step is fairly simple but important. The screen shot below
shows the main choices.


The first decision is whether to append (keep the existing data) or replace it. When I did the first load attempt I blew the delimiter definition and so the table was loaded incorrectly. So the option of replacing the data in a table is useful. Also you can lock a table during load or leave it open for read operations.

Remember the load task bypasses all triggers and constraint checks; so users must be confident that their data is clean - especially the primary key values are not null nor are they duplicated. Also foreign keys must be valid as well.

The next stage involve specifying the input file format, what the exact set of delimiters that will be used with that input format. There are four types of input file formats allowed in the Load utility: DEL- field delimited format in which some delimiter is used to separate the fields. The fields themselves may also have delimiters - text fields delimited by quotes for example. By clicking on the DEL Options button, users can specify exactly what the delimiters are or specify "none apply". ASC format implies fields in ASCII format and fixed columns, IXF is the Internal Exchange Format which has its own format rules for data exchange, and WSF is the Lotus WorkSheet Format for exchanging data between Lotus 123 spreadsheet and a database.

In this case DEL is the option and after an aborted run we clicked on the DEL Option button and specified that VARCHAR fields were not delimited and the default field delimiter was comma. Then we declared mast.csv on the D: drive as the input source file and after dragging the scroll bar down added logout.txt as the message logfile.
The next step in the Load wizard allows users to map the data columns in the source file to the columns in the table. Many times columns have to be skipped because there is no input data or conversely there is no field in the table - the Load utility allows users to map both of these cases.

In our Load operation, the input fields and table fields exactly matched and so no corrections had to be made. The next step of the Load Wizard is Performance. Here users control validity checking, creation or maintenance of indexes and other performance considerations. As this is a first load, none were required. Recovery is the fifth phase of the load Wizard and checkpoints after so many rows are loaded can be set as well as several recovery options. Since the database is relatively small no checkpoints were established and the default recovery options were otherwise used.

The sixth stage, Options, specifies fairly advanced load options associated with large-scale or complex loads (many tables and tablespaces effected) - again we let the default settings stand here and the 7th, Schedule stage (which allows for timed batch processing of the job/task). Stage 8, Summary just tells what is going to be done and so we clicked Finish and less than 10 seconds later the job was done. Not too bad.

Testing the Data

The nice thing about Control Center is that it is easy to test the data by just opening the table and inspecting the data. We did that and then did a couple of queries on the data using the query feature of Control Center.
  
As you can see from the screenshots above, Control Center allows for full SQL queries to be made and then the tabbed display allows the user to see both the query results and the access path. The access path tells exactly how the DB2 optimizer has planned out the query - this is very useful information for performance tuning of queries. In addition, the query has an automatic display throttle of 100 rows. That can be changed and more rows can be displayed by clicking on the Fetch More Rows button.

For the SQL newbies/or the rusty-of-skills, Control Center offers SQL Assist that takes a user through all the steps of putting together a query. This is an important feature not apparently available in Oracle 10g or MySQL so we shall cover it in more detail in a special review.

Summary

This time there were less documentation hiccups, and DB2 Express performed admirably on load. Admittedly the conditions were benign - no pre-existing tables, simple .csv source data, no concurrent users to complain because of slowed response time during load. But the load was fairly fast for over 3700 records. Top marks to DB2 Express so far for its Load capabilities. The only missing link, XML support; but I just didn't have the heart to go traipsing through Info Center trying to find out what DB2 Express has there.

Top of Page  Home  Tutorials