| Book Review: DB2 Loading data
Feature: What tools does DB2 Viper have for loading data ?
Well now that you have DB2 installed and running how do you load in data ? Well I am glad you asked. In this first part we show the traditional way to create databases and load in data. Yes the old CSV file - comma separated values. This is still a robust mechanism to load in data into DB2 and the target will be the Lahman baseball database which is available in several forms including already loaded into an Microsoft Access database and CSV.
The Baseball database is a very practical database because it has all the data about baseball players and managers going back to the 1870s. Do you want to know about Ty Cobb's hitting, Sandy Koufax' pitching or Casey Stengel as manager - the Lahman database has it all. And it is setup in pretty close to 3rd normal form; so aspiring DBAs will get plenty of practice on their muti-table joins, sub-queries, and foreign key management among other things - and learn some baseball lore at the same time.
Creating a DB2 Database
Like a lot of work, creating a database is easiest to do in Control center but can also be done through prepackaged SQL or directly in the DB2 command line mode of operation. As seen here
(lower center of screenshot)Create New Database is a wizard based action. Click on the link and a new screen pops up as the CreateDatabase Wizard presents its options as shown below:
We name the database baseball and set the root to be C:\. I have used a plugin Maxtor drive as my database and DB2 has no problems finding and using the USB attached database. Note also the 4 additional steps afforded. Storage allows users to specify not just the root drive but the specific directory addresses(yes more than one target can be specified) for your new database instance. This is one big step beyond what Microsoft and Oracle Express editions are offering - each controls closely what a database instance can have.
Here is the Storage step - note how simple its is to just add one or more paths to your data. This can be extremely useful when parts of your database are static and infrequently used. For example, a Winter Maintenance schedule is of limited to no interest during the late Spring to summer months. So the detail schedule and resource files which are huge in size are kept on backup store and only activated in the Fall just when they start to be needed. Having the flexibility to to store the detail tables on "offline" devices until needed is very useful.
Maintenance, Timing and Mail Server are settings to control automated support - we ignore these settings for now and go directly to the Summary/Finish step of the Wizard.
The CreateDatabase Wizard is smart and echoes back our settings to us including our default previous settings. Then with the press of the Finish button, DB2 took 25 seconds to create the empty database. I was surprised at the length of time. Oracle was done in less than half the time for the same Baseball database. Ditto for MySQL.
Once we have created our baseball database, it is now ready for loading. One of the advantages of the Lahman baseball database, besides the fun of the data contained in it, is the fact that the database is available in a variety of formats including Access MDB file and CSV - Comma Separated Values format. As it turns out the latter has become very popular among database vendors including IBM, Microsoft, MySQL, Oracle, Sybase and others as a way to load your database. So now we choose to add the Allstars table a as the first table into DB2.
The table to the immediate left is the Allstars table. As you can see it uses CSV-comma separated values layout. The Lahman people have greatly simplified load up of the database because most of the fields do not require quotes to distinguish long character fields. So in every case comma-"," is the delimiter and text or numeric fields do not require any extra delimiters. In several files, the field is completely empty and so users see a sequence of fields separated by commas like this: NL,,,45,1
So without further to do, lets fire up DB2 Control Center (DB2 | DB2COPY1 | Control Center) and start creating the baseball database and a few of its key tables. First, the table is the Allstars. And as the screenshot at the left shows the table is pretty simple - 3 columns:
playerid - 9 characters for the important playerid field
yearid - 4 characters or integer for the year of play
lgid - 2 characters for the league id NL-National Leaque and AL-American League. There are a variety of other leagues designations for players prior to 1900 in the database.
This table is easily created using the create table wizard in Control Center as seen in the
the screenshot above. Note that in the treeview I am pointing to the Tables object; so that means Create New Table link to DB2's Create Table Wizard is available. Clicking on it brings up a Wizard that Guides you through the 8 step process of creating a table.
Again we are going to take a shortcut and use just the first two and last steps. As seen in the
screenshot just above it is easy to add column or field names of a table Also note that the column define dialog will remain open if users hit Apply instead of OK (which closes the dialog). This is useful when defining tables because often though the field name changes the type and size and null characteristics do not.
Also note in the top left the sequence of buttons that the Create Table wizard presents users:
Name - name, alias and description of the table
Column - column or field names, type, size and properties (the dialog just above)
Data Partition - partitioning into multi-table spaces used with huge tables
Table Space - instead of partitioning, create a separate tablespace for a table and/or its index
Keys - define Primary, Foreign, and Unique keys
Dimensions - define the data clustering dimensions of a table for data warehousing
Constraints - specify constraints or validations for fields in a new row
Summary/Finish - last chance to go back and make any corrections
Like the Create Database wizard, the Create Table one is also easy to use. The Wizard signals when shortcuts can be taken by changing the Finish button from grayed out to active.
We take advantage of that and immediately go to Finish after defining the 3 fields in the Allstars table. And as you can see from the screenshot, DB2 creates the table right away. So finally we are ready to load in some data.
Loading DB2 Using CSV
Again it is easy to load fixed or comma separated data using the Control Center's Load Wizard.
The DB2 Information Center spells out in detail how to do the process using Command Line code. But the Load Wizard is so much easier to use and is really quite robust; so why not take advantage of that ease of operation ? In the screenshot above , a right mouse click on the Allstars brings up the Load menu option. Click on it to begin.
The screenshot below shows the start of the Load Data wizard. On initial data load, choose
replace data just in case there is some test data or a previous load has gone awry. Note the Finish button is grayed out since not enough info has been supplied to the wizard.
The next step allows the user to choose the file(s) to be used to load the Allstars table. Note
that first requirement in a CSV load is to tell what the delimiters are - a)field separator (comma is the default), b)string delimiter (none is the default) and c)the decimal point delimiter(dot is the default). In addition there are a whole series of extra settings and properties for the load - most to do with dates and time. As it turns out the Lahman database, with one or two exceptions allows users to go with defaults.
However, it is nice to know that you can refine the CSV data entry for your specific needs. For example, in Quebec there are special formats for both Date and Time data - this requirement can be readily handled here.
On another data entry problem, DB2's KEEPBLANKS option proved to be a god-send, eliminating some serious post processing requirements for a huge table load operation. In general, the biggest problem I have had is with TEXT or CLOB fields where the data is extremely long and has embedded single and double quotes. Here you have to be careful. The second problem is with floating point data but each database treats floating point constants slightly differently on load - so be careful in general with loads that have floating point values.
The full Files step is actually quite powerful. First, you can load data from a local or remote
server with the Load wizard. You must have read permission in the directory and input file specified while read/write permissions are necessary for the message file authorized for logging of the load. Also you can set a maximum number of records to be loaded.
Next you want to indicate the mapping of the columns in the CSV file to the fields/columns in
the database. This istrivial in the case of the Allstars data as seen in the screenshot. However, the ability to skip fields and change the order of data entry is quite useful. This wizard capability is sometimes missing in other CSV loaders.
The last step we are going to fill-in during this load is the Performance settings. Remember
none of the keys nor indexes have been defined for this table. So in this case we want to turn off all edit and integrity checking and literally let the load operation rip.
Afterwards we skip to the last step by clicking either the Summary or Finish button.
As with the other Control Center wizards this is your opportunity to review exactly what and how the data load will take place- any problems or concerns can be fixed by backtracking. Otherwise click on Finish and less than 3 seconds later, 4115 records have been entered. This was surprisingly faster than I had expected. But with no checks on data it should be fast.
Now with the data loaded, lets take a look and inspect the data. The data
shown to the left is one of several quick checks. Note this dialog is quite versatile - it allows users to add or delete rows or correct individual values.
By doing some quick queries with different sort and Select Distinct COUNT queries, I was able to satisfy myself that the load had been accomplished correctly. Be aware that this quick validity checking is essential when loading tables to verify that the data has loaded legitimately. 2-3 minutes of prevention can save hours of frustration trying to figure out why a join or subquery is not working later.
So the data load is done but there are 15 more tables to load including the batting, fielding, and pitching records of all baseball players back to the 1870's. Our second DB2 data load will report on that effort plus working with fixed field and data contained in other databases.
This review is about more than data loading but also how easy it is to create databases and tables with DB2 Viper Express C. We were pleasantly surprised with the ease and speed with which one get working with Express C. DB2 wizards are a breeze to use. Ditto for Control Center and this reviewer lived to tell that Command Line work was crisp and showed I need a quick review of DB2 SQL.
This is a much more complete free database than Oracle's XE or Microsoft's SQL Server Express. One can create many instances of the the DB2 database and has much of the automated framework to manage them at your disposal. In contrast, MySQL is just arriving at many of the features like triggers, stored procedures and views while Oracle and SQL Server are providing single instances of their database with some of the key properties under wraps in the case of Oracle XE.
So although all the databases we have tried have pretty good help and supporting tools, IBM lets users get a lot closer to the full DB2 experience and still manage to make that fairly straight forward and easy to do. What will happen on the database developer side is a whole new question which we will be reporting on in the months ahead particularly regarding XML integration, SOA and SAAS support. But so far, with the major exception of no version of DB2 available for Windows XP Home edition (Oracle and MySQL can do this and IBM is still promising to deliver), DB2 Viper Express C is proving to be a very impressive free database.
Top of Page Home Tutorials
Developers Weblog Graphics Reviews and Tips