A DB2 Data Load 2
Home Tutorials Reviews Weblog

Review: DB2 Express C is loaded with Wordweb database
Feature: This is a more challenging database migration due to strings and CLOBs

In our first data migration of the Lahman Baseball database from Access to DB2 we used CSV - Comma Separated Values and found the DB2's Control Center LOAD command more than equal to the task. In this the second migration task I use the MySQL-based Wordweb database of word definitions, antonyms, and synonym. Although Wordweb is fewer tables, 12 versus versus 23 for the Lahman database; the Wordweb database has 700,000 records versus the 400,000 in the Lahman database. There is a second major difference - most of the fields in the Lahman database are either integer or floating point as would be expected of baseball statistics while the Wordweb has predominately string fields and some of those are loaded into CLOBs - Character Large Objects.

So the data migration had all the look and feel of one of my database contracts including size and string predominating tables but with one notable difference - this migration is onetime while the contract databases are usually monthly or more frequent conversions. So I considered using a utility like Embarcadero's RapidSQL or SQLYog Enterprise - but these tools, as with many others, dumped to Excel or CSV files in order to get to DB2 rather than use a direct connect and transfer approach.

So as seen in the screenshot below I decided to try the IBM Migration Tool Kit:
mtk
However, the IBM website warned us that the MTK for MySQL was still alpha code. And sure enough although easy to setup with a detailed wizard, the data transfer simply did not work to completion. In contrast a test run of the Oracle to DB2 converted the Oracle data with no problems.

So then I switched back to the tried and true LOAD command in DB2's Control Center. I tried to load a CSV file of a table, DEFINITIONS, with two string fields that had managed to defy the Oracle Express loader and forced some extra steps using the RapidSQL Import command. Good fortune - DB2 Control Center's Loader ate up the 126,000 record file with no problems. So that settled the case - I had a data migration plan.

The Wordweb Data Migration Plan

I used the free SQLyog Community edition to export each table's data into a CSV. I exported the database structure of each table into a .SQL file again using SQLyog Community edition - I could have used phpMyAdmin or other free MySQL tools for these tasks but SQLyog is more than equal to the task. So now I have the table structures for the database but in MySQL CREATE TABLE syntax. A quick look into the SQL Pocket Guide shows that there are major differences between CREATE TABLE in DB2 and in MySQL. The differences are not just in the datatypes but also in declaration of keys, tablespaces, and indexes. So each of the 12 CREATE TABLE commands had to be manually translated from MySQL into DB2. That took about half an hour to get the CreatWordweb.SQL script running error free.

Now with an empty database I used DB2's Control Center LOAD command to do its trick:
dbl
Again, the LOAD module ate up the CSV data files for each table - no load took more than 20 seconds to do including the 216,000 record Word_Sense table. However, the LOAD module encountered a problem when the text CLOBs enclosed in with the double quotes " delimiter also had " embedded within quoted text. We had to switch to the tilde ~ as text delimiter for two of the tables. This required emptying the partially filled tables and re-loading them with tilde replacing double quotes in the CSV files - requiring a rerun of SQLyog.

Summary

In sum, the DB2 Load process was a bit disappointing this time. First, among the free utilities for loading databases, DB2 is frequently not supported. Second, IBM's own Migration Toolkit does not support transfer from MySQL databases with quixotic error messages even for simple two integer field loads. The LOAD module performed admirably well - fast, and yet clear error messages in the LOG file when the LOAD had problems - and those problems were easily fixed. Finally, the LOAD process uncovered a problem in the DERIVED database table. There are 417 duplicate records in which the only difference between the key fields is capitalization - afghani and Afghani. I have gone with a dual solution. In the original MySQL Wordweb database I have left the DERIVED table unchanged. In the DB2 Wordweb database, I have gone with the truncated DERIVED table, awaiting word from the database owner as to which is correct.

Considering the fact that, I am constantly needing to move data to and from DB2 databases for replication or data warehouse updates and other tasks, I am still very confident that DB2 Express C will handle those tasks with minimal concern - especially after having done the initial database creation task. Even dates, timestamps and CLOBs, the terror fields of data transfers, moved over with minimal fuss. However, data cleansing where "John B Doe" and "John B. Doe" have to distinguished as possible duplicate records - this work will require professional tools. So for a "free" database, DB2 Express C Viper Edition comes well equipped - the best among our 3 "free" databases.




Top of Page  Home  Tutorials 

Developers Weblog  Graphics Reviews and Tips