Export and Import Table Definitions






Export and Import Table Definitions

You want to move your data from one vendor's database platform to another. The first stage is to extract the metadata.

It should be possible for a system to export an entire database as a sequence of SQL commands. It also should be possible to import it again, on the same machine or on another machine, running the same database platform or a different one.

That's the theory. In reality, each database vendor has its own variation on SQL, which makes moving a database between platforms difficult, but certainly not impossible. The first step is to extract the table definitions and the relationships between tables. After that you can move the data itself.

Unsurprisingly, the vendors have invested in making it easy for you to import your database, but have not put as much investment into exporting. So, each platform is better at accepting ANSI standard SQL than it is at exporting that format. And, of course, each platform can import the SQL that it has exported (assuming you're importing into the same version you exported from), so this causes a problem only for moving between different vendors or versions.

MySQL

MySQL has facilities for exporting table definitions. The mysqldump command-line utility can display CREATE TABLE commands as well as the data itself. You can find the documentation at http://dev.mysql.com/doc/refman/5.0/ en/mysqldump.html. For exporting just the schema you can use --no-data, and if you were interested only in the data you can use --no-create-info. The output from mysqldump will need a little coaxing if you plan to import those CREATE statements into another system. The SQL that comes out of mysqldump with the default options is not going to be acceptable to any other system:

[email protected]:~> mysqldump --no-data u username -ppassword dbname staff
-- MySQL dump 10.10
--
-- Host: localhost    Database: dbname
-- ------------------------------------------------------
-- Server version       5.0.18-standard

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
--SNIP--
--
-- Table structure for table \Qstaff\Q
--

DROP TABLE IF EXISTS \Qstaff\Q;
CREATE TABLE \Qstaff\Q (
  \Qid\Q varchar(20) character set utf8 NOT NULL,
  \Qnm\Q varchar(200) character set utf8 default NULL,
  PRIMARY KEY  (\Qid\Q)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

--SNIP--
/*!40111 SET [email protected]_SQL_NOTES */;

First, a number of comments appear at the start and end of the file. Your target system can safely ignore these, because /* */ delimiters indicates a comment in most SQL implementations. At worst, a target database platform will generate error messages without stopping the process. The back quotes and the ENGINE instructions will stop the other platforms from continuing, and the character set utf8 may also have to be filtered out.

A host of switches can improve the output, and by piping the output through sed you can also remove the character set information. You still get all the comments as before, but the output is much cleaner now:

[email protected]:~> mysqldump u username -ppassword --skip-opt \
>                   --compatible=ansi --no-data --skip-quote-names \
>                   dbname staff | sed 's/character set utf8 //'
--SNIP--
CREATE TABLE staff (
  id varchar(20) NOT NULL,
  nm varchar(200) default NULL,
  PRIMARY KEY  (id)
);

--SNIP--

--skip-quote-names removes the ability to support using a reserved word as a column name. If you have used a reserved word as a column name, you may have to rename the column in order to make it possible to create the new table.


PostgreSQL

PostgreSQL has a simple export command, pg_dump, which produces relatively normal SQL. It does produce commented lines starting with --, blank lines, and unwanted SET commands, but these are easy to remove with a sed command. PostgreSQL uses the term character varying for VARCHAR; you can fix that with another sed substitution:

$ pg_dump --table=staff --schema-only --no-owner scott |\
>   sed -e '/^--/d'  \
>       -e '/^SET/d' \
>       -e '/^ *$/d' \
>       -e 's/character varying/VARCHAR/'
CREATE TABLE staff (
    id VARCHAR(20) NOT NULL,
    nm VARCHAR(200),
    n integer
);
ALTER TABLE ONLY staff
    ADD CONSTRAINT staff_pkey PRIMARY KEY (id);

SQL Server

In SQL Server, you can generate a CREATE script from the Query Analyzer utility (replaced by Server Management Studio in SQL Server 2005). You can do this for a table by right-clicking on it from the Object Browser, or you can do a whole database at once by selecting the Tasks menu item from the Database right-click menu. You should set a few options to make your SQL more portable; set Include If NOT EXISTS and Script Owner to false.

Here is a typical generated script:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [staff](
    [id] [nvarchar](20) NOT NULL,
    [nm] [nvarchar](200) NULL,
PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

GO

You have to deal with a fair amount of SQL Serverspecific code, and you are going to have to hack the output before any other system will deal with it.

You can search and replace the GO instructions with a semicolon. Most systems generally accept a semicolon as a delimiter.

You can usually leave instructions such as SET ANSI_NULLS ON in the file. Most database system command interfaces support a mechanism for merely generating an error message on unrecognized commands, but continuing anyway onto the next instruction. So, for instance, if you were importing this into MySQL:

  This fails with an error message  
$ mysql -u  username  -p password   dbname  < file.sql 

  This prints an error messages, but succeeds anyway 
$ mysql -u  username  -p password   dbname 
mysql> source file.sql 

it would be easy to remove the open and close square brackets (SQL Server 2000 had an option to suppress these, but 2005 has taken away that feature).

You can fix most of the details using search and replace on any old text editor. You could instead use a few lines of Perl in the file convertMSSQL.pl:

use strict;
while (<>){                   #Foreach line of the file
  s/^GO$/;/;                  #Replace GO with semicolon
  s/\[datetime\]/TIMESTAMP/I; #Replace [datetime] with TIMESTAMP
  s/\[//g;                    #Delete [
  s/\]//g;                    #Delete ]
  s/\).*/\)/;                 #Delete all characters following )
  print;                      #Print out the new version of each line
}

You can run this script from the operating system prompt:

perl convertMSSQL.pl < mssqloutput.sql

Dates in SQL Server

You must be especially careful when importing or exporting dates from SQL Server. SQL Server uses the DATETIME data type; other systems call this TIMESTAMP. To confuse matters further SQL Server has a TIMESTAMP type, but that does something else entirely.

When exporting from SQL Server, change DATETIME to TIMESTAMP.

When importing, change each of DATE, TIME, and TIMESTAMP to DATETIME.

Oracle

You need the exp and imp utilities to get the SQL CREATE statements out of Oracle. The exp command creates a binary file called expdat.dmp that contains all of your data and metadata. There is no option to allow exp to create SQL directly. The command imp is used to reimport the .dmp file. You can run the imp command so that it doesn't actually import anything, but instead prints out the SQL commands that it would have used:

[[email protected] book]$ exp USERNAME/password 'TABLES= ( DBBOSS )'

Export: Release 10.1.0.3.0 - Production on Mon Jul 17 12:10:41 2006

Copyright (c) 1982, 2004, Oracle.  All rights reserved.


Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in UTF8 character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)

About to export specified tables via Conventional Path ...
. . exporting table                         DBBOSS         30 rows exported
EXP-00091: Exporting questionable statistics.
EXP-00091: Exporting questionable statistics.
Export terminated successfully with warnings.
[[email protected] book]$ imp USERNAME/password SHOW=Y
Import: Release 10.1.0.3.0 - Production on Mon Jul 17 12:10:46 2006

Copyright (c) 1982, 2004, Oracle.  All rights reserved.


Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

Export file created by EXPORT:V10.01.00 via conventional path
import done in UTF8 character set and AL16UTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
. importing USERNAME's objects into USERNAME
 "CREATE TABLE "DBBOSS" ("USERNAME" VARCHAR2(100), "MODID" NUMBER)  PCTFREE 1"
 "0 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREE"
 "LIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS"
. . skipping table "DBBOSS"

 "ALTER TABLE "DBBOSS" ADD  PRIMARY KEY ("USERNAME", "MODID") USING INDEX PCT"
 "FREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREELIST "
 "GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING ENABLE"
 "ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("USERNAME") REFERENCES "DBUSERS" ("US"
 "ERNAME") ENABLE"
 "ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("MODID") REFERENCES "DBPROG" ("MID") "
 "ENABLE"
Import terminated successfully without warnings.
[[email protected] book]$

The SQL that you can capture and reuse on another system is highlighted in the preceding code. However, the imp command has unhelpfully broken up lines and added double quotes. A little processing would take care of this, but it is actually easier to dive into the binary file expdat.dmp. This file has the SQL DDL (Data Definition Languagethe CREATE and ALTER commands) commands buried in it, and you can find them using the Unix command strings:

[[email protected] book]$ strings expdat.dmp
gEXPORT:V10.01.00
UDBRW
RTABLES
8192
                                       Mon Jul 17 12:10:41 2006expdat.dmp
#C##
#C##
+00:00
BYTE
UNUSED
INTERPRETED
DISABLE:ALL
METRICST
TABLE "DBBOSS"
CREATE TABLE "DBBOSS" ("USERNAME" VARCHAR2(100), "MODID" NUMBER)  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS
INSERT INTO "DBBOSS" ("USERNAME", "MODID") VALUES (:1, :2)
-- snip --
ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("USERNAME") REFERENCES "DBUSERS" ("USERNAME") ENABLE
ENDTABLE
TABLE "DBBOSS"
ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("MODID") REFERENCES "DBPROG" ("MID") ENABLE

Once you've got a cleaned-up version you've still got to remove the Oracle-specific code. There are simple issues, such as replacing VARCHAR2 with VARCHAR, but there are also Oracle-specific additions to the end of the statement, and these can be harder to process automatically.

Access

Microsoft Access is handy for moving data around. It has a relatively simple wizard for importing a single table from a flat file; it is also fairly easy and intuitive to set up an ODBC connection. See "Tunnel into MySQL from Microsoft Access" [Hack #44] for an example.

Getting metadata such as schemas out of Access is a little harder. The Visual Basic code shown here will output a basic table definition into the debug window; you can copy and paste from there. The script has several limitations, however. It assumes that each table has only one index, the primary key. Also, it requires that the primary key be a single field. You should be able to customize this if you need to:

Private Sub Command0_Click(  )
Dim sql As String
For i = 0 To CurrentDb.TableDefs.Count - 1
  If Left(CurrentDb.TableDefs(i).Name, 4) <> "MSys" And _
     Left(CurrentDb.TableDefs(i).Name, 4) <> "~TMP" Then
    Debug.Print "CREATE TABLE " & CurrentDb.TableDefs(i).Name & "("
    For j = 0 To CurrentDb.TableDefs(i).Fields.Count - 1
        Debug.Print "  " & CurrentDb.TableDefs(i).Fields(j).Name;
        Debug.Print "  " & _
          dataType(CurrentDb.TableDefs(i).Fields(j).Type, _
                   CurrentDb.TableDefs(i).Fields(j).Size);
        If CurrentDb.TableDefs(i).Fields(j).Required Then _
          Debug.Print " NOT NULL ";
        Debug.Print " , "
    Next
    Debug.Print "  PRIMARY KEY(" & _
                   CurrentDb.TableDefs(i).Indexes(0).Fields(0).Name & ")"
    Debug.Print ");"
  Else
    Debug.Print "--Skipping table: " & CurrentDb.TableDefs(i).Name
  End If
Next i
DoCmd.RunCommand acCmdDebugWindow
End Sub

Private Function dataType(t As Integer, sz As Integer)
If t = 10 Then dataType = "VARCHAR(" & sz & ")": Exit Function
If t = 20 Then dataType = "DECIMAL(" & sz & ")": Exit Function
If t = 3 Then dataType = "INTEGER": Exit Function
If t = 7 Then dataType = "FLOAT": Exit Function
dataType = "UNKNOWN_DATA_TYPE_" & t & " "
End Function

Here is some sample output from the preceding Visual Basic script:

--Skipping table: ~TMPCLP295561
CREATE TABLE bbcRemote(
  name  VARCHAR(50) NOT NULL  , 
  region  VARCHAR(60) , 
  area  DECIMAL(16) , 
  population  DECIMAL(16) , 
  gdp  DECIMAL(16) , 
  PRIMARY KEY(region)
);
--Skipping table: MSysAccessObjects
--Skipping table: MSysACEs
--Skipping table: MSysObjects
--Skipping table: MSysQueries
--Skipping table: MSysRelationships
CREATE TABLE t(
  a  UNKNOWN_DATA_TYPE_4  NOT NULL  , 
  b  VARCHAR(50) , 
  PRIMARY KEY(a)
);
CREATE TABLE Table1(
  test  VARCHAR(50) , 
  PRIMARY KEY(test)
);

As it happens, you can't put this output back into Access because the data types have nonstandard names. You should use LONG in place of DECIMAL(16).

All of the properties of the database are available through this Visual Basic interface. For example, you can access all of the foreign keys through the CurrentDb.Relations collection. You can iterate over that using the same techniques as shown for the CurrentDb.TableDefs and the CurrentDb.TableDefs( i ).Indexes collections.

Potential Showstoppers

Some factors can make the process of switching platforms harder.

Auto-numbers

If you've used autonumbering schemes you will have a problem. You can convert the metadata without too much difficultly (see Figure for details). However, getting the data across will be challenging. You need to suspend the generation of new numbers during import, but then you have to switch it back on again when your new system is up and running.

"Generate Unique Sequential Numbers" [Hack #57] and "Generate Sequential or Missing Data" [Hack #82] may be of use in transporting autogenerated sequences.

Auto-numbering column types for database systems
SystemColumn typeNotes
SQL Server
IDENTITY

 
MySQL INTEGER AUTO_INCREMENT  
PostgreSQL
INTEGER

Use NEXTVAL('seqName') to calculate the value to insert.
Oracle
INTEGER

Use seqName.NEXTVAL to calculate the value to insert.
Access
COUNTER

 


Spaces in table names and column names

If you have spaces in your table names or field names you will have to use the appropriate quoting mechanism. SQL Server and Access use square brackets; MySQL allows back quotes; and Oracle will let you use double quotes. You could instead convert the names to use _ rather than a space, which will solve the problem for future exports and imports.

Nonstandard functions

It is difficult to write complex queries while sticking to the standard, and in many cases it is just not possible. You will probably find that much of your SQL has to be rewritten. Figure and Figure show some of the most commonly used functions.

Compatibility of common functions (PostgreSQL, MySQL)
ANSIPostgreSQLMySQL
COALESCE(x, y) ANSIANSI
CASE WHEN b THEN t ELSE f END ANSIANSI andIF (b, t, f)
EXTRACT(MONTH FROM d) ANSI andDATE_PART('month', d) ANSI andMONTH( d )
whn + INTERVAL '5' DAY whn + INTERVAL '5 DAY'andwhn + 5 ANSI andwhn + 5
SUBSTRING(s FROM x FOR y) ANSIANSI andSUBSTRING (s, x, y)
a||b||c

ANSI
CONCAT(a,b,c)


Compatibility of common functions (Oracle, SQL Server, Access)
ANSIOracleSQL ServerAccess
COALESCE(x, y) ANSI andNLV(x, y) ANSI Nz(x, y)
CASE WHEN b THEN tELSE f END ANSIANSI IIF(b, t, f)
EXtrACT(MONTH FROM d) ANSI and MONTH( d ) DatePart( mm, d) Month( d )
whn + INTERVAL '5' DAY ANSI andwhn + 5 DateAdd( d , whn, 5)andwhn + 5 DateAdd( d, whn, 5)andwhn + 5
SUBSTRING(s FROM x FOR y) SUBSTR(s, x, y) Substring(s, x, y) MID(s, x, y)
a||b||c

ANSI a + b + c a & b & c




 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows