Importing XML into a database with Python and SQLAlchemy

Let’s begin by analyzing the XML we want to import into our database, it consists of a book’s ISBN, title, and author.

<!-- books.xml -->
<catalog>
  <book isbn="1-880985-26-8">
    <title>The Consumer</title>
    <author>M. Gira</author>
  </book>
  <book isbn="0-679775-43-9">
    <title>The Wind-Up Bird Chronicle</title>
    <author>Haruki Murakami</author>
  </book>
  <!-- imagine more entries here... -->
</catalog>


Now we can create the database table.

create table books
(
isbn varchar(14) primary key not null,
title varchar(50),
author varchar(50)
);

Our example depends on SQLAlchemy, which is a “Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.”

While our example leverages a Postgres database, given SQLAlchemy’s database agnostic approach, it can trivially be modified to access a MySQL database.

This code is derived from a SAX parser originally found in Programming Python 3rd Edition.

# bookhandler.py
from sqlalchemy import *
from sqlalchemy.orm import *

import xml.sax.handler

pg_db = create_engine('postgres:///testdb?user=homer')

metadata = MetaData(pg_db)

books_table = Table('books', metadata, autoload=True)

class Book(object):
    pass

mapper(Book, books_table)

class BookHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.buffer = ""
        self.inField = 0
        self.session = create_session(bind=pg_db)

    def startElement(self, name, attributes):
        if name == "book":
            self.isbn = attributes["isbn"]
        elif name == "title":
            self.inField = 1
        elif name == "author":
            self.inField = 1

    def characters(self, data):
        if self.inField:
            self.buffer += data

    def endElement(self, name):
        if name == "book":
            self.session.begin()
            self.newbook = Book()
            self.newbook.isbn = self.isbn
            self.newbook.title = self.title
            self.newbook.author = self.author
            self.session.save(self.newbook)
            self.session.commit()
        elif name == "title":
            self.inField = 0
            self.title = self.buffer
        elif name == "author":
            self.inField = 0
            self.author = self.buffer
        self.buffer = ""

Now that we’ve set up our sax parser and handler to parse and load the entries from books.xml into the table, lets set up a small script to drive it:

# runit.py
import bookhandler
import xml.sax

parser = xml.sax.make_parser()
handler = bookhandler.BookHandler()
parser.setContentHandler(handler)
parser.parse("books.xml")

Now let’s see if it works:

$ ls
bookhandler.py  books.xml       runit.py
$ python ./runit.py
$ psql testdb

testdb=# select * from books;
     isbn      |           title            |     author
---------------+----------------------------+-----------------
 1-880985-26-8 | The Consumer               | M. Gira
 0-679775-43-9 | The Wind-Up Bird Chronicle | Haruki Murakami
(2 rows)
This entry was posted in MySQL, Postgres, Python. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>