Importing XML into a database with Python and SQLAlchemy
Let’s begin by analyzing the XML we want to import into our database, it consists of a book’s ISBN, title, and author.
<!-- books.xml -->
<catalog>
<book isbn="1-880985-26-8">
<title>The Consumer</title>
<author>M. Gira</author>
</book>
<book isbn="0-679775-43-9">
<title>The Wind-Up Bird Chronicle</title>
<author>Haruki Murakami</author>
</book>
<!-- imagine more entries here... -->
</catalog>
Now we can create the database table.
create table books
(
isbn varchar(14) primary key not null,
title varchar(50),
author varchar(50)
);
Our example depends on SQLAlchemy, which is a “Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.”
While our example leverages a Postgres database, given SQLAlchemy’s database agnostic approach, it can trivially be modified to access a MySQL database.
This code is derived from a SAX parser originally found in Programming Python 3rd Edition.
# bookhandler.py
from sqlalchemy import *
from sqlalchemy.orm import *
import xml.sax.handler
pg_db = create_engine('postgres:///testdb?user=homer')
metadata = MetaData(pg_db)
books_table = Table('books', metadata, autoload=True)
class Book(object):
pass
mapper(Book, books_table)
class BookHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.buffer = ""
self.inField = 0
self.session = create_session(bind=pg_db)
def startElement(self, name, attributes):
if name == "book":
self.isbn = attributes["isbn"]
elif name == "title":
self.inField = 1
elif name == "author":
self.inField = 1
def characters(self, data):
if self.inField:
self.buffer += data
def endElement(self, name):
if name == "book":
self.session.begin()
self.newbook = Book()
self.newbook.isbn = self.isbn
self.newbook.title = self.title
self.newbook.author = self.author
self.session.save(self.newbook)
self.session.commit()
elif name == "title":
self.inField = 0
self.title = self.buffer
elif name == "author":
self.inField = 0
self.author = self.buffer
self.buffer = ""
Now that we’ve set up our sax parser and handler to parse and load the entries from books.xml into the table, lets set up a small script to drive it:
# runit.py
import bookhandler
import xml.sax
parser = xml.sax.make_parser()
handler = bookhandler.BookHandler()
parser.setContentHandler(handler)
parser.parse("books.xml")
Now let’s see if it works:
$ ls
bookhandler.py books.xml runit.py
$ python ./runit.py
$ psql testdb
testdb=# select * from books;
isbn | title | author
---------------+----------------------------+-----------------
1-880985-26-8 | The Consumer | M. Gira
0-679775-43-9 | The Wind-Up Bird Chronicle | Haruki Murakami
(2 rows)