iamtgc

Importing XML into a database with Python and SQLAlchemy

January 29th, 2008 by tgc

Let’s begin by analyzing the XML we want to import into our database, it consists of a book’s ISBN, title, and author.
<!-- books.xml --> <catalog> <book isbn="1-880985-26-8"> <title>The Consumer</title> <author>M. Gira</author> </book> <book isbn="0-679775-43-9"> <title>The Wind-Up Bird Chronicle</title> <author>Haruki Murakami</author> </book> <!-- imagine more entries here... --> </catalog>


Now we can create the database table.
create table books ( isbn varchar(14) primary key not null, title varchar(50), author varchar(50) );

Our example depends on SQLAlchemy, which is a “Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.”

While our example leverages a Postgres database, given SQLAlchemy’s database agnostic approach, it can trivially be modified to access a MySQL database.

This code is derived from a SAX parser originally found in Programming Python 3rd Edition.

# bookhandler.py from sqlalchemy import * from sqlalchemy.orm import * import xml.sax.handler pg_db = create_engine('postgres:///testdb?user=homer') metadata = MetaData(pg_db) books_table = Table('books', metadata, autoload=True) class Book(object): pass mapper(Book, books_table) class BookHandler(xml.sax.handler.ContentHandler): def __init__(self): self.buffer = "" self.inField = 0 self.session = create_session(bind=pg_db) def startElement(self, name, attributes): if name == "book": self.isbn = attributes["isbn"] elif name == "title": self.inField = 1 elif name == "author": self.inField = 1 def characters(self, data): if self.inField: self.buffer += data def endElement(self, name): if name == "book": self.session.begin() self.newbook = Book() self.newbook.isbn = self.isbn self.newbook.title = self.title self.newbook.author = self.author self.session.save(self.newbook) self.session.commit() elif name == "title": self.inField = 0 self.title = self.buffer elif name == "author": self.inField = 0 self.author = self.buffer self.buffer = ""
Now that we’ve set up our sax parser and handler to parse and load the entries from books.xml into the table, lets set up a small script to drive it:
# runit.py import bookhandler import xml.sax parser = xml.sax.make_parser() handler = bookhandler.BookHandler() parser.setContentHandler(handler) parser.parse("books.xml")

Now let’s see if it works:
$ ls bookhandler.py books.xml runit.py $ python ./runit.py $ psql testdb testdb=# select * from books; isbn | title | author ---------------+----------------------------+----------------- 1-880985-26-8 | The Consumer | M. Gira 0-679775-43-9 | The Wind-Up Bird Chronicle | Haruki Murakami (2 rows)

Posted in MySQL, Postgres, Python

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.