Friday, November 23, 2007

Creating great looking ebooks for the Kindle

The Kindle has finally launched. While most people have been focusing on the 80,000 books available at the Kindle Store and bitching about the price, I've been focusing on free content from the web. There are tens of thousands of free books at Project Gutenberg alone. I've never been able to read through a novel from there in the past because it's just not that comfortable to read 300 pages of text on a computer, even a laptop. Now that I own an ebook reader (the Kindle), I can finally take advantage of the plethora of free content out there.

The easiest way to get one of these free ebooks onto your Kindle is to email the html file to your Kindle. You'll have to pay Amazon ten cents for this privilege, but it's still much cheaper than the several dollars you'd have to pay for the same book, with the same content on the Kindle store. If you don't mind doing a little more you can email it to you@free.kindle.com and get the Kindle version emailed back to you for free. Then you only need to connect your Kindle up to your computer and copy the file over into the documents directory.

There are also websites out there that have already converted these books to MobiPocket files (which you can just copy over to your Kindle). ManyBooks is where I generally go.

The problem with all of these is that the books come out as one long file with no table of contents, no easy way to jump around the book, and worst of all: no title and author metadata so the only thing that shows up on your Kindle content page is the name of the file. This is perfectly fine if you don't have much content, but the whole reason I wanted a Kindle is so I could walk around with hundreds of books and read what I felt like at the time. It's difficult to find the book you want without this metadata.

Fortunately, it's not that difficult to create a nice looking, high quality mobi file with a table of contents and proper metadata using wine and mobigen. The following is the process that I used to create a high quality ebook for Gibbon's History of the Decline and Fall of the Roman Empire.

First get the software
  1. Go get mobigen and put it in your working directory.
  2. Install either Wine or Darwine (assuming you don't use Windows).
  3. Go get msvcr71.dll and put it in your working directory.
Now make sure that mobigen works:
% export PATH=$PATH:/Applications/Darwine/Wine.bundle/Contents/bin
% export LD_LIBRARY_PATH=/Applications/Darwine/Wine.bundle/Contents/lib
% cd /var/tmp/working-directory
% wine mobigen.exe
Now get the content.

We'll use Gibbon as an example. I got the html files from Project Gutenberg. Project Gutenberg is dedicated to plain text files, but a lot of books are also available in html which makes our job a lot easier. You can get a working example of gibbon here.

After downloading them I noticed that there were some really weird characters, so I wrote an emacs macro and cleaned up the text. I also didn't like how the text was divided up into 6 massive volumes. I would much prefer to have smaller chapters with a table of contents. So I wrote another emacs macro to break the 6 html files (1 per volume) into 71 html files (1 per chapter).

I then created the opf file. This is an xml file that describes how to bind all of these html files into the MobiPocket ebook. The opf structure is just the Open eBook Publication Structure with some MobiPocket specific extensions. A simple (but nicely formatted) opf file has the following:
  • metadata: Includes the title, author, subjects, ISBN, description, and sources. This is what makes the ebook look better on the main Kindle screen.
  • manifest: Lists and names the content files, both html and images.
  • spine: Adds order to the content.
  • guide: Defines the pages that show up on the menu.
Here's an example opf file.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.2 Package//EN" "http://openebook.org/dtds/oeb-1.2/oebpkg12.dtd">
<package unique-identifier="my-gibbon-UUID">
<metadata>
<dc-metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
<dc:Title>The Decline And Fall Of The Roman Empire</dc:Title>
<dc:Language>en</dc:Language>
<dc:Identifier id="my-gibbon-UUID"
scheme="ISBN">123456789X</dc:Identifier>
<dc:Creator role="aut">Edward Gibbon</dc:Creator>
<dc:Subject>Rome--History--Empire, 30 B.C.-476 A.D.</dc:Subject>
<dc:Subject>Byzantine Empire--History.</dc:Subject>
<dc:Description>Describes the fall of the Roman Empire.</dc:Description>
<dc:Source>http://www.gutenberg.org/dirs/etext97/dfre110.htm</dc:Source>
<dc:Source>http://www.gutenberg.org/dirs/etext97/dfre210.htm</dc:Source>
<!-- ... -->
<dc:Rights>http://www.gutenberg.org/wiki/Gutenberg:The_Project_Gutenberg_License</dc:Rights>
</dc-metadata>
</metadata>
<manifest>
<item id="table-of-contents" href="table-of-contents.html" media-type="text/x-oeb1-document" />
<item id="introduction" href="introduction.html" media-type="text/x-oeb1-document" />
<item id="chapter-1" href="chapter-1.html" media-type="text/x-oeb1-document" />
<item id="chapter-2" href="chapter-2.html" media-type="text/x-oeb1-document" />
<item id="chapter-3" href="chapter-3.html" media-type="text/x-oeb1-document" />
<!-- ... -->
</manifest>
<spine>
<itemref idref="table-of-contents" />
<itemref idref="introduction" />
<itemref idref="chapter-1" />
<itemref idref="chapter-2" />
<itemref idref="chapter-3" />
<!-- ... -->
</spine>
<guide>
<reference type="toc" title="Table of Contents" href="table-of-contents.html" />
</guide>
</package>

The metadata is in Dublin Core format which is one of the main standards, if not the standard, for digital asset metadata. I got most of this data from the Project Gutenberg website. Note the dc:Source and dc:Rights tags. These are important attributes for our purposes since we're using work created by other people. I put in links to the source html files and the Project Gutenberg license.

Also note the table of contents. Project Gutenberg texts don't come with a table of contents so you'll have to create one yourself. Within MobiPocket files a table of contents is just an html file with a tag in the "guide" that says it's of type "toc".

Since all of these source files are just html you are able to include images too. Most Project Gutenberg texts don't come with images, but I was able to find high quality scans of maps from the original printing of Gibbon. Though, remember that the Kindle's screen is only 600x800 with 2-bit pixel depth. I haven't yet found out what the ideal image size is for the Kindle, but it's definitely not 600x800 because the Kindle automatically shrinks the pictures a little bit.

Once you have all of your content you can generate the mobi file:

% export PATH=$PATH:/Applications/Darwine/Wine.bundle/Contents/bin
% export LD_LIBRARY_PATH=/Applications/Darwine/Wine.bundle/Contents/lib
% cd /var/tmp/working-directory
% wine mobigen.exe gibbon/gibbon.opf
I hope that we can create a community that puts together a bunch of high quality free ebooks for the Kindle.

6 Comments:

Blogger Debby said...

I hope you send this information to your high school guidance counselor to see if she understands it!

12:44 PM  
Blogger JRS said...

Thank you. Thank you so much for posting this. It's the first coherent document on how to build mobis that I've found that didn't include, "next, get mobipocket publisher and..." which as a mac-head, I can't do. I got my Kindle for Christmas, and have been leaning on my publisher to make my novel available for it. And the fastest route for that turned out to be creating the mobi myself. I had just hit the wall with how to hook the table of contents up when I stumbled across your website.

After that, I was up and running in a couple hours.

-JRS

4:06 PM  
Blogger Circlet Press said...

ff about creating an emacs marco is beyond my abilities, but editing an HTML file by hand in a word processor... that I can do. I'm a publisher *just* starting to make books for the Kindle (been making print books for 16 years), and it's amazing how little info Amazon provides on how the hell to do this stuff. Thank you for posting this!

9:34 AM  
Blogger Circlet Press said...

Oh, and I found your post because someone in the Amazon DTP forums linked to it! Thanks again.

9:35 AM  
Anonymous John Goerzen said...

You might be interested to know that there is a Free Software mobi generator out there called mobiperl. No need to use Wine, and it should work on Macs.

https://dev.mobileread.com/trac/mobiperl

opf2mobi included in that package does exactly what's needed here.

9:10 AM  
Blogger Circlet Press said...

@ John Goerzen -- Ooooh! Thank you! I just downloaded it the Mobiperl stuff thanks to your pointer. It didn't compile but I think someone here can probably figure it out.

1:14 PM  

Post a Comment

<< Home