How to Edit EPUB Books Manually
At one time or another, you may find yourself compelled to modifying EPUB files manually. This may be the result of certain validation errors that can’t be fixed using your favourite authoring software. In those cases, it’s best to work at the low-level and get your hands dirty, but nothing to fear. You can wash your hands afterwards.
Our experience with Kotobee showed us many forms of validation errors that are nearly impossible to correct using a software. No matter how much effort we put into Kotobee Author development, there are types of errors that are inevitable. Those errors originate from the user. For example, those who create their ebook by copying and pasting from their external PDF sources can send along with their content a number of hidden meaningless characters, invisible to the eye, but visible (and prohibited) to the EPUB validator. That is a problem.
In this brief article, we will explain step by step how to accomplish a few goals, by editing an EPUB file ourselves. If you are not yet very familiar with the EPUB format, a strongly recommended read can be found here: Alice in EPUB-land: Understanding the EPUB Format.
The basics
Just to recap. An EPUB file is a compressed ZIP archive file. Nothing more nothing less. Let’s go over a few simple operations:
Extracting content
To extra the contents, rename the extension to .zip. Use any Zip extraction tool, and grab the files inside the folder. I personally use WinRAR, allowing me to right-click on the file and choose Extract to ‘folder name’.
Compressing back
After completing your edits, select all the files at the root level (that you have extracted), and compress them back into a .zip file. On Windows, this can be done by right-clicking, and from the Context menu select Send to > Compressed (zipped) folder. Rename the extension to .epub.
Important files
The most important file you should be after is the package file, which describes the entire book. The package file not have a fixed location. Its location is rather indicated inside the container.xml file, which you’ll find inside the META-INF folder. As a reminder, for further understanding of the EPUB anatomy, this is a recommended read: Alice in EPUB-land: Understanding the EPUB Format.
The following action steps all require that you have the package file open in front of you, using any text editor. I prefer Notepad++.
Editing chapter content
- From the package file, search for <manifest>.
- Directly under it, you will see a long list of items, all the way until the closing tag </manifest>. This is an index of all the assets (chapters, images) existing inside the EPUB. The path to each asset is explicitly defined.
- To distinguish chapters from the other types of assets, chapters have a media-type value of “application/xhtml+xml”.
- There isn’t a way to preview the content (or even title) of each chapter unless you open it. So use the path defined for each chapter item, and open the document in your text editor.
- Apply the edits you need. Once you’re done, simply save. That’s all there is to it!
Adding/removing chapters
- From the package file, search for <spine>.
- The list found underneath (all the way until the closing tag </spine>) defines the navigation order of chapters. Each item is a chapter by itself. The idref attribute defines the ID of the chapter, which links to a unique item in the manifest list through its id attribute.
Removing a chapter
- Find the chapter that you want to remove. Finding it will require you to open them one by one, as the package file does not describe the content. But the order in the spine should give you a hint. If the chapter is one of the last chapters, then look near the bottom of the spine list. You get the point.
- You will need to remove: the document file, the spine item, and the manifest item. Missing any one of these will give you an EPUB validation check error.
Adding a chapter
- Create a new HTML file anywhere, preferably beside the other HTML files. To avoid basic errors, it’s better to copy one of the existing chapters and modify it as needed.
- Inside the package file, under the manifest list, add a new item similar to any of the other existing chapters in the list (the order does not matter). Enter the new path of the chapter into the href attribute. The path must be relative to the package file. Also, enter a unique ID to the id property.
- Under the spine list, add a new item as such: <itemref idref=”YOUR_UNIQUE_ID”/>. Here order does matter. So make sure you’ve placed it before and after the appropriate items.
Editing properties
Each asset inside the book can have one or more properties assigned to describe it. For example, you can assign a cover-image property to an image, to indicate it’s the book cover. Or you can assign nav to an HTML document to indicate it’s the table of contents. For a list of all supported properties, click here.
- From the package file, search for <manifest>.
- From the item list underneath, search for the item in question.
- If the item does not have a properties attribute, you can add one, as follows:
<item id="chapter1.html" href="xhtml/ch1.html" properties="YOUR_PROPERTY" media-type="application/xhtml+xml"/>
- If the properties attribute exists, add your property name to the existing one(s), separating it with a space.
Find a bad (hidden) character
If you get a complaint from the EPUB validation for an unsupported character, then most probably you have copied a bad (hidden) character from an external source. The EPUB validator should tell you the path to the specific file encountered. Here is an example of an error:
<message>RSC-016, FATAL, [Fatal Error while parsing file 'An invalid XML character (Unicode: 0x1f) was found in the element content of the document.'.], EPUB/xhtml/spptsa.html (22-307)</message>
- Locate the file with the error.
- Open it in Notepad++ (other advanced text-editors may work as well. But avoid the basic Windows Notepad).
- In the menu, select View > Show Symbol > Show All Characters.
- You will start seeing new symbols in the document, such as LF (line feed) and CR (carriage return). Don’t worry, they are not a problem.
- Scan around the document. You should see some strange symbol in the middle of the document, clearly in the wrong place.
- Delete that symbol.
There’s a high chance that this error was repeated elsewhere in your book. So you would want a certain process to remove all instances of that character in bulk. That brings us to the next section. Instead of deleting the strange symbol, copy it instead. Let’s fight fire with fire.
Bulk edits
For books consisting of several dozen chapters, you would want a more automated way to apply your edits throughout the whole book. The popular Find and Replace function is just what you need. However, you’ll need to use it in a tool that supports searching within folders instead of a single file. Again, this is where Notepad++ comes in handy.
- Open Notepad++
- From the menu, select Search > Find in Files.
- Enter what you are searching for in the Find what field.
- Enter what you want to replace it within the Replace with field.
- In the Directory field, paste the path to the extract EPUB folder. Do not forget this step!
You can go deeper than that if all the chapters are collected in a specific folder you know of. - Warning: This is considered a dangerous operation since it can change literally hundreds if not thousands of files without an undo operation. Verify your entered values carefully. It is strongly recommended to click on Find All first, to make sure you’re getting what you intended.
- Click on Replace in Files. And you just saved yourself a day’s worth of work!
You can use this technique to get rid of strange symbols scattered across multiple documents. Once you find the strange symbol, using the steps outlined in the previous section, copy the symbol to your clipboard. In the Find what field, you will need to paste it there. Nothing will appear as it is a hidden character. But trust me, it is there. That sneaky son of a glitch. In the Replace with field, leave that empty. And now you’re set. Clicking on Replace in Files, will delete this character everywhere else in your EPUB.
Conclusion
Once you know the insides of an EPUB file, you will feel the capability of doing anything to it. You’re not restricted to using certain software, nor request over-valued ebook services, for what you would discover to be simple changes. Validation error messages are pretty much clear to indicate exactly what the problem is. From our side, we will continue to enhance our Kotobee Author, to ultimately find ways to catch and correct user-generated errors that are tricky and extreme difficulty to catch otherwise. And three simple words before we close the curtains.
Never underestimate Notepad++
Read more.
The Ebook Author’s Guide To Images
Pingback: The ABCs of editing ePub manually: Automation goes only so far
Pingback: EPUB Navigation Peculiarities - Kotobee Blog
Aziz
March 15, 2017Sir my epub dictionary ebook table of contents is not alphabeticaly but mixed
I want to reorder the topics alphabeticaly is there any shortcut
Kotobee
March 15, 2017The table of contents is generated depending on the order of the chapters (just like any book). What exactly are you trying to achieve?
Aziz
March 15, 2017I just want to order TOC alphabeticaly
Dave Howell
February 18, 2018Tables of Contents always show the order that the material appears in the book. What you want is an index.
Aziz
March 15, 2017Sir i hust want to reorder TOC alphabetically
Kotobee
March 15, 2017In this case, order your chapters alphabetically, and the TOC will be generated alphabetically
Alper Karazeybek
June 18, 2020I just needed this blog post for 3 days. I finally found it! Well the problem is even I didn’t edit my eBook manually and add contents via Kotobee Author the errors on OPF file and html generated by Author is still obvious. I center a paragraph and Kotobee writes align=center on back-end. The EPUB CHEKER finds it as a error! I still need to know if those problems are important for third parties such Google Play Books or others. If yes… That makes Kotobee difficult to use for e-books. But the Kotobee makes sense with Cloud services which all functions could work perfectly! I guess! Sad post!
Kotobee
June 18, 2020Hi Alper. Did you create your ebook from scratch? Or import (e.g. an epub) from elsewhere? Because Kotobee Author never centers paragraphs using attributes. It does it using CSS: style=”text-align: center”. So that align=center must be already coming from your content.
I recall you did send to our support once, correct?
Alper Karazeybek
June 18, 2020Hey! I’m writing again to clarify:
– No importing (just copy pasted texts for the text content)
– . Not align=center (Sorry for that!) – I used the tag end then found a css solution for the centered images or html audio item.
– My main claim is this: I can’t use Kotobee functions because of back-end JS coding. And of course in this case knowledge of html/css is a must for standalone e-books.
– Only knowing them is not enough, knowing if the code is good for ePUB standarts. For example I added audio files. I wanted to avoid dowload button from html code . And thank you ePUB standarts! It’s not allowed as well! And thank you Kotobee can’t function my audio file without JS!
Apart from coding those are still not solved by support. My word file which includes error codes just checked for only first page I suppose. Here are the problems not related to any html/css coding:
– xhtml file exist but not declared in the OPF manifest. (all the xhtml files)
– Referenced resource for online font is not declared in the OPF manifest.
– The property ‘remote-resources’ should be declared in the OPF file.
– File ‘EPUB/css/kotobeeInteractive.css’ could not be found.
– File ‘EPUB/js/global.js’ could not be found.
(Those are definetly export problems I suppose)
My conclusion is very clear! Kotobee Author is good for solutions with Kotobee Hosted/Cloud/Library solutions. Otherwise I must know both coding + EPUB relevant codes.
Unfortunetly I won’t be able to use it for my standalone e-book! I’m so tired of finding some codes and then seeing they are not good for EPUB just because I can’t use Kotobee’s real functions. And If I find some proper solutions…Then I see Kotobee was not able to export proper OPF manifest.
Hope to see you in cloud functions! Thanks for giving me some hope!
Kotobee
June 19, 2020Copying and pasting is the source of the problem. It copies the structured HTML and pastes it as it is. You’ll face this problem with any epub editor.
If you paste into notepad, then copy from notepad into Kotobee Author, that is better since all the formatting is removed.
Note we provide a service where we can clean the HTML for you, so that your epub is validated according to epubcheck and accepted by online book stores. If you’d like that, go to http://www.kotobee.com/contact and select Service Inquiry from the Request type.
Alper Karazeybek
June 19, 2020Okay, let me try and see! I’ll definetly try notepad!
What about the errors of OPF manifests, not declared remote sources etc?
Kotobee
June 19, 2020I don’t know the specific reason, but send it to support (if you didn’t already), and ask specifically about these errors. Try to make each ticket for a specific problem/question. That makes responses faster.
Alper Karazeybek
June 19, 2020Thanks for your answers brother! L Okay. I got so upset because of it was not functioning. I ll close the ticket I have and create a new file with less errors and speficic errors about opf and NCX. This audio centering ll kill me ? Have a great morning!
Alper Karazeybek
June 23, 2020Sir, Kotobee claims Author software can export EPUB in right and proper EPUB standarts. This shouldn’t be a service. Problem making html codes or OPF file problems. Because Kotobee Author promotions doesn’t mention this. It just says you can export your content with EPUB standarts. Let’s say I got rid of the problem making html codes. OPF file problems are definetly unacceptable. OPF files are the hearts of the EPUB files. I’m not going to use the Author only for Kotobee Reader or Cloud. I’m sorry but It’s so dissapointing.
Kotobee
June 24, 2020I don’t know the specific reason, but please send this to our support and ask about these particular errors. Try to make each ticket for a specific problem/question. That makes responses faster.