Text Formatting Question

Status
Not open for further replies.
Joined
Apr 2, 2015
Messages
1,840
Location
USA
Let's say I have an e-book that needs editing, mostly because the formating is screwed up with incorrect line breaks. The e-book's file format is azw3. Let's also assume that I can convert to and save as a plain text file. Kindle can read plain text files. Yet the crazy formatting returns once I open the plain text file on my Kindle. Converting the txt file to an AZW3 file does not result in a better result. The first two chapters had proper line breaks, but anything after that was again riddled with incorrect line breaks. Why do these faulty line breaks occur?
 
This is the original azw3 file on the Kindle. As you can see, the line breaks are horrible.

xomd7o.jpg


This file started out as the original azw3 file, which was then converted toi plain text and converted back into azw3. I also tried just the plain text file. The first two chapters look great, although the spacing between paragraphs is not consistent:

34xknds.jpg


But beyond chapter 2 it's back to terrible:

2vd5hev.jpg
 
This is happening because carriage return/line feed (CR/LF) characters have been inserted at the ends of each line.

If you have Microsoft Word, you can get rid of the CR/LF characters in bulk by doing this:
1- open file in Word
2- select the entire document (Ctrl+A)
3- go to Edit > Replace (or wherever that function is in your version)
4- in the "Find what" dialog box, type this: ^p
5- then hit the "Replace All" button.

The only drawback is that the words at the end of each line and the start of the next lower line will be joined, since there is no space character to separate them.

You may be able to insert a space before the CR/LF -- before removing the CR/LF -- by using regular expressions. Regular expressions are a sort of programming.
Microsoft Office page on regex
 
Thanks. I have Libre Office and will look into if it can do what you describe. So CR/LF characters are not considered formatting, I guess? Otherwise they'd be disregarded when converting to plain text?
 
Originally Posted By: BRZED
So CR/LF characters are not considered formatting, I guess? Otherwise they'd be disregarded when converting to plain text?

No, CR/LF characters are not considered formatting in the sense of being something that gets stripped off when formatting is lost. In fact, I've even seen CR/LF characters being ADDED when formatting is lost, which may be what has happened in your case.

It's been my experience that CR/LF characters are NOT stripped off because the program doesn't know which ones you want and which ones you don't.

Trailing spaces, on the other hand, are often stripped off, the program having decided that you wouldn't want them there and may not even know they are there in the first place. This is why it's tough to get the old Usenet "-- " signature marker to work properly in MS products.
 
Alright. I'll have a go at it and see if I can fix this mess somehow. Thanks again.
 
I have a suspicion that the problem you are having is because a carriage return and a line feed are not technically the same thing. They are different control codes. Some programs treat them as the same, and others do not. That's why we see these formatting nightmares all the time.
 
I can only presume that books are released wit this kind of faulty formatting because fixing it is very difficult. I just can't accept it like that. Obviously, I do not want to have to type the whole book from scratch.
 
Originally Posted By: BRZED
I can only presume that books are released wit this kind of faulty formatting

I'm finding it a bit difficult to understand how a vendor could release a file for use on a Kindle that does not render properly on a Kindle. Was this file obtained from an authorized source, or did it come from somewhere else?
 
Originally Posted By: BRZED
I can only presume that books are released wit this kind of faulty formatting because fixing it is very difficult.

You can see this kind of mess, for example, when you send a gmail from an iPhone to someone that is using a real email client, for instance, or send text that is using real line feeds in the "wrong" spot. There are a couple solutions, but they're not easy or even ideal. One would be to bring it into a text editor and yank all carriage returns and line feeds. Then, have the text editor insert the line feeds at X number of columns. Even some email clients do that. I know that Eudora used to do that, and I used to play around with straightening some things out back in the day with that.

However, it is, at best, a hack, since your paragraphing will be gone. It's fine for fixing a little text document, but for something long, it's obviously problematic.

Ideally, a script that would remove every single line feed but ignore two or more line feeds in a row would fix the problem.
 
Originally Posted By: Tegger
Originally Posted By: BRZED
I can only presume that books are released wit this kind of faulty formatting

I'm finding it a bit difficult to understand how a vendor could release a file for use on a Kindle that does not render properly on a Kindle. Was this file obtained from an authorized source, or did it come from somewhere else?


Official Amazon release. Imagine they butchered a classic like that.
 
Generally, all Kindle book are riddles with errors, mostly with typos. As I underrstand it, most older books are digitized by scanning an actual printed book and not by using a digital source. The software will often make errors and apparently books are no longer proofread. Newer books, which have likely been written on a word processor or computer are generally less riddled with errors, but often they appear to not havfe been proofread at all. Many Steven King books, most of those written sience the late '70 were originally written on Wang word processors, and there shoudl be original digital files, but the e-books are also showing typos, and many more than the actual printed books. My only reasons for going to ebooks is a lack of space and the option of carrying a whole library of books with me anywhere and anytime. I'd really like to fix my favorite Verne book!
 
What would happen if you tried to view a plain text file, such as a couple thousand words, that was not divided by paragraphs and had no carriage returns or line feeds. How would that appear on your Kindle?
 
Originally Posted By: Garak
What would happen if you tried to view a plain text file, such as a couple thousand words, that was not divided by paragraphs and had no carriage returns or line feeds. How would that appear on your Kindle?


The text shows up exactly like it does in a text editor, other than that it is also justified on the Kindle. What does that tell you?
 
That tells me the Kindle is doing its own formatting and is also handling the carriage returns/line feeds and making a giant mess out of the novel you're trying to read. It's splitting lines by so many characters and then splitting the line again when it encounters a CR/LF, hence, your mess.

Can you play with text size on those things to see what that does? It might make something more readable, assuming it's not too small or large for your preference, but you might be able to hack in a solution that way.
 
Other than justificatiom, which I don't know if it even counts as formatting, is the Kindle doing? It didn't insert line breaks or paragraphs into my giant block of plain text. Changing the font size has no effect on formatting.
 
In order to fix the book, I have to hit delete at the binning of every line until one whole empty line before it and the partially empty line before that has been deleted. Then I insert one space. I hit return twice if there is supposed to be a new paragraph. If I do this for the whole book, the formatting will be fixed. It's just so tedious!

From this:

ytutytuytuytuyruyrtrtyr

hjghjg

vhjghghjgfhjfghfhgjf


hghjghjkkgkjhg

vmmvmbnvmvmvmv
bvbnmv


To that:


ytutytuytuytuyruyrtrtyr hjghjg
vhjghghjgfhjfghfhgjf

hghjghjkkgkjhgvmmvmbnvm
vmvmvbvbnmv
 
If it's out of copyright, there should be other versions of the book available online.
 
Status
Not open for further replies.
Back
Top