You are not logged in.

#1 2011-02-24 02:19:36

ctarwater
Member
Registered: 2009-02-05
Posts: 300

HTML Editor - converting "special characters" into entities

Hi everyone,

I'm currently in the middle of formatting my wife's novel into 'clean' html so I can use Calibre to create a well formatted epub/mobi/etc.

I've read that TextMate on Mac includes a function to "Convert Selection to Entities excluding Tags".  In other words it will replace all special characters (ellipses, copyright sign, etc) with the html entities.

I'm currently using Kate and I can't seem to find anything similar.  Is anyone aware of any linux editors with the same function?

Thanks for the help!

Offline

#2 2011-02-24 03:31:22

upsidaisium
Member
From: Vietnam
Registered: 2006-09-16
Posts: 263
Website

Re: HTML Editor - converting "special characters" into entities

Find and replace? Just look up a reference of HTML entities so you know that, for example, you should replace the ellipsis with:

…

Here is one reference: http://www.w3schools.com/tags/ref_entities.asp
(Note that it's broken up into two or three pages, so if you don't find all the entities you're looking for on the first page then keep looking)


I've seen young people waste their time reading books about sensitive vampires. It's kinda sad. But you say it's not the end of the world... Well, maybe it is!

Offline

#3 2011-02-24 03:37:12

ctarwater
Member
Registered: 2009-02-05
Posts: 300

Re: HTML Editor - converting "special characters" into entities

Yeah, I'm currently doing it all manually.  Using find and replace for things one at a time, but apparently that mac program lets you click a button and it replaces ALL special characters (curved quotes, ellipses, etc.)  I'm neurotic, what can I say?

Offline

#4 2011-02-24 15:01:24

upsidaisium
Member
From: Vietnam
Registered: 2006-09-16
Posts: 263
Website

Re: HTML Editor - converting "special characters" into entities

Well it would, of course, be nice if it was easier. I wonder if Kate supports macros or scripting of any sort? If you invested the time to create a macro that replaces all special characters with HTML entities then you could re-use it again in the future without having to waste any more time.


I've seen young people waste their time reading books about sensitive vampires. It's kinda sad. But you say it's not the end of the world... Well, maybe it is!

Offline

#5 2011-02-24 15:43:27

jdarnold
Member
From: Medford MA USA
Registered: 2009-12-15
Posts: 485
Website

Re: HTML Editor - converting "special characters" into entities

I did find this on the web:

http://www.opinionatedgeek.com/dotnet/t … ncode.aspx

I'll bet there are plenty of other sites that will translate text with HTML encoding.

Offline

#6 2011-02-24 16:22:09

ctarwater
Member
Registered: 2009-02-05
Posts: 300

Re: HTML Editor - converting "special characters" into entities

I'll give that a try and check the quality against my manual entries.  So far manually isn't taking as long as I figured it would.

Thanks!

Offline

#7 2011-02-25 17:14:11

awkwood
Member
From: .au <=> .ca
Registered: 2009-04-23
Posts: 91

Re: HTML Editor - converting "special characters" into entities

If you have Ruby installed (and haven't converted it all manually yet) you could use the HTMLEntities gem for this task.
To install:

gem install htmlentities

Then:

require 'htmlentities'

coder = HTMLEntities.new
string = "Tacòs!"
puts coder.encode(string, :named)

#  Tac&ograve;s!

More info on the HTMLEntities gem.

Last edited by awkwood (2011-02-25 17:14:58)

Offline

#8 2011-02-25 21:47:30

rwd
Member
Registered: 2009-02-08
Posts: 664

Re: HTML Editor - converting "special characters" into entities

I know that at least  PSPad  , a freeware windows text editor that runs perfectly under wine can do this. It has the option under 'tools -> user convertors -> chars to named html entity.

Last edited by rwd (2011-02-25 21:47:45)

Offline

#9 2011-02-25 23:02:33

ctarwater
Member
Registered: 2009-02-05
Posts: 300

Re: HTML Editor - converting "special characters" into entities

Thanks for the suggestions guys, I'll check them out tonight!

Offline

#10 2011-02-27 02:13:56

Anthony Bentley
Member
Registered: 2009-12-21
Posts: 76

Re: HTML Editor - converting "special characters" into entities

What is the encoding of the files? If it’s already UTF‐8, you shouldn’t have to convert anything, as ePUB and others are XML and require Unicode.

If they’re something else (e.g., CP1252), you can use iconv to convert:

iconv -f WINDOWS-1252 -t UTF-8 infile > outfile

Offline

#11 2011-02-27 02:17:09

ctarwater
Member
Registered: 2009-02-05
Posts: 300

Re: HTML Editor - converting "special characters" into entities

Encoding is already UTF-8 but I'm trying to make these docs as "universal" as possible and I've read that little things like replacing quotation marks and other symbols with their html entities is a good step in that direction.

Offline

#12 2011-02-27 06:13:57

Anthony Bentley
Member
Registered: 2009-12-21
Posts: 76

Re: HTML Editor - converting "special characters" into entities

ctarwater wrote:

Encoding is already UTF-8 but I'm trying to make these docs as "universal" as possible and I've read that little things like replacing quotation marks and other symbols with their html entities is a good step in that direction.

In situations with multiple encodings this might be the case, but the XML spec requires UTF‐8 support at minimum and defaults to it when the encoding is not declared. All XML parsers can deal with UTF‐8.

In fact, replacing UTF‐8 characters with entities is less portable in XML. This is because the available entities are defined by the doctype, and XML only supports five by default (&lt;, &gt;, &quot;, &amp;, and &apos;). Things like &hellip; are defined in the HTML and XHTML doctypes, but not in other dialects of XML. This used to bite RSS pretty hard (probably still does), because people assumed the entities were available everywhere when they’re really not. EPUB and Mobi are based on XHTML, but other formats might use other XML dialects.

Offline

#13 2011-02-27 11:53:40

ctarwater
Member
Registered: 2009-02-05
Posts: 300

Re: HTML Editor - converting "special characters" into entities

Huh, I wasn't aware of that - thanks.  I'm learning more about this every day.  I'ma have to rethink this a bit now.

Offline

Board footer

Powered by FluxBB