E-mail has traditionally been a plain-text medium, ever since it was
introduced on the ARPAnet in the 1970s (and possibly even earlier
than that on individual time-sharing mainframe computers). However,
some people wanted a way to use fancier formatting in their messages.
Various proprietary formats were tried, but HTML ended up as the
"standard" manner of doing this. How does HTML e-mail work, and is it
a good idea or a bad one? This article discusses the hows, whys, and
why-nots.
HTML E-mail: The Basics
HTML (HyperText Markup Language) is, of course, the format used for
Web pages. It was invented in 1990 by Tim Berners-Lee, the creator of
the Web. E-mail had existed for over a decade before that, so
obviously HTML e-mail is a latecomer. However, it became possible to
use HTML for the main body of an e-mail message once MIME headers
were introduced. These headers (which I discuss more elsewhere) are
able to specify what data format is being used, so the receiving
program knows whether the message is in plain text or HTML. Starting
in the late 1990s, mail programs began to support the sending and
receiving of HTML messages, using the MIME type text/html.
Obviously, when the first HTML-format mail started going out, it
faced a problem in that most users at the time were still using
programs that didn't support the reading of HTML. This was resolved
by using a multipart message, with both plain-text and HTML versions.
The content type header of the message as a whole is
multipart/alternative, indicating that it is composed of several
parts, of which the reader should choose only one to display. That
way, a mail program that understands MIME and HTML can display the
HTML version; a program that understands MIME but not HTML can
display the plain text version; and a program that doesn't understand
MIME will display the raw code, so the plain text version is placed
first so that it can be read normally (with a bunch of messy code
beneath it). Some mail programs even give the viewer a choice to
display the "plain" or "fancy" versions of messages as a
configuration setting.
Mail programs that support the sending of HTML e-mail generally have
a configuration setting to determine whether to send outbound
messages in plain text or HTML form. Many default to HTML form (as
discussed and criticized below) but can be configured to send only
plain text at the user's option. A few, however, send in HTML form
and are difficult or impossible to configure any other way. Some
smarter programs decide which format to use based on what is needed
for the current message; their message editor has the ability to add
special formatting (bold, italics, headers, etc.) and hyperlinks, and
uses HTML format if any of these features are used, but plain text if
they are not.
What is HTML e-mail good for?
Some will reply "Absolutely nothing!" The next section below gives
some reasons for this position. However, HTML e-mail wouldn't have
become as popular as it now is if it had no advantages at all. It can
have a useful purpose. A long message with a complex structure can be
more readable and understandable if there are headers, emphasized
passages, italicized citations, bulleted lists, and other structural
elements made possible by HTML. True hyperlinks in HTML messages may
work better than inserted URLs in a plain text message (which might
get broken in the middle if they are too long to fit on a line).
Charts, graphs, and illustrations added via inline images may be an
essential part of the information content of an article or report.
Even the more exotic things one can do in HTML, such as the embedding
of sound or video data, can have their uses; an electronic greeting
card just wouldn't be the same in plain text.
Then why do so many people hate it?
Unfortunately, for every message that uses HTML effectively, there
are hundreds that use it in a useless or counterproductive way. While
the use of well-structured, valid HTML can enhance the readability
and understandability of a message, few e-mail writers have any
interest in taking the time to do this; e-mail is generally a medium
of quick comments tossed off without much effort. Usually, the writer
will just type in some text and hit "Send", without any attempt at
special formatting or structural elements such as headers. If the
writer's program defaults to sending mail in HTML form, the resulting
message will just consist of plain text with some pointless HTML tags
wrapped around it. Often, such messages will actually be less
readable than normal non-HTML text; the reader's mail program will be
configured to display plain text in a sensible font, while HTML e-
mail contains font tags that try to force the display into a font
face, size, or color that is harder to make out. For this reason,
many people who use mail programs that give them the option to see
the plain or fancy versions of a multipart/alternative message opt to
see the plain version.
There is one category of e-mail senders that actually does take the
time and effort to craft carefully an HTML message that takes
advantage of the strengths of this medium -- but it's likely you
don't want to see the results of their work. These are the
advertisers and marketers who clog your inbox with spam promoting the
junk they're selling. Just like TV commercials are among the most
slickly and expensively produced things on the air, and junk paper
mail is much slicker and more colorful than ordinary personal
letters, junk e-mail makes much more use of any fancy formatting that
it's possible to wring out of today's mail reader programs than any
other sort of e-mail. This, in fact, is probably a major factor
that's driving the development of "enhanced" e-mail, and the reason
vendors like Microsoft turn HTML e-mail on by default; the better for
advertisers to make their pitches more intrusive and annoying. After
all, MS and their big-business friends have their own marketing mail
they want to send you if they can con you into "opting-in". That
other marketers with fewer scruples follow by deluging everybody
(whether opted in, out, or none-of-the-above) with tons of HTML-
formatted pitches for herbal remedies, porn, gambling, and hot
investments isn't their problem.
And, also, a multipart text-and-HTML message is likely to be at least
three times the size of the same message as plain text; after all, it
includes the plain text version, plus an HTML version that repeats
all the same text plus a whole mess of code like this:
[see article on web for example code]
Hence, HTML messages are wasteful of bandwidth and disk space. If
they used clean, logical, valid HTML, they'd be nowhere near as
wasteful, but in practice many mail programs generate incredibly
messy and standards-noncompliant code. And in some cases, if you turn
on HTML mail, even the alternative plain text version that
accompanies it is malformatted; several programs screw up the line
length of messages when HTML is enabled.
But on the other hand...
...there are online newsletters that go out to willing subscribers
(in some cases they even pay to subscribe!), some of which use
carefully-crafted HTML to present useful things like headers,
emphasis, and illustrations. Just like paper mail, where you might
subscribe to some magazines which come out on slick paper with fancy
layouts, like junk mail, but you want to receive them. So HTML e-mail
isn't always evil. Still, if you're publishing an e-mail newsletter,
you should give your recipients a choice of whether to get it in text
or HTML form; some may prefer plain text or have a mail program that
doesn't deal well with HTML. And for your normal non-newsletter
correspondence, stick to plain text (configuring your mail program
away from defaulting to HTML if you're using a program that does
this) unless you actually use the enhancements of HTML for something
that helps your message (putting the whole thing in a cutesy script-
style font or with a background image that looks like notepaper
probably doesn't qualify).
Including Images
There are two ways to include images in HTML e-mail. One way is to
include the images as file attachments associated with the HTML
message (to give some more technical detail, this calls for the
message to have content type multipart/related, with the first sub-
part within it being multipart/alternative (containing the nested
multipart combination of the plain text and HTML versions of the
message) and subsequent parts being the appropriate MIME type for the
images, like image/jpeg, etc. Each image has a Content-ID header
giving a unique content ID string for referring to it (I describe
these more in the page on MIME headers), so that the HTML can then
refer to them in IMG tags using cid: URLs (as described in RFC 2111).
Whew... quite a bit of technical stuff, but fortunately you don't
generally have to know it unless you're creating a program or script
to generate this sort of mail (something I've actually done
myself)... as an end user, you probably just have to drag the image
into the message you're composing and the program does it all for
you... hopefully correctly (though you never know, especially when
it's a program from Microsoft).
The other way, sometimes termed "Lazy HTML", is not to attach the
images to the message, but instead include references to images on
the Web with normal http: URLs within IMG tags. There are a number of
advantages and disadvantages to each of the methods:
- When the images are referenced on the Web, they don't take up
bandwidth when the user is downloading the message and disk space in
the user's mailbox.
- They do, however, take up bandwidth every time the user reads the
message, when the image needs to be downloaded from the Web.
- If the same image is referenced in a number of e-mail messages
using the same URL, however, the user's program will probably cache
it and it won't have to be downloaded and stored repeatedly;
attaching the image would take up space and time in every message in
which it appears.
- If the user is offline while reading mail (as often happens when a
dialup connection is used; the user downloads the mail then
disconnects to save connection charges and avoid tying up the phone
line), the images won't be displayed if they're on the Web rather
than attached.
- On the other hand, if a mail program only displays plain text
messages, but can send HTML e-mail to a separate Web browser to be
displayed, then attached images probably won't work there, but images
from the Web will display correctly.
- A sender without access to a Web server to post files has no way to
send images by the web, but can still attach images directly.
- Spammers sometimes embed specially-named images called "Web Bugs",
whose names encode the specific recipient of the message; when these
images are requested from the Web, this sends a signal that the
message was read so that they know your address is a "live prospect"
who can be spammed further. Because of this, some mail programs don't
display remote images from the Web by default; the user has to
specifically tell the program to show images in a particular message,
or they'll show up blank.
- Senders of bulk messages (which includes newsletters going to
willing subscribers; not all bulk mailers are spammers!) can
generally get out their mailings more quickly and efficiently with
images on the Web rather than attached; the messages are then smaller
in size and transmit faster, while the server load to send images
from the Web server is spread out over hours or days as the messages
are read.
- Still, if it's a very large bulk mailing, load on the Web server to
serve the images may be heavy; you'd better have a server that's up
to this load (much of which will come all at once within moments of
sending the message, as the more attentive readers open it
immediately).
- Users might keep messages archived in their mail program's folders
for a long time (I've got some archived messages from years ago), but
images on Web servers might go away eventually. If they do, the old
archived messages will no longer display correctly.
As you can see, there are arguments to be made for both approaches,
but on the whole, attached images usually work better than remote
ones.
Single-Part HTML-Only Messages
There are a few mail programs (Hotmail seems to be the main offender)
that send HTML mail as a single part, not a multipart message with
both text and HTML versions. Their creators probably justified this
on the grounds that hardly any mail program these days doesn't
support HTML, so there's no need to waste space attaching a text
version too. However, doing this is a bad idea for a number of
reasons:
- Believe it or not, there are still some people reading mail in non-
HTML-supporting readers. This includes some grizzled system
administrator types, set in their ways of reading mail in text mode
from a Unix prompt like they've been doing for the last 20 years or
so. You don't want to anger these people... they're the ones who keep
your servers running!
- Some users have mail programs perfectly capable of HTML, but choose
to display the plain text version instead, which they find more
readable without the graphical "fluff".
- Some users even spam-filter HTML-only messages, because the vast
majority of them are spam. Spammers tend to send HTML messages with
no plain text version (they're so in love with their snazzy graphical
ads that they wouldn't dream of trying to duplicate their content in
something as dull as plain text), while most regular HTML e-mail is
multipart. So your single-part HTML message might not be seen by its
recipient. (Unfortunately, there are some spammers that get past
these filters by doing multipart messages where the plain text
version is something rude like "Your mail program doesn't support
HTML, so you can't read this." Like I should get a different mail
program just to read their spam!)
- It'll probably screw up in AOL, too... see the next article.
- If you write on mailing lists, you may find that some of them
reject non-plain-text messages. If your HTML e-mail is multipart, the
list software will probably just strip the HTML portion and use the
plain text one, maybe adding a line like [Non-text portions of this
message have been removed] (which should clue you into thinking that
maybe you ought to switch to sending text only so you don't get this
added every time), but at least your message will get sent. If you
use no-alternative HTML, it will be rejected altogether.
- Even in mailing lists that accept HTML mail, there may be digest or
archive versions that use only the plain text versions of messages.
Your HTML-only message may get removed altogether there, and replaced
with a note like [This message is not in displayable format]. If you
want everybody to be able to read your writing, avoid this!
Thus, you should avoid this format. If your mail program only sends
HTML mail this way, it's all the more reason' to switch to plain
text.
Unfortunately, some mail programs that send multi-part messages with
a plain-text version along with an HTML version do the plain-text one
badly, and you never notice if your own mail program shows you only
the HTML version while viewing messages. Sometimes, the plain-text
message has no clear separation between quoted material and
responses, if this distinction in the HTML version was made through
things like colors and fonts that go away when the HTML tags are
stripped. Other bizarre things sometimes show up in the plain-text
version, like the word "Message" being added awkwardly at the
beginning of the text because that was the TITLE element of the HTML
version and the part of the mail program that creates the text
version stupidly grabs it as part of the text. But, even worse, there
are some messages (usually part of bulk mailings, but this doesn't
mean it's just spam; it happens in legitimate bulk mailings such as
subscribed-to newsletters) that have a completely empty plain-text
version, so that if your mail program is configured to show plain
text in preference to HTML, you see nothing at all. This is apparenly
the result of a program that's set up to include both formats, but
require the sender to set up the contents of each version separately
(not a bad idea for bulk mailings, as it allows the sender to create
well-formatted versions for each instead of having the text version
created automatically, and often badly, from the HTML version), but
the sender failed to supply any plain text, so that part ended up
empty. If you're going to do that, you shouldn't include a plain text
version at all. Some mail programs can cope with the lack of a plain
text version better than an empty one; when you choose to display
plain text in preference to HTML, it still displays HTML if that's
all there is, but displays a plain text version (even an empty one)
instead if present.
Email Rejection: An Amusing Example
As I've noted, some recipients won't accept HTML-formatted e-mail or
other mail with non-text attachments, because it triggers filters
designed to keep out spam or viruses. Among those who bounce non-text
messages are some companies' technical support and customer service
departments, who will send back messages with attachments and tell
you to resend them as plain text. One amusing example of such is
Bonzi, which has a free download that supposedly "enhances" your PC
experience (I don't recommend you install it; it's reputed to be
annoying adware, and maybe "spyware" too; once it gets into your
system, it won't go away, and might also be sending personal info of
yours to its manufacturer). Anyway, their automated response they
send to anybody who e-mails them in non-text form (which I found out
because apparently some virus e-mailed itself to them forging my
address as the "From" line, triggering this response to me even
though I had never e-mailed them in my life), includes this passage:
BonziMAIL messages include attachments and are not accepted by our
mail system. Please open your regular e-mail program and write us a
message using only plain text.
So, apparently, their own e-mail program, that's included as part of
the software you download from them, produces mail of a format that
they, themselves, reject!
Links
Why You Should Use Plaintext Email
HTML Email is Evil
HTML Email -- Still Evil?
HTML Email Isn't Rich
The Dying Art of Plain Text Email
Why you shouldn't make newsgroup postings in HTML
Configuring your e-mail program to use plain text
Email Style and Formats -- a (somewhat outdated) discussion of the
interoperability problems caused by various mail programs'
"enhancements" to mail format
RFC 2111 -- cid: and mid: URLs
Next: What do you call a mail reader that doesn't handle real HTML,
but tries to render a limited subset of HTML tags -- even in plain
text messages? AOL calls it "HTML Lite", but "Half-Assed HTML" is a
better name for it.
---
* Origin: [adminz] tech, security, support (192:168/0.2)
generated by msg2page 0.06 on Jul 21, 2006 at 19:03:58