Web Writers Learn Your Medium: HTML For Dummies Q and A


One of the comments left in part one, Web Writers Learn Your Medium: HTML For Dummies Will Save Your Life, of the “Know Your Medium” series was:

Okay, okay, I get it! But, I have a question. I use MSWord for spelling and grammar check. I MUST have a spelling and grammar checker. What should I use? Can I just use HTML within MSWord?

The short answer is, “No.” I addressed this in an article entitled Welcome to the Web: Know Your Enemy on 30 May 2008, but it has since been buried in the blog archives. In it I wrote:

Why are word processor so heinous?
Programs installed on your PC have access to the PC’s operating system, font definitions and formatting options. They also use proprietary codes to work their magic. Web servers do not have these nifty attributes. They expect you to provide plain text. When your text contains codes the web form can’t decipher, horrible, nasty things happen.

If you have dodged the bullet until now, chances are you were just keeping it simple when typing in Word. Trust me it will bite you eventually.

This is the perfect time to revisit the topic and expound
Software like Microsoft Word has it’s place. It has many features a writer needs and wants. It does what it does very well. What is does not do well is create web-friendly documents especially if the intended destination of the document is the submission form of a writing site. Word processing programs are great as spell checkers, grammar checkers, and word counters. They are not good at creating web-ready text. Word processing software was originally created to print documents. Microsoft Word does that very well, but it does it within the confines of itself. Once you try to move (copy & paste) the on-screen Word document to a web form, all hell breaks loose.

Some sites try to put in Word converters. Gather.com did it very well. Gather needed to do it because the people contributing there are mostly amateurs. They want to write in Word and post their thoughts immediately. Writers – professionals who are writing for money – should never depend on such a brittle crutch.

My primary argument is you should use the tools (HTML) specifically designed for your trade. The second reason is – software changes. Ask anyone who upgraded to Word 2007 if the web site widget converters kept up. They will tell you no. There was a lag while the web site programmers tried to program a new converter. Their old Word 2003 converters didn’t handle the new software. Some didn’t even bother writing a new translator and I don’t blame them. If I were running a writing site, I’d insist all submission be plain text or HTML. Period. I’d grant no quarter to people who copy & paste directly from MS Word and I’d certainly not waste my time writing some complex translator that would be useless when the new word processing software version was released next year.

In short, there are simply too many word processing software programs out there for a web site to cost-effectively keep up. This is the internet and the internet has its own standards. Standards which, I will add, are much easier to follow than print publisher’s submission standards, but that’s another topic.

Concrete Example
Smart quotes. I hate them. You do too, but you don’t know why. Microsoft Word has this cute feature that creates quotation marks that curl in at the beginning and end of the enclosed quotation. “They look like this.” This slight-of-hand is not done with simple text. It is an illusion of font definitions that only reside inside Microsoft Word. When you copy these creatures and paste them into a web form, they lose their meaning. The web site you submit them to can’t recognize them and you end up with something that looks like «¶ÅTM or some such nonsense.

The Safe Thing To Do
Use Microsoft Word or any software you like to write, grammar & spell check, and word count. Do not copy & paste directly from said software into a web form. I must repeat this as if it were a red-alert klaxon. Do not paste directly from Word. Word’s internal formatting voodoo will not translate especially for fancy stuff like bullets and footnotes. And don’t leave comments like “I never have a problem.” You’ve just been lucky.

The safe thing to do is either save your Word document as text (*.txt) or paste it into a simple text editor like Notepad (Notepad comes with MS Windows and is located in your Accessories folder.). You can then either open the TXT file and copy/paste from Notepad to the web form. This intermediary step reduces the Word voodoo to its basic elements making it safe for the web.

Bonus
That’s the end of the answer, but I wanted to add this for those who like to look under the hood. I typed the simple sentence, “A test text export of a Microsoft Word document.” complete with quotation marks in Microsoft Word 2003. I then chose the File > Web Page Preview menu item. The following is what MS Word hath wrought. Note that 99% of this output is Microsoft trying to maintain its fancy, internal formatting on the web page by using XML and style sheet (CSS) definitions. These things would explode the web form if you tried to submit this to Associated Content or any other site. Here’s what MS Word looks like when the emperor has no clothes:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 11">
<meta name=Originator content="Microsoft Word 11">
<link rel=File-List href="681D1F4B_files/filelist.xml">
<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
  <o:Author>barefoot</o:Author>
  <o:Revision>1</o:Revision>
  <o:TotalTime>12</o:TotalTime>
  <o:Created>2009-01-19T02:14:00Z</o:Created>
  <o:Pages>1</o:Pages>
  <o:Words>7</o:Words>
  <o:Characters>46</o:Characters>
  <o:Company>barefoot</o:Company>
  <o:Lines>1</o:Lines>
  <o:Paragraphs>1</o:Paragraphs>
  <o:CharactersWithSpaces>52</o:CharactersWithSpaces>
  <o:Version>11.9999</o:Version>
 </o:DocumentProperties>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:SpellingState>Clean</w:SpellingState>
  <w:GrammarState>Clean</w:GrammarState>
  <w:PunctuationKerning/>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:Compatibility>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
   <w:DontGrowAutofit/>
  </w:Compatibility>
  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
 </w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" LatentStyleCount="156">
 </w:LatentStyles>
</xml><![endif]-->
<style>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0in;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-fareast-font-family:"Times New Roman";}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;
	mso-header-margin:.5in;
	mso-footer-margin:.5in;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
-->
</style>
<!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
	{mso-style-name:"Table Normal";
	mso-tstyle-rowband-size:0;
	mso-tstyle-colband-size:0;
	mso-style-noshow:yes;
	mso-style-parent:"";
	mso-padding-alt:0in 5.4pt 0in 5.4pt;
	mso-para-margin:0in;
	mso-para-margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:10.0pt;
	font-family:"Times New Roman";
	mso-ansi-language:#0400;
	mso-fareast-language:#0400;
	mso-bidi-language:#0400;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext="edit" spidmax="2050"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1"/>
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=EN-US style='tab-interval:.5in'>

<div class=Section1>

<p class=MsoNormal>“A test text export of a Microsoft Word document.”<span
style='mso-spacerun:yes'>  </span></p>

</div>

</body>

</html>
Advertisements

3 Comments

Add yours →

  1. Thank you so much! I understand even more about what I should be doing now. Probably this is the reason (or one of them) that my work has not brought the satisfactory results I am wanting. I will do as you recommend.

    Your time is appreciated.
    Sadie

  2. Awww man…I’ve done everything in word and upgraded to Office 2007 this past summer. Now I have to figure out how to that .txt thing. No, I’ve never had a problem, yet. But I trust you and so I will now go learn something new. I did notice that 2007 saves files under .docx. So I try to save it under .doc so things don’t get FUBARed.

  3. Thanks, Barefoot. I have heard about word’s funkiness and have been trying the txt thing – but now you have helped me understand it a bit more.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: