Web Documents Require Structure

We typically think of documents as having content and formatting, but when writing for the web, we have to also think in terms of structure.

If you’re new to writing or designing for the web, there’s a hidden something that you need to learn, a fundamental way of viewing documents that can shift your thinking dramatically. This reading will reveal that hidden element, explain why it is important, and show you how to use it.

Word Processing Documents

In order to explain, let’s first talk about how most people think about print documents or documents created using word processing software like Microsoft Word or Google Docs. Here’s just such a document:

Looking at this, you can see two major aspects of the document: its content and its formatting.

The content, of course, is made up of the actual words on the page. Think: letters, words, punctuation. The content is usually created by typing on a keyboard.

The formatting consists of how those words are styled and laid out on the page. Think: font, size, color, alignment, spacing. The formatting is usually created by highlighting text and choosing options from a toolbar provided by the software.

The formatting options given in the ribbon in Microsoft Word

Now most users, if they want to designate a part of the content as special somehow—say as a title or heading—they will use formatting to show that, as you see in the document pictured above. But there are some weaknesses with this system. For one thing, anyone who’s written a document of any significant length and complexity knows it can be hard to get the formatting on similar elements exactly the same across the entire document. On the first page, you format your first-level heading as centered 16-pt. bold Arial, and then a dozen pages later when another first-level heading appears, you can’t remember exactly what you did. Throw in a few more levels of headings and you’re sunk.°

Second, using formatting alone to show the reader how different parts of the text should be read isn’t accessible to all users. Blind and low-vision users as well as anyone who uses screen reading software to engage with your content are all but locked out of understanding how the visual is meant as a clue to the document’s organization.

What’s missing from this approach to word processing documents is an understanding of the third key element, in addition to content and formatting, that makes up what a document is. That element is celled structure.

What is Structure?

Structure refers to the organizational aspects of a document, the way that different types of text are used, ordered, and combined to create and explain a logical flow of ideas from beginning to end.

If you’ve ever hit the return/enter key to start a new paragraph, then you’ve already worked with structure. That act of breaking a paragraph is not the same as punctuating a sentence. Punctuation is part of the content of a document, an indelible part of making the words make grammatical sense. But breaking a paragraph produces meaning on a higher, more abstract level; it says, “these previous sentences shared a common topic or purpose, and the topic or purpose of this next group of sentences is somehow different.”

Now I know what you’re thinking. You’re thinking, “But hitting return inserts a bit of formatting!” And it’s true, a new paragraph is signaled by a line break.° But it also inserts a bit of data that the computer—which can’t generally read the meaning in your formatting decisions—can understand as an indicator of the document’s structure.

And that brings up another important concern about using only formatting to signal structure. Just as blind and low-vision users can’t access such indicators to understand a document’s structure, so too is a computer unable to parse such indicators reliably. So what if you’ve carefully formatted all your headings and subheadings just so? To the software, those headings are just short paragraphs, no different from the longer paragraphs that make up the rest of your content.

Paragraphs and headings aren’t the only elements that contribute to a document’s structure, though they are by far the most common ones. Other elements include the following:

bulleted and numbered lists
block quotes (which, like their smaller cousins the quotation marks, indicate that material has been borrowed from another source)
images and captions
headers and footers°

Inserting Structure into a Traditional Document

Okay, we agree that structure is useful and that it needs to be added to documents in a way that is distinct from merely formatting text in different ways. So how do you do that?

Well, in Microsoft Word or Google Docs, you use the Styles function. Perhaps you’ve noticed the Styles section on the ribbon in Word and wondered what it was for? Or perhaps you’ve used it—but only as a formatting shortcut, since the default styling on a “Heading 1” seemed appropriate for a bit of text in your work?

the Styles section of the ribbon in Microsoft Word

The ribbon in Microsoft Word has a section devoted to Styles

If you have, you’ve unwittingly encoded your document with structural information, revealing to your computer what used to only be accessible to sighted humans. In fact, doing so enables some pretty convenient bonus features, like an automatically generated table of contents (the software assumes the headings are the parts that should be included in the table) or Outline view (again, using levels of headings to generate the levels of the outline). Even more useful, instead of manually changing the formatting on all the parts of your document if you change your mind about what looks good, you can merely update the formatting assigned to a style, and all text with that style attached will automatically update, preserving your formatting’s consistency across the entire document.

The auto-generated table of contents from my dissertation used level-1 and level-2 headings from the document.

A document that is written with content, formatting, and structure in mind is more convenient, consistent, and accessible—not only to the readers, human and otherwise, but also to the writer.

What Does Any of This Have to Do with Writing for the Web?

In word processing software, assigning a style to a bit of text is what creates the document’s structure. On the web, we “mark up” a text to create that structure. After all, HTML stands for “Hypertext Markup Language.”

A webpage’s structure is marked up using tags. Tags are made of text in <angle brackets> that surround the text they are marking up. If you look at a webpage’s source code, you’ll see dozens of these tags scattered all throughout the text. The web browser that serves up the page uses these tags to understand how it should render the text, that is, how it should format things.

Here's a look at the HTML behind this very part of this very webpage—notice the tags marking up the text to give it its structure?

Let’s look at some simple examples. Let’s say you want to mark up a bit of text on your webpage as a paragraph. There’s a tag for that: <p>. So you put an opening <p> tag at the beginning of your paragraph and a closing tag° at the end:

° Closing tags always add a slash after the opening angle bracket, like this: </p>.

<p>This is a one-sentence paragraph.</p>

The browser reads those tags and knows that this is a structural unit in the document and applies the default formatting for a paragraph when it displays the page to the user:

This is a one-sentence paragraph.

Or let’s say you want to add a level-1 heading to your document to introduce your paragraph. The tag for that is <h1>, so you can implement that like this:

<h1>An Example Heading</h1>

<p>This is a one-sentence paragraph.</p>

Again, the browser now knows more about the structure of your document and applies some default formatting for its display:

An Example Heading

This is a one-sentence paragraph.

Of course, the big innovation with HTML was a new structural element, the hyperlink. Links are those clickable words on websites that propel you to another location on the web. Links are created with an anchor tag, or <a>. In order to make the link send the clicker somewhere new, you designate the destination in the tag using the word “href,” like so:

<a href=”http://groversenglish.com/”>link</a>

Embed this bit of structure into your webpage, and you get this:

link

HTML has a large assortment of tags available to mark up a document and give structure to its content, but explaining those is beyond the scope of this article. (Still, if you're interested, you can see a list of all the HTML tags here.)

Did you notice what I just said? I said

HTML has a large assortment of tags available to mark up a document and give structure to its content.
—S. David Grover (emphasis added)

Structure and content? But what about formatting!? Where does that come in?

To answer that, we need to look at a bit of history.

A Bit of History°

In its early form, HTML allowed one to define the content and structure of a document but left most of the formatting to the web browser a user employed when accessing the page. That meant that if you accessed a webpage using, say, Internet Explorer, it would look different than if you accessed it using, say, Netscape Navigator.° Did you catch how weird that is? The reader—not the writer—had most of the control over how a webpage looked.

Actually, some writer-determined formatting was allowed even in early HTML, and that functionality was quickly expanded to involved quite an array of options, making webpage formatting the rival of what you could do with a word processor. But this created a problem—the same one you face when you manually format your Word docs. If you wanted to change how an element was formatted, you’d have to manually change the tags every time they appeared. For example, let’s say your site formatted links as underlined blue text, but you’ve decided they would look better as green text without underlining. Good luck finding every link on the page and rewriting the markup for each and every one! And that’s for just one page—if you website involves several pages (or dozens, or hundreds!), making one small change in formatting could involve hours (or days, or weeks!) of work. Even in the 90s, when the internet was still pretty basic, this wasn’t a very workable option.

The solution came in the form of CSS, or Cascading Style Sheets, which allow designers to separate the formatting of a webpage from its content and structure. What this means (at least in one possible iteration) is that any webpage you build can actually be split across two files. The HTML file contains the content and the structure of the page, while a separate CSS file indicates how different elements are to be formatted.

So, for example, your HTML page contains, say, dozens of links, or <a> tags. The CSS file contains only one entry instructing the browser how all <a> tags are to be formatted:

a { color: blue; text-decoration: underline; }

If you want to change how all the links on the page are formatted, and keep them all perfectly consistent, all you have to do is change the CSS file entry for <a>, and voila! All the links are updated to the new formatting. It’s exactly like applying Styles to a Word doc.

And here’s the kicker: Every page on your website can be linked to the same CSS file, so that one formatting change can actually apply to not only one webpage but your entire website!°

If you’d like to see a truly inspiring demonstration of the power of separating formatting from content and structure, check out the website CSS Zen Garden, where web designers take the exact same HTML file but apply their own CSS to it, resulting in dramatically different layouts and designs, but all using the same structural base.

These three very different looking webpages are actually the same HTML file with three different CSS files attached. Content- and structure-wise, they are identical. (Click to enlarge)

The Power of Structure on the Web

Do I really need to spell out again why paying attention to the structure of your documents and webpages is a good idea? Okay, I will.

For the Writer°

Thinking about structure while you write forces you to consider big-picture and little-picture organization more explicitly. For example, in the article “Chunking Info for Readability,” I commented: “In writing this article, I revised the headings structure three times to get it right, moved several paragraphs from one section to another, and regrouped and reordered content significantly.” All this because I was paying attention to structure.

For the Reader

A document that is carefully and logically structured is easier to navigate, parse, and remember.
Visual cues like formatting are no longer the only way to make sense of the document, making is more accessible to users of all abilities and means of access (e.g., screen readers, etc.).
In fact, it’s possible to have a personal CSS file that you apply to webpages you visit, overriding how the writer meant things to look and substituting your own preferences. Do you have dyslexia? You can force webpages to display in a more readable font, like Dyslexie. Are you far-sighted? You can set the default text size as large as you want it (in fact, anytime you zoom on a site, you are in effect overriding the writer’s formatting choices).

For the computer

When a webpage’s structure is defined in a machine-readable way, computers can more accurately and consistently perform operations such as
- indexing websites for search engines
- autogenerating outlines and site maps
- creating or maintaining databases and other repositories of information°

Okay, hopefully it's clear by now: writing for the web means writing structured documents. But that doesn't necessarily mean learning how to write HTML.

Separation of Powers

There’s another big reason why structure, and separating structure from formatting, is such a good tool on the modern web, and it involves another separation: between designers and writers.

Writers Write; Designers Design

Nowadays it’s rare for a major site to be designed by the same person or people who produce the content for that site. Web design, which includes developing the architecture for a site as well as its look and functionality, involves a specialized set of skills, ones that generally don’t overlap with the skills needed to be a good writer.

Take The New York Times, for example. The content on this site is produced by journalists and editors, people who specialize in the research and reporting of news. The design of the site is created by, well, designers—graphic artists who understand typography and layout, reading patterns and preferences. People who have an artistic eye and know-how.°

Online New York Times articles all adhere to a strong design aesthetic reminiscent of a print newspaper but appreciative of the digital format.

So if the designers are responsible for the look of a website, but the writers supply most of the content, how is that workflow managed?

Well, one approach would be to have all the writers compose their articles in Word and then send them to the designers, who could encode them into HTML and upload them to the site. But that would mean a lot of extra work for the designers, not to mention introducing the chance of errors—what a writer intended as a heading might be interpreted by the designer as something else.

Another approach would be to teach all the writers HTML and let them encode their own articles—but you can imagine how well that might go. Writers who struggle to learn it would probably produce error-ridden HTML that would have to be heavily edited by the designers (more work), and those who excel at it might take matters of formatting into their own hands, adding some fancy additional markup to change fonts and sizes and colors and otherwise undo all the designers’ careful work.

No, the best approach is to leave formatting to the designers and leave content to the writers—all while enabling the writers to write enough structure into their documents that they work well with the designers' intended formatting.

Here are two popular methods of achieving that.

Method One: Employ an Interface

You've probably used some kind of text editor interface to create content for a webpage in your life. If you've ever posted on Facebook, Instagram, Twitter, or another social media site; if you've ever blogged with Wordpress, Blogger, or Tumblr; if you've ever participated in a discussion board on Blackboard, Canvas, or some other educational software—you've used an interface designed to facilitate the creation of HTML with no knowledge of HTML needed.°

the rich text editor on a Canvas discussion board

Here's the rich text editor that Canvas provides users for its discussion boards. Notice the dropdown menu near the top left that says "paragraph"—that menu also includes options for headers and other structural elements.

The benefit of throwing a text editor at your site's writers is that they are generally familiar and pretty intuitive to use. The downside is that giving writers such an interface doesn't actually teach them to write with structure in mind. They are still just as likely to make their headings out of formatting rather than designate headings the proper, structural way. So, if you're a designer, you have to train them, which takes effort—and the effort never ends because of course new writers will be hired as old ones move on. If you're a writer, every time you arrive at a new place you have to learn a new interface and the methods the designers want you to follow.

The New York Times takes this approach, and in order to make it work well, they (1) built their own bespoke text editor, called Oak, that is finely tuned to create the sort of content the site wants and (2) trained every single member of the newsroom staff (1700 people spread across the globe!) and built tutorials and tips into the interface. It's an awe-inspiring, but never-ending, project.

Method Two: Use a Lightweight Markup Language

The other big approach is to have your writers use a simplified markup language that inherently foregrounds the structuring of a text.

See, HTML is a full-featured markup language—it can do all the things it can do, which is a lot. And because it is full of tags, it is difficult to read in its raw state. But people have come up with lightweight versions of HTML that are easy to learn, easy to type, and easy to read. The most widely used one is called Markdown.

The way it works is that you write in Markdown (you can just use any plain text editor), and then the resulting document can be passed through a program that reformats it into an HTML file that has all the right tags in all the right places, ready to be incorporated into a webpage. Markdown only has syntax for the most common HTML structural elements, including

paragraphs
headings (levels 1–6)
links
images
ordered (numbered) lists
unordered (bulleted) lists
blockquotes
code
horizontal rules

and the only formatting elements it allows are italics and bold.°

Dillinger is a browser-based Markdown editor with side-by-side interface—you write your Markdown on the left and it previews the output on the right. You can export files as Markdown files (.md), HTML files, or PDFs.

The benefits to a system like this are many:

Writers can learn to use a language like Markdown in very little time and it requires no expensive software or proprietary interface.
Learning Markdown teaches writers to think structurally and prevents them from messing around with formatting (keeping the designers happy!).
Once someone knows Markdown, they can apply it to any new site that uses it (and a lot of sites use it).
Learning Markdown is a gateway to learning more about HTML.°

The downside is that Markdown isn't flashy, and it freaks some people out at first. Also, there are several flavors of Markdown with slightly different (or expanded) syntax, so it isn't exactly universal.

Wrapping Up

So there you have it: content, formatting, and structure. The three elements that make up web (and print) documents and that, as a writer, you need to understand in order to effectively create texts that serve your readers' needs and accomplish your purposes.

Now that you're aware, you might find yourself noticing the structure inherent in webpages you read, critiquing another author's failure to structure things well or praising their intuitive approach. Such is the road to developing your personal taste and style. Good luck out there!

David Grover is the cofounder of Grover's English and a professor of English at Park University. He earned his doctorate in Technical Communication and Rhetoric from Texas Tech University in 2017.