HyperText Mark-up Language Quick Reference, December 1994
Return to
htmlchek documentation
This is yet another HTML quick reference, containing advice on how to
write correct HTML, as well as practical tips. It is mainly based on version
1.22 of the HTML 2.0 standard (including some HTML3 extensions which are
already implemented by a number of browsers; the extensions are all clearly
labeled as such). Some material is from another quick reference by Tom Fine.
This is not necessarily a good guide for absolute beginners.
General:
The HTML language represents hypertext data, for use as part of the
World-wide Web. HTML is one specific language defined using the general
SGML meta-language. HTTP is a transport
protocol, used to deliver HTML documents (as well as other types of files)
over networks.
<tagname attribute=value attribute="Value"> contained
stuff </closingtag>
A "tag" is everything between the `<' and `>' characters. The tag
name should come directly after the `<' character, with no intervening
whitespace. Tag names and tag attributes are case-insensitive, as are the
values of certain tag attributes as well. If an attribute value contains
whitespace, or any characters other than a-z, A-Z, `.' or
`-' it should be quoted. For this reason, most
URL's should be quoted (the fact that some
implementations may tamper with the alphabetic case of unquoted attribute
values means that it is good style to quote all URL's). Some attributes (such
as COMPACT) do not need a value.
An "element" is made up of the opening tag, its
matching closing tag, and everything that contained between the two (which can
include other tags, and also text which is not part of any tag):
<X>Stuff In Element</X>. The closing tags for some elements are
optional (as noted below), and some tags can not have a corresponding
closing tag (namely, <BR>,
<HR>, <IMG>,
<INPUT>, the non-<TITLE> tags in
<HEAD>...</HEAD>, and the
SGML pseudo-tags <!DOCTYPE> and
<!-- -->).
Details of text formatting in the HTML source (such as the position of
linebreaks) are not preserved when the document is displayed, and extra
whitespace is ignored.
- <!DOCTYPE ...>
- SGML declaration; if it is used, it comes first
in the file. (If you don't know what this is, don't worry about it.)
- <HTML> ... </HTML>
- Encloses the entire document (except <!DOCTYPE>) and
identifies it as HTML. The optional VERSION attribute specifies the
HTML version used.
- <HEAD> ...
</HEAD>
- Encloses the header (<TITLE>, <LINK>,
etc.).
- <BODY> ...
</BODY>
- Encloses the body of the document.
All other tags besides these, and all text which is not part of a tag,
should be contained within a
<HEAD>...</HEAD> or
<BODY>...</BODY> element,
which should be in turn contained within the overall
<HTML>...</HTML>.
- <TITLE> ... </TITLE>
- The title of the document; should not contain any other tags. A
title is OBLIGATORY in
<HEAD>...</HEAD>! There should be no non-tag
text in <HEAD>...</HEAD> except that which is
contained in <TITLE>...</TITLE>.
The title is used to index the document by Web searchers,
so that it should not be something which is cryptic when out of
context, like "Intro". (See also
<H1>...</H1> below.)
- <LINK HREF="URL">
- Specifies general relationships of this document to other resources.
The type of relationship is described by a REL= or REV=
attribute (other attributes are URN=,
TITLE=, and METHODS=). This is not generally implemented
yet, except for <LINK REV="made" HREF="mailto:...">,
used to specify the e-mail address of the author.
- <BASE
HREF="URL">
- Specifies context-independent URL of current file.
- <NEXTID N="...">
- Next anchor name to use (for HTML editors).
- <META CONTENT="...">
- Provides HTTP header info (other attributes are
NAME= and HTTP-EQUIV=).
- <ISINDEX>
- Document is searchable index. (This tag can also appear in
<BODY>...</BODY> element.)
- <Hn> ...
</Hn>
- Section headings; "n" ranges from 1 (highest-level) to 6
(least important); <H4>-<H6> are too small to be
usable in the default configuration of some versions of NCSA Mosaic. Since
the TITLE in the HEAD element is
displayed on the window bar (and should be context-independent), the
<H1>...</H1> element is generally used for the
actual within-document title.
- <BLOCKQUOTE> ... </BLOCKQUOTE>
- Encloses a block of text that is a quote.
- <ADDRESS> ... </ADDRESS>
- Information about the author and the document itself (such as
copyright, sources, last update, acknowledgements, etc.). Shouldn't include
lists or high-level tags (except <P>).
Often displayed as italic.
- <HR>
- Horizontal line (pseudo page-break).
- <PRE> ... </PRE>
- Encloses block of text to be shown verbatim in a fixed-width font
(whitespace is significant). This is the only way to do columns or aligned
tables in HTML 2.0. The WIDTH= attribute gives
a display hint to browsers (the default is WIDTH=80). A
<PRE>...</PRE> element cannot contain any
list or high-level tags except <HR>.
- <P>
- Begin a paragraph (the closing </P> tag is optional).
Cannot contain lists, or any of the above tags. Use
attributes ALIGN=CENTER or ALIGN=RIGHT
to control text position (this is not yet a part of standard HTML 2.0).
These high-level elements all imply both a
preceding and a following paragraph break (except after the optional
</P> tag).
- <UL><LI>... </UL>
- Unordered list.
- <OL><LI>... </OL>
- Ordered list.
- <MENU><LI>... </MENU>
- Menu list (for brief items; not much used -- you can also try
<UL COMPACT> or <OL COMPACT>).
- <DIR><LI>... </DIR>
- Directory list (should be multi-column, but isn't in most
implementations).
- <LI>
- Begins each item in the above lists. An
inline image can be used as a custom bullet (preceding
the list item) using the attribute
BULLET="URL". (This is not yet a part of standard
HTML 2.0.)
- <DL><DT>...
<DD>... </DL>
- Definition list. Can also be useful for writing dialogue (as in a
play). Use <DL COMPACT> for tighter rendering. There does not
actually have to be a <DD> for each <DT> or vice
versa.
- <DT>
- Begins each item title in DL.
- <DD>
- Begins each item definition in DL.
The list item closing tags </LI>, </DT>,
and </DD> are optional.
Lists can be nested (i.e. included in an
<LI>...</LI> item in a
<UL>...</UL> or a
<OL>...</OL> list, or inside a
<DD>...</DD> item in a
<DL>...<DL> list). List items are not supposed
to directly contain <H1>-<H6>
headings, <HR>, or <ADDRESS> (though
<LI> and <DD> elements can contain a
<BLOCKQUOTE> or <FORM> which itself includes
them).
- <A NAME="..."
HREF="URL"> ... </A>
- Creates a link (HREF=) or anchor (NAME=) or both.
(Less commonly used attributes are URN=, REL=,
REV=, TITLE=, and METHODS=.) Non-HTML resource
types referenced in <A HREF="..."> links can be
displayed by external viewers. An anchor element MUST
contain something other than whitespace, or it won't work on many browsers.
It is better if text contained within a link element is
not something relatively meaningless like <A...>Click
Here</A>, but rather something which describes what the link is
pointing to: <A...>Chelsea's cat Socks</A>. (Remember
that not everybody is using a mouse anyway, so the word "Select" is
preferable to "Click".)
Anchors/links CANNOT BE NESTED, directly or indirectly,
so that even code like
<A...>...<X>...<A...>...
</A>...</X>...</A>
is forbidden. (In the upcoming HTML3 language, the
attribute ID="...", which will
be able to be used with most tags, will replace <A
NAME="...">, so that almost any element will be able to be
the target of a link.)
- <IMG SRC="URL"
ALT="...">
- Inserts an image from the URL as part of the
surrounding text flow (if any); GIF 87a (.gif) and X Bitmap
(.xbm) formats are supported. (JPEG's are supported by Netscape.)
- ALIGN=TOP
- ALIGN=MIDDLE
- These attributes control the
placement of short captions alongside an image (but will probably not do what
you expect in the middle of text).
- WIDTH=
- HEIGHT=
- Specify the width and height of the image in pixels.
Greatly speed display of document in Netscape (not yet part of standard HTML
2.0).
- ISMAP
- The image is a clickable imagemap.
Be sure to specify meaningful text in the ALT
attribute value (for use in non-graphic environments),
especially if the image is in a link. If the image is
purely decorative, use ALT="" to avoid annoying "[IMAGE]"
clutter in Lynx.
Using too many and too large inline bitmaps can be very inconsiderate,
especially on your home page and other pages that are linked to from outside,
(unless they are publicized as picture galleries). Many people are using
14.4k modems, and it is particularly frustrating when with no advance warning
you have to wait for a lot of big .GIF's to load -- before you're even able to
decide whether or not there is actually anything of interest on the page. In
any case, inline images will often be shown with few colors (only 50 in some
versions of Mosaic), whereas external images will be shown with the maximum
available number of colors -- so it is best to use a small sample (thumbnail)
as a link to the full size image.
- <BR>
- Forces a line break
- <EM> ... </EM>
- Emphasized (often rendered as italic).
- <STRONG> ... </STRONG>
- Strong emphasis (often rendered bold).
- <CITE> ... </CITE>
- Citation of book, article, movie, etc. (often rendered italic).
- <CODE> ... </CODE>
- Piece of computer source code (often rendered in fixed-width font).
- <KBD> ... </KBD>
- Example of keyboard entry (user input).
- <SAMP> ... </SAMP>
- Literal characters (e.g. computer output).
- <VAR> ... </VAR>
- Name of variable (often rendered as italic).
- <DFN> ... </DFN>
- Word to be introduced/defined (not yet part of standard HTML 2.0).
- <B> ... </B>
- Bold font.
- <I> ... </I>
- Italic font.
- <TT> ... </TT>
- Typewriter (fixed-width) font.
- <U> ... </U>
- Underlined (not yet part of standard HTML 2.0; can also create
confusion with links, which are rendered as underlined
on many browsers).
It is preferable to use logical styles rather than
hard-wired fonts (bold, italic, etc. may not be available
in non-graphical environments, anyway). Styles and
fonts are NOT guaranteed to be rendered
cumulatively (i.e. <B><I>Text</I></B> may
look the same as plain <I>Text</I>, and the italic text
in <H1>RomanText <I>ItalicText</I></H1> may
not be the appropriate size for a H1
heading).
The logical style, font, and
link/anchor elements generally can contain only each
other (and <IMG> and
<BR>), and not lists and
high-level tags. The headings
<H1>-<H6>,
<DT>...</DT> in a
<DL>...</DL> list, and
<LI>...</LI> in MENU or DIR
can also contain only these tags. It is best not to have
whitespace after an opening tag of a style,
font, or anchor element, or before
a closing tag (i.e. <B>Text</B> is preferable to
<B> Text </B>); such whitespace produces
displeasing visual results on some browsers.
- & or &
- &
- < or <
- <
- > or >
- >
These three characters should be escaped with the above ampersand
entities everywhere in a document where they are not intended to be used with
their HTML meanings. Other entities (such as "é" etc.)
are available to encode the alphabetic characters in positions 192-255 of the
ISO 8859-1 Latin 1 character set for European languages. Numeric entities can
be used for characters in the range 160-191 with some hope of success (such as
© for the copyright symbol, since not all browsers
understand ©). Not all browsers understand
or even treat   as a space -- a safe
alternative is   (but this will not act as non-breaking on
most browsers). The range 127-159 is undefined in ISO 8859-1, and should not
be used. A double-quote character must be escaped as " or
" inside an attribute value.
Characters in URL's are best escaped with %-hex-digits
(e.g. %26 for "&").
- <!-- comments go here -- >
- The stuff in such a tag is ignored. The final "--" marks
the end of the comment. Theoretically, a comment can include other HTML tags,
but you're much wiser NOT doing this, since many
implementations don't support it. Some implementations restrict comments to a
single line.
- General form:
- protocol://host:port/path#anchor
Where protocol is one of http,
gopher, ftp, file, telnet, wais,
news, mailto, etc. The "#anchor" is optional, and ":port"
defaults to 80 if left out.
A fully absolute URL contains a protocol prefix, and a full hostname for
external DNS resolution.
- Absolute URL:
- http://myhost.edu/~myself/subdir/file.html#anchor
A URL can be relative in several ways:
- Server-relative URL:
- /~myself/subdir/file.html#anchor
- Document-relative URL:
- subdir/file.html#anchor
(Uses the protocol, host, and port of the current document.)
- Document-internal URL:
- #anchor
URL's which are document-relative, but specify something outside the
current directory (i.e. http URL's which contain
a `/' character, but do not start with a `/' character, after the optional
protocol prefix) can sometimes confuse browsers (especially relative URL's
that start with "../" -- in general, ".." will be
interpreted in terms of the logical Web file system, rather than the physical
file system).
To implement forms (or <ISINDEX> or
<IMG ISMAP>) you need special
HTTP-server stuff outside your HTML file.
- <FORM ...> ... </FORM>
- Encloses the entire form.
- ACTION="URL"
- The URL to use when the form is complete
- METHOD=
- GET or POST
- ENCTYPE=
- Mime type of representation of form data
- <INPUT ...>
- Some type of input field.
- TYPE=
- Types are text, password,
checkbox, radio, submit, reset,
image, hidden
- NAME=
- Name of the field
- VALUE=
- Value of button (label for submit and
reset)
- SRC="URL"
- URL of inline
image (image)
- CHECKED
- This item selected by default (checkbox/radio)
- SIZE=
- Displayed field width, in characters
- MAXLENGTH=
- Maximum field width, in characters
- ALIGN=
- Image alignment (image)
- <SELECT ...><OPTION ...> ...
</SELECT>
- A list of items to select.
- NAME=
- Name of the field
- SIZE=
- Use scrollable list with SIZE # options shown
- MULTIPLE
- Multiple selections allowed
- <OPTION ...>
- Precedes each item in the option list. The closing
</OPTION> tag is optional.
- SELECTED
- This option is selected by default
- VALUE=
- Value
- <TEXTAREA ...> ... </TEXTAREA>
- A multiline text field. The enclosed text is the default value
displayed in the field.
- NAME=
- Name of the field
- ROWS=
- Number of rows in the field
- COLS=
- Number of columns in the field
<FORM>...</FORM> is a
high-level element, and so should not be contained
inside a heading,
<ADDRESS>...</ADDRESS>,
<PRE>...</PRE>, <P>,
<DT>...</DT>,
style, font, or
anchor element (or
<LI>...</LI> in MENU or DIR).
<INPUT>, <SELECT>, and <TEXTAREA>
should only be contained within a form. Forms cannot be nested.
</quickreference>