|
(Post 25/10/2005)
Historically, the word markup has been used to
describe annotation or other marks within a text intended to instruct a
compositor or typist how a particular passage should be printed or laid
out. Examples include wavy underlining to indicate boldface, special
symbols for passages to be omitted or printed in a particular font and
so forth. As the formatting and printing of texts was automated, the
term was extended to cover all sorts of special markup codes inserted
into electronic texts to govern formatting, printing, or other
processing.
Generalizing from that sense, we define markup, or
(synonymously) encoding, as any means of making explicit an
interpretation of a text. Encoding a text for computer processing is in
principle, like transcribing a manuscript from scriptio continua, a
process of making explicit what is conjectural or implicit, a process of
directing the user as to how the content of the text should be
interpreted.
By markup language we mean a set of markup
conventions used together for encoding texts. A markup language must
specify what markup is allowed, what markup is required, how markup is
to be distinguished from text, and what the markup means.
Historically, markup was used to refer to:
- The process of marking manuscript copy for typesetting with
directions for use of type fonts and sizes, spacing,
indentation, etc. (from the Chicago Manual of Style, the bible
of most publishers.)
- Electronic Markup originally referred to the internal,
sometimes invisible codes in documents which described the
formatting.
- In WYSIWYG systems, the system inserts the codes. In early
WYSIWYG systems such as Wordstar, the markup is visible on the
screen.
Markup can be classified as one of
two types:
- Procedural Markup which is concerned with the appearance of
text - its font, spacing etc.
- Descriptive or Declarative Markup which is concerned with
the structure or function of the tagged item.
Markup Langauges permit you to use
your information for applications beyond traditional publishing. For
example:
- World Wide Web home pages
- Information databases
- Diagnostic/expert systems
- Electronic mail
- Hypermedia and hypertext documents
- Database publishing
- CD-ROM publishing
- Interactive Electronic Technical Manuals (IETMs)
- Electronic review
Various markup languages used are
SGML , HTML , XML and the latest being WML.
SGML
The encoding scheme defined by the TEI Guidelines for
Electronic Text Encoding and Interchange Guidelines is formulated as an
application of a system known as the Standard Generalized Markup
Language (SGML).
SGML is an international standard for the description
of marked-up electronic text. More exactly, SGML is a metalanguage, that
is, a means of formally describing a language, in this case, a markup
language.
There are three characteristics of SGML which
distinguish it from other markup languages: its emphasis on descriptive
rather than procedural markup; its document type concept; and its
independence of any one system for representing the script in which a
text is written.
SGML is the basis of two essential Internet
standards:
- HTML, the language of web pages;
- XML, the new solution for electronic documents and
electronic commerce.
HTML
Short for HyperText Markup Language, the authoring
language used to create documents on the World Wide Web. HTML is similar
to SGML, although it is not a strict subset.
HTML defines the structure and layout of a Web
document by using a variety of tags and attributes. The correct
structure for an HTML document starts with <HTML><HEAD>(enter here what
document is about)</HEAD><BODY> and ends with </BODY></HTML>. All the
inormation you'd like to include in your Web page fits in between the
<BODY> and </BODY> tags.
XHTML 1.0 is the current W3C
Recommendation
W3C produces what are known as "Recommendations" for
HTML. These are specifications, developed by W3C working groups, and
then voted in by Members of the Consortium. A W3C Recommendation
indicates that consensus has been reached among the Consortium Members
that a specification is appropriate for widespread use.
XHTML 1.0 is W3C's recommendation for the latest
version of HTML, following on from earlier work on HTML 4.01, HTML 4.0,
HTML 3.2 and HTML 2.0. With a wealth of features, XHTML 1.0 is a
reformulation of HTML 4.01 in XML, and combines the strength of HTML4
with the power of XML.
Three "flavors" of XHTML:
XHTML 1.0 is specified in three "flavors".
XHTML Transitional
- Most people writing Web pages for the general public to access will
want to use this flavor of HTML 4. The idea is to take advantage of
XHTML features including style sheets but nonetheless to make small
adjustments to your mark-up for the benefit of those viewing your pages
with older browsers which can't understand style sheets. These include
using BODY with bgcolor, text and link attributes.
XHTML Strict - Use
this when you want really clean structural mark-up, free of any tags
associated with layout. Use this together with W3C's Cascading Style
Sheet language (CSS) to get the font, color, and layout effects you
want.
XHTML Frameset -
Use this when you want to use HTML Frames to partition the browser
window into two or more frames.
XHTML markup must conform to the markup standards
defined in a HTML DTD.
When applied to Net devices, XHTML must go through a
modularization process. This enables XHTML pages to be read by many
different platforms.
A device designer, using standard building blocks,
will specify which elements are supported. Content creators will then
target these building blocks--or modules.
Because these modules conform to certain standards,
XHTML's extensibility ensures that layout and presentation stay
true-to-form over any platform.
Dynamic HTML is
the next big thing on the Internet. With Dynamic HTML, you can layer
multiple images on top of one another, precisely control the layout of
your page, add new interactivity and much more!
The next phase of work on HTML will seek to complete
the transition to XML, with continued work on modularization, work on
XML Schemas for XHTML, and registering an Internet Media Type for XHTML
following the guidelines set out by the W3C-IETF liaison group studying
Internet Media types for applications of XML.
XML
XML, or eXtensible Markup Language, is a
recommendation from the World Wide Web Consortium (W3C) issued in early
1998. It is a language designed to deliver structured information over
the web more effectively than current languages used for web publishing,
namely HTML. XML separates the content of a document from its
presentation and provides a common format for transferring data across
the World Wide Web or a company intranet. The result is a technology
that makes data available regardless of the proprietary systems
involved. Innumerable analysts and visionaries predict that XML will
surpass HTML as the "lingua franca" of the Internet.
XML represents a capacity for sharing information
that didn't exist before. Virtually any kind of data can be encapsulated
in XML, moved across networks, processed automatically, and published
dynamically. Ultimately, XML opens a new world of possibilities for
sharing, managing and publishing information on the web.
XML allows you to tag your documents with meaningful
tags such as <productname>, <chapter>, and <title>. You can leverage
these tags in your search technology to allow users to search for words
only in chapter titles, for example, or to search for all product names.
cXML
Commerce XML is a new set of document type
definitions (DTD) for the XML specification. cXML works as a
meta-language that defines necessary information about a product. It
will be used to standardize the exchange of catalog content and to
define request/response processes for secure electronic transactions
over the Internet. The processes includes purchase orders, change
orders, acknowledgments, status updates, ship notifications and payment
transactions.
cXML began as a collaborative effort among 40+
companies looking to reduce the costs of online business. This
standardized methodology will allow participating companies--and others
who implement the cXML framework--to constantly improve and streamline
electronic commerce.
Some queries on SGML, HTML and
XML.....
Aren't XML, SGML, and HTML all
the same thing?
Not quite. SGML is the `mother tongue', used for
describing thousands of different document types in many fields of human
activity, from transcriptions of ancient Irish manuscripts to the
technical documentation for stealth bombers, and from patients' clinical
records to musical notation.
HTML is just one of these document types, the one
most frequently used in the Web. It defines a simple, fixed type of
document with markup designed for a common class of office or technical
report, with headings, paragraphs, lists, illustrations, etc, and some
provision for hypertext and multimedia.
XML is an abbreviated version of SGML, to make it
easier for you to define your own document types, and to make it easier
for programmers to write programs to handle them. It omits the more
complex and less-used parts of SGML in return for the benefits of being
easier to write applications for, easier to understand, and more suited
to delivery and interoperability over the Web. But it is still SGML, and
XML files may still be parsed and validated the same as any other SGML
file
What is the difference between
SGML/XML and C or C++?
C and C++ (and other languages like Fortran, or
Pascal, or Basic, or Java or dozens more) are programming languages with
which you specify calculations, actions, and decisions to be carried
out:
SGML and XML are markup specification languages with
which you can design ways of describing information, usually for
storage, transmission, or processing by a program:
On its own, a file of SGML or XML text (including
HTML) doesn't do anything: you have to run a program to do something
with it.
Why not just carry on extending
HTML?
HTML is already overburdened with dozens of
interesting but incompatible inventions from different manufacturers,
because it provides only one way of describing your information.
XML allows groups of people or organizations to
create their own customized markup applications for exchanging
information in their domain (music, chemistry, electronics,
hill-walking, finance, surfing, petroleum geology, linguistics, cooking,
knitting, stellar cartography, history, engineering, rabbit-keeping,
mathematics, et cætera ad infinitum).
HTML is at the limit of its usefulness as a way of
describing information, and while it will continue to play an important
role for the content it currently represents, many new applications
require a more robust and flexible infrastructure.
Do I have to know HTML or SGML
before I learn XML?
No, but it would be useful because a lot of
terminology and practice is in common between SGML, HTML, and XML.
Be aware that ‘knowing HTML’ is not the same as
‘understanding SGML’.. Although HTML was written as an SGML application,
browsers ignore large parts of the SGML (which is why so many useful
things don't work), so just because something is done a certain way in
HTML in a HTML browser does not mean it's correct.
Why should I use XML instead of
HTML?
Authors and providers can design their own document
types using XML, instead of being stuck with HTML. Document types can be
explicitly tailored to an audience, so the cumbersome fudging that has
to take place with HTML to achieve special effects can become a thing of
the past: authors and designers are free to invent their own markup
elements
Information content can be richer and easier to use,
because the hypertext linking abilities of XML are much greater than
those of HTML.
XML can provide more and better facilities for
browser presentation and performance, using CSS (Cascading Sytle Sheets)
and XSL (Extensible Style Language)
It removes many of the underlying complexities of
SGML in favour of a more flexible model, so writing programs to handle
XML is much easier than doing the same for full SGML.
Information will be more accessible and reusable,
because the more flexible markup of XML can be used by any XML software
instead of being restricted to specific manufacturers as has become the
case with HTML.
Does XML replace HTML?
No. XML itself does not replace HTML: instead, it
provides an alternative which allows you to define your own set of
markup elements. HTML is expected to remain in common use for some time
to come, and Document Type Definitions for HTML are available in XML
versions as well as in original SGML. XML is designed to make the
writing of DTDs much simpler than with full SGML.
WML
Wireless Application Protocol (WAP) is a result of
continuous work to define an industry wide standard for developing
applications over wireless communication networks. WML (Wireless Markup
Language) is a markup language based on XML, and is intended for use in
specifying content and user interface for narrowband devices, including
cellular phones and pagers. WML is designed with the constraints of
small narrowband devices in mind. The official WML specification is
developed and maintained by the WAP Forum, an industry-wide consortium
founded by Nokia, Phone.com, Motorola, and Ericsson.
WML offers software developers an entirely new,
exciting platform on which to deploy their applications. With this new
platform, however, comes a host of tradeoffs and challenges. A new
wrinkle will be added to the design process as things like server
round-trips, bandwidth, and display sizes become issues to contend with.
While it may take several iterations for developers and vendors to get
their product offerings right, there is no doubt that WAP opens the door
to a new era in application development and deployment.
Other Markup Languages being
used for specific applications are
VRML - Virtual Reality Modeling Language
MathML - Mathematical Markup Language
CML - Chemical Markup Language
FpML - Financial Products Markup Language
FML - Forms Markup Language
W3C
Short for World Wide Web Consortium, an international
consortium of companies involved with the Internet and the Web. The W3C
was founded in 1994 by Tim Berners-Lee, the original architect of the
World Wide Web. The organization's purpose is to develop open standards
so that the Web evolves in a single direction rather than being
splintered among competing factions. The W3C is the chief standards body
for HTTP and HTML.
DTD
Short for document type definition, a type of file
associated with SGML and XML documents that defines how the markup tags
should be interpreted by the application presenting the document. The
HTML specification that defines how Web pages should be displayed by Web
browsers is one example of a DTD. XML promises to expand the formatting
capabilities of Web documents by supporting additional DTDs. |