XML Interview Questions with Answers Page IV
From freshersonline.com
1. I'm trying to understand the XML Spec: why does it have such difficult terminology?
For implementation to succeed, the terminology needs to be precise. Design goal eight of the specification tells us that ‘the design of XML shall be
formal and concise’. To describe XML, the specification therefore uses formal language drawn from several fields, specifically those of text engineering,
international standards and computer science. This is often confusing to people who are unused to these disciplines because they use well-known English
words in a specialised sense which can be very different from their common meanings—for example: grammar, production, token, or terminal.
The specification does not explain these terms because of the other part of the design goal: the specification should be concise. It doesn't repeat
explanations that are available elsewhere: it is assumed you know this and either know the definitions or are capable of finding them. In essence this
means that to grok the fullness of the spec, you do need a knowledge of some SGML and computer science, and have some exposure to the language of formal
standards.
Sloppy terminology in specifications causes misunderstandings and makes it hard to implement consistently, so formal standards have to be phrased in
formal terminology. This FAQ is not a formal document, and the astute reader will already have noticed it refers to ‘element names’ where ‘element type
names’ is more correct; but the former is more widely understood.
2. Can I still use server-side inclusions?
Yes, so long as what they generate ends up as part of an XML-conformant file (ie either valid or just well-formed).
Server-side tag-replacers like shtml, PHP, JSP, ASP, Zope, etc store almost-valid files using comments, Processing Instructions, or non-XML markup, which
gets replaced at the point of service by text or XML markup (it is unclear why some of these systems use non-HTML/XML markup). There are also some
XML-based preprocessors for formats like XVRL (eXtensible Value Resolution Language) which resolve specialised references to external data and output a
normalized XML file.
3. Can I (and my authors) still use client-side inclusions?
The same rule applies as for server-side inclusions, so you need to ensure that any embedded code which gets passed to a third-party engine (eg calls to
SQL, VB, Java, etc) does not contain any characters which might be misinterpreted as XML markup (ie no angle brackets or ampersands). Either use a CDATA
marked section to avoid your XML application parsing the embedded code, or use the standard <, and & character entity references instead.
4. How can I include a conditional statement in my XML?
You can't: XML isn't a programming language, so you can't say things like
If you need to make an element optional, based on some internal or external criteria, you can do so in a Schema. DTDs have no internal referential
mechanism, so it isn't possible to express this kind of conditionality in a DTD at the individual element level.
It is possible to express presence-or-absence conditionality in a DTD for the whole document, by using parameter entities as switches to include or
ignore certain sections of the DTD based on settings either hardwired in the DTD or supplied in the internal subset. Both the TEI and Docbook DTDs use
this mechanism to implement modularity.
Alternatively you can make the element entirely optional in the DTD or Schema, and provide code in your processing software that checks for its presence
or absence. This defers the checking until the processing stage: one of the reasons for Schemas is to provide this kind of checking at the time of
document creation or editing.
5. I have to do an overview of XML for my manager/client/investor/advisor. What should I mention?
- XML is not a markup language. XML is a ‘metalanguage’, that is, it's a language that lets you define your own markup languages (see definition).
- XML is a markup language [two (seemingly) contradictory statements one after another is an attention-getting device that I'm fond of], not a
programming language. XML is data: is does not ‘do’ anything, it has things done to it.
- XML is non-proprietary: your data cannot be held hostage by someone else.
- XML allows multi-purposing of your data.
- Well-designed XML applications most often separate ‘content’ from ‘presentation’. You should describe what something is rather what something looks
like (the exception being data content which never gets presented to humans).
Saying ‘the data is in XML’ is a relatively useless statement, similar to saying ‘the book is in a natural language’. To be useful, the former needs to
specify ‘we have used XML to define our own markup language’ (and say what it is), similar to specifying ‘the book is in French’.
A classic example of multipurposing and separation that I often use is a pharmaceutical company. They have a large base of data on a particular drug that
they need to publish as:
- reports to the FDA;
- drug information for publishers of drug directories/catalogs;
- ‘prescribe me!’ brochures to send to doctors;
- little pieces of paper to tuck into the boxes;
- labels on the bottles;
- two pages of fine print to follow their ad in Reader's Digest;
- instructions to the patient that the local pharmacist prints out;
- etc.
Without separation of content and presentation, they need to maintain essentially identical information in 20 places. If they miss a place, people die,
lawyers get rich, and the drug company gets poor. With XML (or SGML), they maintain one set of carefully validated information, and write 20 programs to
extract and format it for each application. The same 20 programs can now be applied to all the hundreds of drugs that they sell.
In the Web development area, the biggest thing that XML offers is fixing what is wrong with HTML:
- browsers allow non-compliant HTML to be presented;
- HTML is restricted to a single set of markup (‘tagset’).
If you let broken HTML work (be presented), then there is no motivation to fix it. Web pages are therefore tag soup that are useless for further
processing. XML specifies that processing must not continue if the XML is non-compliant, so you keep working at it until it complies. This is more work
up front, but the result is not a dead-end.
If you wanted to mark up the names of things: people, places, companies, etc in HTML, you don't have many choices that allow you to distinguish among
them. XML allows you to name things as what they are:
<person>Charles Goldfarb</person> worked
at <company>IBM</company>
gives you a flexibility that you don't have with HTML:
<B>Charles Goldfarb</B> worked at<B>IBM<</B>
With XML you don't have to shoe-horn your data into markup that restricts your options.
6. What is the purpose of XML namespaces?
XML namespaces are designed to provide universally unique names for elements and attributes. This allows people to do a number of things, such as:
- Combine fragments from different documents without any naming conflicts. (See example below.)
- Write reusable code modules that can be invoked for specific elements and attributes. Universally unique names guarantee that such modules are invoked
only for the correct elements and attributes.
- Define elements and attributes that can be reused in other schemas or instance documents without fear of name collisions. For example, you might use
XHTML elements in a parts catalog to provide part descriptions. Or you might use the nil attribute defined in XML Schemas to indicate a missing value.
As an example of how XML namespaces are used to resolve naming conflicts in XML documents that contain element types and attributes from multiple XML
languages, consider the following two XML documents:
<?xml version="1.0" ?>
<Address>
<Street>Apple 7</Street>
<City>Color</City>
<State>State</State>
<Country>Country</Country>
<PostalCode>H98d69</PostalCode>
</Address>
and:
<?xml version="1.0" ?>
<Server>
<Name>OurWebServer</Name>
<Address>888.90.67.8</Address>
</Server>
Each document uses a different XML language and each language defines an Address element type. Each of these Address element types is different -- that
is, each has a different content model, a different meaning, and is interpreted by an application in a different way. This is not a problem as long as
these element types exist only in separate documents. But what if they are combined in the same document, such as a list of departments, their addresses,
and their Web servers? How does an application know which Address element type it is processing?
One solution is to simply rename one of the Address element types -- for example, we could rename the second element type IPAddress. However, this is not
a useful long term solution. One of the hopes of XML is that people will standardize XML languages for various subject areas and write modular code to
process those languages. By reusing existing languages and code, people can quickly define new languages and write applications that process them. If we
rename the second Address element type to IPAddress, we will break any code that expects the old name.
A better answer is to assign each language (including its Address element type) to a different namespace. This allows us to continue using the Address
name in each language, but to distinguish between the two different element types. The mechanism by which we do this is XML namespaces.
(Note that by assigning each Address name to an XML namespace, we actually change the name to a two-part name consisting of the name of the XML namespace
plus the name Address. This means that any code that recognizes just the name Address will need to be changed to recognize the new two-part name.
However, this only needs to be done once, as the two-part name is universally unique.
7. What is an XML namespace?
An XML namespace is a collection of element type and attribute names. The collection itself is unimportant -- in fact, a reasonable argument can be made
that XML namespaces don't actually exist as physical or conceptual entities . What is important is the name of the XML namespace, which is a URI. This
allows XML namespaces to provide a two-part naming system for element types and attributes. The first part of the name is the URI used to identify the
XML namespace -- the namespace name. The second part is the element type or attribute name itself -- the local part, also known as the local name.
Together, they form the universal name.
This two-part naming system is the only thing defined by the XML namespaces recommendation.
8. Does the XML namespaces recommendation define anything except a two-part naming system for element types and attributes?
No.
This is a very important point and a source of much confusion, so we will repeat it:
THE XML NAMESPACES RECOMMENDATION DOES NOT DEFINE ANYTHING EXCEPT A TWO-PART NAMING SYSTEM FOR ELEMENT TYPES AND ATTRIBUTES.
In particular, they do not provide or define any of the following:
- A way to merge two documents that use different DTDs.
- A way to associate XML namespaces and schema information.
- A way to validate documents that use XML namespaces.
- A way to associate element type or attribute declarations in a DTD with an XML namespace.
9. What do XML namespaces actually contain?
XML namespaces are collections of names, nothing more. That is, they contain the names of element types and attributes, not the elements or attributes
themselves. For example, consider the following document.
<google:A xmlns:google="http://www.google.org/">
<B google:C="google" D="bar"/>
</google:A>
The element type name A and the attribute name C are in the http://www.google.org/ namespace because they are mapped there by the google prefix. The
element type name B and the attribute name D are not in any XML namespace because no prefix maps them there. On the other hand, the elements A and B and
the attributes C and D are not in any XML namespace, even though they are physically within the scope of the http://www.google.org/ namespace
declaration. This is because XML namespaces contain names, not elements or attributes.
XML namespaces also do not contain the definitions of the element types or attributes. This is an important difference, as many people are tempted to
think of an XML namespace as a schema, which it is not.
10. Are the names of all element types and attributes in some XML namespace?
No.
If an element type or attribute name is not specifically declared to be in an XML namespace -- that is, it is unprefixed and (in the case of element type
names) there is no default XML namespace -- then that name is not in any XML namespace. If you want, you can think of it as having a null URI as its
name, although no "null" XML namespace actually exists. For example, in the following, the element type name B and the attribute names C and E are not in
any XML namespace:
<google:A xmlns:google="http://www.google.org/"> <B C="bar"/> <google:D E="bar"/> </google:A>
11. Do XML namespaces apply to entity names, notation names, or processing instruction targets?
No.
XML namespaces apply only to element type and attribute names. Furthermore, in an XML document that conforms to the XML namespaces recommendation, entity
names, notation names, and processing instruction targets must not contain colons.
12. Who can create an XML namespace?
Anybody can create an XML namespace -- all you need to do is assign a URI as its name and decide what element type and attribute names are in it. The URI
must be under your control and should not be being used to identify a different XML namespace, such as by a coworker.
(In practice, most people that create XML namespaces also describe the element types and attributes whose names are in it -- their content models and
types, their semantics, and so on. However, this is not part of the process of creating an XML namespace, nor does the XML namespace include or provide a
way to discover such information.)
13. Do I need to use XML namespaces?
Maybe, maybe not.
If you don't have any naming conflicts in the XML documents you are using today, as is often the case with documents used inside a single organization,
then you probably don't need to use XML namespaces. However, if you do have conflicts today, or if you expect conflicts in the future due to distributing
your documents outside your organization or bringing outside documents into your organization, then you should probably use XML namespaces.
Regardless of whether you use XML namespaces in your own documents, it is likely that you will use them in conjunction with some other XML technology,
such as XSL, XHTML, or XML Schemas. For example, the following XSLT (XSL Transformations) stylesheet uses XML namespaces to distinguish between element
types defined in XSLT and those defined elsewhere:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="Address"> <!-- The Addresses element type is not part of the XSLT namespace. --> <Addresses> <xsl:apply-templates/> </Addresses> </xsl:template> </xsl:stylesheet>
14. What is the relationship between XML namespaces and the XML 1.0 recommendation?
Although the XML 1.0 recommendation anticipated the need for XML namespaces by noting that element type and attribute names should not include colons, it
did not actually support XML namespaces. Thus, XML namespaces are layered on top of XML 1.0. In particular, any XML document that uses XML namespaces is
a legal XML 1.0 document and can be interpreted as such in the absence of XML namespaces. For example, consider the following document:
<google:A xmlns:google="http://www.google.org/">
<google:B google:C="bar"/>
</google:A>
If this document is processed by a namespace-unaware processor, that processor will see two elements whose names are google:A and google:B. The google:A
element has an attribute named xmlns:google and the google:B element has an attribute named google:C. On the other hand, a namespace-aware processor will
see two elements with universal names {http://www.google.org}A and {http://www.google.org}B. The {http://www.google.org}A does not have any attributes;
instead, it has a namespace declaration that maps the google prefix to the URI http://www.google.org. The {http://www.google.org}B element has an
attribute named {http://www.google.org}C.
Needless to say, this has led to a certain amount of confusion. One area of confusion is the relationship between XML namespaces and validating XML
documents against DTDs. This occurs because the XML namespaces recommendation did not describe how to use XML namespaces with DTDs. Fortunately, a
similar situation does not occur with XML schema languages, as all of these support XML namespaces.
The other main area of confusion is in recommendations and specifications such as DOM and SAX whose first version predates the XML namespaces
recommendation. Although these have since been updated to include XML namespace support, the solutions have not always been pretty due to backwards
compatibility requirements. All recommendations in the XML family now support XML namespaces.
'''15.''' What is the difference between versions 1.0 and 1.1 of the XML namspaces recommendation?
There are only two differences between XML namespaces 1.0 and XML namespaces 1.1:
* Version 1.1 adds a way to undeclare prefixes. For more information, see question 4.7.
* Version 1.1 uses IRIs (Internationalized Resource Identifiers) instead of URIs. Basically, URIs are restricted to a subset of ASCII characters, while
IRIs allow much broader use of Unicode characters. For complete details, see section 9 of Namespaces in XML 1.1.
NOTE: As of this writing (February, 2003), Namespaces in XML 1.1 is still a candidate recommendation and not widely used. PART II: DECLARING AND USING
XML NAMESPACES
