XML Interview Questions with Answers Page III


From freshersonline.com

Jump to: navigation, search

Interview Question Home


1. Can I encode mathematics using XML?Updated

Yes, if the document type you use provides for math, and your users' browsers are capable of rendering it. The mathematics-using community has developed

the MathML Recommendation at the W3C, which is a native XML application suitable for embedding in other DTDs and Schemas.

It is also possible to make XML fragments from other DTDs, such as ISO 12083 Math, or OpenMath, or one of your own making. Browsers which display math

embedded in SGML existed for many years (eg DynaText, Panorama, Multidoc Pro), and mainstream browsers are now rendering MathML. David Carlisle has

produced a set of stylesheets for rendering MathML in browsers. It is also possible to use XSLT to convert XML math markup to LATEX for print (PDF)

rendering, or to use XSL:FO.

Please note that XML is not itself a programming language, so concepts such as arithmetic and if-statements (if-then-else logic) are not meaningful in XML

documents.


2. How will XML affect my document links?

The linking abilities of XML systems are potentially much more powerful than those of HTML, so you'll be able to do much more with them. Existing

href-style links will remain usable, but the new linking technology is based on the lessons learned in the development of other standards involving

hypertext, such as TEI and HyTime, which let you manage bidirectional and multi-way links, as well as links to a whole element or span of text (within

your own or other documents) rather than to a single point. These features have been available to SGML users for many years, so there is considerable

experience and expertise available in using them. Currently only Mozilla Firefox implements XLink.

The XML Linking Specification (XLink) and the XML Extended Pointer Specification (XPointer) documents contain the details. An XLink can be either a URI

or a TEI-style Extended Pointer (XPointer), or both. A URI on its own is assumed to be a resource; if an XPointer follows it, it is assumed to be a

sub-resource of that URI; an XPointer on its own is assumed to apply to the current document (all exactly as with HTML).

An XLink may use one of #, ?, or |. The # and ? mean the same as in HTML applications; the | means the sub-resource can be found by applying the link to

the resource, but the method of doing this is left to the application. An XPointer can only follow a #.

The TEI Extended Pointer Notation (EPN) is much more powerful than the fragment address on the end of some URIs, as it allows you to specify the

location of a link end using the structure of the document as well as (or in addition to) known, fixed points like IDs. For example, the linked second

occurrence of the word ‘XPointer’ two paragraphs back could be referred to with the URI (shown here with linebreaks and spaces for clarity: in practice

it would of course be all one long string):


http://xml.silmaril.ie/faq.xml#ID(hypertext)

.child(1,#element,'answer')

.child(2,#element,'para')

.child(1,#element,'link')

This means the first link element within the second paragraph within the answer in the element whose ID is hypertext (this question). Count the objects

from the start of this question (which has the ID hypertext) in the XML source:

1. the first child object is the element containing the question ();

2. the second child object is the answer (the element);

3. within this element go to the second paragraph;

4. find the first link element.

Eve Maler explained the relationship of XLink and XPointer as follows:

XLink governs how you insert links into your XML document, where the link might point to anything (eg a GIF file); XPointer governs the fragment

identifier that can go on a URL when you're linking to an XML document, from anywhere (eg from an HTML file).

[Or indeed from an XML file, a URI in a mail message, etc…Ed.]

David Megginson has produced an xpointer function for Emacs/psgml which will deduce an XPointer for any location in an XML document. XML Spy has a

similar function.


3. How does XML handle metadata?

Because XML lets you define your own markup languages, you can make full use of the extended hypertext features of XML (see the question on Links) to

store or link to metadata in any format (eg using ISO 11179, as a Topic Maps Published Subject, with Dublin Core, Warwick Framework, or with Resource

Description Framework (RDF), or even Platform for Internet Content Selection (PICS)).

There are no predefined elements in XML, because it is an architecture, not an application, so it is not part of XML's job to specify how or if authors

should or should not implement metadata. You are therefore free to use any suitable method. Browser makers may also have their own architectural

recommendations or methods to propose.


4. Can I use JavaScript, ActiveX, etc in XML files?

This will depend on what facilities your users' browsers implement. XML is about describing information; scripting languages and languages for embedded

functionality are software which enables the information to be manipulated at the user's end, so these languages do not normally have any place in an XML

file itself, but in stylesheets like XSL and CSS where they can be added to generated HTML.

XML itself provides a way to define the markup needed to implement scripting languages: as a neutral standard it neither encourages not discourages their

use, and does not favour one language over another, so it is possible to use XML markup to store the program code, from where it can be retrieved by (for

example) XSLT and re-expressed in a HTML script element.

Server-side script embedding, like PHP or ASP, can be used with the relevant server to modify the XML code on the fly, as the document is served, just as

they can with HTML. Authors should be aware, however, that embedding server-side scripting may mean the file as stored is not valid XML: it only becomes

valid when processed and served, so care must be taken when using validating editors or other software to handle or manage such files. A better solution

may be to use an XML serving solution like Cocoon, AxKit, or PropelX.


5. Can I use Java to create or manage XML files?

Yes, any programming language can be used to output data from any source in XML format. There is a growing number of front-ends and back-ends for

programming environments and data management environments to automate this. Java is just the most popular one at the moment.

There is a large body of middleware (APIs) written in Java and other languages for managing data either in XML or with XML input or output.


6. How do I execute or run an XML file?

You can't and you don't. XML itself is not a programming language, so XML files don't ‘run’ or ‘execute’. XML is a markup specification language and XML

files are just data: they sit there until you run a program which displays them (like a browser) or does some work with them (like a converter which

writes the data in another format, or a database which reads the data), or modifies them (like an editor).

If you want to view or display an XML file, open it with an XML editor or an question B.3, XML browser.

The water is muddied by XSL (both XSLT and XSL:FO) which use XML syntax to implement a declarative programming language. In these cases it is arguable

that you can ‘execute’ XML code, by running a processing application like Saxon, which compiles the directives specified in XSLT files into Java bytecode

to process XML.


7. How do I control formatting and appearance?

In HTML, default styling was built into the browsers because the tagset of HTML was predefined and hardwired into browsers. In XML, where you can define

your own tagset, browsers cannot possibly be expected to guess or know in advance what names you are going to use and what they will mean, so you need a

stylesheet if you want to display formatted text.

Browsers which read XML will accept and use a CSS stylesheet at a minimum, but you can also use the more powerful XSLT stylesheet language to transform

your XML into HTML—which browsers, of course, already know how to display (and that HTML can still use a CSS stylesheet). This way you get all the

document management benefits of using XML, but you don't have to worry about your readers needing XML smarts in their browsers.


8. How do I use graphics in XML?

Graphics have traditionally just been links which happen to have a picture file at the end rather than another piece of text. They can therefore be

implemented in any way supported by the XLink and XPointer specifications (see question C.18, ‘How will XML affect my document links?’), including using

similar syntax to existing HTML images. They can also be referenced using XML's built-in NOTATION and ENTITY mechanism in a similar way to standard SGML,

as external unparsed entities.

However, the SVG specification (see the tip below, by Peter Murray-Rust) lets you use XML markup to draw vector graphics objects directly in your XML

file. This provides enormous power for the inclusion of portable graphics, especially interactive or animated sequences, and it is now slowly becoming

supported in browsers.

The XML linking specifications for external images give you much better control over the traversal and activation of links, so an author can specify, for

example, whether or not to have an image appear when the page is loaded, or on a click from the user, or in a separate window, without having to resort

to scripting.

XML itself doesn't predicate or restrict graphic file formats: GIF, JPG, TIFF, PNG, CGM, EPS, and SVG at a minimum would seem to make sense; however,

vector formats (EPS, SVG) are normally essential for non-photographic images (diagrams).

You cannot embed a raw binary graphics file (or any other binary [non-text] data) directly into an XML file because any bytes happening to resemble

markup would get misinterpreted: you must refer to it by linking (see below). It is, however, possible to include a text-encoded transformation of a

binary file as a CDATA Marked Section, using something like UUencode with the markup characters ], & and > removed from the map so that they could not

occur as an erroneous CDATA termination sequence and be misinterpreted. You could even use simple hexadecimal encoding as used in PostScript. For vector

graphics, however, the solution is to use SVG (see the tip below, by Peter Murray-Rust).

Sound files are binary objects in the same way that external graphics are, so they can only be referenced externally (using the same techniques as for

graphics). Music files written in MusiXML or an XML variant of SMDL could however be embedded in the same way as for SVG.

The point about using entities to manage your graphics is that you can keep the list of entity declarations separate from the rest of the document, so

you can re-use the names if an image is needed more than once, but only store the physical file specification in a single place. This is available only

when using a DTD, not a Schema.


9. How do I include one XML file in another?

This works exactly the same as for SGML. First you declare the entity you want to include, and then you reference it by name:

<?xml version="1.0"?>

<!DOCTYPE novel SYSTEM "/dtd/novel.dtd" [

<!ENTITY chap1 SYSTEM "mydocs/chapter1.xml">

<!ENTITY chap2 SYSTEM "mydocs/chapter2.xml">

<!ENTITY chap3 SYSTEM "mydocs/chapter3.xml">

<!ENTITY chap4 SYSTEM "mydocs/chapter4.xml">

<!ENTITY chap5 SYSTEM "mydocs/chapter5.xml">

]>

<novel>

<header>

...blah blah...

</header>

&chap1;

&chap2;

&chap3;

&chap4;

&chap5;

</novel>


The difference between this method and the one used for including a DTD fragment (see question D.15, ‘How do I include one DTD (or fragment) in

another?’) is that this uses an external general (file) entity which is referenced in the same way as for a character entity (with an ampersand).

The one thing to make sure of is that the included file must not have an XML or DOCTYPE Declaration on it. If you've been using one for editing the

fragment, remove it before using the file in this way. Yes, this is a pain in the butt, but if you have lots of inclusions like this, write a script to

strip off the declaration (and paste it back on again for editing).


10. What is parsing and how do I do it in XML

Parsing is the act of splitting up information into its component parts (schools used to teach this in language classes until the teaching profession

collectively caught the anti-grammar disease).

‘Mary feeds Spot’ parses as

1. Subject = Mary, proper noun, nominative case

2. Verb = feeds, transitive, third person singular, present tense

3. Object = Spot, proper noun, accusative case

In computing, a parser is a program (or a piece of code or API that you can reference inside your own programs) which analyses files to identify the

component parts. All applications that read input have a parser of some kind, otherwise they'd never be able to figure out what the information means.

Microsoft Word contains a parser which runs when you open a .doc file and checks that it can identify all the hidden codes. Give it a corrupted file and

you'll get an error message.

XML applications are just the same: they contain a parser which reads XML and identifies the function of each the pieces of the document, and it then

makes that information available in memory to the rest of the program.

While reading an XML file, a parser checks the syntax (pointy brackets, matching quotes, etc) for well-formedness, and reports any violations (reportable

errors). The XML Specification lists what these are.

Validation is another stage beyond parsing. As the component parts of the program are identified, a validating parser can compare them with the pattern

laid down by a DTD or a Schema, to check that they conform. In the process, default values and datatypes (if specified) can be added to the in-memory

result of the validation that the validating parser gives to the application.


<person corpid="abc123" birth="1960-02-31" gender="female"> <name> <forename>Judy</forename> <surname>O'Grady</surname> </name> </person>

The example above parses as: 1. Element person identified with Attribute corpid containing abc123 and Attribute birth containing 1960-02-31 and Attribute

gender containing female containing ...

2. Element name containing ...

3. Element forename containing text ‘Judy’ followed by ...

4. Element surname containing text ‘O'Grady’

(and lots of other stuff too).

As well as built-in parsers, there are also stand-alone parser-validators, which read an XML file and tell you if they find an error (like missing

angle-brackets or quotes, or misplaced markup). This is essential for testing files in isolation before doing something else with them, especially if

they have been created by hand without an XML editor, or by an API which may be too deeply embedded elsewhere to allow easy testing.


11. When should I use a CDATA Marked Section?

You should almost never need to use CDATA Sections. The CDATA mechanism was designed to let an author quote fragments of text containing markup

characters (the open-angle-bracket and the ampersand), for example when documenting XML (this FAQ uses CDATA Sections quite a lot, for obvious reasons).

A CDATA Section turns off markup recognition for the duration of the section (it gets turned on again only by the closing sequence of double

end-square-brackets and a close-angle-bracket).

Consequently, nothing in a CDATA section can ever be recognised as anything to do with markup: it's just a string of opaque characters, and if you use an

XML transformation language like XSLT, any markup characters in it will get turned into their character entity equivalent.

If you try, for example, to use:

some text with <![CDATA[markup]]> in it.

in the expectation that the embedded markup would remain untouched, it won't: it will just output

some text with markup in it.

In other words, CDATA Sections cannot preserve the embedded markup as markup. Normally this is exactly what you want because this technique was designed

to let people do things like write documentation about markup. It was not designed to allow the passing of little chunks of (possibly invalid) unparsed

HTML embedded inside your own XML through to a subsequent process—because that would risk invalidating the output.

As a result you cannot expect to keep markup untouched simply because it looked as if it was safely ‘hidden’ inside a CDATA section: it can't be used as

a magic shield to preserve HTML markup for future use as markup, only as characters.


13. How can I handle embedded HTML in my XML

Apart from using CDATA Sections, there are two common occasions when people want to handle embedded HTML inside an XML element:

1. when they have received (possibly poorly-designed) XML from somewhere else which they must find a way to handle;

2. when they have an application which has been explicitly designed to store a string of characters containing < and & character entity references with

the objective of turning them back into markup in a later process (eg FreeMind, Atom).

Generally, you want to avoid this kind of trick, as it usually indicates that the document structure and design has been insufficiently thought out.

However, there are occasions when it becomes unavoidable, so if you really need or want to use embedded HTML markup inside XML, and have it processable

later as markup, there are a couple of techniques you may be able to use:

  • Provide templates for the handling of that markup in your XSLT transformation or whatever software you use which simply replicates what was there, eg

<xsl:template match="b">

<xsl:apply-templates/>

</xsl:template/>

  • Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup verbatim, eg

<xsl:template match="ol">

<xsl:copy-of select="."/>

</xsl:template/>

  • As a last resort, use the disable-output-escaping attribute on the xsl:text element of XSL[T] which is available in some processors, eg

<xsl:text disable-output-escaping="yes"><![CDATA[Now!]]></xsl:text>

  • Some processors (eg JX) are now providing their own equivalents for disabling output escaping. Their proponents claim it is ‘highly desirable’ or ‘what

most people want’, but it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary code from being passed untouched through

your system. It also adds another dependency to your software.

For more details of using these techniques in XSL[T], see the relevant question in the XSL FAQ.


14. What are the special characters in XML

For normal text (not markup), there are no special characters: just make sure your document refers to the correct encoding scheme for the language and/or

writing system you want to use, and that your computer correctly stores the file using that encoding scheme. See the question on non-Latin characters for

a longer explanation.

If your keyboard will not allow you to type the characters you want, or if you want to use characters outside the limits of the encoding scheme you have

chosen, you can use a symbolic notation called ‘entity referencing’. Entity references can either be numeric, using the decimal or hexadecimal Unicode

code point for the character (eg if your keyboard has no Euro symbol (€) you can type €); or they can be character, using an established name which you

declare in your DTD (eg ) and then use as € in your document. If you are using a Schema, you must use the numeric form for all except the five below

because Schemas have no way to make character entity declarations.

If you use XML with no DTD, then these five character entities are assumed to be predeclared, and you can use them without declaring them:

<

The less-than character (<) starts element markup (the first character of a start-tag or an end-tag).

&


The ampersand character (>) starts entity markup (the first character of a character entity reference).

>

The greater-than character (>) ends a start-tag or an end-tag.

"

The double-quote character (") can be symbolised with this character entity reference when you need to embed a double-quote inside a string which is

already double-quoted.

'

The apostrophe or single-quote character (') can be symbolised with this character entity reference when you need to embed a single-quote or apostrophe

inside a string which is already single-quoted.

If you are using a DTD then you must declare all the character entities you need to use (if any), including any of the five above that you plan on using

(they cease to be predeclared if you use a DTD). If you are using a Schema, you must use the numeric form for all except the five above because Schemas

have no way to make character entity declarations.


15. Do I have to change any of my server software to work with XML?

The only changes needed are to make sure your server serves up .xml, .css, .dtd, .xsl, and whatever other file types you will use as the correct MIME

content (media) types.

The details of the settings are specified in RFC3023. Most new versions of Web server software come preset.

If not, all that is needed is to edit the mime-types file (or its equivalent: as a server operator you already know where to do this, right?) and add or

edit the relevant lines for the right media types. In some servers (eg Apache), individual content providers or directory owners may also be able to

change the MIME types for specific file types from within their own directories by using directives in a .htaccess file. The media types required are:

  • text/xml for XML documents which are ‘readable by casual users’;
  • application/xml for XML documents which are ‘unreadable by casual users’;
  • text/xml-external-parsed-entity for external parsed entities such as document fragments (eg separate chapters which make up a book) subject to the

readability distinction of text/xml;

  • application/xml-external-parsed-entity for external parsed entities subject to the readability distinction of application/xml;
  • application/xml-dtd for DTD files and modules, including character entity sets.

The RFC has further suggestions for the use of the +xml media type suffix for identifying ancillary files such as XSLT (application/xslt+xml).

If you run scripts generating XHTML which you wish to be treated as XML rather than HTML, they may need to be modified to produce the relevant Document

Type Declaration as well as the right media type if your application requires them to be validated.

Personal tools