XML and Oracle

by Roger Schrag
Database Specialists, Inc.

About Database Specialists, Inc.
Database Specialists, Inc. provides remote DBA services or onsite database support for your mission critical Oracle systems. Since 1995, we have been providing Oracle database consulting in Solaris, HP-UX, Linux, AIX, and Windows environments. We are DBAs, speakers, educators, and authors. Our team is continually recognized by Oracle, at national conferences and by leading trade publications. Learn more about our remote DBA, database tuning, and consulting services. Or, call us at 415-344-0500 or 888-648-0500.

Introduction

XML, short for “extensible markup language,” is quite a buzzword these days. Everyone is talking about XML-enabling their applications, and the Oracle Technology Network has an entire section devoted to XML. Steve Muench has even published a 700+ page book just on the topic of building XML Oracle applications. So what is all the fuss about?

In this paper we will look at what XML is, why it is so promising, and tools and capabilities Oracle provides in order to leverage XML in your database applications. We will stay at a pretty high level. This paper is meant to be a quick read in order to get you started on understanding what XML is all about and how to go about incorporating it into an Oracle environment. A listing of references at the end of the paper will direct you to more detailed information.

An Overview of XML

XML is a standard for representing structured data in human-readable text form. Any type of data-from a purchase order to a stock quote-can be represented in XML. XML uses a system of open and close tags similar to HTML. However, a key difference from HTML is that you can use any tags you want in XML instead of a fixed set of tags listed in the HTML standard. When you represent your data with XML, you can choose to use a set of tags common to your industry or you can make up your own tags from scratch as you like.

One logical unit of data represented in XML is often called an XML “document.” A purchase order or stock quote marked up in XML could be called an XML document. When the XML data is moved between systems, the packet of data is often called an XML “datagram.”

We say an XML document is “well formed” if it follows all of the strict syntax rules of the XML standard-such as starting with a special XML tag and having every open tag matched by a close tag. Here is a sample XML document for a drink recipe. I made up the tag structure myself. You can easily see how XML’s system of tags allows you to specify a clear structure for your data.




<drink-recipe name="Fuzzy Navel">
  <ingredients>
    <ingredient quantity="1" unit="ounce">
      Vodka
    </ingredient>
    <ingredient quantity="1" unit="ounce">
      Peach schnapps
    </ingredient>
    <ingredient quantity="4" unit="ounce">
      Orange juice
    </ingredient>
  </ingredients>
  <preparation>
    <step>
      Pour ingredients into a highball glass almost filled with ice.
    </step>
    <step>
      Stir.
    </step>
  </preparation>
</drink-recipe>

From this XML document you can see that to fix yourself a fuzzy navel you will need an ounce of vodka, an ounce of peach schnapps, and four ounces of orange juice. To make the drink, you first pour the ingredients into a highball glass almost filled with ice and then you stir.

If somebody handed you a more complicated XML document, it might not be clear to you how to interpret the data contained inside. Because tags in XML are extensible and not fixed, the reader of an XML document needs a roadmap in order to make sure they read the data the way the creator of the document intended. Such a roadmap is called a Document Type Definition or DTD.

A DTD defines the vocabulary of tags and how they may be used for a specific type of XML document. Certain elements of data might be optional, while others might be required. Certain elements may occur only once, while others may occur one or more times. Certain elements may only be allowed when nested inside of other elements of data. When an XML document follows all of the rules specified in its corresponding DTD, we say the XML document is “valid.”

Earlier we looked at a sample XML document that contained a fuzzy navel recipe. That XML contained a document of type “drink-recipe.” The DTD for the drink-recipe document type appears below.

<!ELEMENT drink-recipe (ingredients, preparation)>
<!ATTLIST drink-recipe name CDATA #REQUIRED>
  <!ELEMENT ingredients (ingredient+)>
    <!ELEMENT ingredient (#PCDATA)>
    <!ATTLIST ingredient quantity CDATA #REQUIRED
                         unit     CDATA #REQUIRED>
  <!ELEMENT preparation (step+)>
    <!ELEMENT step (#PCDATA)>

You can see that an XML document of type drink-recipe must have a name attribute, and one ingredients and one preparation section. The ingredients section must have one or more ingredient elements. Each ingredient must have a quantity and unit, and free form text. The preparation section must have one or more step elements. Each step has free form text.

So what can you do with an XML document? If you save an XML document into a file, then you can do anything with that file that you could do with an ordinary text file. For example, you can use a text editor like vi or Notepad to edit the XML document. Or you can use FTP or HTTP to move the document to another server. Or you can store the XML document in a VARCHAR2 or CLOB column in your database. (Oracle 9i features a new xmltype datatype specifically for storing XML documents in tables.)

Of course, an XML document is more than just an ordinary text file and there are additional things you can do only with XML documents. There are XML-aware text editors that can read your DTD and validate your XML document as you edit it. Also, Internet Explorer 5 or later and Netscape 6 or later can display your XML document in a way that demonstrates the structure of the data.

There are some additional, very powerful things that you can do with XML documents. Along with the XML 1.0 specification there are additional specifications and recommendations for manipulating and querying XML documents with a standard application programming interface or API and also specifications for standard APIs for transforming XML documents into other data formats.

Document Object Model, or DOM, is an API for representing an XML document as a tree structure. We call the tree structure an XML document’s “infoset.” Using DOM, you can extract the infoset from an XML document, traverse it, make changes, and write the infoset back to the file or datagram that held the XML document. XPath is an API for querying a document’s infoset. Using DOM and XPath, you can query and update XML documents using a fixed set of APIs that are clearly defined in the XML specification.

XML Stylesheet Language for Transformation, or XSLT, is a process for converting XML documents from one document type to another, or for reformatting XML documents to a non-XML format. An XSLT stylesheet is itself an XML document that describes a transformation using a standardized XSLT vocabulary. XSLT is a powerful tool for transforming an XML document from one DTD to another, or to a totally different format such as HTML or a proprietary flat file format.

The Value of XML

The XML 1.0 specification was the outgrowth of a World Wide Web Consortium recommendation put together in early 1998. The specification covers XML syntax and DTDs. DOM, XPath, and XSLT are defined in separate specifications. All major software vendors, including Oracle, Microsoft, and IBM have embraced this specification. The specification is extremely rigid and unambiguous, and at the same time fully extensible.

The specific yet extensible nature of the XML specification has led different vendors to share a vision in how they XML-enable their applications and product offerings. Thus XML has become a vendor neutral, platform neutral, language neutral technology for information exchange. For example, if you have a Microsoft Windows application that can export data formatted as XML documents and if your Oracle General Ledger can import XML data, then your two very different systems will be able to share information with a minimal amount of programming effort.

HTTP has become a ubiquitous protocol for moving content between disparate systems. Since XML documents are plain text, HTTP can easily be used to transfer structured data in the form of XML. This enables companies to share information without having to know anything about the sender or recipient other than the DTD they use. Industry and trade groups are developing DTDs in an attempt to standardize document types. When the document sender and recipient use different DTDs, XSLT can be used to translate XML documents from one layout to another. XSLTs can even be used to transform XML documents into proprietary non-XML file formats.

The implications of the XML specification when coupled with HTTP are far reaching. Using XML it is now very easy to publish structured data on the web. Just as HTML and free web browsers led to the great world wide web of information at your fingertips, XML and the proliferation of XML-enabled tools and applications are leading to a great repository of information becoming available.

Using XML, you can publish more than just linear content on the web. You can publish complexly structured data. Coupling XML with relational databases like Oracle, you can leverage the proven reliability and scalability of Oracle into new levels of flexibility and interoperability.

Oracle Tools for XML-Enabling Database Applications

Oracle offers an XML Developer Kit, or XDK, that provides a lot of XML functionality in the Oracle 8i and Oracle 9i database. Oracle interMedia Text, starting in Oracle 8i, has also been enhanced to be more XML-aware. In addition, JDeveloper has a lot of functionality built in for working with XML.

For those still working with an Oracle database prior to Oracle 8i, there is a PLSXML utility available that gives you a slight bit of XML functionality for your PL/SQL applications.

The XML Developer Kit

Beginning with the first release of Oracle 8i, Oracle has offered the XDK for the Oracle database. For Oracle 8i releases 1 and 2 (8.1.5 and 8.1.6) you must download the XDK from Oracle Technology Network and install it manually. With Oracle 8i release 3 (8.1.7) and Oracle 9i, the XDK is integrated into the database and is installed automatically when the database is installed.

The XDK may be accessed from PL/SQL, Java, C, and C++ applications. PL/SQL applications must run inside the database (i.e. stored procedures, packages, triggers, or anonymous PL/SQL blocks) in order to access the XDK. Java applications may reside inside or outside the database. C and C++ applications, of course, reside outside the database.

In order to access the XDK from PL/SQL or Java applications residing inside the database, you must install Oracle’s JVM (sometimes called the Java option or Oracle JServer). It may seem counter-intuitive that you must have Oracle’s JVM installed in order to use the XDK from PL/SQL, but there is a reason for it: The XDK is itself written in Java. Oracle merely put PL/SQL wrappers on the Java code in order to make it accessible from PL/SQL.

The XDK contains an XML parser, an XSLT processor, an XPath engine, an XSQL page processor and servlet, and an XML SQL utility. The first three are Oracle’s implementations of the XML 1.0 specification, while the last two are features unique to Oracle.

Note that the XSQL page facility is only accessible from Java applications. Also note that in earlier releases of the XDK, the XML SQL utility was a separate tool that you had to install separately from the XDK. You can download the XML SQL utility from Oracle Technology Network.

The XML Parser, XSLT Processor, and XPath Engine

The XML parser enables you to parse and validate an XML document into an infoset data structure in memory, manipulate the infoset, and write the data back to an XML document. PL/SQL application developers access the XML parser by calling the xmlparser and xmldom packages. Java classes are provided for access from Java applications. There is also an oraxml program that lets you perform a syntax check and optional DTD validation of an XML document from the command line.

The XSLT processor enables you to transform XML documents into different XML document types, or any other file format you wish. From PL/SQL you access the XSLT processor with the xslprocessor package. There is also an oraxsl program to perform transformations from the command line. You simply provide the filename of the XML document to be transformed, the filename of the stylesheet describing the transformation, and a filename for the output. This can be useful for transforming many documents in a batch from a script.

The XPath engine in the XDK is used by both the XML parser and the XSLT processor. Calling the xmldom PL/SQL package, for example, will put the XPath engine to work. You don’t typically interact with the XDK’s XPath engine directly.

XSQL Pages

The XSQL page facility provides an extensible framework for publishing data in XML. This feature of the XDK is only available to Java applications. The XSQL page facility consists of an XSQL page processor, an XSQL Java servlet, and a program you can run from the command line. The servlet can run under Apache, Oracle 9iAS, or inside an Oracle 8i release 3 (8.1.7) or Oracle 9i database. The XSQL page processor is called by the Java servlet and the command line program.

XSQL pages make publishing data easy. You create an XML document using the XSQL DTD. This document encapsulates a SQL query and identifies parameters that can be passed in (on a URL or as command line arguments). When you call the servlet or use the command line program, the query gets executed and the results are converted into XML. A query against the EMP table might generate output that looks like:

<ROWSET>
  <ROW num="1">
     <ENAME>King</ENAME>
     <EMPNO>7839</EMPNO>
     <JOB>President</JOB>
  </ROW>
  <ROW num="2">
     <ENAME>Blake</ENAME>
     <EMPNO>7698</EMPNO>
     <JOB>Manager</JOB>
</ROW>
</ROWSET>

You can incorporate an XSLT so that the output conforms to a specific DTD or is marked up in a form suitable for human viewing such as HTML.

The command line program included in the XSQL page facility is called xsql. This program can be useful for publishing large numbers of documents in batch form.

The XML SQL Utility

The XML SQL utility can simplify loading and retrieving XML documents from the database when you use tables, rows, and columns to represent an XML document instead of stuffing entire documents into CLOB or VARCHAR2 columns.

You can pass an XML document that has a structure (similar to the XSQL page example above) to the XML SQL utility, and the utility will insert the rows into a database table. You can also submit a query to the utility and it will format the query results into an XML document.

XSLTs can be used before inserting or after querying in order to make the inputs and outputs more flexible. There are also facilities for updating or deleting data in the database using the XML SQL utility.

You can call the XML SQL utility using Java APIs, PL/SQL APIs, or the command line. In PL/SQL the XML SQL utility appears in the form of packages called dbms_xmlquery and dbms_xmlsave. The command line utility is invoked with “java OracleXML”.

interMedia Text

interMedia Text is an Oracle option that adds powerful text search capabilities to the database. interMedia Text allows you to search text documents stored in CLOBs or BFILEs for specific words or themes. Features such as fuzzy match, stemming, and proximity are also supported.

interMedia Text has built in filters to intelligently search documents stored in popular formats such as HTML or Microsoft Word. Beginning in Oracle 8i release 2 (8.1.6), interMedia Text has built in filters and extra functionality for XML documents as well.

For example, when searching XML documents, you can limit the search to text within certain tags or attribute values. In Oracle 9i, interMedia Text can also search XML documents that are stored in the database using the new xmltype.

JDeveloper

JDeveloper is an interactive development tool that runs on Windows. You can use JDeveloper to build Java applications, and these applications can leverage XML functionality.

JDeveloper has built in capabilities to help you create XML documents and check them for syntax. You can also use JDeveloper to work with XSLT stylesheets, and to create and run XSQL pages. (We talked about XSQL pages in the XDK section above.)

The PLSXML Utility

Oracle Corporation’s true commitment to XML seems to begin with Oracle 8i. The XDK-the cornerstone to XML-enabling your applications-will not work with Oracle 8.0 or earlier databases. The PLSXML utility is the one offering from Oracle I have found that adds XML capabilities to your Oracle 8.0 or earlier databases.

PLSXML is a PL/SQL package that you can call with a SQL query and the results will be output in the form of an XML document. PLSXML requires the Oracle Application Server or Oracle 9iAS-the XML output is written to the htp.p buffer.

This utility is extremely limited in its capabilities, but it does give pre-Oracle 8i users a taste of XML capability in the database. You can download PLSXML from Oracle Technology Network. (You’ll find it near the very bottom of the XML home page, or you can search for “PLSXML”.)

Conclusion

The concepts behind XML and its related specifications are very simple. Simplicity, rigid standards, and extensibility have worked together to yield a powerful technology for cross-system data exchange. Oracle Corporation has jumped on the XML bandwagon and added strong XML support to its database and application development products.

The goal of this paper was to give you a quick introduction to XML, its strengths, and capabilities within the Oracle database. If you want to delve deeper into XML and Oracle, here are some references to check out:

About the Author

Roger Schrag has been an Oracle DBA and application architect for over twelve years. He started out at Oracle Corporation on the Oracle Financials development team and moved into the roles of production DBA and database architect at various companies in the San Francisco Bay Area. Roger is a frequent speaker at Oracle OpenWorld and the IOUG Live! conferences. He is also vice-president of the Northern California Oracle Users Group. In 1995, Roger founded Database Specialists, Inc., (http://www.dbspecialists.com) a consulting firm specializing in business solutions based on Oracle technology. In addition to consulting, the company offers flexible solutions including part-time DBA support and Database Rx (http://www.dbspecialists.com/database_rx.html), a web-based monitoring and alert notification service for Oracle databases. In 2001, the San Francisco Business Times named Database Specialists one of the Top 150 Fastest-Growing Private Companies in the Bay Area.

 

 

 

Copyright 2001 Database Specialists, Inc. http://www.dbspecialists.com