Home » Microsoft SQL Server » 10 - XML Programming in SQL Server 2005

XML and SQL Server

XML programming in SQL server

XML is a universal format used to represent information. XML has already emerged as a front runner among data manipulation and data transmission tools. An XML document consists of data encapsulated within identifying tags. The tags may contain descriptive attributes about the data. The following is an XML representation of a Book Object from our database. We encapsulate information about the bookid, title, author, year first published, and category.

 <book>
<id>1</id>
<title>1000 Years, 1000 People: Ranking the
Men and Women Who Shaped the Millennium</title>
<author>Gottlieb, Agnes Hooper</author>
<pubdate>1999</pubdate>
<category>Non Fiction</category>
</book>

An XML document may be represented as a tree. The root of the tree is the base or root element, and the innermost elements form the leaves. The book element in the above XML is the root; all other elements are trees. More complex documents may have many intermediate levels. An XML document may contain any sort of information and is extremely useful as a universal datasource. The utility of XML documents is enhanced by XML schemas - a Schema defines the structure of an XML document. Schemas are themselves XML documents with the extension ".xsd". XML validators check XML documents against a specified schema. Validators perform an important function; entire databases may be contained in a single XML document; the integrity of the data may be easily verified using a schema.

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="id" type="xs:integer"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="pubdate" type="xs:string"/>
<xs:element name="category" type="xs:string"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

The above lines declare the version of XML and schema that book.xsd complies with. It also declares an element called book. 'complexType' elements contain other elements. The 'Sequence' directive defines the exact order of elements within the complex type. Here, the 'book' root element contains all the book property elements. Note that id is declared as an integer (xs:integer) while 'title' is declared as a string. The 'xs:decimal' xml type may be used to hold currency values like our book price element. Validators used to verify XML against the schema throw errors if the structure of xml does not pertain to the structure specified in the xsd or if any elements have a different datatype.

XML Validation Through XSD
XML Validation Through XSD

XML stylesheets contain specifications on translating the XML document to HTML or some other markup. XSL Transformers are used to modify XML documents according to a provided XML stylesheet's specifications. XML Stylesheets have the extension ".xsl". Several commercial products such as XML Spy and others offer an Environment to create XML documents, StyleSheets, and Schemas.

XSL Transformer
XSL Transformer

The booksbyauthor stylesheet converts a list of several book elements held within start and end <booksbyauthor> tags into an HTML document with the heading 'Books By Selected Author'. This displays books in a tabular format within the xml document. Take a look at this file along with the book XML and XSD to get a better idea about how XML stylesheets work.

SQL Server 2005 supports an XML data type. This data type may be used to hold entire xml documents or well formed XML fragments (like the book element xml above). Further, the xml data type column may be 'Typed' by associating it with a schema stored within a SQL Server schema collection. All new values for the XML data typed column are validated with the schema before input into the database.

Also, XML columns associated with a schema are automatically broken down into individual elements and stored internally in an efficient manner. SQL Server automatically reconstructs the XML document or fragment when queried for the same. Untyped XML is stored as a simple character string. However, even untyped columns should contain well formed XML - all start tags should be complemented by corresponding end tags in the correct sequence.

In this chapter, we will create a bookxml table that holds a single XML data type column. This column (bookelement) will house XML fragments containing important book related information. We will associate the XML fragment with an xsd and run sample pass/fail validations. We will also use programming techniques to use an xsl stylesheet (also stored in a database table) to convert the xml into html and display the result on a web page. A stored procedure will extract all books by authors whose lastname starts with a given string and return an xml string containing the returned xml fragments. Some .NET CLR routines will be used within this procedure to convert this xml list to html and send it back to the user who performed the search.

Let us first create a bookxml table and xml schema collection for the bookelement XML column in the table. We will then construct XML fragments for our existing books and insert these into the table. First, create an XML schema collection. Open Management Studio and connect to your default server. Right click booksDB and click on 'New Query'. Copy and paste the following query in the query window and run it by clicking '!' Execute or hitting the F5 key.

USE [booksdb]
GO
CREATE XML SCHEMA COLLECTION [dbo].[BookSchema]
AS
N'<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="id" type="xs:integer"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="pubdate" type="xs:string"/>
<xs:element name="category" type="xs:string"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
'

Expand the booksdb database and the 'Programmability' folder. Expand 'Types'. Right click on the XML Schema Collections and click 'Refresh'. Now, expand the 'XML Schema Collections' Folder. You will see the BookSchema collection that we have just created.

The New Schema Collection
The New Schema Collection

Next, let us create a bookxml table. Right click on the 'tables' folder under booksdb and click 'New Table..'. Enter 'bookelement' in the column name and select 'XML' from the Data Type list. Click on the box under 'Allow Nulls' so that it allows fields that are empty. Click on the down arrow at the end of the XML Type Specification in the 'Column Properties' panel and select 'dbo.BookSchema' from the list. Now, right click the table tab and save the table with the name 'bookxml'. You have now created a table with a typed xml field.

bookxml Table Definition
bookxml Table Definition

In general, tables like 'bookxml' are indexed on a primary key for ease of retrieval. We use a single column in our example as we will not have a lot of data in our table. Let us now construct some xml fragments from our existing books for insertion into the bookxml table. Run the following command in a query editor to create a list of 'insert into' statements. This command simply retrieves data out of a query that joins the author, book, category, and subject tables and uses the string concatenation operator and several string literals to construct a valid book xml fragment using existing data. We are just experimenting; a real world scenario would not require this type of insertion; data would be procured from sources in XML format.

select 'insert into bookxml values(''<book><id>'+
rtrim(convert(char(6),bookid)) + '</id><title>'
+ title + '</title>' +
'<author>' + lastname + ', ' + firstname + '</author>' +
'<pubdate>' + pubdate +
'</pubdate><category>' + categorydesc +
'</category><price>' +
ltrim(convert(char(10),price)) + '</price></book>'')'
from book, author, subject, category where
author.authorid = book.authorid and
subject.subjectid = book.subjectid and
category.categoryid = subject.categoryid

Once you run the above query, you will see a list of generated insert statements in the bottom panel. Right click on the top bar of the results and click on 'Select All..'. Right click once more and select copy. Open a new query window by right clicking on 'booksdb' and clicking 'New Query'. Now, paste the contents you just copied onto the new window by pressing down the Ctrl key at the bottom left of the keyboard and typing 'V' (Ctrl -V').

Copy Paste Generated XML
Copy Paste Generated XML

There is a small problem with our generated xml. All single quotes have not been escaped. SQL Server considers single quotes to be string delimiters; all single quotes that are part of a string to be inserted into the database (e.g. O'Donell in author name or It's a Long Day in title) should be 'escaped' by inserting an extra single quote. To do this, click on the Edit -> Find and Replace -> Quick Replace on the top bar menu or hold down the Ctrl key and type 'H' (Ctrl - H). This will open the quick replace dialogue.

Fixing single Quotes in generated Insert Statements
Fixing single Quotes in generated Insert Statements

The string {[^(]}'{[^)]} is a regular expression. It tells SQL Server to search for occurrences of single quotes that do not immediately follow an opening bracket or immediately precede a closing bracket. This string is composed of inline single quotes. Enter this string in the 'Find' box. Enter \1''\2 in the Replace box. This string indicates that the entire matched string from 'Find' should be replaced by \1 (the stuff matched within the first pair of '{}') followed by two single quotes (in place of one) followed by \2 (the stuff matched within the second pair of '{}'). Now, select 'Current Document' under 'Look In'. Expand the 'Find Options' box by clicking on the plus sign next to it. Click on the 'Use' box so that it is checked and select 'Regular Expressions'. Now, click 'replace all'. This action should show you a pop-up window with 'Replaced 20 Occurrences'. Close the pop-up and the Find-Replace window. Now, simply select all the contents of the query window with the generated inserts and click '!' Execute or hit the f5 key to insert all the records. You should see several '1 Row(s) affected' messages.

The only reason we were able to insert all these XML fragments is because they comply with the Schema Object dbo.BookSchema. Try to execute the following commands. One has a value for price that is not of type 'xs:decimal' and the other has a pubdate element before the author element. Both fail with the error shown below each. Thus, no invalid data may be entered into the bookxml table.

insert into bookxml values('<book><id>235</id>
<title>What Is Life</title>
<author>Schrodinger, Erwin</author>
<pubdate>2044</pubdate>
<category>Non Fiction</category>
<price>##badprice##</price></book>')
Msg 6926, Level 16, State 1, Line 1
XML Validation: Invalid simple type value: '##badprice##'.
ocation: /*:book[1]/*:price[1]
insert into bookxml values('<book><id>235</id>
<title>What Is Life</title>
<pubdate>2044</pubdate>
<author>Schrodinger, Erwin</author>
<category>Non Fiction</category>
<price>22.22</price></book>')
Msg 6965, Level 16, State 1, Line 1
XML Validation: Invalid content. Expected element(s):author
where element 'pubdate' was specified.
Location: /*:book[1]/*:pubdate[1]

We will now create a table that will hold XSL stylesheet specifications for XML data columns. Create a table in booksdb using the following command. This command creates a table with an automatically incremented id, a column name in the xmlcol field (for the xml column that the xsl corresponds to), and an xml data typed column that contains the xsl file to use to transform the contents of the column name in 'xmlcol' to HTML.

USE [booksdb]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[xsl](
[xslid] [smallint] IDENTITY(1,1) NOT NULL,
[xmlcol] [varchar](50) NOT NULL,
[xslspec] [xml] NOT NULL,
CONSTRAINT [PK_xsl] PRIMARY KEY CLUSTERED
(
[xslid] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF

Save the bookxsl.sql file onto your computer and open it using File -> Open -> File and execute its contents by selecting the contents and clicking '!' Execute or pressing the F5 key. You just ran an insert command that simply inserts a line with the xsl we will use later. Make sure that the row has been entered into the table.

CLR stands for Commmon language Runtime. This is a .Net mechanism that allows .Net code to be registered and used within SQL server. The code segment within the XSLTTransform.cs file contains code (from msdn.com) that calls an XML transformer with two input parameters - the xml file and xsl to use for conversion. A function called transform within this code returns an HTML representation of the input XML. Knowing how this code works is really not necessary; it is enough to know that we will create an assembly based on this code within the SQL server and use its functions in a Stored Procedure. Save the file onto your computer.

Open a command window (Run -> Cmd) and go to the 'C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727' folder using the 'cd' command. You may not have the v2.0.50727 folder; go to 'C:\WINDOWS\Microsoft.NET\Framework\' folder through windows explorer and check for any directory with a 'v2' prefix and type the CD command for this path. Link the above code using the following command. Note that the path should correspond to the path to XSLTTransform.cs on your computer. This will create a XSLTTransform.dll file in the 'C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727' directory. Move the XSLTTransform.dll file to an appropriate location on your computer. Now, go back to Management Studio, open a new query window for booksdb and execute the following command:

Assembly Creation
Assembly Creation

CREATE ASSEMBLY XsltTransform
from 'C:\\XSLT\\XSLTTransform.dll'

The path to the .dll file should be customized to the path on your computer; slashes in the path should be escaped by double slashes. The XSLTTransform.cs file contains a 'Transform' function that does the work of XML to HTML translation. Information about this function is held in the assembly we just created. However, we need to map it to a SQL Server function to use it in stored procedures and commands. Execute the following command on a booksdb query window to map this function to an applyXsltTransform function that takes the xml and xsl stylesheet as input. The last line maps the Transform function within the XSLTTransform assembly to the new function. This line has the syntax <assembly_name>.<class_name>.<function_name>

create function ApplyXsltTransform( @inputXML xml, @inputTransform xml )
returns xml
as external name XSLTTransform.XSLTTransform.Transform

Once you have created the assembly and the function, save the findbooksbyauthors.sql file onto your system and open it by clicking on the File -> Open -> File top bar menu in Management Studio. Execute the contents of the file by hitting the F5 key or clicking '!' Execute.

Take a look at the code in the stored procedure. The procedure takes as input a string with characters from the required author's last name. It returns as output a long xml string corresponding to an HTML file that displays information about the books by the given author(s) in tabular format. Consider the 'XQuery' command that gets the corresponding entries from the bookxml file. XQuery is a language used to query XML documents:

set @strLength = len(@search)
set @search = UPPER(LEFT(@search, 1)) +
LOWER(SUBSTRING(@search, 2, (@strLength - 1)))
set @bookXMLString = '<booksbyauthor>'
DECLARE books_sum CURSOR FOR
SELECT convert(varchar(500),bookelement)
FROM bookxml
WHERE bookelement.exist
('/book/author[substring(., 1,sql:variable("@strLength")) =
sql:variable("@search")]') = 1

The initial two lines obtain and save the length of the input parameter into a @strLength variable and format @search to be a lowercase string with an uppercase initial character (the format of our author name - an initial upper case letter followed by all lowercase). Note that XQuery, unlike the default collation of SQL server, is case sensitive. This means that 'sh' or 'sH' will not match 'Shakespeare' or 'Shaw'. We reformat @search to take care of this issue. We declare a bookXMLString variable and set it to a root '<booksbyauthor>' element. Then, we declare a cursor for a select query that uses Xpath. The bookelement.exist function returns 1 each time it finds an 'author' element within a 'book' element ('/book/author[..') whose first few letters upto the length of the input string (substring(., 1,sql:variable("@strLength")) ) contains the string we got as input ('...sql:variable("@search")]'). Note that stored procedure variables are used in SQL Server XQueries by prefixing sql:variable to the variable name within parenthesis and double quotes.

We next iterate through the books_sum cursor and append each found xml fragment to the bookXMLString variable. Once we close the cursor after processing every returned row, we add a closing tag of the root element to bookXMLString (select @bookXMLString = @bookXMLString + '</booksbyauthor>')

The next few lines use the ApplyXsltTransform function to transform the @bookXMLString variable's XML to HTML using the XSL stylesheet stored for the bookelement column in the xsl table we created earlier and stores this HTML in the output parameter @bookHTML for retrieval.

 select @xml = (select @bookXMLString)
select @xslt = (select xslspec from xsl where xmlcol = 'bookelement')
set @bookHTML = dbo.ApplyXsltTransform( @xml, @xslt )

Our method is quite unique as all application steps including html generation are performed in the database layer itself. This means that very little application code is required to deploy our application. A more intuitive use for XML data is to create web services that directly use XML output (we would not have to transform the data in such cases). Web services may be designed to use HTTP endpoints created for our stored procedures.

To test our stored procedure, run it using the following command. Right click on the result and click 'select all', right click again and click copy. Paste onto notepad or some other text editor to view the output HTML

declare @myHTML xmlexec findbooksbyauthor 'pre', @myHTML OUTPUTselect @myHTML

Deploying our stored procedure on the web is quite easy.