Using SQL Server XML Datatype

SQL Server has an XML datatype you can use to natively store XML data in SQL Server databases. If you are still using SQL Server 2000, you can store XML, but it must be in a string-based column, or you must shred the data into relational columns using OPENXML or BulkLoad. By using a native XML type, SQL Server 2005 and 2008 can support richer operations against your XML data, such as constraints, cross-domain queries that combine relational data and XQuery, and XML indexes.

Another benefit of using the native XML datatype is that XML data is inherently different from relational data in its structure. XML data is in a hierarchical structure that can be recursive, and XML supports a different query language than relational systems.

There are many scenarios where using relational modeling is a better choice than XML, and vice versa. For example, if you have data that is very interrelated, such as customers, their orders, the products in the orders, and the stores that sell those products, you could try to implement a solution using XML, but it would be quite challenging. How do you structure your hierarchy? Do you want a customer to be a top-level node and then have orders for each customer appear underneath? How do you write a query that returns all customers with at least five orders, where each order is greater than $1,000, and the name of the store where the customers purchased the products? Another problem is that you will repeat data throughout the hierarchy, such as product names, product prices, and so on, because of the hierarchical nature of XML. Plus, if you want to delete a customer but not the products or orders under that customer, you can’t do so, because the orders and products are children under the customer element. On the other hand, using a relational model as a solution, you can quickly model your data and query the information. You may be thinking that in this scenario, you should just shred your XML data into the relational database, as you saw in the previous chapter. However, shredding has its own issues, in that you do not always get back what you put in, since you are not guaranteed the same XML when you reconstitute the shredded data. Shredding adds another layer of complexity in terms of code creation and maintenance. Also, any reasonably complex XML document will need to be shredded across many tables, requiring extensive join operations across those tables to reconstitute the XML. You’ll also end up with a complex, annotated schema full of tables and many foreign key relations into which to shred that XML.

Now, there are scenarios where modeling your data using XML is very useful. First, XML can be more flexible than relational models. So, if you need a free-form structure to store data, XML can be a good choice. Also, XML is self-describing and easily portable across applications or even platforms. Plus, if your data has sparse entries or needs rich multivalue functionality, XML is a good choice as your data format. Finally, if you truly have document-centric data such as Microsoft Office documents, you will want to store this information as XML, since Microsoft Office documents lack rigid structures. XML provides the flexibility to store and query the information in the documents in a rich way.

Even if you choose XML as the format for your data, you will need to decide between using the XML datatype, shredding your XML into relational columns, and storing the XML using the (n)varchar(max) or varbinary(max) type. If you care about the order of elements, and you want the ability to use XML programming paradigms such as XPath and XQuery, you will want to use the XML datatype to store your XML data. If your XML data is best represented using a relational model, you can shred your data into relational columns using annotated schemas, just as you could in SQL

Server 2000. Finally, if you need to preserve the XML data exactly as it was created, including whitespace and attribute ordering, then you will want to store the XML in an (n)varchar(max) or a varbinary(max) column. Some scenarios (such as legal documents) may require this.

Finally, SQL Server can support a hybrid model, whereby you may use the XML datatype but promote certain properties—for example, key document properties such as author, last modification time, or last save time—into relational columns, or you may shred your XML into relational columns but keep the original copy in an nvarchar column. SQL Server provides the flexibility to meet the needs of your application when working with XML data.

We want to make one thing very clear, though, since this will cause you issues in the long term if you do not remind yourself of it regularly: if your data is quite structured, in that your XML does not look hierarchical and is normalized, you should use the relational model. Do not use XML. XML is targeted at semistructured or unstructured data. If you need to dynamically add schemas or data on the fly that you never expected, XML is your best choice. Do not make the mistake of thinking everything is a nail to bang with the XML hammer in SQL Server.

Source of Information : Apress Accelerated SQL Server 2008


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner