Query Efficiency

SQL Server automatically generates a random IV to help prevent statistical analysis attacks on columns of data. The need to eliminate patterns from encrypted data is at odds with the need to index and quickly search the same data. Indexing takes advantage of these patterns to organize data for efficient search and retrieval.

A hacker who knows the relative frequency with which certain pieces of encrypted data occur in a given column could use that information to deduce even further information about it. For example, a corporate database containing employee information in a table encrypted without the use of random IVs might leak additional information from the patterns provided. Consider the HumanResources.Employee table in the AdventureWorks database. Most of the executive and managerial titles occur only once, while the lower-level positions may occur dozens of times. A hacker might be able to infer additional information from this pattern, including information about which employees are paid the most. The hacker might use knowledge like this to help focus his attack. SQL Server’s random IV generation helps to eliminate these patterns from encrypted data. This has two main implications for T-SQL developers:

• The same IV used during encryption is required during decryption.

• The encryption functions are nondeterministic, which means that encrypting the same plain text multiple times with the same key will not generate the same encrypted text.

The nondeterministic nature of the SQL 2008 encryption functions makes it useless to index an encrypted column directly. Searching encrypted columns requires decrypting every value in the column and comparing them one by one. This is very inefficient and can be a bottleneck in your applications if your tables are large. Some methods have been suggested for increasing the efficiency of searching encrypted data. These methods generally include storing a hash of the encrypted data for indexing. The main problem with these methods is that they reintroduce the statistical patterns eliminated by the random IVs. You can take several approaches to strike a balance between data security and search efficiency. The most important recommendation is to not encrypt columns you will use heavily in your query search criteria (WHERE clause), sort criteria (ORDER BY clause), or grouping (GROUP BY clause).

However, sometimes you might not have a choice—you may need to encrypt a column that is part of your WHERE clause or other query criteria. One thing you can do to make this more efficient is to narrow down your results using other criteria involving nonencrypted columns first.

You can create a “pseudo-index” of your data by adding an additional column to your table with a one-way hash code of your plain text, and creating an index on that column. The built-in SQL Server 2008 HashBytes function can be used to generate a one-way MD5, SHA-1, or other hash value of your plain text and store it in the new column. Indexing this new plain text hash value column can make equality searches (using the T-SQL = operator) much more efficient. Range searches (operators like <, >, BETWEEN, and so on), however, cannot be used on hashed or encrypted data. One of the implications of pseudo-indexing with a hash value is that it once again opens the door for statistical analysis attacks using the hash values as a guide. Using a hash value as an index also makes dictionary attacks against the hashed values possible. A dictionary attack is one in which a hacker uses a large list of plain text values to try to guess the plain text of a hashed or an encrypted value by brute force.

Another method of pseudo-indexing encrypted data is a variation on the previous method, except that it uses a hashed message authentication code (HMAC) in place of the hash value. The HMAC basically takes a “secret” value, combines it with the plain text, and generates a hash value based on that data. Although the HMAC method provides protection against dictionary attacks, it doesn’t provide any additional protection against statistical analysis

The main thing to consider when using SQL Server’s data-encryption facilities is that encryption and search efficiency are opposing goals. The purpose of encryption is data security, often at the expense of search efficiency. While you can use the methods suggested here to increase the efficiency of SQL queries on encrypted data, the hash value and HMAC index methods require more storage and can actually circumvent SQL Server’s protection against statistical analysis (via random IV)..

Source of Information : Apress Accelerated SQL Server 2008

0 comments


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner