Utility platforms

Most (but not all) Western countries use some form of computer controlled system to manage their electricity grid, gas supply and water distribution. It’s highly likely in the future that a nation-state will exploit these command and control systems by using DDOS methods which would lead to serious power outages and could bring down the stock markets.

The US government is already aware that Russia and China has infiltrated the electrical grid and deployed logic bombs which will be used to sabotage or disrupt the grid. The US is one of a number of countries that actually has its’ electricity grid connected to the Internet, so it is conceivable that they could just disconnect it from the Internet in the event of an attack.

In July 2009, North Korea is claimed (by security experts) to have been behind a cyber attack that paralysed the websites of US and South Korean government agencies, banks and businesses. South Koreas National Intelligence Service (NIS) claimed at the time that the attack appeared to have been elaborately prepared and staged by a certain organisation or state. The NIS claimed that the cyber attacks were the work of North Korea. The cyber attack didn’t appear to sabotage South Korean or US networks but after investigation by South Korean and US authorities it became apparent that it was a DDOS attack.

South Korea has for some time warned of the dangers of cyber espionage by Chinese and North Korean hackers. The country’s Defence Security Command at the time logged 95,000 daily attempts to penetrate its military network. There is no evidence to suggest that the same level of daily cyber attacks was seen in July of this year (2010). AhnLabs the South Korean Internet Security vendor did tell me earlier this year that they were preparing for a cyber attack but I understand it never came.

Source of Information : Hakin9 November 2010

Cyber Sabotage – the SCADA threat

Sabotage can occur at many levels i.e. military, government, utility platforms (i.e. electricity, oil, banking, stock markets, transport etc) or corporate and all will use the same attack methods. Sabotage is one of the major threats to our everyday lives and could in essence take a nation-state into the dark ages. With ever increasing reliance on the Internet (remember we are running out of IP addresses – think IPv6), it’s no surprise that cyber sabotage is going to increase the risks to every nationstate on Earth.

Cyber sabotage can do many things i.e. reprogramme existing source code; control or change the way programmable logic controllers work (PLC) by embedding a rootkit in the code; edit/delete source code and or readable documents and command and control of a servers files and folders. These are just some ways attackers (or a nation-state) might exploit damage on another network/system.

One of the most destructive sabotage attack vectors seen to date has to be Stuxnet. Stuxnet (seen earlier in 2010) is a Windows-based computer worm that specifically attempts to sabotage and in some instances reprogram industrial software systems like SCADA which are used to control and monitor industrial processes across the globe The worm targeted the Siemens SCADA control systems (via USB flash drives)* and is believed to have been created in the West and used in a sabotage attack on a nuclear plant in Iran.

The worm’s impact on the plant will probably never be known. What is known is that it delayed the start-up of the plant for some weeks. The malware itself was unique in that the malware writers would have had to have had firsthand knowledge of industrial processes – what was equally strange was that the malware was coded in C and C++ (malware isn’t normally coded this way) and that the malware had two stolen digital certificates. One significant advancement on previous malware was its’ ability to update itself over peer to peer – so all in all this is probably the most sophisticated malware of its type to date.

Who wrote it? This was probably a Western sponsored cyber sabotage. The Stuxnet worm is very much the beginning of a new dawn in cyber weaponry and leaves leading security researchers in no doubt that we will now see the evolution of the Stuxnet family.

Source of Information : Hakin9 November 2010

Cyber Espionage – the network threat

Espionage isn’t something new, and neither is cyber espionage. Some of the most sophisticated cyber espionage networks reside in Russia and China. Cyber espionage normally involves stealing secret (and or classified) documents from other nation-state governments, individuals, military establishments, rivals, enemies and businesses.

Cyber warrior units (that are run by nation-states) which are closely tied to private hacker groups are normally responsible for developing cracking techniques to develop sophisticated malware, Trojans, backdoor traps and logic bombs to gain unauthorised access to a foreign network and or server.

Recently, leading security researchers identified and tracked a sophisticated cyber espionage network based in China called Shadow. The Shadow worms systematically snoop through files (looking for secret; restricted or confidential classified documents) stored on the targeted computer. The Shadow worm would then send the harvested data through the web to core servers located in China. The attackers used social media platforms such as Twitter, Facebook, redundant cloud-based computing systems and Google Groups as the command-and-control infrastructure for the Shadow rollout.

This espionage network targeted computers in several foreign countries including systems belonging to the Indian government and military. The Shadow network (a similar espionage ring to GhostNet from 2009), was found to have compromised the UN, computers systems belonging to the Indian government and the Embassy of Pakistan in the US. Unfortunately, the targeted computers weren’t secure as the data had been moved from a secure environment.

Source of Information : Hakin9 November 2010

The cyber security threat areas again:
• Cyber warfare
• Cyber terrorism
• Cyber attacks (organised crime)

Cyber warfare
Cyber warfare can be defined as part of four other defence components – one being air, sea, land and space. It is in these areas that cyber warfare has emerged as the fifth domain of warfare. Cyber warfare involves one nation-state attacking another by using digital attack code to bring about a nation-state infrastructure collapse. The collapse will target the energy system i.e. gas, electricity and oil for example as well as the financial hubs. This can be achieved by using DDOS, Trojans, malware or the use of logic bombs and trap doors in source code for example.


Cyber terrorism
Cyber terrorism uses Internet based attacks which are related to terrorist activities – this might include DDOS attacks on government networks and or looking to steal individual personal information to commit fraud (this is used to raise funds for the terrorists to commit their terrorist acts).


Cyber attacks
Cyber attacks usually involve an organised criminal gang who target individual and networked computers to extract personal and business information to commit financial fraud.

Source of Information : Hakin9 November 2010

2010 Storage Devices You Care The Most About

The cost of SSDs dropped rapidly over the course of 2009 and 2010, making an SSD feasible for most any build where you want snappy responsiveness with boot-up and application loading. Of course, capacity vs. cost is still an issue, but many drives now offer enough space to handle Windows 7 and some the games you want to play. For multimedia storage, HDDs are still ideal, and you’ll find a number of 1TB and 2TB options on the market.


Winner: OCZ Vertex 2 (240GB)
$599.99; www.ocz.com
The OCZ Vertex 2 features the Sand-Force SF-1200 controller, which offers a significant performance increase over the Indilinx Barefoot, including a nearly 100MBps increase in the max write speed (from 180MBps to 275MBbs). The 240GB OCZ Vertex 2 has a maximum read speed of 285MBps, and OCZ indicates that it can produce sustained write speed up to 250MBps.

The OCZ Vertex 2 includes native TRIM support, which erases blocks of NAND flash before they are re-written, so that the SSD doesn’t slow down over time. Note that you’ll need to be running Win7 or a version of Linux with kernel 2.6.33 to benefit from TRIM. The Vertex 2 can also be used with OCZ’s Toolbox Software, which lets you quickly update the Vertex 2’s firmware, securely erase the drive, and optimize the sectors to match the OS. OCZ also throws in a 3.5-inch desktop adapter bracket, so you don’t need to find any creative ways to stow the SSD in your build. The 240GB Vertex 2 is backed by a three-year warranty from OCZ and offers a 2-million hour MTBF.


First Runner-Up: Crucial RealSSD C300
$599.99; www.crucial.com
Crucial added a Marvell 88SS9174-BJP2 controller to provide native support for 6Gbps file transfers, and the 256GB RealSSD C300 is built using 34nm Micron ONFI 2.1-compliant MLC NAND flash memory. Crucial indicates that the 256GB RealSSD C300 offers read speeds up to 355MBps and write speeds up to 215MBps. Even though it was introduced in April, the RealSSD C300 is still one of the fastest consumer SSDs around.

There’s a 128MB built-in cache, where Crucial stores mapping levels and access history for quick performance with recent tasks. Of course, you’ll want to pair the Crucial RealSSD C300 with motherboard that features a 6Gbps SATA interface to take full advantage of the SSD’s speed. Otherwise, the RealSSD C300’s maximum read speed drops to 265MBps. Crucial offers a three-year limited warranty on the SSD and indicates it should last 1.2 million hours.


Second Runner-Up: 3TB Western Digital Caviar Green
$239.99; www.wdc.com
Due to a 2.2TB limit in the maximum size of the master boot record used in our PCs, manufacturers have been hesitant to create drives greater than 2TB. With the 3TB Caviar Green, Western Digital includes an AHCI-compliant Host Bus Adapter that enables the operating system to support all the space on the hard drive. The 3TB Caviar Green is built using a 750GB per platter areal density and Advanced Format technology. The latter increases the media format efficiency, so it can read and write more data in same amount of space than it could before.

The Caviar Green line offers Western Digital’s IntelliPower technology, which optimizes spin speed, transfer rate, and caching to save power. Western Digital indicates that the 3TB can hold up to 1,150 hours of DVD-quality video or up to 750,000 MP3s. The drive is covered by a three-year limited warranty.

2010 Graphics Cards You Care The Most About

Source of Information : Computer Power User (CPU) January 2011

Server Selection Criteria

Below we list—non-exhaustively—selection criteria appropriate for an IT professional to use in choosing a server, following the list by a discussion of each criterion in turn:

• Availability of applications and development tools
• System availability
• Data integrity
• Security
• Performance
• Scalability
• Price
• Support for the client/server model
• Architectural maturity
• Investment risk

Note that the list is given in no particular order; when selecting a server, the individual needs of the business will drive the relative weighting of the criteria.

The availability of a vast catalogue of applications and development tools eases the construction of business solutions and is an indication that the server family under consideration has a strong and active marketplace. Such an active market has beneficial indirect effects: reduced prices because of volume and competitive effects; enhanced confidence in the continuity of the vendor and platform; widespread availability of experts in the applications and the tools. We should note in passing that the success of a platform is rarely an effect of the intrinsic merits of its technology (whether hardware or software); it is the marketplace that decides.

Since business activities rely more and more on their data processing resources, three criteria of increasing importance are availability, data integrity, and security. By availability, we mean the ability of the system to perform its function every time that it is requested; integrity corresponds to the system’s ability to manage its data without damage to the data; and security is the ability to ensure that there is no unauthorized access to that data.

The performance of a system is an important characteristic, but it is difficult to characterize with a single number. The best way of capturing performance is to identify what the system must do to fulfill its mission and then identify the values that allow one to characterize performance for that mission. If the analysis of the workload shows that one or more standard benchmarks is a close approximation to the workload, then one may use the relative performance of different systems on the relevant benchmark(s) to provide a good predictor of performance on the company’s workload. If no benchmarks fit well, then one can compare performance only by running tests on the real workload. Since systems needs often change over time, it is generally a good idea to leave some "headroom" in available performance when choosing a system.

Scalability—the ability of a system to match the dimensions of the problem it has to manage—is closely related to performance. But given the choice of two systems, one may prefer a less powerful system with good scalability over a more powerful system with less growth potential.

Over the course of time, the ways of comparing price have changed, shifting from a simple price comparison of the systems under consideration (both hardware and software) with support and maintenance costs added in, to a “Total Cost of Ownership” approach, which also integrates internal costs, such as the staff needed to operate the system, system management and user support costs, the economic impact on the company were the system to become unavailable, etc.

Support for the client/server model depends on the availability of appropriate middleware; some platforms are better served than others in this area. For example, standard systems present a significant potential market, and so attract the attention of middleware publishers.

Although the various concepts used in server architectures are widely known, the development of a new architecture often takes longer than originally expected by its developers. The various aspects of maturity are very powerful forces; one such effect is the difficulties innovative architectures often meet while going to market.

Installing a server and the necessary applications represents a major investment, in terms of both immediate expenditure (acquisition costs of hardware and software) and associated costs (training, implementing new applications, etc.). Given the usual size of these costs, the investment risk is an important criterion. It is particularly important that a server vendor stay in existence for as long as its customers are using its products, and so this issue presents a particular problem to start-ups, since so many collapse while young. This difficulty can be less severe for suppliers of products like storage subsystems, since these have extremely well defined interfaces with their host systems—so, if the supplier collapses, all is not lost—alternative products from other vendors can be deployed in the same way and with the same connections as the products from the ill-fated start-up. In general, however, the subsystems purchased from the defunct vendor must be replaced, since such subsystems will, in general, only run software provided by their vendor.

Source of Information :  Elsevier Server Architectures 2005

The concept of a cluster has been known for a long time - since the end of the 70s for Tandem and since 1983 for Digital. UNIX clusters appeared at the very start of the 90s, while Windows 2000 clusters did not arrive until the late 90’s. UNIX clusters not only offer excellent functionality, but extremely good stability, which they acquired after years of experience. For Microsoft, this experience is still in its very early stages, so one may judge that UNIX clusters have a solid advantage today.

The problems faced by the UNIX clusters are due to their diversity; each cluster solution is specific to a UNIX vendor, each of which offers its own software interfaces. Each UNIX cluster vendor therefore has to support the development of its own cluster extensions and, at the same time, any thirdparty software vendor who wants to take advantage of the high availability and the scalability of the various cluster solutions must somehow handle this wide range of diversity.

Converging on some smaller number of UNIX versions, perhaps triggered by the introduction of IA-64, could effect a remedy to these diversity problems—unless the systems vendors decide to reserve the cluster extensions for use just on their own systems. Because Microsoft’s cluster solution is an integration of proprietary interfaces and of implementation, it answers just one part of the problem. The key problem remains unanswered: that qualification of application software on a given system platform is a necessity.

Finally, we should note that Linux-based cluster solutions could offer a threat to Windows-based systems similar to the threat they offer to the fragmented and divided UNIX market.

Source of Information :  Elsevier Server Architectures 2005

Hardware cost reductions should continue. At least in the entry level, we can distinguish two pricing models: constant performance with reducing price, and constant price with increasing performance. It is better to make comparisons using the price/performance ratio. It should be noted that the increasing investment necessary to build such systems will reduce the number of vendors, and that such a reduction in competitors could diminish the rate of cost reduction.

As to software, the logic of a volume market works as well as for hardware. And phenomenon of free software will tend to stabilize or even reduce the cost of commercial offerings.

Source of Information :  Elsevier Server Architectures 2005

CHINA HIJACKS INTERNET TRAFFIC

China Telecom Corp., the nation’s biggest fixed-line phone carrier, denied it hijacked Internet traffic (by exploiting Border Gateway Protocol routing tables), after a U.S. government report said the company wrongly diverted international Web data. China Telecom sent erroneous Internet-traffic instructions that briefly diverted about 15 percent of global Web traffic through servers in China, the U.S.- China Economic and Security Review Commission said in a report yesterday. The incident, which lasted about 18 minutes on April 8, affected U.S. military and government sites, as well as sites run by companies including Yahoo! Inc. and Microsoft Corp., the report said.

Information Technology Cloud: XtraCharts Main Features: "

The XtraCharts Suite makes 2D and 3D charting easier than ever. It includes multiple palettes to automatically color your series, so that appearances are never a concern. It can even show a preview of a chart before you've supplied any data for it. You can access any element's settings by simply clicking that element. And there are so many additional automatic adjustments, that it takes little effort to create a smart, professional looking chart.

With the XtraCharts Suite you can visualize data stored almost anywhere - from a database to a collection. You can even supply point data directly to a chart which is in unbound mode. And the XtraCharts Suite is extremely flexible, not only with the data it displays, but also in the ways you can output that data. Using the Swift Plot, you can easily create a lightweight chart for quick processing of very large amounts of data points. Our charting engine allows you to display charts not only on Windows Forms, but also on Web Sites, in reports created with the XtraReports Suite, or even print them directly.

The features and benefits that the XtraCharts Suite:

How Else Can a Daybook Help When Debugging?

As well as maintaining a record of your experiments, a daybook can also be useful for the following:

• Writing out hypotheses. Getting things onto paper can help identify flaws in assumptions, especially when the hypothesis is complex.

• Keeping track of details such as stack traces, argument values, and variable names. Not only does this help with finding things again, but it also helps you communicate with colleagues when explaining the problem, avoiding the need to rely upon memory.

• Keeping a list of ideas to try. Often you will notice something else you want to investigate, or a possible follow up experiment will occur to you, but you don’t want to abandon the current experiment to pursue it. A “to-do” list ensures that you don’t forget to come back to it later.

• Doodling when you need to take your mind off the problem.

Should I Leave My Logging in the Code?

Souce of Information : Paul Butcher - Debug it Find repair and prevent bugs

AxCrypt 1.7.2126

Axantum
Software AB
www.axantum.com
Free

TrueCrypt is to AxCrypt as a forklift is to a pair of tweezers: TrueCrypt encrypts whole drives and partitions, while AxCrypt simply encrypts single files with 128-bit AES encryption. We know this seems like a simple job, and that there are lots of programs that can do it, so what’s so special about AxCrypt? The key is simplicity. AxCrypt installs in Windows quickly and easily, and there’s no AxCrypt GUI; it works completely from Explorer. Just right-click a file to encrypt it. AxCrypt can also make self-decrypting files if you need to pass files to people who don’t have AxCrypt. Otherwise, when you encounter an encrypted file, double-clicking it causes AxCrypt to automatically decrypt the file and open the program that can edit the file (such as Word for an encrypted DOCX file). When you save and close the file, AxCrypt automatically re-encrypts it.

Source of Information : CPU Computer Power User November 2010

Partition recovery just got easier with EASUS Software’s Partition Master 6.1.1. New to 6.1.1 is a recover partition function that ensures data security and salvages lost or damaged data because of partition failure. The recovery function operates in manual or automatic mode and can be customized quickly through a simple setup wizard. With the feature enabled, partitions can be undeleted, recovered, and repaired, and the root directory can be previewed to ensure recovery of the proper partition. Partition Master 6.1.1 also allows quick and easy partition duplication, movement, creation, and deletion. The EASUS Partition Master 6.1.1 comes in five setups: Home version for 32-bit home users, Professional for 64-bit machines, Server for Windows Server users, Unlimited for unlimited use within a company, and Technician for IT professionals. Pricing ranges from free for the Home version to US$ 199 for the Windows Server version. Currently, a Web special knocks US$ 40 off the Server edition price. Visit http://partition-tool.com/ for details.

Others TechTools » SugarCRM Announces Sugar 6

What Has Changed Since Windows Vista?

You might have already upgraded your Windows application to Windows Vista, or perhaps you might be happy with what Windows XP gives you. Either way, we understand that you might be hesitant to possibly having to handle some breaking changes in the operating system.

As a developer, all you really want is to be able to build applications that meet certain requirements, while maintaining application compatibility so that the application doesn’t break on the new operating system. At the same time, you want to provide end users with a work environment they understand and feel comfortable navigating.

Helping you to do this is one of the core tenets the development team at Microsoft followed when building Windows 7. Although Windows 7 is built on the foundation laid with Windows Vista, the operating system fundamentals have been improved and a rock-solid infrastructure has been created that enables you to provide compelling user experiences.

Hundreds of changes have been made to the underlying infrastructure in Windows 7. And, it’s important to note that these are not breaking changes. These are changes were made to the operating system to improve its performance and reliability. Windows 7 has a much smaller disk footprint than Windows Vista, it consumes significantly less memory, it consumes less CPU resources when idle, it performs less I/O activity, and—above all—the system consumes less power. At the same time, Windows 7 is much more responsive to user input and delivers better performance. There are no disturbances in the user interface, and user interactions with the system—whether it is handling input or showing the Start menu or taskbar—are immediate.

Windows 7 loads faster than ever, and it can scale up to 256 processors. This might sound like a large number, but keep in mind that Windows 7 needs to be in service for the next 5 to 8 years, and it’s more than likely that we’ll see such multicore desktop systems at some time during that period. Changes were made to the internal threading dispatchers, removing locks from critical resources to enable much faster context switching. All of that (and much more) was done while minimizing breaking changes and keeping a focus on stability, reliability, and performance. And most importantly, you will not be disturbed with changes that force you to spend time and money on nonfunctional features.

Microsoft also understands that for you to be a successful Windows developer, the operating system needs to support your efforts and include built-in tools and technologies that will help boost your productivity. Windows 7 ships with a large number of development tools and frameworks to assist with your development efforts and increase ease of deployment. Technologies such as .NET 3.5 Service Pack 1 (SP1), Windows PowerShell 2, MSI 5.0, Native Web Services API, and a new Ribbon Framework are just some examples of Windows 7 builtin technologies.

This is your opportunity to create new and exciting user experiences using all the great features that Windows 7 has to offer developers.

Source of Information : Microsoft Press - Introducing Windows 7 for Developers

C# LINQ Keywords - The orderby Clause

The orderby clause is used to sort the sequence of results in a query. Following the orderby keyword is the item you want to sort by, which is commonly some property of the range variable. You can sort in either ascending or descending order, and if you don’t specify that with either the ascending or descending keyword, ascending is the default order. Following the orderby clause, you can have an unlimited set of subsorts simply by separating each sort item with a comma, as demonstrated here:

using System;
using System.Linq;
using System.Collections.Generic;

public class Employee
{
public string LastName { get; set; }
public string FirstName { get; set; }
public string Nationality { get; set; }
}

public class OrderByExample
{
static void Main() {
var employees = new List() {
new Employee {
LastName = "Glasser", FirstName = "Ed",
Nationality = "American"
},
new Employee {
LastName = "Pupkin", FirstName = "Vasya",
Nationality = "Russian"
},
new Employee {
LastName = "Smails", FirstName = "Spaulding",
Nationality = "Irish"
},
new Employee {
LastName = "Ivanov", FirstName = "Ivan",
Nationality = "Russian"
}
};

var query = from emp in employees
orderby emp.Nationality,
emp.LastName descending,
emp.FirstName descending
select emp;
foreach( var item in query ) {
Console.WriteLine( "{0},\t{1},\t{2}",
item.LastName,
item.FirstName,
item.Nationality );
}
}
}

Notice that because the select clause simply returns the range variable, this whole query expression is nothing more than a sort operation. But it sure is a convenient way to sort things in C#. In this example, I sort first by Nationality in ascending order, then the second expression in the orderby clause sorts the results of each nationality group by LastName in descending order, and then each of those groups is sorted by FirstName in descending order.

At compile time, the compiler translates the first expression in the orderby clause into a call to the OrderBy standard query operator extension method. Any subsequent secondary sort expressions are translated into chained ThenBy extension method calls. If orderby is used with the descending keyword, the generated code uses OrderByDescending and ThenByDescending respectively.

Source Of Information : Apress Accelerated C Sharp 2010

C# LINQ Keywords - The where Clause and Filters

Following one or more from clause generators or the join clauses if there are any, you typically place one or more filter clauses. Filters consist of the where keyword followed by a predicate expression. The where clause is translated into a call to the Where extension method, and the predicate is passed to the Where method as a lambda expression. Calls to Enumerable.Where, which are used if you are performing a query on an IEnumerable type, convert the lambda expression into a delegate. Conversely, calls to Queryable.Where, which are used if you perform a query on a collection via an IQueryable interface, convert the lambda expression into an expression tree.

Source Of Information : Apress Accelerated C Sharp 2010

C# LINQ Keywords - The join Clause

Following the from clause, you might have a join clause used to correlate data from two separate sources. Join operations are not typically needed in environments where objects are linked via hierarchies and other associative relationships. However, in the relational database world, there typically are no hard links between items in two separate collections, or tables, other than the equality between items within each record. That equality operation is defined by you when you create a join clause. Consider the following example:
<pre class="brush:csharp">
using System;
using System.Linq;
using System.Collections.Generic;

public class EmployeeId
{
public string Id { get; set; }
public string Name { get; set; }
}

public class EmployeeNationality
{
public string Id { get; set; }
public string Nationality { get; set; }
}

public class JoinExample
{
static void Main() {
// Build employee collection
var employees = new List<EmployeeId>() {
new EmployeeId{ Id = "111-11-1111",
Name = "Ed Glasser" },
new EmployeeId{ Id = "222-22-2222",
Name = "Spaulding Smails" },
new EmployeeId{ Id = "333-33-3333",
Name = "Ivan Ivanov" },
new EmployeeId{ Id = "444-44-4444",
Name = "Vasya Pupkin" }
};

// Build nationality collection.
var empNationalities = new List<EmployeeNationality>() {
new EmployeeNationality{ Id = "111-11-1111",
Nationality = "American" },
new EmployeeNationality{ Id = "333-33-3333",
Nationality = "Russian" },
new EmployeeNationality{ Id = "222-22-2222",
Nationality = "Irish" },
new EmployeeNationality{ Id = "444-44-4444",
Nationality = "Russian" }
};

// Build query.
var query = from emp in employees
join n in empNationalities
on emp.Id equals n.Id
orderby n.Nationality descending
select new {
Id = emp.Id,
Name = emp.Name,
Nationality = n.Nationality
};

foreach( var person in query ) {
Console.WriteLine( "{0}, {1}, \t{2}",
person.Id,
person.Name,
person.Nationality );
}
}
}
</pre>
In this example, I have two collections. The first one contains just a collection of employees and their employee identification numbers. The second contains a collection of employee nationalities in which each employee is identified only by employee ID. To keep the example simple, every piece of data is a string. Now, I want a list of all employee names and their nationalities and I want to sort the list by their nationality but in descending order. A join clause comes in handy here because there is no single data source that contains this information. But join lets us meld the information from the two data sources, and LINQ makes this a snap! In the query expression, I have highlighted the join clause. For each item that the range variable emp references (that is, for each item in employees), it finds the item in the collection empNationalities (represented by the range variable n) where the Id is equivalent to the Id referenced by emp. Then, my projector clause, the select clause, takes data from both collections when building the result and projects that data into an anonymous type. Thus, the result of the query is a single collection where each item from both employees and empNationalities is melded into one. If you execute this example, the results are as shown here:
<pre class="brush:text">
333-33-3333, Ivan Ivanov, Russian

444-44-4444, Vasya Pupkin, Russian
222-22-2222, Spaulding Smails, Irish

111-11-1111, Ed Glasser, American
</pre>
When your query contains a join operation, the compiler converts it to a Join extension method call under the covers unless it is followed by an into clause. If the into clause is present, the compiler uses the GroupJoin extension method which also groups the results. For more information on the more
esoteric things you can do with join and into clauses, reference the MSDN documentation on LINQ or see Pro LINQ: Language Integrated Query in C# 2008 by Joseph C. Rattz, Jr. (Apress, 2007).

There’s no reason you cannot have multiple join clauses within the query to meld data from multiple different collections all at once. In the previous example, you might have a collection that represents languages spoken by each nation, and you could join each item from the empNationalities collection with the items in that language’s spoken collection. To do that, you would simply have one join clause following another.

Source Of Information : Apress Accelerated C Sharp 2010

C# LINQ Keywords - The from Clause and Range Variables

Each query begins with a from clause. The from clause is a generator that also defines the range variable, which is a local variable of sorts used to represent each item of the input collection as the query expression is applied to it. The from clause is just like a foreach construct in the imperative programming style, and the range variable is identical in purpose to the iteration variable in the foreach statement. A query expression might contain more than one from clause. In that case, you have more than one range variable, and it’s analogous to having nested foreach clauses. The next example uses multiple from clauses to generate the multiplication table you might remember from grade school, albeit not in tabular format:
<pre class="brush:csharp">
using System;
using System.Linq;

public class MultTable
{
static void Main() {
var query = from x in Enumerable.Range(0,10)
from y in Enumerable.Range(0,10)
select new {
X = x,
Y = y,
Product = x * y
};
foreach( var item in query ) {
Console.WriteLine( "{0} * {1} = {2}",
item.X,
item.Y,
item.Product );
}
}
}
</pre>
Remember that LINQ expressions are compiled into strongly typed code. So in this example, what is the type of x and what is the type of y? The compiler infers the types of those two range variables based upon the type argument of the IEnumerable<T> interface returned by Range. Because Range returns a type of IEnumerable<int>, the type of x and y is int. Now, you might be wondering what happens if you want to apply a query expression to a collection that only supports the nongeneric IEnumerable interface. In those cases, you must explicitly specify the type of the range variable, as shown here:
<pre class="brush:csharp">
using System;
using System.Linq;
using System.Collections;

public class NonGenericLinq
{
static void Main() {
ArrayList numbers = new ArrayList();
numbers.Add( 1 );
numbers.Add( 2 );
var query = from int n in numbers
select n * 2;
foreach( var item in query ) {
Console.WriteLine( item );
}
}
}
</pre>
You can see where I am explicitly typing the range variable n to type int. At run time, a cast is performed, which could fail with an InvalidCastException. Therefore, it’s best to strive to use the generic, strongly typed IEnumerable<T> rather than IEnumerable so these sorts of errors are caught at compile time rather than run time.

As I’ve emphasized throughout this book, the compiler is your best friend. Use as many of its facilities as possible to catch coding errors at compile time rather than run time. Strongly typed languages such as C# rely upon the compiler to verify the integrity of the operations you perform on the types defined within the code. If you cast away the type and deal with general types such as System.Object rather than the true concrete types of the objects, you are throwing away one of the most powerful capabilities of the compiler. Then, if there is a typebased mistake in your code, and quality assurance does not catch it before it goes out the door, you can bet your customer will let you know about it, in the most abrupt way possible!

<span style="font-size:85%;"><span style="color: rgb(192, 192, 192);">Source Of Information : Apress Accelerated C Sharp 2010</span></span>

Extension Methods and Lambda Expressions Revisited

Before I break down the elements of a LINQ expression in more detail, I want to show you an alternate way of getting the work done. In fact, it’s more or less what the compiler is doing under the covers. The LINQ syntax is very foreign looking in a predominantly imperative language like C#. It’s easy to jump to the conclusion that the C# language underwent massive modifications in order to implement LINQ. Actually, the compiler simply transforms the LINQ expression into a series of extension method calls that accept lambda expressions.

If you look at the System.Linq namespace, you’ll see that there are two interesting static classes full of extension methods: Enumerable and Queryable. Enumerable defines a collection of generic extension methods usable on IEnumerable types, whereas Queryable defines the same collection of generic
extension methods usable on IQueryable types. If you look at the names of those extension methods, you’ll see they have names just like the clauses in query expressions. That’s no accident because the extension methods implement the standard query operators I mentioned in the previous section. In fact, the query expression in the previous example can be replaced with the following code:
<pre class="brush:csharp">
var query = employees
.Where( emp => emp.Salary > 100000 )
.OrderBy( emp => emp.LastName )
.OrderBy( emp => emp.FirstName )
.Select( emp => new {LastName = emp.LastName,
FirstName = emp.FirstName} );
</pre>
Notice that it is simply a chain of extension method calls on IEnumerable, which is implemented by employees. In fact, you could go a step further and flip the statement inside out by removing the extension method syntax and simply call them as static methods, as shown here:
<pre class="brush:csharp">
var query =
Enumerable.Select(
Enumerable.OrderBy(
Enumerable.OrderBy(
Enumerable.Where(
employees, emp => emp.Salary > 100000),
emp => emp.LastName ),
emp => emp.FirstName ),
emp => new {LastName = emp.LastName,
FirstName = emp.FirstName} );
</pre>
But why would you want to do such a thing? I merely show it here for illustration purposes so you know what is actually going on under the covers. Those who are really attached to C# 2.0 anonymous methods could even go one step further and replace the lambda expressions with anonymous methods. Needless to say, the Enumerable and Queryable extension methods are very useful even outside the context of LINQ. And as a matter of fact, some of the functionality provided by the extension methods does not have matching query keywords and therefore can only be used by invoking the extension methods directly.


Source Of Information : Apress Accelerated C Sharp 2010

LINQ Query Expressions

At first glance, LINQ query expressions look a lot like SQL expressions. But make no mistake: LINQ is not SQL. For starters, LINQ is strongly typed. After all, C# is a strongly typed language, and therefore, so is LINQ. The language adds several new keywords for building query expressions. However, their implementation from the compiler standpoint is pretty simple. LINQ query expressions typically get translated into a chain of extension method calls on a sequence or collection. That set of extension methods is clearly defined, and they are called standard query operators.

This LINQ model is quite extensible. If the compiler merely translates query expressions into a series of extension method calls, it follows that you can provide your own implementations of those extension methods. In fact, that is the case. For example, the class System.Linq.Enumerable provides implementations of those methods for LINQ to Objects, whereas System.Linq.Queryable provides implementations of those methods for querying types that implement IQueryable<T> and are commonly used with LINQ to SQL.

This LINQ model is quite extensible. If the compiler merely translates query expressions into a series of extension method calls, it follows that you can provide your own implementations of those extension methods. In fact, that is the case. For example, the class System.Linq.Enumerable provides implementations of those methods for LINQ to Objects, whereas System.Linq.Queryable provides implementations of those methods for querying types that implement IQueryable<T> and are commonly used with LINQ to SQL.

Let’s jump right in and have a look at what queries look like. Consider the following example, in which I create a collection of Employee objects and then perform a simple query:
<pre class="brush:csharp">
using System;
using System.Linq;
using System.Collections.Generic;

public class Employee
{
public string FirstName { get; set; }
public string LastName { get; set; }
public Decimal Salary { get; set; }
public DateTime StartDate { get; set; }
}

public class SimpleQuery
{
static void Main() {
// Create our database of employees.
var employees = new List<Employee> {

new Employee {
FirstName = "Joe",
LastName = "Bob",
Salary = 94000,
StartDate = DateTime.Parse("1/4/1992") },
new Employee {
FirstName = "Jane",
LastName = "Doe",
Salary = 123000,
StartDate = DateTime.Parse("4/12/1998") },
new Employee {
FirstName = "Milton",
LastName = "Waddams",
Salary = 1000000,
StartDate = DateTime.Parse("12/3/1969") }
};

var query = from employee in employees
where employee.Salary > 100000
orderby employee.LastName, employee.FirstName
select new { LastName = employee.LastName,
FirstName = employee.FirstName };
Console.WriteLine( "Highly paid employees:" );
foreach( var item in query ) {
Console.WriteLine( "{0}, {1}",
item.LastName,
item.FirstName );
}
}
}
</pre>
First of all, you will need to import the System.Linq namespace, as I show in the following section titled "Standard Query Operators." In this example, I marked the query expression in bold to make it stand out. It’s quite shocking if it’s the first time you have seen a LINQ expression! After all, C# is a language that syntactically evolved from C++ and Java, and the LINQ syntax looks nothing like those languages.

Prior to the query expression, I created a simple list of Employee instances just to have some data to work with. Each query expression starts off with a from clause, which declares what’s called a range variable. The from clause in our example is very similar to a foreach statement in that it iterates over the employees collection and stores each item in the collection in the variable employee during each iteration. After the from clause, the query consists of a series of clauses in which we can use various query operators to filter the data represented by the range variable. In my example, I applied a where clause and an orderby clause, as you can see. Finally, the expression closes with select, which is a projection operator. When you perform a projection in the query expression, you are typically creating another collection of information, or a single piece of information, that is a transformed version of the collection iterated by the range variable. In the previous example, I wanted just the first and last names of the employees in my results.

Another thing to note is my use of anonymous types in the select clause. I wanted the query to create a transformation of the original data into a collection of structures, in which each instance contains a FirstName property, a LastName property, and nothing more. Sure, I could have defined such a structure prior to my query and made my select clause instantiate instances of that type, but doing so defeats some of the convenience and expressiveness of the LINQ query. And most importantly, as I’ll detail a little later in the section "The Virtues of Being Lazy," the query expression does not execute at the point the query variable is assigned. Instead, the query variable in this example implements IEnumerable<T>, and the subsequent use of foreach on the query variable produces the end result of the example.

The end result of building the query expression culminates in what’s called a query variable, which is query in this example. Notice that I reference it using an implicitly typed variable. After all, can you imagine what the type of query is? If you are so inclined, you can send query.GetType to the console and you’ll see that the type is as shown here:
<pre class="brush:csharp">
System.Linq.Enumerable+<SelectIterator>d__b`2[Employee, ?
<>f__AnonymousType0`2[System.String,System.String]]
</pre>
For those of you familiar with SQL, the first thing you probably noticed is that the query is backward from what you are used to. In SQL, the select clause is normally the beginning of the expression. There are several reasons why the reversal makes sense in C#. One reason is so that Intellisense will work. In the example, if the select clause appeared first, Intellisense would have a hard time knowing which properties employee provides because it would not even know the type of employee yet.


Source Of Information : Apress Accelerated C Sharp 2010

LINQ: Language Integrated Query

C-style languages (including C#) are imperative in nature, meaning that the emphasis is placed on the state of the system, and changes are made to that state over time. Data acquisition languages such as SQL are functional in nature, meaning that the emphasis is placed on the operation and there is little or no mutable data used during the process. LINQ bridges the gap between the imperative programming style and the functional programming style. LINQ is a huge topic that deserves entire books devoted to it and what you can do with LINQ.1 There are several implementations of LINQ readily available: LINQ to Objects, LINQ to SQL, LINQ to Dataset, LINQ to Entities, and LINQ to XML. I will be focusing on LINQ to Objects because I’ll be able to get the LINQ message across without having to incorporate extra layers and technologies.

LINQ does a very good job of allowing the programmer to focus on the business logic while spending less time coding up the mundane plumbing that is normally associated with data access code. If you have experience building data-aware applications, think about how many times you have found yourself coding up the same type of boilerplate code over and over again. LINQ removes some of that burden.

Development for LINQ started some time ago at Microsoft and was born out of the efforts of Anders Hejlsberg and Peter Golde. The idea was to create a more natural and language-integrated way to access data from within a language such as C#. However, at the same time, it was undesirable to implement it in such a way that it would destabilize the implementation of the C# compiler and become too cumbersome for the language. As it turns out, it made sense to implement some building blocks in the language in order to provide the functionality and expressiveness of LINQ. Thus we have features like lambda expressions, anonymous types, extension methods, and implicitly typed variables. All are excellent features in themselves, but arguably were precipitated by LINQ.


Source Of Information : Apress Accelerated C Sharp 2010

Collection Initializers

C# 3.0 introduced a new abbreviated syntax for initializing collections, similar to the object initializer syntax shown in the section titled "Object Initializers." If the collection type instance you are initializing implements IEnumerable or IEnumerable<T> and contains a public Add method that accepts one parameter of the contained type, you can utilize this new syntax. Alternatively, your type could just implement ICollection<T> from the System.Collections.Generic namespace because it also implements IEnumerable<T>. The collection initializer syntax is shown in the following:
<pre class="brush:csharp">
using System;
using System.Collections.Generic;

public class Employee
{
public string Name { get; set; }
}

public class CollInitializerExample
{

static void Main() {
var developmentTeam = new List<Employee> {
new Employee { Name = "Michael Bolton" },
new Employee { Name = "Samir Nagheenanajar" },
new Employee { Name = "Peter Gibbons" }\
};
Console.WriteLine( "Development Team:" );
foreach( var employee in developmentTeam ) {
Console.WriteLine( "\t" + employee.Name );
}
}
}
</pre>
Under the covers the compiler generates a fair amount of code to help you out here. For each item in the collection initialization list, the compiler generates a call to the collection’s Add method. Notice that I have also used the new object initializer syntax to initialize each of the instances in the initializer list.

As I’ve mentioned, the collection type must implement ICollection<T> or implement IEnumerable<T> and a public Add method. If it does not, you will receive compile-time errors. Additionally, the collection must implement only one specialization of ICollection<T>; that is, it can only implement ICollection<T> for one type T. And finally, each item in the collection initialization list must be implicitly convertible to the type T.


Source Of Information : Apress Accelerated C Sharp 2010

Types That Produce Collections

I’ve already touched upon the fact that a collection’s contents can change while an enumerator is enumerating the collection. If the collection changes, it could invalidate the enumerator. In the following sections on iterators, I show how you can create an enumerator that locks access to the container while it is enumerating. Although that’s possible, it may not be the best thing to do from an efficiency standpoint. For example, what if it takes a long time to iterate over all of the items in the collection? The foreach loop could do some lengthy processing on each item, during which time anyone else could be blocked from modifying the collection.

In cases like these, it may make sense for the foreach loop to iterate over a copy of the collection rather than the original collection itself. If you decide to do this, you need to make sure you understand what a copy of the collection means. If the collection contains value types, then the copy is a deep copy, as long as the value types within don’t hold on to reference types internally. If the collection contains reference types, you need to decide if the copy of the collection must clone each of the contained items. Either way, it would be nice to have a design guideline to follow in order to know when to return a copy.

The current rule of thumb when returning collection types from within your types is to always return a copy of the collection from methods, and return a reference to the actual collection if accessed through a property on your type. Although this rule is not set in stone, and you’re in no way obligated to follow it, it does make some semantic sense. Methods tend to indicate that you’re performing some sort of operation on the type and you may expect results from that operation. On the other hand, property access tends to indicate that you need direct access to the state of the object itself. Therefore, this rule of thumb makes good semantic sense. In general, it makes sense to apply this same semantic separation to all properties and methods within your types.


Source Of Information : Apress Accelerated C Sharp 2010

You’ve seen how you can use the C# foreach statement to conveniently iterate over a collection of objects, including a System.Array, ArrayList, List<T>, and so on. How does this work? The answer is that each collection that expects to work with foreach must implement the IEnumerable<T> or IEnumerable interface that foreach uses to obtain an object that knows how to enumerate, or iterate over, the items in the collection. The iterator object obtained from IEnumerable<T> must implement the IEnumerator<T> or IEnumerator interface. Generic collection types typically implement IEnumerable<T>, and the enumerator object implements IEnumerator<T>. IEnumerable<T> derives from IEnumerable, and IEnumerator<T> derives from IEnumerator. This allows you to use generic collections in places where nongeneric collections are used. Strictly speaking, your collection types are not required to implement enumerators, and users can iterate through the collection using a for loop if you provide an index operator by implementing IList<T>, for example. However, you won’t make many friends that way, and once I show you how easy it is to create enumerators using iterator blocks, you’ll see that it’s a piece of cake to implement IEnumerable<T> and IEnumerator<T>.

Many of you may already be familiar with the nongeneric enumerator interfaces and how to implement enumerators on your collection types. In the rest of this section, I’ll quickly go over the salient points of creating enumerators from scratch, and I’ll quickly transition to how to create enumerators the new and improved way using iterator blocks. If you’d like, you may skip to the next section on iterators. Or if you want a refresher on implementing enumerators, go ahead and read the rest of this section.

The IEnumerable<T> interface exists so that clients have a well-defined way to obtain an enumerator on the collection. The following code defines the IEnumerable<T> and IEnumerable interfaces:
<pre class="brush:csharp">
public interface IEnumerable<T> : IEnumerable
{
IEnumerator<T> GetEnumerator();
}

public interface IEnumerable
{
IEnumerator GetEnumerator();
}
</pre>
Since both interfaces implement GetEnumerator with the same overload signature (remember, the return value doesn’t take part in overload resolution), any collection that implements IEnumerable<T> needs to implement one of the GetEnumerator methods explicitly. It makes the most sense to implement the non-generic IEnumerable.GetEnumerator method explicitly to make the compiler happy.

The IEnumerator<T> and IEnumerator interfaces are shown here:
<pre class="brush:csharp">
public interface IEnumerator<T> : IEnumerator, IDisposable
{
T Current { get; }
}

public interface IEnumerator
{
object Current { get; }
bool MoveNext();
void Reset();
}
</pre>
Again, the two interfaces implement a member that has the same signature, which, in this case, is the Current property. When implementing IEnumerator<T>, you should implement IEnumerator.Current explicitly. Also, notice that IEnumerator<T> implements the IDisposable interface. Later, I’ll explain why this is a good thing.

Now I’m going to show you how to implement IEnumerable<T> and IEnumerator<T> for a homegrown collection type. Good teachers always show you how to do something the "hard way" before introducing you to the "easy way." I think this technique is useful because it forces you to understand what is happening under the covers. When you know what’s happening underneath, you’re more adept at dealing with the technicalities that may come from using the "easy way." Let’s look at an example of implementing IEnumerable<T> and IEnumerator<T> the hard way by introducing a home-grown collection of integers. I’ll show how to implement the generic versions, because that implies that you must also implement the nongeneric versions as well. In this example, I haven’t implemented ICollection<T> so as not to clutter the example, because I’m focusing on the enumeration interfaces:
<pre class="brush:csharp">
using System;
using System.Threading;
using System.Collections;
using System.Collections.Generic;
public class MyColl<T> : IEnumerable<T>
{
public MyColl( T[] items ) {
this.items = items;
}

public IEnumerator<T> GetEnumerator() {
return new NestedEnumerator( this );
}

IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
}

// The enumerator definition.
class NestedEnumerator : IEnumerator<T>
{
public NestedEnumerator( MyColl<T> coll ) {
Monitor.Enter( coll.items.SyncRoot );
this.index = -1;
this.coll = coll;
}
public T Current {
get { return current; }
}
object IEnumerator.Current {
get { return Current; }
}
public bool MoveNext() {
if( ++index >= coll.items.Length ) {
return false;
} else {
current = coll.items[index];
return true;
}
}

public void Reset() {
current = default(T);
index = 0;
}

public void Dispose() {
try {
current = default(T);
index = coll.items.Length;
}
finally {
Monitor.Exit( coll.items.SyncRoot );
}
}
private MyColl<T> coll;
private T current;
private int index;
}
private T[] items;
}

public class EntryPoint
{
static void Main() {
MyColl<int> integers =
new MyColl<int>( new int[] {1, 2, 3, 4} );
foreach( int n in integers ) {
Console.WriteLine( n );
}
}
}
</pre>
This example initializes the internal array within MyColl<T> with a canned set of integers, so that the
enumerator will have some data to play with. Of course, a real container should implement ICollection<T> to allow you to populate the items in the collection dynamically. The foreach statements expand into code that obtains an enumerator by calling the GetEnumerator method on the IEnumerable<T> interface. The compiler is smart enough to use IEnumerator<T>.GetEnumerator rather than IEnumerator.GetEnumerator in this case. Once it gets the enumerator, it starts a loop, where it first calls MoveNext and then initializes the variable n with the value returned from Current. If the loop contains no other exit paths, the loop will continue until MoveNext returns false. At that point, the enumerator finishes enumerating the collection, and you must call Reset on the enumerator in order to use it again.

Even though you could create and use an enumerator explicitly, I recommend that you use the foreach construct instead. You have less code to write, which means fewer opportunities to introduce inadvertent bugs. Of course, you might have good reasons to manipulate the enumerators directly. For example, your enumerator could implement special methods specific to your concrete enumerator type that you need to call while enumerating collections. If you must manipulate an enumerator directly, be sure to always do it inside a using block, because IEnumerator<T> implements IDisposable.

Notice that there is no synchronization built into enumerators by default. Therefore, one thread could enumerate over a collection, while another thread modifies it. If the collection is modified while an enumerator is referencing it, the enumerator is semantically invalid, and subsequent use could produce undefined behavior. If you must preserve integrity within such situations, then you may want your enumerator to lock the collection via the object provided by the SyncRoot property. The obvious place to obtain the lock would be in the constructor for the enumerator. However, you must also release the lock at some point. You already know that in order to provide such deterministic cleanup, you must implement the IDisposable interface. That’s exactly one reason why IEnumerator<T> implements the IDisposable interface. Moreover, the code generated by a foreach statement creates a try/finally block under the covers that calls Dispose on the enumerator within the finally block. You can see the technique in action in my previous example.


Source Of Information : Apress Accelerated C Sharp 2010

Collection Types - Efficiency

When given a choice, you should always prefer the generic collection types over the nongeneric versions because of added type safety and higher efficiency. Let’s consider the efficiency standpoint a little more closely. When containing value types, the generic types avoid any unnecessary boxing and unboxing. Boxing is definitely a much more expensive operation than unboxing, because boxing requires a heap allocation but an unboxing operation doesn’t. Rico Mariani pinpoints many other efficiency bottlenecks in his blog, Rico Mariani’s Performance Tidbits.8 He indicates that the development teams spent a lot of time focusing specifically on performance issues and simplifying things to make them better. One excellent example that he provides illustrates how List<T> is remarkably faster than ArrayList when used in many foreach iterations. However, the speed is not because of the obvious boxing/unboxing reasons, but rather because ArrayList uses a gratuitous amount of virtual methods, especially during enumeration. ArrayList.GetEnumerator is virtual, and the nested enumerator type ArrayListEnumeratorSimple also implements the MoveNext method and the Current property virtually. That adds up to many costly virtual methods to call during enumeration. Unless you’re enumerating an ArrayList like a crazed demon, you won’t notice this performance penalty, but it just goes to show how much attention the BCL development team has been putting on efficiency lately.

This is a great example of why you want to analyze your class designs clearly to ensure that you’re making your classes inheritable for a good reason. Don’t make a method virtual unless you’re positive someone will need to override it, and if you do, make sure you use the NVI pattern. It is my firm belief that you should tend toward creating sealed classes, unless you’re absolutely sure that there is a good reason why people would want to inherit from your class. If you can’t think of a reason why they would want to, don’t leave it unsealed just because you think someone may come up with a good reason in the future. If you don’t come up with a good reason, then it’s unlikely that you created your class with inheritance in mind, and it may not work as expected for whatever derives from your class. Inheritability should be a conscious decision and not a subconscious one.

There is one caveat to everything mentioned so far: Gratuitous use of generics, or any feature for that matter, without knowing the ramifications is never good. Whenever a fully constructed type is created, the runtime must generate that code within memory. Also, fully constructed types created from generic types with static fields will each get their own copy of the static fields. Moreover, they’ll all get their own version of the static constructor. So, if the generic contains a field like this:
<pre class="brush:csharp">
public class MyGeneric<T>
{
public static int staticField;
}
</pre>
then MyGeneric<int>.staticField and MyGeneric<long>.staticField will both reference different storage locations.

The moral of the story is that you must consider the engineering trade-off. Although generics help avoid boxing and generally create more efficient code, they can also increase the size of your application’s working set. If in doubt, measure the results using performance-analysis tools to determine the proper route to take.

Even if your class derives from a class that uses virtual methods, it will be more efficient if you declare it sealed, because the compiler can then call those virtual methods nonvirtually when calling through a reference to the derived type.

Source Of Information : Apress Accelerated C Sharp 2010

Collection Types - System.Collections.ObjectModel

For those of you who need to define your own collection types, you’ll find the types defined in the System.Collection.ObjectModel namespace most useful. In fact, you should derive your implementations from the objects in this namespace, if at all possible. This namespace contains only five types, and the fact that this namespace exists has been the source of some controversy. There were two main reasons these types were broken out into their own namespace. First, the Visual Basic environment already contains a Collection type that is implemented by a namespace it imports by default, and the Visual Basic team was concerned that VB users could become confused by seeing two types with similar names and drastically different behaviors popping up in IntelliSense. Second, the Base Class Libraries (BCL) team thought that users would rarely need the types in this namespace. Whether that is true will be shown over time. My opinion is that these types are extremely useful for writing libraries or for code consumed by others. One of Microsoft’s guidelines even suggests that you should consider creating a subclass of these types when exposing collections, even if only to provide a richer type name describing the collection and an easily accessible extensibility point.

These types are extremely useful if you’re defining collection types of your own. You can derive your type from Collection<T> easily in order to get default collection behavior, including implementation of ICollection<T>, IList<T>, and IEnumerable<T>. Collection<T> also implements the nongeneric interfaces ICollection, IList, and IEnumerable. However, you may have to cast the type to one of these interfaces explicitly to access the properties and methods of them, because many of them are implemented explicitly. Moreover, the Collection<T> type uses the NVI pattern7 to provide the derived type with a set of protected virtual methods that you can override. I won’t list the entire public interface to Collection<T> here, because you can find the details in the MSDN documentation. However, the protected virtual methods that you may override are shown in the following code:

public class Collection<T> : ICollection<T>, IList<T>, IEnumerable<T>,
ICollection, IList, IEnumerable
{
...
protected virtual void ClearItems();
protected virtual void InsertItem( int index, T item );
protected virtual void RemoveItem( int index );
protected virtual void SetItem( int index, T item );
...
}

You cannot modify the storage location of the collection by overriding these methods. Collection<T> manages the storage of the items, and the items are held internally through a private field of type IList<T>. However, you can override these methods to manage extra information triggered by these operations. Just be sure to call through to the base class versions in your overrides.

Finally, the Collection<T> type offers two constructors: one creates an empty instance, and the other accepts an IList<T>. The constructor copies the passed-in contents of the IList<T> instance into the new collection in the order that they are provided by the enumerator returned from IList<T>.GetEnumerator. This ordering is important to note, as you’ll see a way to control it in the following section on enumerators and iterator blocks. The implementation of the source list’s enumerator can do such things as reverse the order of the items as they’re put into the collection, simply by providing a proper enumerator implementation. Personally, I believe there should be more constructors on Collection<T> that accept an interface of type IEnumerator<T> and IEnumerable<T> in order to provide more flexible ways to fill a collection. You can solve this problem by introducing the extra constructors into a type that derives from Collection<T>, as I’ve shown here:

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
public class MyCollection<T> : Collection<T>
{
public MyCollection() : base() {
}

public MyCollection( IList<T> list )
: base(list) { }

public MyCollection( IEnumerable<T> enumerable )
: base() {
foreach( T item in enumerable ) {
this.Add( item );
}
}

public MyCollection( IEnumerator<T> enumerator )
: base() {
while( enumerator.MoveNext() ) {
this.Add( enumerator.Current );
}
}
}

public class EntryPoint
{
static void Main() {
MyCollection<int> coll =
new MyCollection<int>( GenerateNumbers() );

foreach( int n in coll ) {
Console.WriteLine( n );
}
}

static IEnumerable<int> GenerateNumbers() {
for( int i = 4; i >= 0; —i ) {
yield return i;
}
}
}

In Main, you can see the instance of MyCollection<int> created by passing in an IEnumerable<int> type returned from the GenerateNumbers method. If the yield keyword in the GenerateNumbers method looks foreign to you, it may be because it’s a feature added in C# 2.0. I’ll explain this keyword a little later on in this chapter. Essentially, it defines what’s called an iterator block, which creates a compilergenerated enumerator from the code. After creating a MyCollection<T> constructed type, you can still hold on to it and use it solely through a Collection<T> reference. After all, MyCollection<T> is-a Collection<T>. Incidentally, I didn’t bother creating constructors that accept the nongeneric IEnumerable and IEnumerator, simply because I want to favor stronger type safety.

You may have noticed the existence of List<T> in the System.Collections.Generic namespace. It would be tempting to use List<T> in your applications whenever you need to provide a generic list type to consumers. However, instead of using List<T>, consider Collection<T>. List<T> doesn’t implement the protected virtual methods that Collection<T> implements. Therefore, if you derive your list type from List<T>, your derived type has no way to respond when modifications are made to the list. On the other hand, List<T> serves as a great tool to use when you need to embed a raw list-like storage implementation within a type, because it is devoid of virtual method calls such as Collection<T> and is more efficient as a result.

Another useful type within the System.Collections.ObjectModel namespace is the type ReadOnlyCollection<T>, which is a wrapper you can use to implement read-only collections. Since the C# language lacks any notion of using the const keyword for const-correctness like in C++, it is essential to create immutable types when necessary and pass those to methods in lieu of const parameters. The constructor for ReadOnlyCollection<T> accepts an IList<T> parameter type. Thus, you can use a ReadOnlyCollection<T> to wrap any type that implements IList<T>, including Collection<T>. Naturally, if users access the ICollection<T>.IsReadOnly property, the answer will be true. Any time users call a modifying method such as ICollection<T>.Clear, an exception of type NotSupportedException will be thrown. Moreover, in order to call modifying methods, the ReadOnlyCollection<T> reference must be cast to the interface containing the method, because ReadOnlyCollection<T> implements all modifying methods explicitly. The biggest benefit of implementing these methods explicitly is to help you avoid their use at compile time.

Source Of Information : Apress Accelerated C Sharp 2010

Collection Types - Sets

The .NET 3.5 Framework introduced yet another useful collection class, known as HashSet, which is defined in the System.Collections.Generic namespace. HashSet implements the typical set operations that you would expect. For example, you can call the IntersectWith method to modify the current set so that it will contain an intersection of the current items and the items contained in the IEnumerable<T> type given. Conversely, UnionWith modifies the current set to contain the union of two sets. Other useful methods include IsSubsetOf, IsSupersetOf, ExceptWith, SymmetricExceptWith, Contains, etc. These are just a few of the useful methods available for sets.

As is typical with set operations, you can only add unique values to instances of HashSet. For example, if you have already added the values 1, 2, and 3 to a HashSet<int> instance, then you cannot add another integer corresponding to one of those values. This is the reason the Add method returns a Boolean indicating whether the operation succeeded or not. It would be inefficient to throw an exception in such cases, so the result is indicated via the return value from Add.

Notice that the various set operation methods implemented by HashSet accept parameters of type IEnumerable<T>. This is very handy because it allows you to use any collection type as the parameter to these methods rather than only HashSet instances.

Source Of Information : Apress Accelerated C Sharp 2010

Collection Types - Dictionaries

The .NET 2.0 Framework introduced the IDictionary<TKey, TValue> type as a generic and thus strongly typed counterpart to IDictionary. As usual, concrete types that implement IDictionary<TKey, TValue> should implement IDictionary as well. There is a lot of overlap, and the generic interface declares more type-safe versions of some properties and methods declared in IDictionary. However, there is also a new method available on IDictionary<TKey, TValue> called TryGetValue, which you can use to attempt to get a value based on the given key. The method returns the value through an out parameter, and the actual return value from the method indicates whether the item was in the dictionary. Although you can do this same thing using the index operator and catching the KeyNotFoundException when the item is not in there, it is always more efficient to avoid exceptions if you know the item is probably not there. Using exceptions for the purpose of control flow is a practice to avoid for two reasons. First, using exceptions for control flow is inefficient, because exceptions are expensive. Second, it trivializes the fact that an exception is a truly exceptional event. When using exceptions for control flow, you’re using exceptions to handle an expected event. You’ll find more cases of this Try... method call pattern throughout the .NET Framework, because the .NET team made a concerted effort to avoid efficiency bottlenecks such as these.

When implementing generic dictionaries, you have a couple of choices from which to derive implementations. First, you can use SortedDictionary<TKey, TValue>, which provides O(log n) retrieval and implements IDictionary<TKey, TValue> as well as the collection interfaces. However, you can also choose to use KeyedCollection<TKey, TValue> in the System.Collections.ObjectModel namespace. Although it doesn’t actually implement the dictionary interfaces, it does provide O(1) retrieval most of the time. For more details, see the MSDN documentation.

Source Of Information : Apress Accelerated C Sharp 2010

Collection Types - Lists

One thing that is missing from ICollection<T>, and for good reason, is an index operator that allows you to access the items within the collection using the familiar array-access syntax. The fact is that not all concrete types that implement ICollection<T> need to have an index operator, and in some of those cases, it makes no sense for them to have an index operator. For example, an index operator for a list of
integers would probably accept a parameter of type int, whereas a dictionary type would accept a parameter type that is the same as the key type in the dictionary.

If you’re defining a collection where it makes sense to index the items, then you want that collection to implement IList<T>. Concrete generic list collection types typically implement the IList<T> and IList interfaces. IList<T> implements ICollection<T>, and IList implements ICollection, so any type that is a list is also a collection. The IList<T> interface looks like the following:

public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
T this[ int index ] { get; set; }
int IndexOf( T item );
void Insert( int index, T item );
void RemoveAt( int index );
}

The IList interface is a bit larger:

public interface IList : ICollection, IEnumerable
{
bool IsFixedSize { get; }
bool IsReadOnly { get; }
object this[ int index ] { get; }
int Add( object value );
void Clear();
bool Contains( object value );
int IndexOf( object value );
void Insert( int index, object value );
void Remove( object value );
void RemoveAt( int index );
}

As you can see, there is some overlap between IList<T> and IList, but there are plenty of useful properties and methods in IList that a generic container such as List<T>, or any other generic list that you create, would want. As with ICollection<T> and ICollection, the typical pattern is to implement both interfaces. You should explicitly implement the methods of IList that overlap in functionality with those of IList<T>, so that the only way to get to them is to convert the instance reference to the IList type first.

Source Of Information : Apress Accelerated C Sharp 2010

The most obvious additions to the collection types starting within the .NET 2.0 Framework are the types defined within the System.Collections.Generic namespace. These types are strongly typed, thus giving the compiler a bigger type-safety hammer to wield when ferreting out type-mismatch bugs at compile time. In addition, when used to contain value types, they are much more efficient, because there is no gratuitous boxing. Arguably, the root type of all the generic collection types is ICollection<T>. I have included the declaration for it here:

public interface ICollection<T> : IEnumerable<T>, IEnumerable
{
int Count { get; }
bool IsReadOnly { get; }
void Add( T item );
void Clear();
bool Contains( T item );
void CopyTo( T[] array, int arrayIndex );
bool Remove( T item );
}

For the sake of comparison, I’ve included the nongeneric ICollection interface definition as well:

public interface ICollection : IEnumerable
{
int Count { get; }
bool IsSynchronized { get; }
object SyncRoot { get; }
void CopyTo( Array array, int index );
}

Now, let’s take a look at the differences and what that means for your code. One thing that has been missing with the nongeneric collections is a uniform interface for managing the contents of the collection. For example, the nongeneric Stack and Queue types both have a Clear method to erase their contents. As expected, they both implement ICollection. However, because ICollection doesn’t contain any modifying methods, you generally can’t treat instances of these two types polymorphically within code. Thus, you would always have to cast an instance variable to type Stack in order to call Stack.Clear, and cast to type Queue in order to call Queue.Clear.

ICollection<T> helps this problem by declaring some methods for modifying the collection. As with most general-use solutions, it does not necessarily apply to all situations. For example, ICollection<T> also declares an IsReadOnly property, because sometimes you need to introduce an immutable collection in your design. For those instances, you would expect calls to Add, Clear, and Remove to throw an InvalidOperationException.

Since a main purpose of ICollection<T> is to provide stronger type safety, it only makes sense that ICollection<T> provides its own version of CopyTo that is strongly typed. Whereas ICollection.CopyTo knows that the first parameter is an array and accepts a System.Array reference as its first parameter, ICollection<T>.CopyTo is given the concrete array type in its first parameter. Clearly, you can only pass a single dimension array to ICollection<T>.CopyTo. The fact is that the nongeneric ICollection.CopyTo only accepts an array of single dimension as well, but because the compiler cannot determine the rank of a System.Array type at compile time, you get a runtime exception of the type ArgumentException if you
pass an array with more than one dimension to a proper implementation of ICollection.CopyTo. Notice that I said "a proper implementation." Not only is the caller of ICollection.CopyTo supposed to know this rule, but so is the type implementing ICollection. The added type information in
ICollection<T>.CopyTo not only protects both the caller and the implementer from making this mistake, it also provides greater efficiency.

You’ll notice that all of the generic collection types implement both ICollection<T> and ICollection. Both interfaces provide useful utility to the container type. Any methods in ICollection that overlap with ICollection<T> should be implemented explicitly.

For better performance, it’s recommended that calling code determines if such operations are forbidden by first checking the IsReadOnly property, thus avoiding the exception altogether. Of course, if the end result of IsReadOnly returning true is that you throw an exception, then there is no gain.

When defining your own collection types, you should derive from Collection<T> in the System.Collections.ObjectModel namespace unless there is a good reason not to do so. For instance, Collection<T> might have some functionality that you don’t want, or you must be explicit about how the items are stored in the collection and has protected virtual methods that you can override to control its behavior. When you don’t derive from Collection<T>, your job is much more laborious, because you must reimplement most of what Collection<T> already implements. If you are creating your own custom dictionary type, derive from the Parallel Computing Platform team at Microsoft and their locking techniques are finely tuned for efficiency in concurrent multithreaded environments.

Source Of Information : Apress Accelerated C Sharp 2010

Multidimensional Jagged Arrays

If you come from a C/C++ or Java background, you’re probably already familiar with jagged arrays, because those languages don’t support rectangular multidimensional arrays like C# does. The only way to implement multidimensional arrays is to create arrays of arrays, which is precisely what a jagged array is. However, because each element of the top-level array is an individual array instance, each array instance in the top-level array can be any size. Therefore, the array isn’t necessarily rectangular—hence, the name jagged arrays.

The syntactical pattern for declaring a jagged array in C# is similar to its cousins C++ and Java. The following example shows how to allocate and use a jagged array:

using System;
using System.Text;
public class EntryPoint
{
static void Main() {
int[][] jagged = new int[3][];
jagged[0] = new int[] {1, 2};
jagged[1] = new int[] {1, 2, 3, 4, 5};
jagged[2] = new int[] {6, 5, 4};
foreach( int[] ar in jagged ) {
StringBuilder sb = new StringBuilder();
foreach( int n in ar ) {
sb.AppendFormat( "{0} ", n );
}
Console.WriteLine( sb.ToString() );
}
Console.WriteLine();
for( int i = 0; i < jagged.Length; ++i ) {
StringBuilder sb = new StringBuilder();
for( int j = 0; j < jagged[i].Length; ++j ) {
sb.AppendFormat( "{0} ", jagged[i][j] );
}
Console.WriteLine( sb.ToString() );
}
}
}

As you can see, allocating and creating a jagged array is a bit more complex than rectangular arrays because you must handle all of the subarray allocations individually, whereas a rectangular array gets allocated all at once. Notice how the output provides a jagged-looking output, because each subarray has a different size:

In the example, I show two ways to iterate through the array just to show the syntax for accessing the individual items within a jagged array and how that syntax differs from accessing items within a rectangular array. The syntax is similar to that of C++ and Java. The foreach method of iterating through the array is more elegant, and as I’ll cover later on, using foreach allows you to use the same code to iterate through collections that may not be arrays.

It often makes sense to use jagged arrays rather than rectangular arrays. For example, you may be reading in information from a database, and each entry in the top-level array may represent a collection where each subcollection may have a widely varying amount of items in it. If most of the subcollections contain just a handful of items and then one of them contains 100 items, a rectangular array would waste a lot of space because it would allocate 100 entries for each subcollection no matter what. Jagged arrays are generally more space efficient, but the trade-off is that accessing items within a jagged array requires more care, because you cannot assume that each subarray has the same number of items in it.

It’s preferable to use foreach to iterate through arrays and collections. That way, you can change the type of the container later and the foreach block won’t have to change. If you use a for loop instead, you may have to change the method used to access each individual element. Additionally, foreach handles cases where the array has a nonzero lower bound.

Jagged arrays can potentially be more computationally efficient, because jagged arrays are typically arrays of single-dimension, zero-lower-bound arrays, which the CLR represents with vectors.

Source Of Information : Apress Accelerated C Sharp 2010

Multidimensional Rectangular Arrays

C# and the CLR contain direct support for multidimensional arrays, also known as rectangular arrays. You can easily declare an array with multiple rank within C#. Simply introduce a comma into the square brackets to separate the rank, as shown in the following example:

using System;
public class EntryPoint
{
static void Main() {
int[,] twoDim1 = new int[5,3];
int[,] twoDim2 = { {1, 2, 3},
{4, 5, 6},
{7, 8, 9} };
foreach( int i in twoDim2 ) {
Console.WriteLine( i );
}
}
}

There are several things to note when using rectangular arrays. All usage of these arrays boils down to method calls on a CLR-generated reference type, and the built-in vector types don’t come into play here. Notice the two declarations. In each case, you don’t need the size of each dimension when declaring the type. Again, that’s because arrays are typed based on their containing type and rank. However, once you create an instance of the array type, you must provide the size of the dimensions. I did this in two different ways in this example. In creating twoDim1, I explicitly said what the dimension sizes are, and in the creation of twoDim2, the compiler figured it out based upon the initialization expression.

In the example, I listed all of the items in the array using the foreach loop as shown. foreach iterates over all items in the array in a row-major fashion. I could have achieved the same goal using two nested for loops, and I definitely would have needed to do that if I needed to iterate over the array elements in any other order. When doing so, keep in mind that the Array.Length property returns the total amount of items in the array. In order to get the count of each dimension, you must call the Array.GetLength method supplying the dimension that you’re interested in. For example, I could have iterated over the items in the array using the following syntax, and the results would have been the same:

using System;
public class EntryPoint
{
static void Main() {
int[,] twoDim = { {1, 2, 3},
{4, 5, 6},
{7, 8, 9} };
for( int i = 0; i != twoDim.GetLength(0); ++i ) {
for( int j = 0; j != twoDim.GetLength(1); ++j ) {
Console.WriteLine( twoDim[i,j] );
}
}
for( int i = twoDim.GetLowerBound(0);
i <= twoDim.GetUpperBound(0);
++i ) {
for( int j = twoDim.GetLowerBound(1);
j <= twoDim.GetUpperBound(1);
++j ) {
Console.WriteLine( twoDim[i,j] );
}
}
}
}

For good measure, I’ve shown how to iterate over the dimensions of the array using two methods. The first method assumes that the lower bound of each dimension is 0, and the second does not. In all of the calls to GetLength, GetUpperBound, and GetLowerBound, you must supply a zero-based dimension of the Array that you’re interested in.

When you access the items of a multidimensional array, the compiler generates calls to Get and Set methods, which are similar to GetValue and SetValue. These methods are overloaded to accept a variable list of integers to specify the ordinal of each dimension within the array.

When mapping multidimensional arrays to mathematical concepts, the rectangular array is the most natural and preferred way to go. However, creating methods where an argument may be an array of varying rank is tricky, because you must accept the argument as type System.Array and dynamically deal with the rank of the array. You can access the rank of an array using the Array.Rank property. Thus, creating rank-general code is tricky due to the syntactical burden of accessing all array items through method calls to System.Array, but it is entirely possible. Moreover, the most general array-manipulation code should also handle the case of nonzero lower bounds in the individual ranks.

All arrays created within C# using the standard C# array declaration syntax will have a lower bound of 0. However, if you’re dealing with arrays used for mathematical purposes, as well as arrays that come from assemblies written in other languages, you may need to consider that the lower bound may not be 0.

Source Of Information : Apress Accelerated C Sharp 2010

Vectors vs. Arrays

It’s interesting to note that the CLR supports two special types to deal with arrays in C# code. If your array happens to be single-dimensional, and it happens to have a lower bound of 0, which is usually true for C# arrays,6 then the CLR uses a special built-in type called a vector, which is actually a subtype of System.Array. The CLR supports special IL instructions defined to work directly with vectors. If your array is multidimensional, then a CLR vector type is not used and an array object is used instead. To demonstrate this, let’s take a quick look at some IL code generated by the following short example:

public class EntryPoint
{
static void Main() {
int val = 123;
int newVal;
int[] vector = new int[1];
int[,] array = new int[1,1];
vector[0] = val;
array[0,0] = val;
newVal = vector[0];
newVal = array[0,0];
}
}

Take a look at the generated IL for the Main method:

.method private hidebysig static void Main() cil managed
{
.entrypoint
// Code size 46 (0x2e)
.maxstack 4
.locals init ([0] int32 val,
[1] int32 newVal,
[2] int32[] 'vector',
[3] int32[0...,0...] 'array')
IL_0000: nop
IL_0001: ldc.i4.s 123
IL_0003: stloc.0
IL_0004: ldc.i4.1
IL_0005: newarr [mscorlib]System.Int32
IL_000a: stloc.2
IL_000b: ldc.i4.1
IL_000c: ldc.i4.1
IL_000d: newobj instance void int32[0...,0...]::.ctor(int32,
int32)
IL_0012: stloc.3
IL_0013: ldloc.2
IL_0014: ldc.i4.0
IL_0015: ldloc.0
IL_0017: ldloc.3
IL_0018: ldc.i4.0
IL_0019: ldc.i4.0
IL_001a: ldloc.0
IL_001b: call instance void int32[0...,0...]::Set(int32,
int32,
int32)
IL_0020: ldloc.2
IL_0021: ldc.i4.0
IL_0022: ldelem.i4
IL_0023: stloc.1
IL_0024: ldloc.3
IL_0025: ldc.i4.0
IL_0026: ldc.i4.0
IL_0027: call instance int32 int32[0...,0...]::Get(int32,
int32)
IL_002c: stloc.1
IL_002d: ret
} // end of method EntryPoint::Main


Notice the difference between usages of the two C# arrays. On line IL_0005, the newarr IL instruction creates the instance represented by the vector variable. The multidimensional array held in the variable array is created on line IL_000d. In the first case, a native IL instruction handles the operation, whereas a regular constructor call handles the operation in the second case. Similarly, when accessing the elements, the ILinstructions stelem and ldelem, on lines IL_0016 and IL_0022 respectively, are used for the vector, whereas regular method calls handle the access to the elements of the multidimensional array.

Because vector support is handled by specific IL instructions tailored specifically for vectors, it’s safe to assume that vector use tends to be more efficient than multidimensional array use, even though instances of both derive from System.Array.

Source Of Information : Apress Accelerated C Sharp 2010


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner