Cloud computing - Parallelization

Horizontal scaling and parallelization go hand in hand, however today the scale and implementation has changed. On a microscopic scale, software can use vertical scaling on symmetric multiprocessors to spawn multiple threads where parallelization can speed operations or increase response time. But with today’s compute environments shifting toward x86-architecture servers with two and four sockets, vertical scaling only has as much parallel processing capability as the server has cores (or as many cores have been purchased and allocated to a particular virtual machine). On a macroscopic scale, software that can use parallelization across many servers can scale to thousands of servers, offering more scalability than was possible with symmetric multiprocessing.

In a physical world, parallelization is often implemented with load balancers or content switches that distribute incoming requests across a number of servers. In a cloud computing world, parallelization can be implemented with a load balancing appliance or a content switch that distributes incoming requests across a number of virtual machines. In both cases, applications can be designed to recruit additional resources to accommodate workload spikes.

The classic example of the parallelization with load balancing is a number of stateless Web servers all accessing the same data, where the incoming workload is distributed across the pool of servers.

There are many other ways to use parallelization in cloud computing environments. An application that uses a significant amount of CPU time to process user data might use the model illustrated in Figure 10. A scheduler receives jobs from users, places the data into a repository, then starts a new virtual machine for each job, handing the virtual machine a token that allows it to retrieve the data from the repository. When the virtual machine has completed its task, it passes a token back to the scheduler that allows it to pass the completed project back to the user, and the virtual machine terminates.

Divide and conquer
Applications can be parallelized only to the extent that their data can be partitioned so that independent systems can operate on it in parallel. A good application architecture includes a plan for dividing and conquering data, and a variety of real world examples illustrate the wide range of approaches:

• Hadoop is an implementation of the MapReduce pattern which is an implementation of the master/worker parallelization pattern.

• Database sharding can be accomplished through a range of partitioning techniques, including vertical partitioning, range-based partitioning, or directory based partitioning. The approach used depends entirely on how the data is to be used.

• Major financial institutions have refactored their fraud detection algorithms so that what was once more of a batch data-mining operation now runs on a large number of systems in parallel, providing real-time analysis of incoming data.

• Some high-performance computing applications that deal with three-dimensional data have been designed so that state of one cubic volume (of gas, liquid, or solid) can be calculated for time t by one process. Then the state of the one cube is passed onto the processes representing the eight adjoining cubes, and the state is calculated for time t+1.

The partitioning of data has a significant impact on the volume of data transferred over networks, making data physics the next in the list of considerations.

Source of Information : Introduction to Cloud Computing architecture White Paper 1st Edition, June 2009


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner