Cloud computing - Data physics

Data physics considers the relationship between processing elements and the data on which they operate. Since most compute clouds store data in the cloud, not on a physical server’s local disks, it takes time to bring data to a server to be processed. Data physics is governed by the simple equation that describes how long it takes to move an amount of data between where it is generated, stored, processed, and archived. Clouds are good at storing data, not necessarily at archiving it and destroying it on a predefined schedule. Large amounts of data, or low-bandwidth pipes, lengthen the time it takes to move data:

time = bytes * 8 / bandwidth

This equation is relevant for both the moment-by-moment processing of data and for long-term planning. It can help determine whether it makes sense, for example, to implement a surge computing strategy where it might take longer to move the data to a public cloud than it does to process it. It can also help determine the cost of moving operations from one cloud provider to another: whatever data has accumulated in one cloud provider’s datacenter must be moved to another, and this process may take time.

The cost of moving data can be expressed both in time and bandwidth charges. The\ hybrid model illustrated in Figure 5, where a company’s private cloud is collocated with its cloud provider’s public cloud, can help to reduce costs significantly. Bandwidth within a colocation facility generally is both plentiful and free, making this strategy a win-win proposition for both the time and dollar cost of moving data around.



The relationship between data and processing
Data physics is a reminder to consider the relationship between data and processing, and that moving data from storage to processing can take both time and money. Some aspects of this relationship to consider include:

• Data stored without compute power nearby has limited value, and cloud providers should be transparent regarding the network relationship between these two components. What is the size of their pipes? What is the latency? What is the reliability of the connection? Cloud providers should be forthcoming with answers to all of these questions.

• Cloud architects should be able to specify the locality of virtual components and services so that there is a well-defined relationship between virtual machines and the storage they access.

• Cloud providers may optimize this relationship automatically for customers, but consider whether their optimization is right for the application at hand.

• In a networked environment, it is sometimes more efficient (faster, less latency) to calculate a value than it is to retrieve it from networked storage. Consider the trade-off between using compute cycles and moving data around.



Programming strategies
Cloud computing calls for using programming strategies that consider the movement of data:

• Moving pointers around is usually better than moving the actual data. Note how the scheduler/worker model illustrated in Figure 10 uses a central storage service and passes tokens between application components rather than the actual data.

• Pointers should be treated as a capability, with care taken to ensure that they are difficult to forge.

• Tools such as representational state transfer (REST) or Simple Object Access Protocol (SOAP) help to reduce application state while managing the transfer of state data.



Compliance and data physics
Maintaining compliance with governmental regulations and industry requirements adds another layer of considerations to the management of data. A cloud architect needs to be able to specify both topological and geographical constraints on data storage. A cloud provider should make it easy to specify the relationship between data and the virtual machines that process it, and also where the data is stored physically:

• Companies handling personal data may be required to adhere to governmental regulations regarding the handling of the data. For example, those doing business in the European Union may violate local laws if they store their data in the United States because of the difference in how the law protects their data. In cases like this, cloud providers should provide the capability to specify constraints on how and where data can be moved.

• Companies subject to industry standards, such as those imposed by credit card processing authorities, may face restrictions on where data is stored and how and when it is destroyed. In cases like this, freed disk blocks cannot be allowed to be incorporated into another customer’s block of storage. They must be securely erased before reuse.

When choosing a cloud provider for data storage, consider not just whether the provider is trustworthy. Consider whether the cloud provider is certified according to standards appropriate for the application.



Security and data physics
Data is often the most valuable of a company’s assets, and it must be protected with as much vigilance than any other asset. It is easy to argue that more vigilance is needed to protect data because of how an intruder can potentially reach a company’s data from anywhere on the Internet. Some steps to take include:

• Encrypt data at rest so that if any intruder is able to penetrate a cloud provider’s security, or if a configuration error makes that data accessible to unauthorized parties, that the data cannot be interpreted.

• Encrypt data in transit. Assume that the data will pass over public infrastructure and could be observed by any party in between.

• Require strong authentication between application components so that data is transmitted only to known parties.

• Pay attention to cryptography and how algorithms are cracked and are replaced by new ones over time. For example, now that MD5 has been proven vulnerable to attack, use a stronger technique such as SHA-256.

• Manage who has access to the application and how:
- Consider using strong, token-based authentication for administrator roles.
- For customer login/password access, consider who manages the authentication server and whether it is under the company or the cloud provider’s control.
- For anonymous access to storage, for example anonymous FTP, consider whether a customer would have to register with the cloud provider for access or whether the cloud provider could federate with the company’s authentication server.


Source of Information : Introduction to Cloud Computing architecture White Paper 1st Edition, June 2009

0 comments


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner