THE MOBILE NETWORK - Throughput and Latency

Although these acronyms and the constant evolution of cellular and wireless network technology
can be baffling, the important thing to understand is that a variety of networks are used to provide data connections to your users ’ devices. As with the physical and diversity - related challenges of the devices themselves, you need to be cautious about assumptions for these connections.

Speed or throughput of the network connection is an obvious constraint. At the end of 2010, according to Akamai, the average fi xed - line broadband speed in the United States was 5Mbps,
many factors faster than even the theoretical peak speed of most mobile networks. This has a direct impact on the users ’ Web experience because it defi nes the minimum time that an uncached web page takes to download to a device. You ’ re probably not surprised to read that many mobile devices also do not have comprehensive or long - lived caching capabilities, thanks to their memory constraints.

A user with a 3G UMTS connection in the United States might expect an average download speed of 250Kbps, and 750Kbps on HSDPA (although such speeds are drastically affected by movement and the density of other data users in the local area). Even this is six times slower than a typical wired desktop experience: A web page containing a 1Mb video fi le might load in 2 or 3 seconds on a desktop, but it would take at least 15 seconds on a fast mobile network. That may be longer than an impatient user on the go is prepared to wait for the download. If you expect to deliver rich media to your mobile web users, you certainly need to look at limiting or adapting fi le sizes.

In addition to pure speed, other factors significantly affect the impact of the network on the user
experience. One is the setup time for the data connection itself. A desktop or laptop computer
usually has a persistent connection to the Web, and the fi rst page visited by a user starts to
download immediately. Most devices, on the other hand, connect on demand (in order to preserve power when the data connection is not in use), and this can add as much as 5 to 7 seconds to the time for the first page to be requested. Your users may already be impatient by the time your page even starts downloading.

A more persistent but often overlooked consideration is that of roundtrip latency. This is a
measure of the time taken for data packets to proceed from the device, through the network, to
the destination service, and back again, although excluding the time actually taken for the server
to perform any processing. This is influenced entirely by the type and topology of the network, the route the packets take, and any intermediate proxies or gateways that process the data en route.

On a fixed - line ADSL connection, latency is so low that it is barely considered. Regardless of the throughput speed, a ping time of less than 80 milliseconds to most web servers can be assumed from within the United States, and at most a few 100ms to internationally hosted servers.

On a mobile network, however, latency is a real consideration. This is partly because packets
sent from a mobile device to a public web server traverse a number of sub - networks and their
boundaries. First, there is the cellular air interface to a nearby cell station — which has a
particularly significant impact on latency — then a backhaul link to the network carrier ’ s switching center. Then there is a sequence of gateways that connect the traffic, often through firewalls and proxies, to the Internet, and then finally the packet proceeds through web infrastructure to the server. The effects on latency can be significant.

AT & T quotes a latency overhead of between 100ms and 200ms for requests to servers immediately external to their current UMTS and HSDPA networks, and 600ms over their GPRS and EDGE networks. While this is impressive, given the complexity of the cellular network involved, you should still expect the latency of a typical browser - to - server - to - browser roundtrip to be an order of magnitude longer than for a broadband connection.

In some respects, latency is more important than the raw throughput of the network, and this is particularly true for web applications, where a page is made up of a large number of component
parts. The request for each graphic, each style sheet, and each external script is delayed by the latency of the network, and if those objects are small, this can easily dominate the total time taken for the page to fully download. Web developers can take real action to mitigate these effects, such as combining CSS files, using sprite techniques for interactive images, and tuning cache settings for external objects.

Source of Information : Wiley - Professional Mobile Web Development with WordPress Joomla and Drupal

THE MOBILE NETWORK - Data Networks

As you might imagine, the networks responsible for bringing content and services to a mobile device are also notably different from the networks to which you connect laptops and desktops. Many contemporary mid - and high - end mobile devices include WiFi data connectivity, and when used in that mode, the device is connecting to a local access point, which, in turn, is most likely connected to a regular broadband - grade connection. In this case, the network connection for the device is fast, reliable, and low in latency — just as it is for the other computers and devices that are connected through the same access point and service provider.

But leave the limits of the hotspot, and your mobile device needs to revert to the cellular network for its data connection. At this point, the behavior and characteristics of the connection can change dramatically. A responsible web developer needs to be aware of these likely characteristics — and how the device mitigates them — in order to ensure that the user still retains a pleasant experience.

Throughout the world, a small number of prevalent cellular and mobile data technologies are in
regular use. The most widespread, by both geography and number of users, is the Global System for Mobile Communications (GSM) and its evolutions, which provide over 4 billion connections in over 200 countries worldwide, including the United States. A rival family of standards, broadly termed CDMA, is also popular in the United States, China, and a number of other countries. Japanese networks offer particular implementations of various types, including some proprietary network technologies.

In most developed and some developing markets, network technologies have reached a third -
generation of data access, sometimes known as 3G, providing speeds up to 2Mbps. These include UMTS (also known as W - CDMA), generally deployed by GSM network carriers, and CDMA2000, deployed by their CDMA brethren. In the United States, AT & T and T - Mobile offer UMTS data networks, while Verizon and Sprint have CDMA2000 networks.

Markets that have not yet reached widespread 3G coverage but still provide data services (notably in the least - developed countries and many developing countries), tend to provide slower 2.5G or 2.75G data technologies. Most common of these are the GSM - based GPRS which provides speeds up to 60Kbps, and EDGE which provides speeds up to 240Kbps.

Looking forward, fourth generation mobile technologies, including Advanced Long Term Evolution (LTE), are, at time of writing, in the process of being specified and standardized, but theoretically offer speeds up to 1Gbps. Sadly, such networks and devices are unlikely to be widespread for several years. In the interim, many networks provide transitional 3.5G technologies, such as HSDPA, EV - DO, and WiMAX, all of which, with speeds of between 3Mbps and 14Mbps, offer significant increases of speed and capacity to the 3G platform.

Source of Information : Wiley - Professional Mobile Web Development with WordPress Joomla and Drupal

Less than 5 years ago, when most mobile devices used their own embedded or simple real - time operating systems, it appeared to mobile web developers as though there were as many operating system versions as there were models of devices. Given that these operating systems tended to
contain their own varied browser implementations, with no option for users to upgrade them or install alternatives, the challenge of delivering a reliable web experience to all of them was almost insurmountable. Such browsers were typically very limited, and often derived from their WAP browser precedents, provided limited or incomplete XHTML or CSS capabilities, low - fidelity media support, and little or no opportunity to use JavaScript or AJAX in web pages.

In 2005, Opera, a Norwegian browser manufacturer, launched Opera Mini, a browser that could be installed by the user on such devices and which subsequently has become a very popular third - party browser for older and low - to mid - range handsets. (Opera also provides a more capable browser, Opera Mobile, which runs on high - end devices, primarily those running Symbian and Android.) Using a proxy architecture to compress, adapt, and re - layout the content on behalf of the device, this browser provided the first glimpse that rich and complex websites could be rendered well and made relatively usable on a typical mobile device screen.

At about the same time, Nokia released a new browser for their high - end S60 range of devices that was based on code from the open - source WebKit browser project. Given its desktop heritage, the WebKit engine provided an unprecedented level of support for HTML, CSS, and JavaScript on a mobile device. This was something of a watershed in the history of mobile browsers, and since then, a number of signifi cant mobile device platforms now ship with WebKit - based browsers, including Apple ’ s iPhone, Google ’ s Android, Palm ’ s WebOS, and most recently Blackberry. Microsoft ’ s mobile operating systems do not provide WebKit - based browsers, but the capabilities of their default browsers have risen significantly in recent releases.

While the different implementations of each of these browsers can vary radically — device diversity is not going away any time soon — they do at least share a common open - source ancestry. This has helped the cause of efficient mobile web development greatly, because a developer or designer can at least assume a reasonable level of support for image and media support, CSS, and AJAX (although not Flash, video, or vector graphics, which remain variable in their support across browsers).

Unfortunately, it ’ s easy to forget that not all users are necessarily running the latest and greatest smart phones. Many cheaper handsets still run on embedded operating systems with weak web.

Source of Information : Wiley - Professional Mobile Web Development with WordPress Joomla and Drupal

Pick up any mobile device and look at it carefully. Of course, the first thing you notice is the size, shape, and weight of its physical implementation. Very old mobile devices can be easily recognized by their larger, ungainly form, but following a period in the 1990s and 2000s, during which manufacturers sought to develop ever smaller devices, most modern devices are now of broadly similar size and weight. Not by coincidence, this is about the size that fi ts comfortably into an adult hand.

A limited selection of device form factors tend to prevail in today ’ s mobile market place. Some are more popular in certain parts of the world than others or among certain demographic groups, so a conscientious mobile web developer needs to be aware of all of them. These broad categories include the following:

Candybar — These devices are rectangular and static, typically with the main screen on the top half of one face and navigational control keys and a numeric keypad on the lower part of the same face, as with the Samsung SGH - t349. This form factor tends to be prevalent for cheaper or legacy models, although a wider, flatter candybar form, with a larger screen and complete QWERTY keyboard, is often used for more pricey business devices, such as the RIM BlackBerry range and some of the Nokia E - Series range.

Slate — These devices are also rectangular and static, but with much larger screens and fewer physical buttons on the casing than the candybar form factor. The rise of popularity in slate devices has been largely driven by improvements in touch - screen technology, which allow for point - and swipe - based navigation and for numeric and QWERTY keyboards to be displayed in software. Often, these devices can be rotated between landscape and portrait mode to maximize screen usage for particular types of applications. With the advent of the Apple iPhone and Android - based devices, this form factor has become very popular on expensive consumer
devices, although some mid - range devices are now exhibiting similar characteristics. Additionally, a larger variant of the slate form factor, personified by the Apple iPad, Amazon
Kindle, and other tablet devices, is starting to inhabit the space between pocket - sized mobile devices and laptops, while still being quite feasible web clients for humans on the move.

Slider — These devices are rectangular and of similar proportions to candybar devices when closed. However, the two halves of the device, one supporting the screen and one the keyboard, are able to slide relative to each other. This extends the size of the device and exposes the keyboard for use. Portrait - style sliders are popular, often on low - end models, because the longer “ open ” shape can be easier to use for making calls. However, many contemporary handsets slide in a landscape manner, exposing a QWERTY keyboard to use with a wider screen aspect ratio, as with the HTC P4350 device shown.

Flip — These devices also are designed to open up and expose a concealed keyboard, but do so with a hinge, rather than a slider. As a result, the primary screen is not visible in the closed state and is generally smaller as a proportion of the device than for the other form factors. Some handsets provide a smaller secondary screen on the outside of the device, but this rarely supports a web browser. Motorola i410 device exhibiting a classic flip form. Despite all the differences between these form factors, you need to make some reasonable assumptions for the purposes of delivering mobile web content to a capable device. First, you can be fairly sure that the device has a screen upon which its browser can render your content, but also that it is fairly small, both physically and in terms of pixel dimensions — relative to a desktop or laptop.

Source of Information : Wiley - Professional Mobile Web Development with WordPress Joomla and Drupal

What's in It for Developers?

Your ability to write a webbot can distinguish you from a pack of lesser developers. Web developers—who've gone from designing the new economy of the late 1990s to falling victim
to it during the dot-com crash of 2001—know that today's job market is very competitive. Even today's most talented developers can have trouble finding meaningful work. Knowing how to write webbots will expand your ability as a developer and make you more valuable to your employer or potential employers.

A webbot writer differentiates his or her skill set from that of someone whose knowledge of Internet technology extends only to creating websites. By designing webbots, you demonstrate that you have a thorough understanding of network technology and a variety of network protocols, as well as the ability to use existing technology in new and creative ways.


Webbot Developers Are in Demand
There are many growth opportunities for webbot developers. You can demonstrate this for yourself by looking at your website's file access logs and recording all the non-browsers that have visited your website. If you compare current server logs to those from a year ago, you should notice a healthy increase in traffic from nontraditional web clients or webbots. Someone has to write these automated agents, and as the demand for webbots increases, so does the demand for webbot developers.

Hard statistics on the growth of webbot use are hard to come by, since many webbots defy detection and masquerade as traditional web browsers. In fact, the value that webbots bring to businesses forces most webbot projects underground. I can't talk about most of the webbots I've developed because they create competitive advantages for clients, and they'd rather keep those techniques secret. Regardless of the actual numbers, it's a fact that webbots and spiders comprise a large amount of today's Internet traffic and that many developers are required to both maintain existing webbots and develop new ones.


Webbots Are Fun to Write
In addition to solving serious business problems, webbots are also fun to write. This should be welcome news to seasoned developers who no longer experience the thrill of solving a problem or using a technology for the first time. Without a little fun, it's easy for developers to get bored and conclude that software is simply a sequence of instructions that do the same thing every time a program runs. While predictability makes software dependable, it also makes it tiresome to write. This is especially true for computer programmers who specialize in a specific industry and lack diversity in tasks. At some point in their careers, nearly all of the programmers I know have become very tired of what they do, in spite of the fact that they still like to write computer programs.

Webbots, however, are almost like games, in that they can pleasantly surprise their developers with their unpredictability. This is because webbots operate on data that changes frequently, and they respond slightly differently every time they run. As a result, webbots become impulsive and lifelike. Unlike other software, webbots feel organic! Once you write a webbot that does something wonderfully unexpected, you'll have a hard time describing the experience to those writing traditional software applications.


Webbots Facilitate "Constructive Hacking"
By its strict definition, hacking is the process of creatively using technology for a purpose other than the one originally intended. By using web pages, news groups, email, or other online technology in unintended ways, you join the ranks of innovators that combine and alter existing technology to create totally new and useful tools. You'll also broaden the possibilities for using the Internet.

Unfortunately, hacking also has a dark side, popularized by stories of people breaking into systems, stealing private data, and rendering online services unusable. While some people do write destructive webbots, I don't condone that type of behavior here. In fact, KEEPING WEBBOTS OUT OF TROUBLE is dedicated to this very subject.

Source of Information : Webbots Spiders and Screen Scrapers A Guide to Developing Internet Agents with PHP CURL

How to Use OpsMgr

Using OpsMgr is relatively straightforward. The OpsMgr monitoring environment can be accessed through three sets of consoles: an Operations Console, a Web Console, and a command shell. The Operations Console provides full monitoring of agent systems and administration of the OpsMgr environment, whereas the Web Console provides access only to the monitoring functionality. The command shell provides command-line access to administer the OpsMgr environment.


Managing and Monitoring with OpsMgr
As mentioned in the preceding section, two methods are provided to configure and view OpsMgr settings. The first approach is through the Operations Console and the second is through the command shell.

Within the Administration section of the Operations Console, you can easily configure the security roles, notifications, and configuration settings. Within the Monitoring section of the Operations Console, you can easily monitor a quick up/down status, view active and closed alerts, and confirm overall environment health.

In addition, a web-based monitoring console can be run on any system that supports Microsoft Internet Explorer 6.0 or higher. This console can be used to view the health of systems, view and respond to alerts, view events, view performance graphs, run tasks, and manage maintenance mode of monitored objects. New to OpsMgr 2007 R2 is the capability to display the health explorer in the Web Console.


Reporting from OpsMgr
OpsMgr management packs commonly include a variety of preconfigured reports to show information about the operating system or the specific application to which they were designed to work. These reports are run in SQL Reporting Services. The reports provide an effective view of systems and services on the network over a custom period, such as weekly, monthly, or quarterly. They can also help you monitor your networks based on performance data, which can include critical pattern analysis, trend analysis, capacity planning, and security auditing. Reports also provide availability statistics for distributed applications, servers, and specific components within a server.

Reports are particularly useful for executives, managers, and application owners. These reports show the availability of any object within OpsMgr, including a server, a database, or even a service such as Lync Server 2010 that includes a multitude of servers and components. The availability report in Figure 13.6 shows that the LS1 Server was available until after 8:00 a.m. and then was down through 11:00 a.m.

The reports can be run on demand or at scheduled times and delivered through e-mail. OpsMgr can also generate HTML-based reports that can be published to a web server and viewed from any web browser. Vendors can also create additional reports as part of their management packs.


Using Performance Monitoring
Another key feature of OpsMgr is the capability to monitor and track server performance. OpsMgr can be configured to monitor key performance thresholds through rules that are set to collect predefined performance data, such as memory and CPU usage over time. Rules can be configured to trigger alerts and actions when specified performance thresholds have been met or exceeded, enabling network administrators to act on potential

performance issues. Performance data can be viewed from the OpsMgr Operations Console. In addition, performance monitors can establish baselines for the environment and then alert the administrator when the counter subsequently falls outside the defined baseline envelope.


Using Active Directory Integration
Active Directory integration provides a way to install management agents on systems without environmental-specific settings. When the agent starts, the correct environmental settings, such as the primary and failover management servers, are stored in Active Directory. The configuration of Active Directory integration provides advanced search and filter capabilities to fine-tune the dynamic assignment of systems.


Integrating OpsMgr Non-Windows Devices
Network management is not a new concept. Simple management of various network nodes has been handled for quite some time through the SNMP. Quite often, simple or even complex systems that use SNMP to provide for system monitoring are in place in an organization to provide for varying degrees of system management on a network. OpsMgr can be configured to integrate with non-Windows systems through monitoring of syslog information, log file data, and SNMP traps. OpsMgr can also monitor TCP port communication and website transaction sequencing for information-specific data management.

New to OpsMgr 2007 R2 is the capability to monitor non-Microsoft operating systems
such as Linux and UNIX, and the applications that run on them, such as Apache and
MySQL. OpsMgr monitors the file systems, network interfaces, daemons, configurations,
and performance metrics. Operations Manager 2007 R2 supports monitoring of the following
operating systems:

. HP-UX 11i v2 and v3 (PA-RISC and IA64)
. Sun Solaris 8 and 9 (SPARC) and Solaris 10 (SPARC and x86)
. Red Hat Enterprise Linux 4 (x86/x64) and 5 (x86/x64) Server
. Novell SUSE Linux Enterprise Server 9 (x86) and 10 SP1 (x86/x64)
. IBM AIX v5.3 and v6.1

Special connectors can be created to provide bidirectional information flows to other management products. OpsMgr can monitor SNMP traps from SNMP-supported devices as well as generate SNMP traps to be delivered to third-party network management infrastructures.


Exploring Third-Party Management Packs
Software and hardware developers can subsequently create their own management packs to extend OpsMgr’s management capabilities. These management packs extend OpsMgr’s management capabilities beyond Microsoft-specific applications. Each management pack is designed to contain a set of rules and product knowledge required to support its respective products. Currently, management packs have been developed for APC, Cisco, Citrix, Dell, F5, HP, IBM, Linux, Oracle, Solaris, UNIX, and VMware to name a few.

. A complete list of management packs can be found at the following Microsoft site:
http://pinpoint.microsoft.com/en-US/systemcenter/managementpackcatalog.

Source of Information : Pearson-Microsoft Lync Server 2010 Unleashed

OpsMgr Architecture

OpsMgr is primarily composed of five basic components: the operations database, reporting database, Root Management Server, management agents, and Operations Console. These components make up a basic deployment scenario. There are also several optional components that provide functionality for advanced deployment scenarios.

OpsMgr was specifically designed to be scalable and can subsequently be configured to meet the needs of any size company. This flexibility stems from the fact that all OpsMgr components can either reside on one server or can be distributed across multiple servers.

Each of these various components provides specific OpsMgr functionality. OpsMgr design scenarios often involve the separation of parts of these components onto multiple servers. For example, the database components can be delegated to a dedicated server, and the management server can reside on a second server.

The following list describes the different OpsMgr components:

. Operations database—The operations database stores the monitoring rules and the active data collected from monitored systems. This database has a seven-day default retention period.

. Reporting database—The reporting database stores archived data for reporting purposes. This database has a 400-day default retention period.

. Root Management Server—This is the first management server in the management group. This server runs the software development kit (SDK) and Configuration service, and is responsible for handling console communication, calculating the health of the environment, and determining what rules should be applied to each agent.

. Management Server—Optionally, an additional management server can be added for redundancy and scalability. Agents communicate with the management server to deliver operational data and pull down new monitoring rules.

. Management agents—Agents are installed on each managed system to provide efficient monitoring of local components. Almost all communication is initiated from the agent with the exception of the actual agent installation and specific tasks that run from the Operations Console. Agentless monitoring is also available with a reduction of functionality and environmental scalability.

. Operations Console—The Operations Console is used to monitor systems, run tasks, configure environmental settings, set author rules, subscribe to alerts, and generate and subscribe to reports.

. Web Console—The Web Console is an optional component used to monitor systems, run tasks, and manage maintenance mode from a web browser.

. Audit Collection Services—This is an optional component used to collect security events from managed systems; this component is composed of a forwarder on the agent that sends all security events, a collector on the management server that receives events from managed systems, and a special database used to store the collected security data for auditing, reporting, and forensic analysis.

. Gateway Server—This optional component provides mutual authentication through certificates for nontrusted systems in remote domains or workgroups.

. Command shell—This optional component is built on PowerShell and provides full command-line management of the OpsMgr environment.

. Agentless Exception Monitoring—This component can be used to monitor Windows and application crash data throughout the environment and provides insight into the health of the productivity applications across workstations and servers.

. Connector Framework—This optional component provides a bidirectional web service for communicating, extending, and integrating the environment with thirdparty or custom systems.

Source of Information : Pearson-Microsoft Lync Server 2010 Unleashed

OpsMgr Functionality

OpsMgr provides for several major pieces of functionality as follows:

. Management packs—Application-specific monitoring rules are provided within individual files called management packs. For example, Microsoft provides management packs for Windows server systems, Exchange Server, SQL Server, SharePoint, DNS, and DHCP, along with many other Microsoft technologies including Lync Server 2010. Management packs are loaded with the intelligence and information necessary to properly troubleshoot and identify problems. The rules are dynamically applied to agents based on a custom discovery process provided within the management pack. Only applicable rules are applied to each managed server.

. Event monitoring rules—Management pack rules can monitor for specific event log data. This is one of the key methods of responding to conditions within the environment.

. Performance monitoring rules—Management pack rules can monitor for specific performance counters. This data is used for alerting based on thresholds or archived for trending and capacity planning. There was a brief spike in latency at approximately 6:15 a.m., but the latency is normally under less than 0.05ms.

. State-based monitors—Management packs contain monitors, which allow for advanced state-based monitoring and aggregated health rollup of services. Monitors also provide self-tuning performance threshold monitoring based on a two- or three state configuration. The Lync Server Application Sharing service is stopped.

. Alerting—OpsMgr provides advanced alerting functionality by enabling email alerts, paging, short message service (SMS), instant messaging (IM), and functional alerting roles to be defined. Alerts are highly customizable, with the capability to define alert rules for all monitored components.

. Reporting—Monitoring rules can be configured to send monitored data to both the operations database for alerting and the reporting database for archiving.

. End-to-end service monitoring—OpsMgr provides service-oriented monitoring based on System Definition Model (SDM) technologies. This includes advanced object discovery and hierarchical monitoring of systems.

Source of Information : Pearson-Microsoft Lync Server 2010 Unleashed

What Is New in OpsMgr R2?

System Center Operations Manager 2007 R2 released in spring 2009 and includes many new improvements on the previous version, Operations Manager 2007 Service Pack 1. Some of these improvements include

. Cross platform support—This supports non-Microsoft platforms such as UNIX and Linux. This enables administrators to have a single-pane view of their entire IT environment in OpsMgr.

. Integration with System Center Virtual Machine Manager 2008—This integrates with the VMM 2008 and enables synergies such as Performance Resource and Optimization (PRO) Tips, which provide virtual machine recommendations based on observed performance and the capability to implement the recommendation at the click of a button.

. Notifications—The notification system has been revamped and now includes an Outlook rule–style interface. Notifications can be generated for specific alerts, and notifications can be sent out as high-priority emails.

. Overrides view—Rather than hunt for overrides within all the management packs, OpsMgr R2 has an authoring view that shows all the overrides defined in the system.

. Improved management pack maintenance—OpsMgr 2007 R2 enables Microsoft management packs to be browsed, downloaded, and imported directly from the console. It even includes versioning, dependency checks, and the capability to search from management pack updates.

. Service level monitoring—Applications can be defined from various monitored objects and the service level of the application, and be monitored and reported on against defined target SLAs.

. Better scaling of URL monitoring—The URL monitor now scales to thousands of websites without undue performance impact.

. Improved database performance—The overall performance of the database and console has been dramatically improved.

These improvements bring the platform to a new level of performance and interoperability while retaining the look and feel of the original Operations Manager 2007 tool.

Source of Information : Pearson-Microsoft Lync Server 2010 Unleashed

Operations Manager 2007 R2 includes one of the best management packs for monitoring and maintaining Lync Server 2010. This management pack was developed by the product group and includes information about the product.

The Lync Server 2010 management pack monitors all the Lync Server 2010 server roles and has separate views for each of the roles to enable targeted monitoring in the console. The Lync Server 2010 management includes the following features:

. Automatic Lync topology discovery through the Central Discovery script—The Central Discovery script is a PowerShell script that runs on an automatically selected Central Discovery Watcher Node. This is the Front End pool where the Central Management Store is installed. The script automatically discovers and initiates monitoring of all Lync roles, components, and services.

. Automatic monitoring of pools—Front End and Edge pools are discovered automatically and monitored for a variety of availability, configuration, performance, and security conditions.

. Automatic alerting—The Lync Server management alerts on thousands of different conditions, enabling the administrator to immediately be aware of any potential problems in the infrastructure.

. Synthetic transactions to simulate user traffic—The management pack leverages the built-in Lync Server synthetic transaction PowerShell cmdlets.

The Microsoft Lync Server Management Pack monitors all aspects of the Microsoft Lync Server infrastructure. The management pack structures the monitoring into the services paradigm used by Lync Server. These services include
. Application Service
. Archiving Service
. Central Management Service
. Conferencing Service
. Core Service
. Edge Service
. Mediation Service
. Provisioning Service
. Registration Service
. User Service
. Web Service

On all these services, administrators can generate availability reports to ensure that the servers and systems are meeting the Service Level Agreements (SLAs) set by the organization. For each of the services, the management pack has views for
. Service alerts
. Service performance
. Service state

In addition, the OpsMgr platform monitors the Lync Server 2010 dependencies to ensure that the Lync Server 2010 infrastructure doesn’t fail due to a failure of the dependent systems such as the operating system, Active Directory, DNS, and IIS. The features of the management packs for the following major systems include

. Windows Operating System Management Pack—Monitors and alerts all the major elements of the Windows server that Lync Server 2010 runs on, including processor, memory, network, disk, and event logs. It gathers performance metrics and alerts on thresholds, and critical events.

. Active Directory Management Pack—Monitors and alerts on Active Directory key metrics such as replication latency, domain controller response times, and critical events. The management pack generates synthetic transactions to test the response time of the PDC emulator, LDAP, and other domain services.

. DNS Management Pack—Monitors and alerts on DNS servers for resolution failures and latency, and critical events.

. IIS Management Pack—Monitors and alerts on IIS services, application pools, performance, and critical events.

Source of Information : Pearson-Microsoft Lync Server 2010 Unleashed

System Center Operations Manager (OpsMgr) 2007 R2 provides the best-of-breed approach to monitoring and managing Lync Server 2010 within the environment. OpsMgr helps to identify specific environmental conditions before they evolve into problems through the use of monitoring and alerting components.

OpsMgr provides a timely view of important Lync Server 2010 conditions and intelligently links problems to knowledge provided within the monitoring rules. Critical events and known issues are identified and matched to technical reference articles in the Microsoft Knowledge Base for
troubleshooting and quick problem resolution. The management pack also provides for synthetic transactions to monitor services end-to-end.

The monitoring is accomplished using standard operating system components such as Windows Management Instrumentation (WMI), Windows event logs, and Windows performance counters, along with Lync Server 2010–specific API calls and PowerShell cmdlets. OpsMgr-specific components are also designed to perform synthetic transactions and track the health and availability of network services.

In addition, OpsMgr provides a reporting feature that enables administrators to track problems and trends occurring on the network. Reports can be generated automatically, providing network administrators, managers, and decision makers with a current and long-term historical view of environmental trends. These reports can be delivered through e-mail or stored on file shares to power web pages.

The following sections focus on defining OpsMgr as a monitoring system for Lync Server 2010. This chapter provides specific analysis of the way OpsMgr operates and presents OpsMgr design best practices, specific to deployment for Lync Server 2010 monitoring.

Source of Information : Pearson-Microsoft Lync Server 2010 Unleashed

Message Cost Model

The third equation is the Message Cost Model. The Message Cost Model breaks down the cost of sending a message from one end to the other in terms of its fi xed and variable costs. Simply put, the Message Cost Model equation is as follows:

C = a + bN

» C is the cost of sending the message from one end, say A, to the other, say B
» a is the upfront cost for sending the message
» b is the cost per byte of the message
» N is the number of bytes of the message

This equation is simple to understand and there are two key takeaways from this model:

» Transfer of a message irrespective of its size involves an upfront fixed cost. In terms of messages, the overhead around connection establishment, handshake, and setup are quite common.

» The cost of a message transfer is directly and linearly co-related to the message size.

The Message Cost Model provides some interesting insights into costs associated with transmission of messages across a network. On a gigabit Ethernet, a is about 300 micro-seconds, which is 0.3 milliseconds, and b is 1 second per 125 MB. 1 Gigabit is 1000 Mb or 125 MB. A gigabit Ethernet implies a transmission rate of 125 MBps. A cost of 1 second per 125 MB is the same as 1 ms per 125 KB because 1000 ms make up a second and 1000 KB make up an MB. This means 100 messages of 10 KB each take 100 multiplied by (0.3 + 10/125) ms, which is 38 ms, whereas 10 messages of 100 KB take only 10 multiplied by (0.3 + 100/125) ms, which is 11 ms. Therefore, a way to optimize message cost is to send as big a packet as possible each time, thereby amortizing the upfront cost over a much larger size.

In a theoretical calculation a, the fi xed cost, in the Message Cost Model is considered fi xed for all message sizes but usually that’s not the case. The value of a varies depending on the message size.

Source of Information : NoSQL

Little’s Law

Little’s Law applies to parallel computing but has its origins in the world of economics and queuing theory. The law appears deceptively simple but provides a probability distribution independent way of analyzing the load on stable systems. The law states that the average number of customers in a stable system is the product of the average arrival rate and the time each customer spends in the system. In terms of a formula, it appears as follows:

» L = kW
» L is the average number of customers in a stable system
» k is the average arrival rate
» W is the time a customer spends in the system

To understand this a bit further, consider a simple system, say a small gas station with cash-only payments over a single counter. If four customers arrive every hour at this gas station and each customer takes about 15 minutes (0.25 hours) at the gas station, there should be an average of only one customer at any point in time at this station. If more than four customers arrive at the same station, it becomes clear that it would lead to bottlenecks in the system. If gas station customers get frustrated by waiting longer than normal and leave without fi lling up, you are likely to have higher exit rates than arrival rates and in such a situation the system would become unstable.

Viewing a system in terms of Little’s Law, it becomes evident that if a customer or an active
process, when translated to parallel programs, takes a certain amount of time, say W, to complete
and the maximum capacity for the system allows handling of only L processes at any time, then the arrival rate cannot be more than L/W per unit of time. If the arrival rate exceeds this value, the system would be backed up and the computation time and volume would be impacted.

Source of Information : NoSQL

Amdahl’s Law

Amdahl’s Law provides a formula for fi nding the maximum improvement in performance of an overall system when only a part of the system is improved. Amdahl’s Law is named after Gene Amdahl, www.computer.org/portal/web/awards/amdahl, a well-known computer architect who contributed to the making of the IBM mainframes.

Amdahl’s Law can succinctly be explained using a simple example. Say you have a process that runs for 5 hours and this process can be divided into sub-tasks that can be parallelized. Assume that you can parallelize all but a small part of the program that takes 25 minutes to run. Then this part of the program, the one that takes 25 minutes to complete, ends up defining the best speeds that the overall program can achieve. Essentially, the linear part of the program limits the performance.

In mathematical terms this example could be seen as follows:
» Total time taken for the program to run: 5 hours (300 minutes)
» Time taken for the serial part of the program: 25 minutes
» Percentage of the overall program that can be parallelized: ~91.6
» Percentage that cannot be parallelized (or is serial in nature): 8.4
» Therefore, maximum increase in speed of the parallelized version compared to the nonparallelized version is 1 / (1 – 0.916) = ~11.9

In other words, the completely parallelized version could be more than 11 times faster than the nonparallel version of the same program. Amdahl’s Law generalizes this calculation of speed improvement in an equation, which is as follows:

1 / ((1 – P) + P/S)

where P represents the proportion that is parallelized and S the times the parallelized part performs as compared to the non-parallelized one.

This generalized equation takes into account different levels of speed increase for different parts of a program. So, for example, a program can be parallelized into four parts, P1, P2, P3, and P4, where P1, P2, P3, and P4 are 10%, 30%, 40%, and 20%, respectively. If P1 can speed up by 2x, P2 by 3x, and P3 by 4x, but P4 cannot be speeded up, then the overall running time is as follows:

0.10/2 + 0.30/3 + 0.40/4 + 0.20/1 = 0.45

Therefore, the maximum speed increase is 1/0.45 or 2.22, more than double that of the non-parallel program.

You can read more about Amdahl’s Law at www-inst.eecs.berkeley.edu/~n252/paper/ Amdahl.pdf.

Amdahl’s Law applies as much to MapReduce parallelization as it does to multi-core programming.

Gustafson’s Law, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.6348, reevaluates Amdahl’s Law. It states that given more computing power, more complex problems can be solved in the same time as a simpler problem takes, when lesser computing power is used. Therefore, Gustafson’s Law contradicts the scalability limits imposed by the linear part of the program, especially when large complex repetitive tasks are carried out using more computing resources.

Source of Information : NoSQL

MIGRATING FROM RDBMS TO NOSQL

Migrating from a structured schema to a schema-less form is not very hard. In many cases you could simply export the data from RDBMS tables and move them into NoSQL collections. However, things get complicated when the NoSQL database is a column-family, sorted ordered, or a key/value store. Changes in paradigm often lead to redesign efforts.

The greater impedance mismatch is around ad-hoc querying and secondary indexes, which are often difficult to support in a NoSQL environment. NoSQL looks at the data store from a query perspective and not from a generic storage viewpoint.

To facilitate data importation from RDBMS to Hadoop for NoSQL-style manipulations, Cloudera has created an open-source product called Sqoop. Sqoop is a command-line tool with the following capabilities:

» Imports individual RDBMS tables or entire databases to fi les in HDFS
» Generates Java classes to allow you to interact with your imported data
» Provides the ability to import from SQL databases straight into your Hive data warehouse

You can learn more about Sqoop at https://github.com/cloudera/sqoop.

Source of Information : NoSQL

Polyglot Persistence at Facebook

Facebook in particular uses MySQL for many mission-critical features. Facebook is also a big HBase user. Facebook’s optimizations to MySQL were presented in a Tech Talk, the recordings of which are available online at www.livestream.com/facebookevents/video?clipId=flv_
cc08bf93-7013-41e3-81c9-bfc906ef8442. Facebook is about large volume and superior performance and its MySQL optimizations are no exception to that. Its work is focused on maximizing queries per second and controlling the variance of the request-response times. The numbers presented in the November 2010 presentation are very impressive. Some of the key metrics shared in the context of its online transaction processing system were as follows:

» Read responses were an average of 4ms and writes were 5ms.

» Maximum rows read per second scaled up to a value of 450 million, which is obviously very large compared to most systems.

» 13 million queries per second were processed at peak.

» 3.2 million row updates and 5.2 million InnoDB disk operations were performed in boundary cases.

Facebook has focused on reliability more than maximizing queries per second, although the queriesper-second numbers are very impressive too. Active sub-second-level monitoring and profiling allows Facebook database teams to identify points of server performance fractures, called stalls. Slower queries and problems have been progressively identified and corrected, leading to an optimal system. You can get the details from the presentation.

Facebook is also the birthplace of Cassandra. Facebook has lately abandoned Cassandra and gone in favor of HBase. The current Facebook messaging infrastructure is built on HBase. Facebook’s new messaging system supports storage of more than 135 billion messages a month. As mentioned earlier, the system is built on top of HBase. A note from the engineering team, accessible online at www.facebook.com/note.php?note_id=454991608919, explains why Facebook chose HBase over other alternatives. Facebook chose HBase for multiple reasons. First, the Paxos-based strong consistency model was favored. HBase scales well and has the infrastructure available for a highly replicated setup. Failover and load balancing come out of the box and the underlying distributed filesystem, HDFS provides an additional level of redundancy and fault tolerance in the stack. In addition, ZooKeeper, the co-ordination system, could be reused with some modifications to support a user service.

Therefore, it’s clear that companies like Facebook have adopted polyglot persistence strategies that enable them to use the right tool for the job. Facebook engineering teams have not shied away from making changes to the system to suit their needs, but they have demonstrated that choosing either DBMS or NoSQL is not as relevant as choosing an appropriate database. Another theme that has emerged time and again from Facebook is that it has used a tool that it is familiar with the most. Instead of chasing a trend, it has used tools that its engineers can tweak and work with.
For example, sticking with MySQL and PHP has been good for Facebook because it has managed to tweak them to suit its needs. Some have argued that legacy has stuck, but clearly performance numbers show that Facebook has figured out how to make it scalable.

Like Facebook, Twitter and LinkedIn have adopted polyglot persistence. Twitter, for example, uses MySQL and Cassandra actively. Twitter also uses a graph database, named FlockDB, for maintaining relationships, such as who’s following whom and who you receive phone notifications from. Twitter’s popularity and data volume have grown immensely over the years. Kevin Weil’s September 2010 presentation (www.slideshare.net/kevinweil/analyzing-big-data-attwitter-web-20-expo-nyc-sep-2010) claims tweets and direct messages now add up to 12 TB/day, which when linearly scaled out imply over 4 petabytes every year. These numbers are bound to continue to grow and become larger and larger as more people adopt Twitter and use tweets to communicate with the world. Manipulating this large volume of data is a huge challenge. Twitter uses Hadoop and MapReduce functionality to analyze the large data set. Twitter leverages the highlevel language Pig (http://pig.apache.org/) for data analysis. Pig statements lead to MapReduce jobs on a Hadoop cluster. A lot of the core storage at Twitter still depends on MySQL. MySQL is heavily used for multiple features within Twitter. Cassandra is used for a select few use cases like storing geocentric data.

LinkedIn, like Twitter, relies on a host of different types of data stores. Jay Kreps at the Hadoop Summit provided a preview into the large data architecture and manipulation at LinkedIn last
year. The slides from that presentation are available online at www.slideshare.net/ydn/6-dataapplicationlinkedinhadoopsummmit2010. Linked In uses Hadoop for many large-scale analytics jobs like probabilistically predicting people you may know. The data set acted upon by the Hadoop cluster is fairly large and usually in the range of more than 120 billion relationships a day. It is carried out by around 82 Hadoop jobs that require over 16 TB of intermediate data. The probabilistic graphs are copied over from the batch offline storage to a live NoSQL cluster. The NoSQL database is Voldemort, an Apache Dynamo clone that represents data in key/value pairs. The relationship graph data is read-only and Voldemort’s eventual consistency model doesn’t cause any problems. The relationship data is processed in a batch mode but filtered through a faceted search in real time. These filters may lead to the exclusion of people who a person has indicated they don’t know.

Looking at Facebook, Twitter, and LinkedIn it becomes clear that polyglot persistence has its benefits and leads to an optimal stack, where each data store is appropriately used for the use case in hand.

Source of Information : NoSQL

Revisiting Site Search, Capturing the Second Term

In further refining our search knowledge to understand how users are interacting with our site, we may want to track the initial referring search from SEO or paid search, and the following site search query (if any). Tracking the secondary search terms provides further insight into how users refine their searches: users may be coming to the site on a very general term, then refining their search on the entry page.

You can use the data you collect to help reduce bounce rates. For every user that refines her search through your site upon entry, there are plenty of users who do not. By applying what you learn from the more determined users to the less determined users, you may be able to lower bounce rates for that segment of traffic. For example, suppose you find that large volumes of traffic come to a page on your site from the term “red cars,” and that many of these users go on to do secondary searches on your site for “sports cars.”

Assess the landing page, and determine if there is any content on the page relevant to “sports cars” (or links to such content elsewhere on your site). If there is, is it prominently displayed on the page? If not, move the content around on the page (or provide cross-links to help the user find her way to the content). Again, I cannot stress enough how effective A/B testing is for improving the usability of your site, not just from a search perspective but from a user perspective, too. You can run A/B or multivariate tests to try out different wordings of section titles, different imagery, or alternate positionings of content on a page.

Tracking on-site follow-up searches to referring search terms can also provide you with insight into what your users see as relevant content. You may discover that you have the wrong page organically ranking for a certain term, and you may need to focus on getting a different page to rank organically for that same term. Understanding the traffic you are getting and the language your visitors use will help you ensure that you are presenting them with relevant content earlier in their experience with your site as opposed to forcing them to find the content through their own clicking and determination.

You will not be able to satisfy all people all of the time, and you will have to learn where and when to draw the line. Otherwise, you will find that you are constantly trying to optimize your content to other sets of terms. To make sure you don’t get stuck in this cycle, use the data you have access to in order to determine which changes will likely improve revenue the most.

You should consider setting two statistical bars. The first will determine what percentage of people arriving at your site from external search go on to use your site search. This may range from 3% to 30%, or more or less; it depends on the content and the users. The second bar to set is the volume of secondary searches you may see on a term before you decide to incorporate content related to that term into your landing page. Where you set this bar will depend on the page itself, your ability to modify the content, and whether you feel the secondary search term is highly relevant to the topic of the page.

Source of Information : MASTERING SEARCH ANALYTICS MEASURING SEO, SEM AND SITE SEARCH

Site Search—Capturing and Using the Second Term

Your site search provides a rich idea of what your website encompasses. It can be thought of as a navigational tool, providing an index of all the pages on your site and how they are related. Further, as we have seen, you can track referring search terms and follow-up site search terms to get a better idea of what visitors to your site are really looking for. Knowing this, you can get into more sophisticated uses of your data. If you can track what terms brought a person to your site, you can apply a secondary search against your site search engine and return the results dynamically in your web page, to further aid your user in navigating your site (as well as correlating this data against the other related searches people have performed on your site). That is, you can basically create a section on a page that shows what other users thought was relevant or related to the landing page. You can also use site search data to refine faceted navigation elements on your site. Amazon and Google are very effective examples of this: both provide a spot where correlated or related searches are presented to the user.

Offering up related searches can provide a more predictive path through your site. Providing refinements (facets) based on data you have already collected can also help you retain users, by ensuring that you return results you know to be relevant. While developing these sorts of real-time applications that utilize your site statistics is beyond the scope of this book, this example illustrates how you can move from using data from search in a reactionary manner to providing a real-time service to your end users. All the user engagement information you need is already available based on site search and SEO traffic and terms.

If you can track what terms brought a person to your site, you can apply a secondary search against your site search engine and return the results dynamically in your web page, to further aid your user in navigating your site (as well as correlating this data against the other related searches people have performed on your site). That is, you can basically create a section on a page that shows what other users thought was relevant or related to the landing page.

Establishing real-time applications that draw upon your search analytics data unleashes the power of the information you have been collecting. You can use this data everywhere from your home page, through to articles, to products. Integrating search data into your site to provide navigation points dynamically uses the voice of your customer to build data elements and can provide relevant links between your content that you may never have thought of.

You will begin to see that everything we have been tracking through SEO and paid search can provide data points to apply to your site search. You can even take SEO and paid search queries and track their pathing through your site to further aid in optimizing your site search algorithm. Providing a set of related searches, as well as a set of results that show that people who looked at page “A” also looked at page “B,” may further improve the usability of your site. These are again all features that will require A/B or multivariate testing on your end, but they are also features that can be enabled through the data you have collected.

Source of Information : MASTERING SEARCH ANALYTICS MEASURING SEO, SEM AND SITE SEARCH

Applying Site Search Patterns to SEO/Paid Search

Site search can also provide a better understanding of search funnels, or keyword pathing. Search funnels are when you get a search followed by a second search. To get any sort of insights, though, we need to look at the data in aggregate.

A typical search, where we see a keyword followed by several related searches. This investigation may also prove valuable in discovering where your site search is breaking down. Searches followed by further searches usually indicate two things: the first is an unsuccessful search result, where refinement occurs because of irrelevant content, and the second is a successful result delivering content that requires further investigation. How can you distinguish the two? You can’t, really. This will require some good old-fashioned investigation by clicking and following the paths of your users. Then you will need to make some assumptions about what the users were looking for or saw.

Looking at site search patterns gives us a very unique insight into how users string words and terms together. You can capture searches in Google Analytics along with the other related terms that were searched. More advanced analytics tools can be adjusted to allow for tracking the search path or entire paths through a site. In analyzing your pathing reports, you should look for reports with multiple interactions with site search.

You should be able to track the original search term, the page clicked to, the following search term, and the following page clicked to. As you start to see these patterns, you can look for multiple instances of the same patterns occurring, and for any evolution in these patterns. This can help you identify pages that could be improved with the addition of cross-linkages, as well as potential problems with your content. People who search for a term and then repeat a search for a very similar term are very easily categorized as having gotten unsatisfactory search results.

Folks who do a broad search and then a more narrow version of that search give you a better idea of how search refinement may work. One trick you can use with Google Analytics to capture data on search refinement is to include a refinement option for users to select. For example, you can include a category drop-down in your site search, allowing users to select whether they’re looking for pages related to products, news, support, and so on. Google Analytics can capture that selection as well as the search term.

Capturing these refinements in site search can help you improve your landing pages for your site search users, but you may also be able to apply this data to your SEO or paid search campaigns. The caveat here is that site search users who are familiar with your products and are searching for a specific product may lean toward support, as they are already users (as opposed to people new to your products who may be looking to purchase). There is no easy answer here; your analytics tools will provide you with insights into the areas to target, but you will need to do some A/B or multivariate testing to validate what works for SEO/paid search traffic as opposed to site search users. There will always be some differences in usage, but there is still data that can correlate and provide indications as to what areas you should be focusing on. Utilize this data and apply where and when you can both what the analytics tell you and what your own investigations into the data and the user experience reveal.

Source of Information : MASTERING SEARCH ANALYTICS MEASURING SEO, SEM AND SITE SEARCH

Pulling Terms from Site Search for SEO/Paid Search

Site search can provide a wealth of information, potentially providing more insight into the users’ search patterns and behavior than any other of your search touch points. Site search, when tracked, can return any and all variations of search patterns people utilize on your site. Granted, you are dealing with a specific segment of users who are likely familiar with your products, services, or offerings, but still, you have access to valuable raw search data.

Depending on the volume of traffic your site gets, you may or may not be able to glean useful insights from site search patterns. However, if you can establish some patterns of site search, you may be able to figure out where in the search path you have gaps.

Search leads from SEO only deliver traffic that the search engines feel is relevant to your site, while paid search only provides traffic through words you think are relevant to your site. Site search allows you to see what your customers are actually entering as search terms without the search engine or your assumptions interfering in the query to be tracked. Enabling the management of site search should be a top priority, not just to help you improve your other search campaigns but also to improve the usefulness of your website. You should not be surprised if your site search is one of the most-used sections of your website, especially if you have a very large site.

Utilizing site search to build keyword lists to sample for both SEO and paid search may prove very fruitful. Once you’ve identified the most frequently searched terms on your site, you can compare those terms against the terms that drive traffic to your site via organic and paid search. Begin with your SEO keywords. Terms that are highly searched via site search should be cross-referenced against terms that appear in your SEO list. Any terms that do not appear in your SEO list should be added to a list of terms that should be tested for relevancy. For any terms that do appear in your site search list, look at the volume of traffic generated as well as the rankings of those terms in the search engine result pages. Any term that ranks poorly is also a candidate for testing. You should end up with a list of potentially relevant words. You may need to further filter this list to eliminate completely irrelevant terms. Once you have your short list, validate that these words are not in your paid search campaigns. What you are left with should be prime candidates for some paid search testing. Set up campaigns for each of these terms to validate keyword volumes as well as the quality of traffic from these search terms.

You should be tracking what has become your standard dashboard of raw traffic volumes, exit rates, conversion rates, sales assists (if trackable), and average value per visitor. You will also want to track and log who else is competing on these terms. Are they highly competitive terms, or are they low-competition terms? You may find that there is a set of terms that it never occurred to you to target with your paid search or SEO campaigns. Testing the terms in paid search allows you to perform quick evaluations with minimal spends to determine how much traffic they generate and the relevancy of that traffic. You may want to test these terms against several existing landing pages, or develop a new landing page with content you think may be relevant to searchers on these terms. To get an idea of what may work, look at the sites with the top SEO rankings on these terms.

While you can use tools such as Google Website Optimizer for A/B and multivariate testing, you can also run these tests manually. It is important to understand how each variation differs, and what impact these changes have on your results. Build up a test case for these keywords and log how the terms perform against different landing pages. Using paid search to deliver traffic. Landing pages and entry points can and do impact revenue. Beyond landing pages, users may hit other pages, all of which can be tracked in your clickstream data. We can see that page-1 has the highest average value per visitor and the highest conversion rate. As this appears to be the most successful landing page, I would focus on this page for capturing elements to improve conversions and maximize revenue.

Building out a new page for a second set of paid search tests would be beneficial before investing heavily in SEO activity. Test this page against your new baseline metrics and validate how it compares against the current pages. I would set this page up as a noindex page for the search spiders, as it will eventually replace an existing page on the site.

You will want to maintain that old page’s inbound links, and page rank, while you’re testing out a new page to capitalize on improving conversions and assists. When you do launch your new page, replace the old page. If your paid search campaign was showing positive results, you may want to continue to run that concurrently with your SEO efforts and monitor revenue from both the SEO and paid search traffic.

Using site search to expand your SEO and paid search campaigns can be a very costeffective way to build a keyword list, using terms that your customers are telling you are relevant to them. The first rule in understanding your search campaigns is to exhaust all data you have access to that provides insights directly from your customers before expanding out. The only time you would want to break this rule is if you are new to the market or if you are introducing a new product and have no footprint from which to leverage data (in which case, you may look for third-party case studies or other sources of data).

Source of Information : MASTERING SEARCH ANALYTICS MEASURING SEO, SEM AND SITE SEARCH

EMC has developed a shared vision for the private cloud along with its key partners like VMware, Cisco, and AT&T. This group sees lots of opportunity in providing technology and services to companies looking for a better approach to managing IT infrastructure.

And although some companies may use private clouds as an entry point and then transition to public clouds, EMC sees the private cloud as much more than just a staging ground for public clouds. EMC and partners want to help you create a flexible set of IT resources by federating your private clouds with external infrastructures provided by third-party providers.

Not surprisingly, EMC’s contribution is concentrated on providing storage, backup, archiving, and security (from RSA) to support the data centers in a private cloud environment. When all IT resources (servers, network, and storage) are pooled in the virtualized data center model, many things need to change.

Storage must be designed and managed differently. For example, many EMC products require a dedicated pair of servers, and this requirement won’t fly in a virtualized environment. New tools and processes are required to plan and manage IT resources and ensure information security. For example, your company can use EMC’s Atmos cloud storage service to build a scalable internal storage cloud, and then tie it to an external cloud storage service. Cisco brings the network and capability of building a scalable network to the mix. VMware’s vSphere, as described later in this section, is the cloud operating system.

Source of Information : cloud computing for dummies 2010 retail ebook distribution

HP has been working on cloudlike implementations with its customers since 2001. These implementations have typically included consulting and integration support and have leveraged HP’s extensive collection of technology management products.

Based on experiences in these customer engagements, HP has put a special emphasis on helping customers who want to create hybrid cloud environments. The company is leveraging its extensive services teams (including the EDS division) to help educate and lead their customers down an appropriate path to the cloud. EDS has significant experience with vertical marketmanaged services (hosted services specialized for different industries) and HP will leverage this knowledge and intellectual property (IP) in its evolving cloud strategy.

HP’s teams of business and IT consultants and engineers get involved with the design and implementation of many different types of cloud environments. For example, HP’s Infrastructure Design Service will help you design compute, storage, data center, and Infrastructure as a Service implementations. Other teams provide management consulting, business technology optimization, and testing services.

While companies can easily incorporate a CRM software as a service implementation into its IT environment, large-scale adoption of cloud computing requires IT to adopt a services focus; HP is designing some of its consulting services with this in mind. In addition, HP has expanded its cloud environment consulting teams to help companies focus in on the quality of service delivered across all business lines.

HP is packaging its hardware for private cloud implementations. Two key examples:

✓ Proliant SL, a scale-out server environment based on commodity servers

✓ Blade Matrix, a cloud in a box that includes the preintegration of networks, servers, storage, and automation capabilities

Source of Information : cloud computing for dummies 2010 retail ebook distribution

With many of its large enterprise customers determined to transform their data centers to become more efficient, IBM has already done a lot of private and hybrid cloud implementations. While the majority of IBM’s initial efforts have been directed toward packaging private and hybrid solutions for enterprise data centers, in the longer term we expect to see a much broader strategy that includes all aspects of the cloud, including public clouds for SaaS, IaaS, and PaaS. IBM has created a centralized cloud computing organization with a goal of creating offerings that encompass software, hardware, and services.

IBM anticipates a lot of demand for solutions to manage the interface between public and private clouds. For example, IBM’s Blue Business platform supports both public and private cloud interfaces. In this scenario, the customer has a physical box on-site in the data center. This way the customer can have a private cloud inside the firewall that also supports the ability to burst out into the public cloud when they need additional compute capacity or storage.

A key element of the IBM private and hybrid cloud strategy is to offer solutions based on varying customer-driven workloads. These solutions are organized together as IBM Smart Business Cloud. IBM private and public cloud strategies offer solutions based on varying customer-centric workloads.

These solutions are delivered via three consumption models:

✓ Smart Business on the IBM Cloud (public cloud) is a set of standardized services delivered by IBM on the IBM cloud.

✓ Smart Business Cloud (private cloud) provides private cloud services, behind the client’s firewall, built and/or managed by IBM.

✓ Smart Business Systems (cloud in a box) are preintegrated, workloadoptimized systems for clients who want to build their own cloud with hardware and software.

In addition, IBM has a packaged private cloud offering. IBM combines the hardware, software, storage, virtualization, networking, and service management components in one package and adds options for services and financing. This package can include some preestablished connections to public cloud services.

As of August 2009, several categories of workload solutions are available for private cloud implementations, including the IBM Smart Analytics System. The following workloads are currently available:

✓ Development and test: Many organizations have a lot of variation in the demand for test and development resources, making these types of workloads a very practical first step for companies looking to improve data center and IT efficiency and cost-effectiveness. This offering is a private cloud implementation that provides customers with a self-service portal to develop and test on their own. This same service can be implemented inside a customer’s firewall. IBM also has a public cloud offering for this area.

✓ Desktop and devices: End-user connections to desktops and mobile devices are another workload type that IBM has identified as a requirement for private clouds. Companies want their users to access applications from anywhere (at any time) by using thin clients or other Internet-connected devices. This cloud service provides the technology infrastructure for these user environments.

✓ Infrastructure storage: IBM is offering access to storage on demand in various ways. Customers can install the IBM Smart Business Storage Cloud behind the firewall in the data center. Customers can also buy hardware with the virtual image of hardware and software required for additional storage. IBM also has an option for customers to buy on demand storage on the IBM public cloud.

✓ Infrastructure compute: This offering is IBM’s version of computing power on demand. This large enterprise offering has shared virtual images on the IBM cloud. IBM has partnered with Amazon and Google to add its middleware Software as a Service model in the Amazon and Google cloud environments.

In keeping with its strategy of providing packaged solutions to help companies get up to speed quickly, IBM also offers its IBM Cloudburst appliance, a family of preintegrated hardware, storage, virtualization, and networking with built-in service management.

Source of Information : cloud computing for dummies 2010 retail ebook distribution

Comparing public, private, and hybrid

We wish we could tell you that there are clear distinctions between private and public clouds. Unfortunately, the lines are blurring between these two approaches. Hybrid approaches also are starting to take hold. For example, some public cloud companies are now offering private versions of their public clouds. Some companies that only offered private cloud technologies are now offering public versions of those same capabilities. In this section we offer some issues to consider when you’re making your business decision.


Going public
When is a public cloud the obvious choice? Here are some examples:
✓ Your standardized workload for applications is used by lots of people. Email is an excellent example.
✓ You need to test and develop application code.
✓ You have SaaS (Software as a Service) applications from a vendor who has a well-implemented security strategy.
✓ You need incremental capacity (to add compute capacity for peak times).
✓ You’re doing collaboration projects.
✓ You’re doing an ad-hoc software development project using a Platform as a Service (PaaS) offering.


Many IT department executives are concerned about public cloud security and reliability. You need to get security right and handle any legal and governance issues, or the short-term cost savings could turn into a long-term nightmare.


Keeping things private
In contrast, when would a private cloud be the obvious choice? Here are some examples:
✓ Your business is your data and your applications. Therefore, control and security are paramount.

✓ Your business is part of an industry that must conform to strict security and data privacy issues. A private cloud will meet those requirements.

✓ Your company is large enough that you have the economies of scale to run a next generation cloud data center efficiently and effectively.


Driving a hybrid
Now add one more choice into the mix: the hybrid cloud. When would you use it? It isn’t about making an either/or choice between a public or private cloud. In most situations, we think a hybrid environment will satisfy many business needs. Here are a few examples:

✓ Your company likes a SaaS application and wants to use it as a standard throughout the company; you’re concerned about security. To solve this problem, your SaaS vendor creates a private cloud just for your company inside their firewall. They provide you with a virtual private network (VPN) for additional security. Now you have both public and private cloud ingredients.

✓ Your company offers services that are tailored for different vertical markets. For example, you might offer to handle claims payments for insurance agents, shipping services for manufacturers, or credit checking services for local banks. You may want to use a public cloud to create an online environment so each of your customers can send you requests and review their account status. However, you might want to keep the data that you manage for these customers within your own private cloud.


Although private and public cloud environments each have management requirements by themselves, these requirements become much more complex when you need to manage private, public, and traditional data centers all together. You need to add capabilities for federating (linking distributed resources) these environments. In addition, your service levels need to focus on how a service is working rather than how a server is working.

Source of Information : cloud computing for dummies 2010 retail ebook distribution

Are you a Microsoft fan? Do you want to combine your passion for Microsoft products with computer-related skills in a fulfilling career? If so, you are in luck because there are many career avenues for Microsoft lovers. With the proper training and savvy networking, you can secure one of the many different jobs available for people who love Microsoft.

Education and Training

In pursuing a Microsoft-related career, it is important that you obtain an adequate level of education and training. The best way to accomplish is with a formal Microsoft training or certificate program. You can choose among many specialized programs for whatever type Microsoft job that you want to pursue. For more information on Microsoft-related education and training, check out some online Microsoft Certifications.

Microsoft Jobs

There are a wide range of jobs for people who love Microsoft. These include:

  • Artist/Graphic Designer Accounting for several factors, such as color, form, and light, artists and graphic designers concentrate on the visual and aesthetic aspects of Microsoft games, applications, and displays.
  • Software Development Engineers Software engineers create codes and computing languages that helping transform computing ideas into practical Microsoft features and applications. Collaborating with other Microsoft professionals such as test engineers and program managers, software developers help create quality Microsoft products at the cutting edge of technology.
  • Test Engineers Test engineers test newly developed Microsoft products. Pushing the Microsoft products to the limit, test engineers help ensure that they are of a high standard of quality and are also capable of meeting customer's expectations.
  • Program Managers Program managers oversee and work with multiple Microsoft professionals in ensuring that computing ideas gets carried out and result in a quality Microsoft products. Program managers are involved in all phases of product creation, from understanding and anticipating the needs of the customer, brainstorming ideas, creating and testing the product, and developing a marketing plan.
  • Audio Designer Audio designers produce the voice aspects, sound, audio effects, and music for Microsoft games and applications. 
  • Content Publishers Content publishers perform several tasks such as the conception, design, development, production, editing, and publishing of material for Web audiences, Microsoft games and applications, customer support, and Microsoft hard-copy literature.
  • Game Designers Game designers create and design Microsoft games, while considering such factors as user experience, aesthetics, usability, software compatibility, level of gaming difficulty, audio and visual landscape, emotional elements, and how real or "lifelike" the game appears to the game user.
  • User Experience Microsoft professionals in the user experience specialization take existing or newly developed Microsoft products and develop and facilitate interactions with computing users. By getting feedback from users, professional in the user experience specialization can help make changes for Microsoft products before they hit the market or fine-tune existing Microsoft products.

Additional Microsoft Jobs Additional jobs for lovers of Microsoft products include a build engineer, international product and localization engineer, and product planner. Also, there are many electronics stores and computer outlet chains in which an individual can sell, provide customer service, or repair Microsoft products.

Defining a private cloud

There’s confusion — as well as passionate debate — over the definition of a private cloud. When we say private cloud, we mean a highly virtualized cloud data center located inside your company’s firewall. It may also be a private space dedicated to your company within a cloud vendor data center designed to handle your company’s workloads.

The characteristics of the private cloud are as follows:
✓ Allows IT to provision services and compute capability to internal users in a self-service manner
✓ Automates management tasks and lets you bill business units for the services they consume
✓ Provides a well-managed environment
✓ Optimizes the use of computing resources such as servers
✓ Supports specific workloads
✓ Provides self-service based provisioning of hardware and software resources

You might think this sounds a lot like a public cloud! A private cloud exhibits the key characteristics of a public cloud, including elasticity, scalability, and self-service provisioning. (Please refer to Chapter 1 for detailed information on cloud characteristics.) The major difference is control over the environment. In a private cloud, you (or a trusted partner) control the service management.

It might help to think of the public cloud as the Internet and the private cloud as the intranet.

If private and public clouds are so similar, why would you develop a private cloud instead of ordering capacity on demand from an Infrastructure as a Service provider or using Software as a Service? Here are several good reasons companies are using a private rather than a public cloud:

✓ Your organization has a huge, well-run data center with a lot of spare capacity. It would be more expensive to use a public cloud even if you have to add new software to transform that data center into a cloud.

✓ Your organization offers IT services to a large ecosystem of partners as part of your core business. Therefore, a private cloud could be a revenue source.

✓ Your company’s data is its lifeblood. You feel that to keep control you must keep your information behind your own firewall.

✓ You need to keep your data center running in accordance with rules of governance and compliance.

✓ You have critical performance requirements, meaning you need 99.9999 percent availability. Therefore, a private cloud may be your only option. This higher level of service is more expensive, but is a business requirement.

Some early adopters of private cloud technology have experienced server use rates of up to 90 percent. This is a real breakthrough, particularly in challenging economic times.

Source of Information : cloud computing for dummies 2010 retail ebook distribution

Talking to Your Cloud Vendor about Data

You’re thinking about using some of the data services in the cloud. Before you sign the contract, remember that data (especially your company’s data) is a precious asset and you need to treat it as such.

In addition to issues surrounding security and privacy of your data that we cover earlier in the chapter, we recommend asking your potential vendor about the following topics:

✓ Data integrity: What controls do you have to ensure the integrity of my data? For example, are there controls to make sure that all data input to any system or application is complete, accurate, and reasonable? What about any processing controls to make sure that data processing is accurate? And, there also need to be output controls in place to ensure that any output from any system, application, or process can be verified and trusted. This dovetails with the next bullet about any specific compliance issues that your particular industry might have.

✓ Compliance: You are probably aware of any compliance issues particular to your industry. Obviously, you need to make sure that your provider can comply with these regulations.

✓ Loss of data: What provisions are in the contract if the provider does something to your data (loses it because of improper backup and recovery procedures, for instance)? If the contract says that your monthly fee is simply waived, you need to ask some more questions.

✓ Business continuity plans: What happens if your cloud vendor’s data center goes down? What business continuity plans does your provider have in place: How long will it take the provider to get your data back up and running? For example, a SaaS vendor might tell you that they back up data every day, but it might take several days to get the backup onto systems in another facility. Does this meet your business imperatives?

✓ Uptime: Your provider might tell you that you will be able to access your data 99.999 percent of the time — however, read the contract. Does this uptime include scheduled maintenance?

✓ Data storage costs: Pay-as-you-go and no-capital-purchase options sound great, but read the fine print. For example, how much will it cost to move your data into the cloud? What about other hidden integration costs? How much will it cost to store your data? You should do your own calculations so you’re not caught off guard. Find out how the provider charges for data storage. Some providers offer a tiered pricing structure. Others offer pricing based on server capacity.

✓ Contract termination: How will data be returned if the contract is terminated? If you’re using a SaaS provider and it has created data for you too, will any of that get turned over to you? You need to ask yourself if this is an issue. Some companies just want the data destroyed. Understand how your provider would destroy your data to make sure that it isn’t floating around in the cloud.

✓ Data ownership: Who owns your data after it goes into the cloud? Some service providers might want to take your data, merge it with other data, and do some analysis.

✓ Switching vendors: If you create applications with one cloud vendor and then decide to move to another vendor, how difficult will it be to move your data? In other words, how interoperable are the services? Some of these vendors may have proprietary APIs and it might be costly to switch. You need to know this before you enter into an agreement.

Source of Information : cloud computing for dummies 2010 retail ebook distribution

Databases and data stores in the cloud

Given the scale of some of these applications, it isn’t surprising that new database technologies are being developed to support this kind of computing.

Some database experts believe that relational database models may have difficulty processing data across large numbers of servers — in other words, when the data is distributed across multiple machines. Performance can be slow when you’re executing complex queries that involve a join across a distributed environment. Additionally, in an old-style database cluster, data must either be replicated across the boxes in the cluster or partitioned between them. According to other database experts, this makes it hard to provision servers on demand.

In response, some large cloud providers have developed their own databases. Here’s a sample listing:

✓ Google Bigtable: This hybrid is sort of like one big table. Because tables can be large, they’re split at row boundaries into tablets, which might be 100 megabytes or so. MapReduce is often used for generating and modifying data stored in Bigtable. Bigtable is also the data storage vehicle behind Google’s App Engine (a platform for developing applications).

✓ Amazon SimpleDB: This Web service is for indexing and querying data. It’s used with two other Amazon products to store, process, and query data sets in the cloud. Amazon likens the database to a spreadsheet in that it has columns and rows with attributes and items stored in each. Unlike a spreadsheet, however, each cell can have multiple values and each item can have its own set of associated attributes. Amazon then automatically indexes the data.

✓ Cloud-based SQL: Microsoft has introduced a cloud-based SQL relational database called SQL Database (SDS). SDS provides data storage by using a relational model in the cloud and access to that data from cloud and client applications. It runs on the Microsoft Azure services platform. The Azure platform is an Internet-scale cloud-services platform hosted in Microsoft data centers; the platform provides an operating system and a set of developer services.

Numerous open-source databases are also being developed:
✓ MongoDB (schema-free, document-oriented data store written in C++)
✓ CouchDB (Apache open-source database)
✓ LucidDB (Java/C++ open-source data warehouse)

Source of Information : cloud computing for dummies 2010 retail ebook distribution

Large-scale data processing in Cloud

The lure of cloud computing is its elasticity: You can add as much capacity as you need to process and analyze your data. The data might be processed on clusters of computers. This means that the analysis is occurring across machines.

Companies are considering this approach to help them manage their supply chains and inventory control. Or, consider the case of a company processing product data, from across the country, to determine when to change a price or introduce a promotion. This data might come from the point-of-sale (POS) systems across multiple stores in multiple states. POS systems generate a lot of data, and the company might need to add computing capacity to meet demand.

This model is large-scale, distributed computing and a number of frameworks are emerging to support this model, including

✓ MapReduce, a software framework introduced by Google to support distributed computing on large sets of data. It is designed to take advantage of cloud resources. This computing is done across large numbers of computers, called clusters. Each cluster is referred to as a node. MapReduce can deal with both structured and unstructured data. Users specify a map function that processes a key/value pair to generate a set of intermediate pairs and a reduction function that merges these pairs.

✓ Apache Hadoop, an open-source distributed computing platform written in Java and inspired by MapReduce. It creates a computer pool, each with a Hadoop file system. It then uses a hash algorithm to cluster data elements that are similar. Hadoop can create a map function of organized key/value pairs that can be output to a table, to memory, or to a temporary file to be analyzed. Three copies of the data exist so that nothing gets lost.

Source of Information : cloud computing for dummies 2010 retail ebook distribution

MongoDB has the notion of drivers. Drivers for most mainstream libraries are available for interfacing and interacting with MongoDB. CouchDB uses web-standard ways of interaction and so you can connect to it using any programming language that supports the web idiom of communication. Wrappers for some languages make communication to CouchDB work like drivers for MongoDB, though CouchDB always has the RESTful HTTP interface available.

Redis, Membase, Riak, HBase, Hypertable, Cassandra, and Voldemort have support for language bindings to connect from most mainstream languages. Many of these wrappers use languageindependent services layers like Thrift or serialization mechanisms like Avro under the hood. So it becomes important to understand the performance characteristics of the various serialization formats.

One good benchmark that provides insight into the performance characteristics of serialization formats on the JVM is the jvm-serializers project at https://github.com/eishay/jvmserializers/wiki/. The performance measures via the efforts of this project relate to a number of data formats. The formats covered are as follows:

protobuf 2.3.0 — Google data interchange format. http://code.google.com/p/protobuf/

thrift 0.4.0 — Open sourced by Facebook. Commonly used by a few NoSQL products, especially HBase, Hypertable, and Cassandra. http://incubator.apache.org/thrift/

avro 1.3.2 — An Apache project. Replacing Thrift in some NoSQL products. http://avro.apache.org/

kryo 1.03 — Object graph serialization framework for Java. http://code.google.com/p/kryo/

hessian 4.0.3 — Binary web services protocol. http://hessian.caucho.com/

sbinary 0.3.1-SNAPSHOT — Describing binary format for scala types. https://github.com/harrah/sbinary

google-gson 1.6 — Library to convert Java objects to JSON. http://code.google.com/p/google-gson/

jackson 1.7.1 — Java JSON-processor. http://jackson.codehaus.org/

javolution 5.5.1 — Java for real-time and embedded systems. http://javolution.org/

protostuff 1.0.0.M7 — Serialization that leverages protobuf. http://code.google.com/p/protostuff/

woodstox 4.0.7 — High-performance XML processor. http://woodstox.codehaus.org/

aalto 0.9.5 — Aalto XML processor. www.cowtowncoder.com/hatchery/aalto/index.html

fast-infoset 1.2.6 — Open-source implementation of Fast infoset for binary XML. http://fi.java.net/

xstream 1.3.1 — Library to serialize XML and back. http://xstream.codehaus.org/

The performance runs are on a JVM but the results may be as relevant to other platforms as well. The results show that protobuf, protostuff, kryo, and the manual process are among the most efficient for serialization and de-serialization. Kyro and Avro are among the formats that are most efficient in terms of serialized size and compressed size.

Having gained a view into the performance of formats, the next section segues into benchmarks of NoSQL products themselves.

Source of Information : NoSQL


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner