Web services and the Grid are converging! The prospect of grid-based, commodity computers delivering run anywhere, anytime Web services across the Internet has hype-o-meters showing a speedy rise and marketing departments gearing up everywhere. Standards are still winding their way through community processes and early adopter products are just coming to market, but that hasn't stopped some industry watchers from proclaiming "Grid services" the next big thing. The Butler Group, for example, sees the coming boom in Grid services dwarfing even the Internet in terms of its impact, as they transform IT from a products-based to a services-oriented industry. Irrational exuberance? Maybe not. The coupling of Web services' strong standards heritage with the Grid's experience securing and managing resources in heterogeneous environments could create a boon for SOA, B2B and eCommerce. The answer ultimately lies in the Web services community's ability to turn the Grid's strengths into standards, reference implementations, and products as part of the convergence.
Ben Worthen, in his September 2002 article for CIO magazine, titled "Web Services Still Not Ready for Prime Time," singled out security and reliability as two major hurdles Web services needed to overcome before being ready for prime time. He saw the lack of standard security protocols as an issue for all but a few firms wanting to use Web services on the Internet, and the absence of an "open-line, telephone-like connection" derailing B2B and e-Commerce applications by undermining the reliability of machine-to-machine transactions. Two years, dozens of standards later, and the problems remain. Can the Grid's influence change the equation?
Addressing the Security Problem Security on the Internet, by its very nature, is problematic. The Internet operates as a confederation of independent domains where each domain potentially has its own domain authority and security policies. Figure 1 illustrates some of the problems this creates for Web services applications. In the example, a user interacts with a Web service, Service A, running in security Domain A. Domain A uses X.509 certificates for authentication and authorization. Service A uses a second service, Service B, running in security Domain B, that uses Kerberos. Service B, in turn, uses Service C, running in security Domain C, that uses user-ids and passwords. Reconciling the user's identity, privileges, and credentials across the domains is the problem.
SAML and XACML are approaching this problem by abstracting the differences out of the services themselves and into Policy Decision Points (PDP). This decreases the number of nodes involved and introduces a common mechanism, assertions, for bridging domains. SAML and XACML are relatively new standards and implementing products are just beginning to emerge. That leaves a large "legacy" base and long transition period unaddressed. The interim solution is to integrate existing security mechanisms in a coherent way. For the foreseeable future, that means bridges between domains and proxy credentials representing real credentials in situations where there is no common credential mechanism or authority.
In my article "Who's Master of Your Domain? Web services security in an unfriendly world," (WSJ, Vol. 4, issue 6), I discussed how Web services standards (WS-Policy, WS-Security, and WS-Trust) are now in place for implementing basic security services, such as authentication, authorization, confidentiality, and integrity, in Web services environments. The standards provide the grammar for communicating security policies and the flexibility necessary to support any of the traditional security strategies. They also support SAML and XACML. As you would expect, the standards specify "how" to convey security information, but not "how" to implement any specific service or bridge domains. They leave that to implementers.
The Grid's experience operating in heterogeneous environments goes directly to solving the bridging problem. The Grid Security Infrastructure (GSI), built on the Transport Layer Security (TLS) protocol and the Internet X.509 Infrastructure, creates a basic security framework that does not require a centralized certificate management authority. The Grid community has successfully worked out the problems of integrating non-X.509 based authentication and authorization mechanisms into this infrastructure by creating bridges and methods for global-to-local identity mapping, proxying credentials, and delegating privileges. Some refactoring is necessary, but the Grid community's work provides the roadmap needed to advance the discussion within the Web services community beyond abstract models to concrete reference implementations and products. The Grid's experience could cut years off product introduction cycles and provide bridging solutions between both traditional approaches and between those approaches and next generation, SAML and XACML-based products.
Addressing the Reliability Problem The Internet poses even greater reliability challenges - particularly for Web services integrating components, belonging to multiple owners, using the Internet as the backbone.
When you pick up a phone, you get a dial tone. The telephone is always on; it's always available. That is the level of reliability Ben Worthen was talking about when he said there isn't an "open-line, telephone-like connection" for Web services. When Worthen used the term reliability, he was actually using it as shorthand for the "ilities" - a broader set of Quality of Service (QoS) metrics characterizing an application (see Table 1 for a more comprehensive list). Like a dial tone, when the "ilities" are present, an application is flexible, stable, reliable, and there when you need it. When they are missing or unevenly implemented, all bets are off.
A "dial-tone"-quality service must be able to handle both scheduled and unscheduled outages. It must also be able to deal with surges in demand both in terms of the number of clients requesting to use the service and in the resources any particular request may need. From a client's perspective, it doesn't really matter whether a service is "down" or simply too busy to respond, it is either there or it isn't. The keys to building "dial-tone"quality services are redundancy, managed environments, and high-quality Web services. To better understand this point, let's look at what it means to build a "dial-tone"-quality service for a simple, self-contained environment, and then extend the discussion to include the Internet.
Redundancy, redundancy, redundancy. The first key to building a "dial-tone"-quality service is including enough resources to eliminate single points of failure and resource bottlenecks. The resources in question include hardware, software, networks, and Web service instances. The level of redundancy needed depends on the level of failover the service must provide, not only during normal operations, but also during peek demand and routine maintenance outages.
The three levels of failover possible are cold, warm, and hot (stateful). Cold failover brings up a new component, from a down or offline status, in response to unexpected demand or an outage. Cold failover is generally only an option at the applications, or Web service, level for "dial-tone" -quality services because of the time needed to bring up system or network components cold. Warm failover keeps spare components or capacity online. With warm failover, a service can gracefully deal with both scheduled and unscheduled outages by either dynamically restarting or switching components, and with surge, by distributing load. Stateful failover goes a step further by being able to transparently switch from a failing or degraded component to a backup while maintaining application state.
The second key to achieving "dial-tone"-quality services is managed environments. In a managed environment, management products control different components and levels of the infrastructure, detecting, recovering, and repairing, or at least compensating for, failing or degraded components. Management products include load balancing, clustering engines, and network, systems, and applications management services - which are available from firms such as BEA, Cisco, Computer Associates, IBM, HP, Oracle, Quest, and Veritas. The management products monitor key metrics, resource status, and fault diagnostics, and use redundant resources to maintain consistent levels of service.
The third key is "dial tone"-quality Web services. A "dial tone"-quality Web service is well written, error free, does not contain any unwanted or unknown side effects, and does not request resources beyond those needed, nor hold resources longer than necessary. The service abstracts out lifetime, location, and resource management functions and incorporates interfaces for interacting with management products that provide those capabilities. For stateful failover, the service also maintains the application "context" necessary for "failing over" to another instance.
Figure 2 brings these concepts together into a notional, "dial-tone" quality service. Load balancing provides scalability, clustering provides availability and reliability, transaction and database replication provide failover, geographic separation provides disaster recovery, and server and network redundancy provide the underlying resources necessary to eliminate any single points of failure and to meet surge requirements. The implementation includes not two, but three instances of critical components to ensure there is enough redundancy to guarantee high availability and reliability even when some components are offline for maintenance. Gartner estimates that it costs approximately 3.5 times as much to create such an environment as one for standard applications.
How does the Internet affect this picture? The "ilities" are difficult to build into distributed, Internet-based Web services applications because Web services applications are compositional by nature, i.e., you create larger applications by linking together smaller services or components, and the Internet introduces diversity in terms of the underlying systems, network, and manageability infrastructure. The overall reliability of such a service is a function of the reliability of the individual component services and their supporting and connecting infrastructures.
Figure 2 illustrates this problem. In the example, a user connects to Service A, which is 99.9% available on a 24x7 basis (three nines, by the way, equates to less than 2 minutes downtime per day, and to less than 9 hours downtime per year). Service A connects to Service B, which is only 99.5% available but still 24x7. Service B, in turn, connects to Service C, which is 95% available but only 12x7. Assuming all three services are critical to the application's functionality, the overall availability of Service A is only 49.2% (.999*.995*.95*.5) when you look at it from a 24x7 perspective.
While the challenges are different, the keys to building "dial-tone"-quality services for the Internet remain redundancy, managed environments, and high-quality Web services. The X-factor in this equation is managed environments. Management products geared to the Internet cannot take the centralized management approach shown in Figure 3. Instead, they must take a decentralized, federated approach. This presupposes standards for products communicating with one another, with infrastructure- level resources - such as servers, memory, and network devices - and with applicationlevel services such as Web services. It also presupposes standards for products requesting and providing the metrics, status, and resource information necessary for management products to relocate and dynamically control resources within the environment. Web services is the logical choice for providing these standards, but stateful resources, context maintenance, and standard Web services interfaces for interacting with management products and resources, have, up until this point, been missing. This is where the Grid comes in. Part of the convergence between Web services and the Grid has been getting the "right" interfaces and standards in place to specifically address the manageability question.
Adding Stateful Resources to Web Services Fault-tolerant and stateful failover solutions assume the ability to interact with stateful services and resources such as servers, disk drives, and memory. Web services is built on a family of stateless protocols: HTTP, SOAP, WSDL, etc. They are stateless in the sense that each client/server interaction is discrete, with no assumption one message relates to another. While higher level protocols such as WS-Coordination, WS-Transaction, and BPEL4WS provide patterns for implementing stateful message sequences, they also assume stateless interactions at the message level. So, up until recently, none of the existing standards accounted for stateful services or resources.
The Web Services Resource Framework (WSRF) proposal corrects this omission by defining a WS-Resource as a combination of a Web service and a stateful resource expressed as an XML document and a WS-Addressing endpoint. WSRF creates the framework for message exchanges between Web services components and a stateful resource, thereby paving the way for exposing and sharing such resources as Web services and for creating stateful Web services.
Adding Context Web Services Composite Application Framework (WS-CAF) compliments WSRF by adding the context necessary for managing state aware, compositional applications in stateful environments. WS-CAF introduces the concepts of participants, sharing a common context, and coordinators, orchestrating and managing that context, thereby enabling them to ensure a common outcome across the application.
Adding the Management Product Interfaces WSRF, Web Services Notification (WS-Notification), and Web Services Distributed Management (WSDM) complete the picture by defining the Web services interfaces and metrics necessary for managing Web services and for integrating Management Products across heterogeneous environments.
WSRF defines the interfaces for managing service lifetimes, properties, and faults. WS-Notification adds a publish and subscribe interface that Management Products can use for overseeing and reporting state and property changes. WSDM builds on WSRF introducing the concept of "manageable resources" and identifying the interfaces necessary for Management Products to interact with those resources.
WSDM is a two-part specification. WSDM Management Using Web Services (MUWS) defines the interfaces for management products to manage resources using Web services messages. WSDM Management of Web services (MOWS) defines the use of MUWS for managing Web services resources to:
- Monitor QoS
- Enforce SLAs
- Control tasks
- Manage resource life cycles
Basic WSDM products can achieve simple fault tolerance; WSDM products acting as WS-CAF coordinators can monitor and manage the context necessary for providing stateful failover.
When will WSDM products be available? They are here today with CA and HP on the leading edge. CA's Unicenter WSDM is one of the first products available, and HP is working closely with the WSDM standards committee, contributing elements of its Web Services Management Framework (WSMF) to the WSDM standard. Smaller companies, such as AmberPoint, Blue Titan, and Infravio, also have products in this space. Bigger players such as BMC, IBM, Microsoft, and Veritas are rapidly following.
Conclusion As Figure 4 summarizes, the Grid is significantly influencing Web services standards. Web services can finally clear the hurdles Ben Worthen identified by leveraging the Grid's experience in the areas of security and service manageability. The Internet is still a challenge, but even there the necessary pieces are starting to fall into place. Standards, however, are still in their infancy, many are either at Version 1.0 or lower, and some deconfliction still needs to occur. Management products built to the standards are relatively new or still on the drawing boards, meaning widespread adoption is still several years off. It may be 5-10 years before we fully understand the Grid's full impact on Web services, but the convergence is still a thread worth watching. If the Web services and Grid communities leverage their synergy, this is one instance where the reality may live up to the hype and given the - potential, this one could turn out to be the big kahuna.
References
Worthen, B., "Web Services Still Not Ready for Prime Time", CIO Magazine, Sept. 1, 2002. www.cio.com/archive/090102/prime.html
Foster, I., et.al, "The Anatomy of the Grid." Globus Organization. 2001. www.globus.org/research/papers/anatomy.pdf
Foster, I., et.al, "The Physiology of the Grid, Globus Organization." June 2002. www.globus.org/research/papers/ogsa.pdf
Tuecke, S., "Grid Security Infrastructure (GSI) Roadmap." Internet Draft, Global Grid Forum, February 2001. www.gridforum.org/Security/ggf1_2001_03/drafts/ draft-ggf-gsi-roadmap-02.doc
Tuecke, S., et. al. "Open Grid Services Infrastructure (OGSI) Version 1.0." Global Grid Forum, 2003. www.ggf.org/ogsi-wg
Web Services Resource Framework Version 1.0, March 5, 2004. www-106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf
Bunting, Doug, et. al (2003). "Web Services Composite Application Framework (WS-CAF) Ver 1.0." Arjuna Technologies, Ltd; Fujitsu Limited; IONA Technologies Ltd; Oracle Corporation; and Sun Microsystems, Inc. http://developers.sun.com/techtopics/webservices/wscaf/primer.pdf * Graham, Steve, et.al. "Web Services Notification (WS-Notification)." International Business Machines Corporation, Sonic Software Corporation, SAP AG, Hewlett-Packacrd Development Company, Akamai Technologies Inc, and Tibco Software Inc., 2003-2004. ifr.sap.com/ws-notification/ws-notification.pdf
DeCarlo, J, et. al. "Management Using Web Services: Architecture." OASIS, 2003. xml.coverpages.org/WSDM-MUWSArch20031208.pdf * Dharmawan, A., et. al. "Web Services Distributed Management: Management Using Web Services (MUWS 0.5)." OASIS Open, 2003-2004. www.oasis-open.org/committees/download.php/ 6234/cd-wsdm-muws-0.5.pdf * DeCarlo, J, et. al. "Web Services Distributed Management: Management of Web Services (WSDM-MOWS 0.5)." OASIS Open, 2003-2004. www.oasis-open.org/committees/download.php/ 6201/cd-wsdm-mows-0.5.20040402.pdf
|