NEWS.com.au Network
NEWS.com.au |
FOX SPORTS |
CLASSIFIEDS |
MOBILE |
Beijing Olympics
previous pause next Network Highlights:

Amazon shows price of cloud leadership

Stephen Ellis | August 05, 2008

IT'S a truism in information technology that it rarely pays to be a user of the early versions of anything, and especially of the first release.

Still, someone has to do the dirty work so the rest of us can benefit later on in the life cycle of the product or service.

One of the current frontlines on which early adopters are doing the heavy lifting is cloud computing, where both the provision and commercial use of remote, shared computing infrastructure is still in its early days.

Early days, but hardly its infancy.

Quite a few big-name companies are already relying on cloud systems such as Amazons AWS or Google Apps for critical parts of their online infrastructure, including the likes of Nasdaq, US Major League Baseball, Hasbro and SAP.

Many others are testing such systems.

So when Amazon's S3 (Simple Storage Service, part of its broader AWS offerings) cloud went down for eight hours on July 20, the outage affected thousands of businesses that use S3 to remotely store and retrieve data such as pictures used in web pages or customer records.

Amazon is arguably the leading player among a handful of huge web-focused companies that know more than anyone about how to build, operate and maintain a complex, distributed IT architecture capable of providing true cloud services.

While firms such as Google, Microsoft, 3Tera and Yahoo claim to be peers, none can currently match Amazon's operational experience and service breadth.

At one level it is perhaps not surprising that it is Amazon's S3 service (rather than one of its other offerings) in which outages have most been an issue.

Of the three fundamental resources used in computation, processor cycles, bandwidth and storage, it is probably the last - storage - that is best suited to cloud provision and of most interest to current users (at least anecdotally).

Without trivialising the challenges of designing a truly secure, scalable, fault-tolerant and highly available shared storage cloud, basic user requirements and best technical ways to achieve most of them are pretty well understood.

Storing multiple redundant copies of data, using authentication and encryption to provide users with privacy and security and designing automated failover and repair mechanisms to compensate for inevitable periodic component-level failures (by disc drives or RAID controllers, for instance) are, more or less, the building blocks of every cloud storage project. As it turns out, Amazon's storage service is no different, and it didn't go down because of some error in its use of these basic foundations.

Instead, S3 ground to a temporary halt because of a breakdown in Amazon's approach to a final, trickier part of meeting the challenge of cloud computing - the need for a layer of co-ordinated metadata above the storage itself, to provide system-wide knowledge of where all the various pieces of information are stored, which data is owned by which users and so on.

In distributed systems, there is usually some sort of background chatter between servers that perform tasks (such as storing or retrieving a file) to ensure they all have a shared, coherent, synchronised picture of what is where.

According to Amazon's commendably clear and honest explanation of what went wrong with S3, corruption in some of these maps of its data storage system caused servers to ask each other for the correct map with increasing frequency, until the whole system jammed up.

In a distributed system this is exactly what should happen if a server thinks its version of the system metadata might be corrupted.

It should find out whether this is the case, and if it is, either obtain the correct metadata or shut itself down. Automatically doing the right thing can create unexpected feedback and feed-forward loops in complex, massive computing clouds, and this seems to have been what happened at Amazon.

Fortunately, it didn't take Amazon long to fix its problems, and of course the firm learned from its experience and has subsequently adapted its architecture to reduce the chances of a repetition.

Amazon is far from the only player in cloud computing to have run into such problems.

Its occasional difficulties just happen to be more visible than those of its rivals because it has been more successful at building its business in this area.

In addition, most business users of the early cloud services on offer today are aware these technologies are still immature, and in many cases they have fallback plans or only use them for tasks in which they can afford occasional downtime (which, of course, most users could just as easily suffer if they were handling online storage internally).

Still, the incident is a reminder that large-scale commercial cloud computing is still a relatively new technical achievement, and as it scales and develops it will no doubt encounter further hurdles that affect the paying users who are doing the rest of us the favour of helping cloud providers perfect their recipes.

Your Comments:

0 Comment(s)

Story Tools

Post A Comment

We welcome your comments on this story. Comments are submitted for possible publication on the condition that they may be edited. Please provide a screen name and suburb/location - these will be published. We also require a working email address - not for publication, but for verification. Read our publication guidelines.

* Required fields

Share This Article

From here you can use the Social Web links to save Amazon shows price of cloud leadership to a social bookmarking site.

Email To A Friend

* Required fields

Information provided on this page will not be used for any other purpose than to notify the recipient of the article you have chosen.

Register now!

Sign up for a daily update of the biggest stories in IT. From Microsoft to Microformats, you'll be on top of all the latest in IT news five days a week.

Also in Australian IT

Dell to announce second retail partner

DELL Australia is expanding its retail presence and will announce a new partner on Friday.

Jetstar's intranet soars

JETSTAR revamped its corporate intranet to serve one of the most mobile workforces around.

OLPC XO-1 laptop a rugged marvel

THE is a robust laptop with a waterproof membrane keyboard - donate an XO for a child in need and get one free as a gift.

Same old song from Don and his broadband

BEREFT of anything that resembled original or constructive thought, Telstra has exhumed its old broadband strategy from 2005.

Also in the Australian

Macquarie fires 100 bankers, advisers

3:44pm MACQUARIE Bank has sacked about 100 investment bankers and advisers today as it begins to consolidate its global workforce.

Macquarie fires 100 bankers, advisers

ABOUT 100 investment bankers and advisers have been sacked by Macquarie today, as it begins to consolidate its global workforce.

Dery annointed M&C's global chair

Australian Tom Dery has been appointed the global chairman of advertising agency M&C Saatchi.

Bradley expected to blur lines

TERTIARY leaders expect the imminent Bradley review to urge the merger of the higher and vocational education sectors.