Amazon Brings Data Warehousing to the Cloud
Less than seven years ago, Amazon Web Services disrupted traditional datacenter computing with its cloud-based infrastructure services, allowing enterprise customers to provision compute and storage and pay based on usage without having to make capital outlays for hardware or software. Many who have moved to this model of paying for IT infrastructure as an operational expense have enjoyed considerable reductions in capital expenditures.
Now, Amazon is looking to similarly upend the way organizations deploy data warehouses.
Kicking off its first-ever partner and customer conference on Wednesday, Amazon launched a cloud-based data warehousing service called Redshift. Amazon says the service will substantially reduce the cost of deploying data warehouses by eliminating the need to acquire conventional software and hardware provided by the likes of EMC, Hewlett-Packard, IBM, Microsoft, Oracle, SAP and Teradata.
In his opening keynote address at the company's re:Invent conference in Las Vegas, Amazon Web Services Senior VP Andy Jassy cited an IBM-commissioned report that found typical data warehouse installations can cost anywhere from $19,000 to $25,000 per terabyte per year. Using reserved data warehouse instances on Amazon's forthcoming Redshift, the average annual cost per terabyte will amount to less than $1,000 per TB year, according to Jassy.
"It allows you to easily and rapidly analyze petabytes of data. It's about a tenth of the cost of traditional data warehouse solutions. It automates the deployment and it works with the popular business intelligence tools," Jassy told 6,000 attendees present at re:Invent and 12,000 registered viewers (including yours truly) of the live webcast.
Customers can choose from either 16 TB nodes or 2 TB nodes and can configure up to 100 nodes per hour up to 1.6 petabytes starting at 85 cents per hour for a 2 TB node. The data is stored in columnar format, Jassy said, which means that the I/O moves much more quickly and queries of data will render much faster than a typical data warehouse solution. The service supports queries with standard SQL, JDBC and ODBC, he noted.
The parent company of AWS, the flagship Amazon retail site, has been testing Redshift for several months. Jassy said the group took 2 billion rows of data and ran six of its most complex queries typically performed in its existing Netezza (now part of IBM) data warehouse. On two 16-terabyte nodes of Redshift, it cost $3.65 per hour equating to $32,000 per year. "Instead of spending millions of dollars, they spent $32,000 a year and ended up with 10 times faster queries," Jassy said.
"Some multi-hour queries finish in under an hour, and some queries that took five to 19 minutes on our current data warehouse are now returning in seconds with Amazon Redshift," said Erik Selberg, manager of Amazon.com's data warehouse team, in a statement.
Redshift's underlying data warehouse engine is powered by ParAccel, a venture-backed company with a deep bench of data warehousing veterans that offers its own high performance analytic database. Initially Redshift will support BI tools from Jaspersoft and MicroStrategy but Jassy said it will also support other leading tools including Cognos from IBM and BusinessObjects from SAP. Early customers that are already participating in a private beta are Flipboard, the team of NASA/Jet Propulsion Labs, Netflix and Schumacher Group.
The service is available now for a limited preview but Amazon is targeting early next year to make Redshift commercially available.
So will Redshift take a bite out of the traditional data warehousing business? That remains to be seen but if Amazon delivers the price-performance that it's promising, it'll offer a compelling alternative, particularly to organizations that can't afford a traditional data warehouse today that have the need to analyze information.
"It doesn't necessarily mean customers are going to chuck the data warehouses they've already got," said Forrester Research analyst James Staten in a telephone interview. "If you've already go one you've already sunk that cost in. But if you're going to have to double or triple that data warehouse in size, it's really going to be hard to justify the cost of keeping them on premises."
That said, despite the rapid growth of the cloud business of Amazon and other providers, many organizations remain reluctant to move mission-critical or sensitive data off premise and that could certainly impact how quickly data warehousing and big data analytics moves to the cloud.
Jassy made clear Amazon will continue its path to be a disruptive force in datacenter computing and I'll spell that out in a post on Thursday following Amazon CTO Werner Vogels' keynote.
Posted by Jeffrey Schwartz on November 28, 2012