Amazon Brings Data Warehousing to the Cloud
    		Less than seven years ago, Amazon Web Services disrupted  traditional datacenter computing with its cloud-based infrastructure services,  allowing enterprise customers to provision compute and storage and pay based on  usage without having to make capital outlays for hardware or software. Many  who have moved to this model of paying for IT infrastructure as an operational  expense have enjoyed considerable reductions in capital expenditures. 
		Now,  Amazon is looking to similarly upend the way organizations deploy data  warehouses.
		Kicking off its first-ever partner and customer conference  on Wednesday, Amazon launched a cloud-based data warehousing service called Redshift.  Amazon says the service will substantially reduce the cost of deploying data  warehouses by eliminating the need to acquire conventional software and  hardware provided by the likes of EMC, Hewlett-Packard, IBM, Microsoft, Oracle,  SAP and Teradata.
		In his opening keynote address at the company's re:Invent conference in Las Vegas, Amazon Web Services Senior VP Andy Jassy cited an  IBM-commissioned report that found typical data warehouse installations can  cost anywhere from $19,000 to $25,000 per terabyte per year. Using reserved  data warehouse instances on Amazon's forthcoming Redshift, the average annual  cost per terabyte will amount to less than $1,000 per TB year, according to  Jassy. 
		"It allows you to easily and rapidly analyze petabytes  of data. It's about a tenth of the cost of traditional data warehouse  solutions. It automates the deployment and it works with the popular business  intelligence tools," Jassy told 6,000 attendees present at re:Invent and  12,000 registered viewers (including yours truly) of the live webcast. 
		Customers can choose from either 16 TB nodes or 2 TB nodes  and can configure up to 100 nodes per hour up to 1.6 petabytes starting at 85 cents per  hour for a 2 TB node. The data is stored in columnar format, Jassy said, which  means that the I/O moves much more quickly and queries of data will render much  faster than a typical data warehouse solution. The service supports queries  with standard SQL, JDBC and ODBC, he noted.
		The parent company of AWS, the flagship Amazon retail site,  has been testing Redshift for several months. Jassy said the group took 2  billion rows of data and ran six of its most complex queries typically  performed in its existing Netezza (now part of IBM) data warehouse. On two  16-terabyte nodes of Redshift, it cost $3.65 per hour equating to $32,000 per  year. "Instead of spending millions of dollars, they spent $32,000 a year  and ended up with 10 times faster queries," Jassy said.
		"Some multi-hour queries finish in under an hour, and  some queries that took five to 19 minutes on our current data warehouse are now  returning in seconds with Amazon Redshift," said Erik Selberg, manager of Amazon.com's  data warehouse team, in a statement.
		Redshift's underlying data warehouse engine is powered by  ParAccel, a venture-backed company with a deep bench of data warehousing  veterans that offers its own high performance analytic database. Initially  Redshift will support BI tools from Jaspersoft and MicroStrategy but Jassy said  it will also support other leading tools including Cognos from IBM and  BusinessObjects from SAP. Early customers that are already participating in a  private beta are Flipboard, the team of NASA/Jet Propulsion Labs, Netflix and  Schumacher Group. 
		The service is available now for a limited preview but Amazon  is targeting early next year to make Redshift commercially available.
		So will Redshift take a bite out of the traditional data  warehousing business? That remains to be seen but if Amazon delivers the price-performance  that it's promising, it'll offer a compelling alternative, particularly to  organizations that can't afford a traditional data warehouse today that have  the need to analyze information.
		"It doesn't necessarily mean customers are going to  chuck the data warehouses they've already got," said Forrester Research  analyst James Staten in a telephone interview. "If you've already go one  you've already sunk that cost in. But if you're going to have to double or  triple that data warehouse in size, it's really going to be hard to justify the  cost of keeping them on premises."
		That said, despite the rapid growth of the cloud business of  Amazon and other providers, many organizations remain reluctant to move mission-critical or sensitive data off premise and that could certainly impact how  quickly data warehousing and big data analytics moves to the cloud.
		Jassy made clear Amazon will continue its path to  be a disruptive force in datacenter computing and I'll spell that out in a  post on Thursday following Amazon CTO Werner Vogels' keynote.
 
	Posted by Jeffrey Schwartz on November 28, 2012