News

Microsoft Readies Azure-Based Big Data Offerings

Microsoft on Monday gave an update on some forthcoming additions to its Big Data portfolio, including Azure Data Lake Store.

First announced this past April, Azure Data Lake Store will be released as a preview later this year. Azure Data Lake Store is a new HDFS-compatible Hadoop File System data store aimed at enabling organizations to run large analytics workloads. Microsoft describes Azure Data Lake Store as a single repository that lets users capture data of any size or format without requiring changes to the application as data scales. Data can be securely stored and shared, as well as processed and queried from HDFS-based applications and tools, according to T. K. "Ranga" Rengarajan, Microsoft's corporate vice president for data platform, in a blog post on Monday.

"Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages," Rengarajan said. "It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications."

Complementing Azure Data Lake Store will be the newly announced Azure Data Lake Analytics, an Apache YARN-based service designed to dynamically scale to handle large Big Data workloads. The new Azure Data Lake Analytics service will be based on U-SQL, a language that will "unify the benefits of SQL with the power of expressive code," Rengarajan said. "U-SQL's scalable distributed query capability enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse."

In a MSDN blog posted on Monday, Michael Rys, a principal program manager for Big Data at Microsoft, explained why U-SQL is suited for Azure Data Lake Analytics:

Taking the issues of both SQL-based and procedural languages into account, we designed U-SQL from the ground-up as an evolution of the declarative SQL language with native extensibility through user code written in C#. This unifies both paradigms, unifies structured, unstructured, and remote data processing, unifies the declarative and custom imperative coding experience, and unifies the experience around extending your language capabilities.

U-SQL is built on the learnings from Microsoft's internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL, and Hive. For example, we base our SQL and programming language integration and the execution and optimization framework for U-SQL on SCOPE, which currently runs hundred thousands of jobs each day internally. We also align the metadata system (databases, tables, etc.), the SQL syntax, and language semantics with T-SQL and ANSI SQL, the query languages most of our SQL Server customers are familiar with. And we use C# data types and the C# expression language so you can seamlessly write C# predicates and expressions inside SELECT statements and use C# to add your custom logic. Finally, we looked to Hive and other Big Data languages to identify patterns and data processing requirements and integrate them into our framework.

Microsoft also announced the general availability of managed clusters for its Azure HDInsight service on Linux, which the company claims has a 99.9 percent uptime SLA. The company also is offering Azure Data Lake Tools for Visual Studio and said that ISV applications can be offered in the Azure Marketplace.

About the Author

Jeffrey Schwartz is editor of Redmond magazine and also covers cloud computing for Virtualization Review's Cloud Report. In addition, he writes the Channeling the Cloud column for Redmond Channel Partner. Follow him on Twitter @JeffreySchwartz.

Most   Popular