News
Microsoft Sets Sights on Data Mining Dominance
Redmond hopes to challenge SAS Institute and SPSS for data mining and predictive analytic bragging rights.
- By Stephen Swoyer
- May 07, 2008
When it comes to data mining and predictive analytics, Microsoft Corp. might
not be the first company that comes to mind.
That could change, however, especially if Donald Farmer, Redmond's principal
program manager for SQL Server Data Mining, has his way.
Microsoft has come a long way in the data mining and predictive analytics segment,
Farmer says, and with a game-changing Excel 2007 release under its belt -- and
a promising SQL Server 2008 revision in the pipeline -- Redmond hopes to challenge
established powers SAS Institute Inc. and SPSS Inc. for data mining and predictive
analytic bragging rights.
"[We don't] have all the functionality of something like a SAS or an SPSS,
because that's just not our market," he conceded.
It comes down to a difference of scale, according to Farmer. SAS and SPSS typically
target larger, more expensive deployments, typically with users well-versed
in the usage of their tools. Microsoft is targeting a different kind of data
mining consumer: the Excel analyst, for example, who might not have much (if
any) experience with data mining, predictive analytics or statistical analysis,
for that matter.
"I was looking at the SPSS figures from last quarter that came through,
and it was one of their best quarters ever. They added 16 new customers. Obviously
16 customers is a pretty good number [for a company like SPSS], but for Microsoft,
16 customers doesn't even cover my travel expenses for a year. Our market just
has to be a much larger market," Farmer said.
"By the way, I don't mean to say we can't hit the high-end. Within Microsoft,
we have our own database marketing team. We're one of the largest companies
in the world. We have a huge database marketing team who do classic customer
analysis. These guys were all SAS users, but when they joined Microsoft, they
started using our tools. The entire process runs on our database, they actually
use the Excel [data mining] add-ins to do it. It's not that there's nothing
they don't miss, [it's that] they are able to achieve the same business results
using our tools."
Last year, Microsoft released
a data mining and predictive analytic add-on for its Excel 2007 product. The
add-on, which is similar to Microsoft's well-known SQL Server BI Accelerator
products, integrates natively with Excel 2007. It introduces a new "Data
Mining" tab that exposes several pre-built functions, including forecasting,
accuracy charting, cross-validation, exception highlighting, category detection,
key influencers, shopping basket analysis (the last is a SQL Server 2008-only
function) and many others.
The key, Farmer argued, is that the transition from straightforward analysis
in Excel to data mining is relatively seamless. A user need only click on the
Data Mining toolbar and select from one of several canned functions (e.g., "Accuracy
Chart," "Highlight Exceptions"). The next step uses a wizard
interface to walk a user through the rest of the process.
In other words, he said, it's organic as far as the Excel user -- who isn't
even doing data mining -- is concerned.
"For [a function such as] 'Detect Categories,' what [the add-in is] doing
is building a clustering model in the background [either on a local or remote
SQL Server instance], but we don't expect the Excel user to understand that.
We just [call it] 'Build Categories,'" Farmer explained.
"What this does is actually build a clustering model on the server. It
finds the five most significant clusters, and then returns that to the user
again in Excel, so the users gets that in a user interface that lets them understand
what the clusters mean, just using standard Excel visualization."
Ditto for a function such as exception highlighting. "When I run 'Highlight
Exceptions'...we're actually building a clustering model [on the server] that
looks for outliers from the data. But the user doesn't have to understand any
of that. The idea here is to use standard Excel features to give them a sense
that they already know how to do this," Farmer said.
Microsoft's Accuracy Charting feature lets users compare the efficacy of different
data models. This can also be a boon to data mining hotshots, according to Farmer,
because they can compare the effectiveness of the models that they design vis-à-vis
those which Microsoft provides out of the box.
Microsoft isn't leaving data mining experts in the lurch, he said. They can
design their own custom functions and embed them in the Excel toolbar, Farmer
said, as well as design (and refine) their own data models.
"The functions that we use are all public functions. We have not extended
anything here. There are no private [i.e., proprietary] protocols. It's all
publicly available interfaces, so, in theory, anybody could write this,"
he said.
Surprisingly, Microsoft's SQL Server Data Mining Add-In isn't based on technology
Redmond picked up via its acquisition
of the former ProClarity Corp. two years ago. Instead, Farmer said, it's
all homegrown.
"All of this has been developed in-house by the data mining team. There's
nothing of ProClarity in there," he said. "It's starting to get very
interesting, [because] now that we have ProClarity in-house, we're talking about
how we [can] get these capabilities together. We're doing a lot of work behind
the scenes on that."
Another important trend, Farmer pointed out, is cross-pollination between both
the Excel and the SQL Server Analysis Services teams.
"I now have a couple of program managers on my team who came from Excel
and who are now working with us designing Analysis Services tools. Suddenly
we now have people from Analysis Services who've gone and joined the Excel team,"
he said.
The takeaway, Farmer stressed, is that Microsoft doesn't hope to compete with
SAS or SPSS on a feature or functionality basis; its angle, he insisted, is
usability. Redmond's Excel-based, wizard-driven approach to data mining and
predictive analytics might lack some of the analytical heft of solutions from
SAS or SPSS, but it's eminently more usable, Farmer said.
On the programming side, too, Microsoft is making it easier for developers
to expose data mining or predictive analytic functionality to non-traditional
users (via portals, dashboards or Web applications), according to Farmer. In
this respect, programmers can use OLE DB or ADO.NET to embed analytic capabilities
in their custom applications. Redmond's competitors are doing this, too, Farmer
said, but few other companies can claim to be as developer-friendly -- or, for
that matter, as developmentally popular -- as Microsoft.
"What we've been saying is that in many ways we don't compete with SAS
or SPSS. If we're being cheeky about it, we say, 'We're just doing interventions,'"
he said. "We're seeing a lot of interest in the Excel-side [data mining],
for one thing, but we're also seeing [interest] in the embed-ability, too. The
people who are actually pushing this are from the developer side.
"We just did a 25-city road show in Europe. Everywhere we did this it
was absolutely packed with developers. They're continually looking for new ways...[to]
offer differentiating functionality to their internal users, and [our] message
of not [having to] recode your business logic [to expose new data mining functionality]
is really resonating with them.
"For a developer, capturing business logic is actually very difficult.
Understanding the business case and hard-coding it into applications -- that's
very difficult. What we're doing [with SQL Server-based data mining] offers
them a way to do that with lower maintenance costs, and they're getting very
excited about that."