How one Solution Provider is using Site Server 3.0’s powerful search capabilities to bring Web content directly and continually to employees.

Search the World Over

How one Solution Provider is using Site Server 3.0’s powerful search capabilities to bring Web content directly and continually to employees.

Maybe I should start this article with the moral, which is: “Automatically add at least one more week to your project plan the first time you try this.”

Like many consultants and trainers, I have a weakness: I love playing with new technologies—especially those put out by Microsoft. Somehow, when that happens, the MCSE lobe of my brain goes into high gear, and I start speaking in tongues. Another little problem: I love trying something no one else has done before—at least, no one in my corner of the world.

Enter: Microsoft Site Server 3.0. Exit: Most of my time in the month of July.

The idea was simple: Deploy a platform to allow my company, QuickStart Technologies, to move our “knowledge management” plans forward. As a service company, we’re constantly looking for ways to capture our shared experiences and best practices and to share that knowledge with our entire staff. Also, management of knowledge at QuickStart was made more difficult by our inability to search all relevant sources of information in our enterprise from a single point. As with most networks, we’d have to look in various file shares, search Exchange public folders separately, and search each Web site (internal and external) separately. In light of those challenges, Site Server 3.0 seemed the platform to help us meet our business needs.

The project would be deployed in several phases. Phase one: Deploy the platform (that’s MCSE-speak for set up the box). Phase two: Implement a Search Server solution to catalog the company’s known universe (internal resources, the intranet, public Web sites, and public folders) and some important sites on the Internet. This would allow employees to conduct searches on a set of knowledge specifically pertaining to our business. While many Internet search engines would give us lots of Internet content, this would give us the one-stop shopping we wanted. Naturally, I jumped at the chance to oversee the project.

Site Server 3.0 actually contains more than just search capabilities. Site Server’s core centers around content publishing and management, search, and content delivery. (For a description of the individual components, see John West’s article "Broaden Your Sites: The Site Server 3.0 Story" in this issue). Ultimately, our goal would be to use the content management capabilities to capture and categorize content, use search to give our employees access to all available content, and then use the content delivery mechanisms to personalize and monitor the “user experience”—what the user actually sees at the site.

How Hard Could Setup Be?

The first thing we did, believe it or not, was to read the directions. Site Server comes with an excellent online set of documentation. Without a doubt, the docs saved me hours of problems. That’s because, as with many of Microsoft’s most complex products, Site Server requires a particular order to its installation. You’ll want to follow the order suggested, partly out of necessity (one product will look for another), and partly out of a desire not to start the process over again (trust me on this one).

The first thing I did was to set up our server. You know, the standard stuff: Windows NT Server 4.0 with Service Pack 3. I even gave it a creative NetBIOS name—TribalKnowledge—that leveraged all 15 characters available. Big mistake, but more on that later.

I opted for the de facto standard for Web servers: a stand-alone box. I could easily set up the platform as a stand-alone entity first, then merge it into the domain that the server would eventually call “home” (more on that later too). Since I’m not a domain administrator, this was also the least painful route.

Many Microsoft products require large amounts of disk space, processor power, or memory. Site Server doesn’t discriminate—it likes all three. My server (a Pentium II 333MHz) currently houses about 128M of RAM (and I’m in the process of doubling that), and about 6G of storage space.

Because many of Site Server’s complementary programs (such as IIS 4.0) were going to need to throw things in the system partition, I opted for a large system partition (2G) and used the rest of the disk as a single large data partition (4G). In general, I’d recommend that you install Site Server in a separate partition from your \WINNT directory. Site Server likes to store things in its own directory by default (search catalogs, etc.) and it’s much easier to deal with that issue if you install Site Server in a data partition.

Microsoft recommends a minimum virtual memory size of 128M. That worked fine for me for awhile—about three days of active searching, to be exact. Then those nasty little “out of virtual memory” errors sprang up as I started to implement a search. After that, I more than doubled the size of virtual memory, to 256M, and all has been well since. I’d recommend that you double the minimum requirement if you’re doing any memory or processor intensive tasks—like Search or Analysis—with Site Server.

Installing the Database

The next step involved installing SQL Server 6.5. While this didn’t seem like a necessity at the time (I could have used a remote SQL Server or local Access database), I decided to go with experience and avoid the local Access database. The local SQL Server has also made prototyping a dream. I can do all my later work with the Analysis and Membership databases from the confines of a single server. While the database performance wasn’t as fast as I’d want it to be, I also didn’t need to wait for an act of Congress for my DBA to approve the addition of a new table. Later on, I moved all my database content to a separate SQL Server and simply changed the pointers on my ODBC references.

Installation was easy, since I read the directions a couple of times. Not only did I need to install SQL using the Local System Account security context, but I needed to increase the Master Database Size to 50M and activate TCP/IP connectivity. SQL Server installed beautifully. Then came the requisite patches: SQL Server Service Pack 4 (unlike the beta, SP 3 won’t do it), and a file called SQLSERVR.EXE, on the Site Server 3.0 CD. I needed to replace the file in the MSSQL\BINN (not BIN) directory. Once again, read all the release notes and documentation!

At this point, all I needed to do was create my data devices and databases, update the ODBC drivers from Microsoft’s Web site, create data source names (DSNs) that point to my database, and tune SQL Server according to the documentation. No headaches there, since everything was fairly self-explanatory. Honestly, none of it is required for Site Server Search; but I figured it would be better to have everything in place for using content management and personalization down the road.

Then I installed Internet Explorer 4.01 (a requisite). I left out the Active Desktop, since it wasn’t necessary.

Figure 1. QuickStart’s application of Site Server 3.0 manages the corporate knowledge base on a platform of NT Server and IIS 4.0. It uses the Catalog Build Server to maintain information culled from several resources, including Web sites and network files. Search statistics and user preferences reside in a SQL Server database as an analysis database and personalization database, respectively.

Finally, I installed Option Pack 4.0. The little helpful “hint” in the documentation mentioned that I should also install Transaction Server and Index Server, so who was I to argue? Next time, I might do a little arguing. Index Server 2.0 is invaluable for much of the intranet functionality of Site Server 3.0 (specifically content management), but I was able to disable it from the Services Control Panel Applet. You’ll need to get all the way down in the Search documentation for that little tidbit. I’ve gotten much better system performance on my dedicated Search Server without Index Server.

The next step required a little digging as well. I needed to “activate” the FrontPage extensions that shipped with the NT Option Pack. So, I opened up the FrontPage Server Administrator and installed the FrontPage Extensions. Funny, I thought I just did that as part of installing Option Pack.

Then, I installed Visual InterDev, FrontPage 98, and the FrontPage 98b page from Microsoft’s Web site. I also decided to go to the Microsoft site and update the FrontPage Server extensions. Call me paranoid.

After all this, you’re probably thinking, “So would you go ahead and install Site Server already?”

What’s in a Name?

Site Server installation seemed easy. I created two Administrator level accounts on the server, one for the Publishing Component and one for the Search Component. Click Setup, answer a couple of questions, and you’re off! And then…BOOM! LDAP Error!

If you’re unfamiliar with it, LDAP is the Lightweight Directory Access Protocol, the RFC that will solve all of your worries. Actually, it’s a great protocol for accessing directory information, particularly the Site Vocabulary that Site Server uses for knowledge management and the membership directory components. It’s also the new standard appearing in Windows NT 5.0. In this case, LDAP was going to help provide personalization for my search Web pages, so I needed it to be functional from the start.

Naturally, still being the worrier that I am, I started to rebuild the server. I figured that it was me—or something I’d done wrong. BOOM! Still no luck.

At this point, I want to thank the Microsoft PSS engineer who spent three hours on the phone with me, to no avail, trying to diagnose the problem. I finally did some digging and realized I had used the name INSTRUCTOR (it’s the MCT in me) during one of my successful installations earlier that month. It turns out that Site Server doesn’t like a 15-character name, like “TribalKnowledge.” In fact, it made LDAP initialization fail. With a 10-character name, like “Instructor,” no problem. So, some 20 installations later, I changed the rules as well as the name of my server. Success!

Now all I had to do was configure my server and create a Web site. No problem, right?

Since I now had a functioning box, it was time to request the assistance of our IS staff. They “opened the door” for me and created a machine account in the domain using Server Manager. After successfully adding the Site Server machine to its new home domain, I had our IS staff create an “access user account” in the domain for Site Server to search resources on my own internal network. For simplicity’s sake, I gave it the same name and password as the Search Service account on my local server. Unlike the local account, it didn’t have administrator-level privileges. No sense creating a back door into the domain.

In Search of…?

Our first step in creating a Search Server solution was to configure the Catalog Build Server. From the MMC, I had to go into the properties of the Catalog Build Server. As I mentioned earlier, I created a content access account on my home domain. I had to specify the account (DOMAIN\USERNAME) and password.

The hardest part—and the part that required the most planning—was the definition of our catalogs. A catalog is simply all the data that you decide to index. The Catalog Build Server uses a catalog definition by which to index or “crawl.” Ultimately, that catalog gets “handed-off” to the Search Server service, which will query it according to user requests.

I decided to split our search into four different catalogs: Microsoft, External, Internal, and Exchange. I figured out this trick after creating one large catalog that took about 14 hours to build each night. Since I’m using the same server to build the catalog and run the searches, I needed a more efficient way of searching.

The internal resources catalog was actually the easiest to build. I started by going through each of our servers enumerating shared directories. I then set up a file crawl in the search for each file share. Our content access account needed read access to each of those shared directories. I had two choices: manually give the content access account read access to each of the directories and their shares or add the content access account to existing groups that have read access. Just wait until you ask your IS manager to have an account added to every single group except administrators.

Important safety tip: Treat your content access account password like you would your Administrator or other BackOffice account passwords. Keep it a secret! It is, however, very safe if you’re using NTFS security (and I hope you are). The Catalog Builder not only reads the file, but also reads the ACLs (permissions) on the file. Your search will never give anyone improper access to a file if you use Basic or NTLM authentication on your server. It will check your name against the ACLs in the search and produce only accessible references.

Since we host our Internet presence on-site, I also included our site as a Web crawl in the search. In order to decrease the time it takes to index our entire site, I increased the number of documents that the Catalog Build Server could grab simultaneously. Normally, it’s five at a time. That assumes you’re hitting over the Internet (50K—T1 bandwidth), not a 100BaseT connection. In the Catalog Build Server properties (the server object itself, not the individual search), I changed the Maximum Request Frequency on the Timing tab to include *.domain.com as Unlimited. Now, the nightly crawls on that site are finished before daybreak on the East Coast.

Our intranet was also a little different in terms of configuring the crawl. Since I wanted to include the ACLs on each of our Web pages, I had to do a little creative configuring. Normally, you’d do a Web crawl on an HTTP server. In this case, I did a file crawl.

I started by exposing the Inetpub\wwwroot directory as a hidden sharepoint. I then crawled a file crawl for each subdirectory that housed a virtual Web (\\server\sharepoint$\subweb). The trick is to set up a “virtual mapping” that puts each UNC file name to a location on your Web server. In the Mappings area (the URL tab in your individual Search properties), I added \\server\sharepoint$ maps to http://server. It worked like a charm. Managers get to all the “secret stuff”, and no one else is the wiser. At least, not until this article appears...

In order to maximize our internal searches, we set the frequency to one full crawl per night and one incremental crawl in the afternoon.

10 Things to Remember
  1. Test everything. From your browser, enter every base URL you intend to crawl. If you can’t see it in Internet Explorer, Site Server won’t see it either.
  2. Crawl internal Web servers using file crawls. This will preserve any NTFS security you have in place. Then map the crawls to HTTP access.
  3. Plan. Write out everything you plan to search. Group items into catalogs based on like content and frequency of changes.
  4. Keep your computer names short. You’ll eventually want to run LDAP for personalization support.
  5. Make friends with your Exchange Administrator and DBA. You’ll need their help to get everything connected and running.
  6. Read the documentation.
  7. Limit the crawls on big sites using site rules. Only crawl those directories of interest to your audience.
  8. Back up your search configuration. Back up your Catalog Builder settings (it’s a task in the MMC) and each of your Catalog Definitions (also a task in the MMC).
  9. Avoid installing Site Server into the system partition. You may find your catalogs filling all available disk space and killing your server.
  10. Use site indexes when available. They’ll give you better searches than default or home pages.

Larry Cooper

Searching Exchange

The next step was to create a catalog that would browse our Exchange public folders on a frequent basis during the day.

The hardest part of this was the proper configuration of both Site Server and Exchange servers. Our Content Access account had to be granted Administrator rights on the Configuration Object in our Exchange site. Doing it was easy. Convincing our e-mail admin that it was safe was entirely different. I also had to go into my Services Control Panel Applet on my Site Server machine and change the Site Server Search service to run under that account.

The tricky part was the configuration of my Search host’s Exchange information. The documentation is somewhat lacking as to the format of the name to use in this dialog. My success came from typing in the name of the Exchange Server. Not the UNC name with the NetBIOS backslashes or the fully qualified domain name (server.domain.com). Took me a few times to get it right. I entered the same information for my Outlook Web Access (OWA) server name. The Site Server Search engine assumes that you’re hosting OWA in the http://server/exchange directory (the default for Exchange 5.x).

Searching the Rest of the World

From a standpoint of working with all the other sites on the Internet, I learned a couple of lessons the hard way.

First, I’ve found out that I’m a lazy Web-surfer. I type www.domain.com and expect a home page to come up. Unfortunately, when you’re indexing, you may not want to start at the home page. Instead, find the site index and have your crawl actually start on the site index (for example, http:www.domain.com/index.htm). That will give you 100 percent of the site’s information (assuming the index is up-to-date). Also, don’t guess! Use your browser, and actually type in the entire URL to the site. Then cut and paste it into your search catalog definition. Since Web pages can be .htm, .html, .asp, and so forth, it’s important to start off with the correct page name and extension.

The other thing I learned is, sites can be really big. Yep…I transgressed the borders of Olympus and tried to index parts of the Microsoft site. And to my surprise, it worked very well. Too well, in fact. Because http://www.microsoft.com is so interwoven, I started getting lots of content that wasn’t in English or that I had no interest in. I’m sure that many people enjoy Microsoft Golf, but I didn’t need to spend time indexing its product homepage.

To solve this problem, I told my search catalog to crawl hyperlink http://www.microsoft.com and then told it not to crawl www.microsoft.com. I’ll explain.

In your search catalog’s properties, you have a “General” tab where you set up a site crawl. It starts on your defined URL and starts crawling. In this case, I set it to crawl www.microsoft.com/siteserver/default.asp and www.microsoft.com/exchange/default.asp.

To limit the site crawl, you’ll need to go to the “Sites” tab and set rules. In my case, I set up www.microsoft.com as a site, then created a few rules within the site. The first rule: to avoid crawling www.microsoft.com. I then set up two rules enabling crawling of the www.microsoft.com/siteserver and www.microsoft.com/exchange directories. As a final note, be sure to demote the “avoid crawling” rule to the bottom of your list. That way, you get the directories you want and avoid the directories you don’t want.

Once I had gone through all of the trials of building, testing, and scheduling my catalogs, I had to create an interface to give my audience access to the data.

While you may opt for using the Knowledge Manager application and creating pre-built searches (briefs), I decided to take the grassroots approach and use the sample site that came with Site Server 3.0. You’ll find it in the Microsoft Site Server\SiteServer\Knowledge\Search directory. Be sure you grab everything (including subdirectories) in that sample.

I literally copied every ounce of the code into my Web server’s root directory. I then created a new default.asp frame set to accommodate my site’s banner and the two sample pages (searchleft.asp and searchright.asp). Then I was off and running.

I’ve found that Site Server Search’s object model is extremely easy to learn and follow. If you primarily concentrate on infrastructure and the systems side of life, you’ll still be able to decipher the VBScript code used to build these pages. I found it to be a little easier than using the .ASP pages for Index Server. If you’re new to VBScript, go to www.microsoft.com/train_cert/download/download.htm and get the free self-paced course, “Essentials of Visual Basic Scripting Edition 3.0 for Web Site Development.”

The Moral of the Story Is…

As you know if you’ve been paying much attention to the industry, “knowledge management” is a buzzword for the new millennium. And because Site Server 3.0 fits into Microsoft’s knowledge management strategy, it’s a product you can expect to hear more and more about.

As you’ve seen from my story, the implementation of Site Server Search is, in theory, a simple proposition. What it requires, though, is a full understanding of your environment and a decent amount of planning. (Or, if you’re like me, little or no planning and a decent amount of time.)

Hence, the moral of my story…

Featured