Cybergroup Selects dtSearch

May 09, 2005 12:05 AM
DevConnections
Rating: (0)

asp:CaseStudy

 

Cybergroup Selects dtSearch

dtSearch?s Text Retrieval Engine Powers Web-based BusinessIntelligence Mining Library

 

 

Cybergroup?s client requested that Cybergroup develop aWeb-based business intelligence mining library, including Web-based searchingseamlessly combining both its structured SQL database and its separate documentcollection.

 

Project Requirements and Background

Cybergroup?s client realized that database information, althoughcritical to its business intelligence, represented only a small portion of allits corporate information. By the client?s estimate, its corporate databasecontained a mere 20% of business-decision information, while the remaining 80%could be found in other sources ? Web site pages, Microsoft Office documents,PDFs, etc. The client needed a single search to cover both the SQL database andthe file repository, as well as to return unified results from both sources.

 

To ensure that a search of the combined database anddocument repository retrieve all relevant information, the client furtherrequired not only basic search functionality, such as word and phrasesearching, but also advanced search features. The client wanted search featureslike stemming and fuzziness for word misspellings, as well as phonic searching.The client also wanted concept searching, including the capability for synonymexpansion using both pre-defined thesaurus terms and a user-definedthesaurus/synonym list.

 

For sorting search results, the client wanted a variety ofadvanced relevancy ranking options. Finally, for ease of browsing searchresults, the client specified that the search must return retrieved SQLdatabase entries and documents with highlighted hits (as well as a preferablyWYSWYG display of Web pages like HTML, PDF, and XML, along with the highlightedhits).

 

Going forward in terms of digital library management, theclient needed Cybergroup to develop a solution allowing multiple contributorsto be able to upload documents to the Web library. Upon document check-in, theclient further needed a mechanism to add to the client?s main SQL database metadataregarding the document.

 

Solution Overview

To meet all of the above requirements for the project?ssearch functionality, Cybergroup chose the dtSearch Text Retrieval Engine forWin & .NET by dtSearch Corp.(http://www.dtsearch.com).A single dtSearch index could include both the SQL database and the separatedocument repository, including searching with all the above advanced searchfeatures, ranking capabilities, and hit-highlighted display options.

 

To use these built-in capabilities, Cybergroup needed towrite custom VB.NET code to ?drag along? certain fields from the database thatwould be associated with each document and stored in the searchable index. Cybergroupalso needed to write a custom ASP.NET-based server control using the dtSearchEngine APIs. Cybergroup called this application its ?dtResults Control?; screenshotsand a detailed description of Cybergroup?s dtResults Control follow.

 


Figure 1: Cybergroup?s dtResultsControl.

 


Figure 2: Cybergroup?s dtResultsControl.

 

Cybergroup?s Description of dtResults Control

Like any .NET control, a developer can drag and drop thedtResults Control right into a development environment. Cybergroup implementedthe dtResults Control by inheriting from the datagrid control, leveraging theexisting power of the datagrid. Cybergroup chose the datagrid as a foundationfor its server control because it offers built-in paging and a robustprogramming model.

 

The following code is from Cybergroup?s sample application,and appears when the user enters a search term or phrase and clicks the Searchbutton:

 

Private Sub GetResults()

 

 'Setting the location ofthe index

 SearchResultList1.IndexPath= "c:\dbconnectorindex"

 'Mapping virtual path ofdocuments to physical path

   Dim rptd As NewSearchResultList.SearchResultList.ResultPathTranslationDictionary

 rptd.Add("c:\testdocs", "./testdocs")

 

 'Setting various searchsettings

 SearchResultList1.RelativePathTranslations= rptd

 SearchResultList1.SortCaseInsensitive = cbCaseInsensitive.Checked

 SearchResultList1.SortAscending() = ddAscendingFlag.SelectedValue

 SearchResultList1.SearchType = ddSearchType.SelectedValue

 SearchResultList1.SortType= ddSort.SelectedValue

 SearchResultList1.Stemming = cbStemming.Checked

 

 If cbFuzzyness.Checked =True Then

     SearchResultList1.Fuzzy = True

     SearchResultList1.FuzzLevel = ddFuzzyness.SelectedValue

 Else

     SearchResultList1.Fuzzy= False

 End If

 

 SearchResultList1.Phonic= cbPhonic.Checked

 

 SearchResultList1.Synonyms = cbSynonyms.Checked

 

 'Defining dtSearch customfields to be displayed

 Dim cfn As String() ={"SupplierID", "CompanyName", "Region"}

 SearchResultList1.CustomFieldNames= cfn

 

 Dim cffn As String() ={"Supplier ID #", "Company Name"}

 SearchResultList1.CustomFieldFriendlyNames = cffn

 

 IfchkSearchWithin.Checked = True Then

     SearchResultList1.SearchWithin = True

     SearchResultList1.PreviousSearchFilter = Session("psf")

 End If

 

 'Executing the search andbinding the results

 SearchResultList1.GetResults(tbSearch.Text)

 

 'Storing the"previous search filter" to be used later if user clicks "SearchWithin Results"

 Session("psf")= SearchResultList1.PreviousSearchFilter

 

 Literal1.Text ="Search: <B><I>" & tbSearch.Text & "</I></B>returned: " & CType(SearchResultList1.DataSource,DataTable).Rows.Count & " results"

 

End Sub

 

The following provides a flavor of the development andfunctionality behind Cybergroup?s development of the dtResults Control.

 

Using the GetResults method of the dtResults Control,Cybergroup reduced the task of creating the search and results display to oneline of code in the simplest case. We can execute a search and then displayresults by passing a search string input by the user on the search form, as inthis example:

 

SearchResultList1.GetResults(tbSearch.Text) 'ONLY ONE LINE OFCODE

 

Of course, a developer can also leverage the power of the dtResultsControl though its properties. Take for example the SortType property. Simplyput, the SortType property allows the developer to sort the information inresults display. Let?s say the developer wants to have the most recentlymodified documents appear first in the results display. The developer would setthe SortType property to ?date? and the Ascending property to ?false?; for example:

 

SearchResultList1.SortType = "date"

SearchResultList1.Ascending = false

 

On the internal side of the control, a canned set ofstrings like ?date?, ?hits?, and ?title? are checked, and the Ascendingvariable is checked. Then the control produces a hex variable containingdtSearch flags encoded in a certain way to be passed to its sort function. However,the binary manipulations are abstracted, and the developer can even bind thevariables, by single lines of code, to checkboxes or dropdown lists.

 

Here?s the code in the dtResults Control for the SortTypeproperty:

 

Dim flags As New dtengine.SortType

 

 If Not (sortf = 0) Then

   flags = sortf

 ElseIf sortt ="hits" Then

   flags =dtengine.SortType.stSortByHits

 ElseIf sortt ="index" Then

   flags =dtengine.SortType.stSortByIndex

 ElseIf sortt ="date" Then

   flags =dtengine.SortType.stSortByDate

 ElseIf sortt ="timeofday" Then

    flags = dtengine.SortType.stSortByTime

 ElseIf sortt ="title" Then

   flags =dtengine.SortType.stSortByTitle

 ElseIf sortt ="name" Then

   flags =dtengine.SortType.stSortByName

 ElseIf sortt ="filetype" Then

   flags =dtengine.SortType.stSortByType

 ElseIf sortt ="size" Then

   flags =dtengine.SortType.stSortBySize

 Else

    flags = dtengine.SortType.stSortByUserField

 End If

 

 If sascend Then

   flags +=dtengine.SortType.stSortAscending

 End If

 

 If cinsens Then

   flags += dtengine.SortType.stSortCaseInsensitive

 End If

 

 res.Sort(flags, sortt)

 

Critically important to our project is the ability toextract ?custom field? data from the dtSearch index. Custom fields are columnsthat we have extracted from the database during the indexing process and nowwish to present in a search results display.

 

Through the use of the ?CustomFieldNames? and the ?CustomFieldFriendlyNames?properties, a developer can easily and attractively display databaseinformation in the results display.

 

The CustomFieldNames property is a string array of thenames of custom fields (i.e., database columns) in the index that the developerwishes to include in the results. When defined, the strings in it should appearexactly as they do in the index. For example, {?SupplierID?, ?CompanyName?, ?Region?}.

 

The CustomFieldFriendlyNames property is a string arraythat represents the names of the fields that the developer would like to haveappear in the control. This provides for a high degree of customization in resultspresentation. Rather than display cryptic database column names, the developercan display understandable labels. These names are connected to actual customfields by their position in the array, with regard to the CustomFieldNamesproperty above. If the string is longer than CustomFieldNames, then the end isdiscarded. If shorter, then the names of the remaining custom fields default totheir actual names. For example, {?ID # of Supplier?, ?Supplier Name?, ?Supplier?sRegion?}.

 

To return the Custom Field information in the resultsdisplay the developer would simply set the properties as in the followingexample:

 

Dim cfn As String() = {"SupplierID","CompanyName", "Region"}

SearchResultList1.CustomFieldNames = cfn

Dim cffn As String() = {"Supplier ID #", "CompanyName"}

SearchResultList1.CustomFieldFriendlyNames = cffn

 

Following is a complete list of the dtResults Controlproperties and methods:

 

Ascending: Iftrue, the results will be sorted in ascending order by whatever criterion isspecified in SortType. If false, results are sorted in descending order. Defaultsto false.

 

CustomFieldNames:This string array represents the names of custom fields in the index that thedeveloper chooses to include in the results. The strings in it should appearexactly as they do in the index; for example, {?SupplierID?, ?CompanyName?, ?Region?}.

 

CustomFieldFriendlyNames:This string array represents the names of the fields that the developer wantsto appear in the control. These names are connected to actual custom fields bytheir position in the array, with regard to CustomFieldNames. If longer thanCustomFieldNames, then the end is discarded. If shorter, then the names of theremaining custom fields default to their actual names. For example, {?ID # ofSupplier?, ?Supplier Name?, ?Supplier?s Region?}.

 

Fuzzy andFuzzLevel: These control the tolerance of the search; for example,searching for ?alphabet? with Fuzzy = True and FuzzLevel = 1 would also searchfor ?alphaqet? or ?albhabet?. Searching for ?alphabet? with Fuzzy on and FuzzLevelat 3 would also find ?alpkaqet?.

 

IndexPath: Thisis the location of the dtSearch index files to use for searching. If it is notset, then SearchResults will look for an ?IndexPath? key in Web.config.

 

Phonic:Controls phonic searching; for example, with Phonic = True, searching for ?Smith?would also find ?Smythe?.

 

PreviousSearchFilter:This allows the developer to create ?Search Within Results? functionality, inconjunction with the SearchWithin property, described below. This propertyshould be saved to a session variable after the initial search, and restoredfrom it when the user triggers a ?Search Within Results?.

 

RelativePathTranslations:A SearchResultList.ResultPathTranslationDictionary containing the relativepaths of the absolute paths to documents stored in the dtSearch index. Thisallows a URL to be generated for the link to the document, given only anabsolute path on the server. For example, one might include the following in aninitialization method:

 

Dim rptd As New SearchResultList.SearchResultList.ResultPathTranslationDictionary

 

rptd.Add(?c:/Inetpub/website/search/documents?, ?documents?)

rptd.Add(?c:/Inetpub/website/tutorials?, ?../tutorials?)

 

SearchResultList1.RelativePathTranslations = rptd

 

SearchType: A string.Valid values are ?allwords?, ?anywords?, ?phrase?, and ?boolean?. In the ?allwords?setting, dtSearch will search for any document containing each word in thesearch, in any order or proximity. In the ?anywords? setting, dtSearch willsearch for any documents containing any of the words in the search query, notnecessarily all of them in the same document. In the ?phrase? setting, dtSearchwill consider the entire search query like a single word, and search fordocuments containing the exact query. In the ?boolean? setting, the user canuse Boolean logic to specify a query. dtSearch provides the following guidance:

  • tart apple pie - the entire phrase must bepresent
  • apple pie and pear tart - both phrases must bepresent
  • apple pie or pear tart - either phrase must bepresent
  • apple pie and not pear tart - only apple must bepresent
  • apple w/5 pear - apple must occur within 5 wordsof pear
  • apple not w/27 pear - apple must not occurwithin 27 words of pear
  • subject contains apple pie - finds apple pie ina subject field
  • use parenthesis if the query contains more thanone connector

 

SearchWithin:If this property is set to True, and the PreviousSearchFilter property is setto a value obtained from it after a previous GetResults call, then the resultsof the current search will be a subset of the results of the previous search.

 

SortType: A string.Meaningful values are ?hits?, ?date?, ?name?, and ?size?. If set to ?hits?, thedocuments containing the most occurrences of the search query, or the highestscore, will appear on top. If set to ?date?, the most recently modifieddocuments will appear on top. If set to ?name?, the documents will be sorted inalphabetical order of their title. If set to ?size?, the documents with thelargest file sizes will appear on top. If the field has a different value thanany of these, it is assumed to be the name of a custom field in the index bywhich to sort.

 

Stemming:Controls the word stemming capability of dtSearch. For example, if Stemming =True, searches for ?apply?, ?applying?, ?applier?, or ?applies? are allequivalent.

 

Synonyms: Usesan English thesaurus to search for synonyms of the search query in addition tothe search query itself.

 

GetResults(SearchTextAs String): Simply put, this function evaluates a search with the argumentsdetermined by properties on the query string passed, and displays the resultsin a human-readable format, with 10 results per page and a pager control. Untilthis method is called, the control is invisible to the user.

 

Greg Bean isPresident of Cybergroup, Inc., a developer of advanced Internet and intranetdeveloper search tools in Baltimore, MD. E-mail him at mailto:gbean@cybergroup.com.

 

dtSearch

dtSearch offers over a decade of experience in text searchand retrieval. Large enterprises typically use dtSearch products for generalinformation retrieval, Internet and Intranet site searching, access totechnical documentation, and embedding in applications for distribution.dtSearch is also on the US Government?s GSA Schedule. The company hasdistributors worldwide, including coverage on six continents. For moreinformation visit http://www.dtsearch.com.