|
» SoftInform Search Technology
» Searching in the Corporate Network
» SearchInform Competitors
» Segmentation and Market Analysis
» SearchInform in the Internet
SearchInform Competitors
1. Major Differences of Our Technology from Its Competitors
2. Competitors Description
2.1. dtSearch
2.2. iSYS
2.3. Hummingbird Search Server
2.4. Verity
2.5. Google Desktop Search Enterprise
2.6. Copernic Desktop Search
2.7. iSleuthHound Prof. Deluxe + SleuthHound Server
2.8. Archivarius 3000
2.9. Autonomy
3. Comparison of Indexing Speed
3.1. ̉est 1 Comparison of Indexing Speed
3.2. ̉est 2 Comparison with Hummingbird (Fulcrum)
3.3. Summary Table of Indexing Speed
4. Competitors Summary
1. Major Differences of Our Full text Search Technology from Its Competitors
Not one of our competitors sports the ability to search documents with a similar content, though some of them declare this feature. However, in reality their version of searching documents with a similar content does no good.
Unfortunately, it was impossible to test all competitors text search engines since a number of programs do not furnish a trial version for testing. This, in our opinion, only serves to prove that they are not ready to position their information retrieval solution as top-notch. But some general information can be assumed about the rest. In a nutshell, SearchInform indexes and searches text information twice as fast as the acknowledged search market leaders. The speed of document searching is also 2-4 times higher than that of the most advanced full text search technologies. It is also of no small importance that the resulting index size from SearchInform is much smaller, which in the end also contributes to the text search speed.
The text search tests were run with various sets of information between SearchInform and other software products, therefore for each test we will draw up a score between our document retrieval system and its competitors. For ease of testing we used the desktop versions of the applications. The analysis was focused on the world leading companies with the exception of some search technologies developed in Russia (iSleuthHound and Archivarius).
2. Competitors Description
2.1. dtSearch
www.dtsearch.com
The solutions of the dtSearch Corp. comprise a wide range of search engines both for home use and for working within the network of an enterprise. In particular, dtSearch Desktop with a built-in dtSearch Spider can index and find not only files on a user computer, but also Web nodes (at preset depth), and local network resources. It can also utilize external indexes created on other computers.
dtSearch recognizes various character sets, including Cyrillic, as well as a number of text file formats, such as .doc, .xls, .rtf, .pdf, .html, and popular databases (via ODBC). It is noteworthy that in databases you can search text by the content of specific fields and tags.
In addition to the conventional search in "natural language" or by means of formal queries, you can choose from some other, more advanced methods: morphological search (stemming with the search of all word forms for each word); fuzzy search – accepting possible errors and misprints; phonetic search - with account of similar sounding words, and synonymous search.
dtSearch Desktop 7.0 managed the test of indexing 11 gigabytes of texts within quite a reasonable time of 2 hours 57 minutes, having taken up 4.15 Gb on the disk to create the index. As far as documents search is concerned, no errors with the Russian text were reported. One of the weaknesses is that a query comprising several dozens of words made the search system slow down substantially before displaying the result. The search of documents with a content similar to query text has not been declared and is absent whatsoever.
2.2. iSYS
www.isys.com
The ISYS Company has been on the market for 16 years, and has acquired over 10 000 adherents of its products. Since the very foundation of the company the software developed by ISYS has been aimed at business, corporate users. The software range of ISYS includes search programs for desktop computers and for corporate networks, as well as in the Internet.
The corporate full text search system by ISYS is designed to secure a fast and convenient search, whether applied on a personal computer, the Internet or the corporate network of an enterprise. ISYS indexes data and performs documents retrieval by using statements and key phrases just as it works with the Internet.
ISYS supports several query methods (Command Line Query, Menu-Assisted Query, Natural Language Query); uses the document relevance algorithm and the linguistic peculiarities of the language that allow introducing such features as synonymous text search, fuzzy search (search with errors) and so on.
ISYS supports 125 text file formats (including Microsoft Office and WordPerfect documents, electronic mail, PDF, XML, databases and so on), and 30 languages including Chinese, Japanese and Corean.
The full text search system indexed 11 gigabytes of test texts with about the same results as dtSearch, but slightly faster, within 2 hours 53 minutes. The index size, however, was a fraction larger and made up 4.2 gigabytes. To a newbie (an inexperienced user) a slightly complicated method of document searching with different query versions may seem inconvenient at first. However, close scrutiny resolves all questions. The thing is, the search application refuses to search documents by a "long" query consisting of several words. This type of text search is provided for by some additional features. However, tests revealed some problems with ISYS Russian morphology. This however, does not hinder the retrieval system from taking the leading position in the world rating. Among the strengths of the application is the high quality system of automatic documents rubrication. As soon as indexing was complete, ISYS assigned all processed documents to the appropriate rubrics and presented them in a convenient form. As regards searching documents with a similar content, it is not featured in the search system and, as far as we have been able to elucidate, there are no plans to do so.
2.3. Hummingbird Search Server
www.hummingbird.com
Hummingbird Ltd. is one of the leaders in developing software for enterprises. The main product of the company is Hummingbird Enterprise™, an integrated platform for managing the information circulating within a company. Founded in 1984, the Hummingbird Company currently delivers informational corporate solutions to over 33 thousand enterprises in the whole world. These companies rely on Hummingbird for consolidating their business processes, information and staff.
The Hummingbird Search Server application (formerly known as Fulcrum/Open Docs) built into Hummingbird Enterprise™ is a system for searching information that includes searching similar documents and meta-search. The comparative tests of Hummingbird Search Server and SearchInform revealed that our system indexes data four times faster, and the size of the resulting index generated by SearchInform is almost three times smaller. The tests were run on a 6 gigabytes textual database. SearchInform managed the test within 1 hour 19 minutes having created an index about 1,26 gigabytes. The Hummingbird Search Server results are far more modest: indexing – 4 hours 50 minutes, index size - 3,5 gigabytes. The search of documents similar to query text declared by Hummingbird Search Server in reality does not work. Thus, it remains to be a promise yet to be fulfilled, while the application searches identical documents, but not similar in content and meaning. To prove the assumption, after relevant tests have been run by the Russian representative of the Hummingbird Company, the decision has been made to build out full text search server into Hummingbird Enterprise™ documents circulation for promotion at the Russian market.
2.4. Verity
www.verity.com
Located in Sunnyvale, California, the Verity Company develops software that incorporates the tools for searching the information, classifying and analyzing it within the framework of an enterprise. The Verity technology serves as the basis for over 260 third party programs. The products developed by Verity operate at various industrial firmware platforms. They exhibit high efficiency and scalability. In particular, they service thousands and thousands of external concurrent users. Verity proudly boasts of thousands of major clients among world leaders in industry and the government sector. The Gartner Group research proclaims Verity the world's number one developer of full text analytical technologies for corporate informational resources, business applications, corporate portals and the Internet.
Verity K2 Enterprise 5.5 enables you to search text information both in the corporate network of an enterprise and in electronic correspondence (it supports only Outlook, though). The application supports virtually all existing text documents formats, both wide-spread and special-purpose. In addition, the search application engine makes use of tachonomies and the Open Navigation feature that includes the tools for classifying that dynamically provide the user with the required information. Verity K2 Enterprise 5.5 includes an advanced system for categorizing and classifying. It processes incoming information and automatically determines which of the users needs a particular text document at a certain moment. Unfortunately, it was impossible to test Verity K2 Enterprise 5.5 or Verity Enterprise Desktop Search; therefore we cannot provide a comprehensive resolution on the speed of indexing text data and the speed of searching information. And in conlusion: the system does not sport searching documents with a similar content.
2.5. Google Desktop Search Enterprise
desktop.google.com/enterprise
A freeware development of the Google Company designed for information retrieval on a user computer, in the Internet and in the corporate network of an enterprise.
Google Desktop Search Enterprise proudly sports the ability to index and search documents in dozens of the most widely spread text formats, as well as electronic mail, audio and video files tags and images. To be remembered: in order to show the application which files and folders to index, you have to install an additional component gdetweak. Without this add-in Google Desktop Search will index the whole information on user computer and in the network of the enterprise that it can access. Google Desktop Search Enterprise managed the search test within 3 hours 41 minutes having created a 1,9 megabytes index. The search speed is rather satisfactory and is on the same level as that of the broadly acknowledged market participants. Unlike, for example, ISYS and dtSearch (documents circulation systems such as Hummingbird or Documentum are out of scope for reasons of bulkiness) Google Desktop Search Enterprise by right features the most user friendly interface. However, in terms of software administering and setting up the work in the local network it is undoubtedly only second to its competitors. The thing is, it is quite complicated to set up network operation as you would need it in a particular situation, because the search system tries to do everything for you. The only way to fine tune the engine is to install additional components. This is a major disadvantage. It would be true to say that as a desktop search system Google Desktop Search with the gdetweak component knows no equals. (It doesn't even have problems with Russian, though the speed of indexing and search could be higher). But corporate application is a far way from the current state. The promised search of documents with a similar content (in the Internet originally positioned as similar pages) is quite wretched. Apparently, for this very reason it is not included either into the desktop or into the network versions of search engine.
2.6. Copernic Desktop Search
www.copernic.com
Copernic Desktop Search allows searching various files, electronic mail messages (supporting Outlook Express 5.x/6.x, Outlook 2000/XP/2003, Windows Address Book), Word, Excel, PowerPoint, Acrobat PDF documents, music and video files, graphics and so on. In addition, the full text search can be performed both on a local computer and in the Internet. The built-in tools for viewing various files allow you to see the search results. For example, if you select in the main window of the retrieval application the thumbnail of an HTML-document, Copernic Desktop Search will display its contents. Upon installation of the application a small window will be displayed in the taskbar. In the window you can enter the search query and perform quick search set-up. The speed of application operation is of separate notice, as well as the low level of computer resources consumption. Among the weaknesses of the application is the total inability to work with Russian texts in the .txt and .html formats. However, in Microsoft Office documents Copernic finds a Russian text perfectly well. The application managed indexing the test database within a record time for desktop search engines, 5 hours 11 minutes (the index size was 3 gigabytes). However, it turned out that most of the files from the test selection of texts remained unindexed. For example, the Russian .txt and .html were touched just "on the surface", that is, only the file names were indexed, but not the texts themselves. Hence the question: why can't the system find a Russian text in these file formats? After all, it understands the Russian language perfectly well.
2.7. iSleuthHound Prof. Deluxe + SleuthHound Server
www.iSleuthHound.com/ru
The iSleuthHound Technologies Company is dedicated to developing the tools for intellectually prompt information retrieval. The company develops special-purpose utility for searching complex information with access via the Internet and intranet, as well as simple solutions aimed at the end user alike.
SleuthHound and iSleuthHound Prof. are a practical text search application of the company's technologies and know-how for prompt local search of unstructured information. The utility sports a user friendly interface and is aimed at the end user.
SleuthHound Server. The application enhances your ability to find the required document in the corporate network. The utility supports searching documents of the following formats: .txt, .doc, .rtf, .htm, .html and any extensions if the file format is compatible with ASCII, ANSI, Unicode (for example, .ñ, .cpp, .bas, .pas and so on). MS PowerPoint presentations .ppt, .pps, MS Excel.xls documents, ASP documents, .asp and several others are supported only in the presence of additional modules.
The search engine of iSleuthHound Technologies managed to digest 11 gigabytes of texts within as many as 21 hours 12 minutes. The index file size resulted in those very 11 gigabytes the full text search utility received at the beginning. It should be noted that iSleuthHound showed the poorest results as compared to the other applications included into the test group. The same goes for the speed of document searching. When the query phrase consisted of dozens of words, the application "froze" hopelessly. No search of documents with a similar content is provided. Which is not at all surprising.
2.8. Archivarius 3000
www.wizetech.com/ru/document-search
Archivarius 3000 is an utility for searching text documents and mail messages on a computer, in the local network and on removable disks (CD, DVD, etc.). The documents can be searched by key words or using the query language much like it is done in the search engines in the Internet.
The application features a multi-language interface including Russian, Ukrainian and Belarusian. The system supports semantic search in Russian, Ukrainian, Belarusian, English, German, Spanish, French, Czech, Danish, Greek, Hungarian, Italian, Portuguese and Swedish.
Archivarius supports the most popular text file formats from MS Office and PDF to TXT and LEX, searching in ZIP, RAR, ARJ archives and many others. It searches mail messages in Outlook, Outlook Express, MS Exchange and The Bat!. In addition, you can access your work documents from home via the Internet.
In principle, Archivarius 3000 is a rather recommendable desktop full text search system that enables the user to work with the information on his or her personal computer. It goes without saying that the utility is not equipped with network abilities. In addition, this search engine by Wizetech Software refuses to index anything larger than 1,5 gigabytes. Therefore, the information has to be "prepared" for search, that is, "fed" to the application file by file. This is slightly irritating. However, Archivarius is rehabilitated by the ability to search in all created indexes. As far as the speed is concerned, 1,5 gigabytes of texts were digested by Archivarius search engine within 27 minutes (resulting in a 800 megabytes index file). We can assume that 11 gigabytes will be indexed approximately within 4 hours. And searching documents with a content similar to query text is not even an option.
2.9. Autonomy
www.autonomy.com
Autonomy Corporation is one of the leading developers of enterprise software infrastructure. The Autonomy technology allows structuring information into a single whole from various sources, whether CRM, a knowledge management system, the local network of an enterprise or the company's online resources. Autonomy is the only search system of all considered in this article that, like SearchInform, offers the ability to consolidate the information circulating at the enterprise from various sources. That is why the pool of Autonomy adherents includes over a thousand of major companies all over the world including Ford, Ericsson, Shell, Nestle, BBC, Reuters, Hutchinson 3G, Royal Sun Alliance, Sun Microsystems, Philips, Boeing, Schneider Electric, Coca Cola, to name a few.
The technology developed by the Autonomy company enables the computer to decipher the semantic meaning of unstructured information by means of mathematical algorithms of matching "patterns", or samples, for determining basic concepts contained in information fragments. The products developed by Autonomy provide the tools for solving the fundamental tasks facing any company or organization, and can be utilized in any application working with information flows and arrays, including informational and corporate portals, e-commerce systems, client relations management systems, data analysis systems, etc. One of the greatest strengths of this system is automatic categorization of information, based on document clustering.
However, it would be illogical to consider Autonomy as a competitor in terms of searching information. Yes, it is a robust information management system, but it makes no provision for quick phrasal search and adequate search of documents with a content similar to query text. As regards searching documents with a similar content, it has nothing in common with our technology. In Autonomy this feature is based on text document clustering (automatic rubrication). This does not provide for sorting documents with a similar content in order of relevance. In addition, this search engine is unable to solve the problem of informational fuzziness and many others.
3. Comparison of Indexing Speed
3.1. ̉est 1 Comparison of Indexing Speed
11 Gb of information were indexed.
Computer: AMD Barton 2.5 MHz, 1 Gb random access memory.
|
Search system
|
Indexing duration
|
Index size
|
|
iSleuthHound Prof Deluxe
|
21 hours 12 minutes
|
11 Gb
|
|
Archivarius 3000
|
4 hours
|
About 6 Gb
|
|
Isys desktop 7.0
|
2 hours 53 minutes
|
4.2 Gb
|
|
DtSearch 7.0
|
2 hours 57 minutes
|
4.15 Gb
|
|
Google Desktop Search Enterprise
|
3 hours 41 minutes
|
1,9 Gb
|
|
Copernic Desktop Search*
|
5 hours 11 minutes
|
3 Gb
|
|
SearchInform
|
1 hour 31 minutes
|
1,9 Gb
|
*The majority of .html and txt documents containing Russian text were indexed, but it was impossible to find them other than by headlines. A depressing result for a Russian speaking user. But it is a good system on the whole.
3.2. Test 2 Comparison with Hummingbird (Fulcrum)
This test was aimed at comparing the search potential of the Fulcrum search engine built into Hummingbird products. After the tests have been completed, the decision has been made to build our search engine into Hummingbird documents circulation for promotion in the Russian market.
6 Gb of texts were indexed
Hummingbird - 4 hours 50 minutes
SearchInform – 1 hour 19 minutes
Index size Hummingbird = 3,5 gigabytes
Index size SearchInform = 1.26 gigabytes
3.3. Summary Table of Indexing Speed
The table sums up the system index/SearchInform index factor
|
System
|
Indexing duration
|
Index size
|
|
iSleuthHound Prof Deluxe 4.5
|
14 times worse
|
5 times more
|
|
Archivarius 3000
|
2.63 times worse
|
3 times more
|
|
Isys desktop 7.0
|
1.9 times worse
|
2.1 times more
|
|
DtSearch 7.0
|
1.9 times worse
|
2.1 times more
|
|
Hummingbird
|
3.6 times worse
|
2.8 times more
|
|
Google Desktop Search Enterprise
|
2.5 times worse
|
The same*
|
|
Copernic Desktop Search
|
3.4 times worse
|
1.5 đàç times more
|
*When you deactivate searching documents with a similar content in SearchInform, the index size will be twice as smaller.
4. Competitors Summary
As shown in the summary table of text indexing and searching speed, as well as by the size of the resulting index, SearchInform is a great deal more efficient than the existing full text search systems. None of the applications (in case of documents circulation systems the search technology built into this system was considered) can't boast of an adequate and prompt tool for searching text documents with a similar content. In spite of the lag in indexing and searching speed, the desktop and free applications Google Desktop Search and Copernic Desktop Search for a home user are quite competitive with SearchInform, at least in part. This is due to an appealing user friendly interface and simplicity of use. However, in case of Copernic Desktop Search this statement can be fully applied to English speaking users only, as the this search utility appeared to be not quite friendly with the Russian language.
But as regards the technology itself and using the search system in the corporate sector where speed is of prime importance as well as the quality of indexing data and produced results, SearchInform currently has no equals. Search of documents with a content similar to query text, supersonic speed of text indexing and searching bring the SoftInform technology to the forefront of any tests, whether tests for speed or relevance of found documents.
|