SearchInform  
PRODUCTS | BUY | TESTIMONIALS | CONTACTS

Enterprise Search Solutions

Corporate search Home > Corporate search > SearchInform Features
 
Features
 
Search Server
 
Desktop
 
Set for converting printed material into electronic format
 
CD/DVD Publisher

Previous feature | All features | Next feature

SOFTINFORM SEARCH TECHNOLOGY BASED APPLIED SOFTWARE SPEED TESTS

INTRODUCTION

In order to reflect all aspects of full text search, several tests have been made on various types of data of various size. The most widely used data formats are TXT, HTML, DOC, RTF, PDF.

From the point of view of search technologies, it would be most correct to carry out tests of simple formats. As a rule, data is stored in some DBMS or data archives, and is introduced to the search system in the form of pure text.

In test results we have indexation time and index size. Note that an index produced by SearchInform is about 50 Mb in size, therefore with little text data a SearchInform index will still be quite big. With large data volumes the extra 50 megabytes will not make much of a difference.

The tests were run on an average capacity computer. Test computer configuration: CPU AMD Athlon2,2 Hz, RAM 2 Gb DDR400, HDD two IDE hard drives 160 Gb each (the data was stored on one of the HDDs, the index was created on another one).

DESCRIPTION OF VOLUMES TO BE INDEXED

For testing indexing and search speed we selected several informational volumes of various size containing documents of various formats. The volumes are listed from the smallest to the largest: each larger volume includes a smaller volume. For example, volume "21.85" includes volume "11.1", and so on.

Note: the volumes are named after the size of information in gigabytes.

Volumes "11.1", "21.85", "41.17", "83.22" are patents in English in the HTML format. The documents are physically stored in archive ZIP files, 5000 - 10000 files per archive.
In addition to patents, volume "132.26" in HTML taking 83.22 Gb contains the information from test volumes in DOC, RTF, PDF formats, as well as "10.7" texts.

INDEXING SPEED TESTS

Table 1


Test volume

«11.1»

«21.85»

«41.17»

«83.22»

«132.26»

Size of documents

11.1 Gb

21.85 Gb

41.17 Gb

83.22 Gb

132.26 Gb

Documents total

319 695

619 018

1 118 513

1 993 149

2 888 202

Unique words

2 527 473

4 016 495

6 157 339

11 276 270

18 912 257

Pure text size

7.92 Gb

15.5 Gb

28.97 Gb

59.42 Gb

77.57 Gb

Index size

1.76 Gb

3.29 Gb

6.03 Gb

12.12 Gb

16.29 Gb

Indexation duration

30 min 36 sec

59 min 30 sec

1 hour 53 min

3 hours 56 min 15 seconds

6 hours 06 minutes

On average 1 Gb an hour

21.76

21.99

21.72

21.14

21.68

Table 2


 

«10.7»

DOC

RTF

PDF

Size of documents

10.7 Gb

1,9 Gb

325 Mb

5,39 Gb

Documents total

48 222

7 791

769

526

Unique words

4 408 347

439 354

220 262

942 295

Pure text size

9.88 Gb

179 Mb

33,27 Mb

126 Mb

Index size

2.06 Gb

118 Mb

86,91

160

Indexation duration

32 minutes

1,34 minutes

29 seconds

12,05 minutes

On average 1 Gb an hour

20.06

72.7

39.4

26.8

The tests have revealed that in terms of indexing speed SearchInform works about 3-4 times faster than its counterparts. This document does not include the results of our competitors, but if you wish to get acquainted with those, send your request to support@searchinform.com, and our experts will provide you with all the relevant information.