SOFTINFORM SEARCH TECHNOLOGY BASED APPLIED SOFTWARE SPEED TESTS
INTRODUCTION
In order to reflect all aspects of full text search, several tests have been made on various types of data of various size. The most widely used data formats are TXT, HTML, DOC, RTF, PDF.
From the point of view of search technologies, it would be most correct to carry out tests of simple formats. As a rule, data is stored in some DBMS or data archives, and is introduced to the search system in the form of pure text.
In test results we have indexation time and index size. Note that an index produced by SearchInform is about 50 Mb in size, therefore with little text data a SearchInform index will still be quite big. With large data volumes the extra 50 megabytes will not make much of a difference.
The tests were run on an average capacity computer. Test computer configuration: CPU AMD Athlon2,2 Hz, RAM 2 Gb DDR400, HDD two IDE hard drives 160 Gb each (the data was stored on one of the HDDs, the index was created on another one).
DESCRIPTION OF VOLUMES TO BE INDEXED
For testing indexing and search speed we selected several informational volumes of various size containing documents of various formats. The volumes are listed from the smallest to the largest: each larger volume includes a smaller volume. For example, volume "21.85" includes volume "11.1", and so on.
Note: the volumes are named after the size of information in gigabytes.
Volumes "11.1", "21.85", "41.17", "83.22" are patents in English in the HTML format. The documents are physically stored in archive ZIP files, 5000 - 10000 files per archive.
In addition to patents, volume "132.26" in HTML taking 83.22 Gb contains the information from test volumes in DOC, RTF, PDF formats, as well as "10.7" texts.
INDEXING SPEED TESTS
Table 1
Test volume
«11.1»
«21.85»
«41.17»
«83.22»
«132.26»
Size of documents
11.1 Gb
21.85 Gb
41.17 Gb
83.22 Gb
132.26 Gb
Documents total
319 695
619 018
1 118 513
1 993 149
2 888 202
Unique words
2 527 473
4 016 495
6 157 339
11 276 270
18 912 257
Pure text size
7.92 Gb
15.5 Gb
28.97 Gb
59.42 Gb
77.57 Gb
Index size
1.76 Gb
3.29 Gb
6.03 Gb
12.12 Gb
16.29 Gb
Indexation duration
30 min 36 sec
59 min 30 sec
1 hour 53 min
3 hours 56 min 15 seconds
6 hours 06 minutes
On average 1 Gb an hour
21.76
21.99
21.72
21.14
21.68
Table 2
«10.7»
DOC
RTF
PDF
Size of documents
10.7 Gb
1,9 Gb
325 Mb
5,39 Gb
Documents total
48 222
7 791
769
526
Unique words
4 408 347
439 354
220 262
942 295
Pure text size
9.88 Gb
179 Mb
33,27 Mb
126 Mb
Index size
2.06 Gb
118 Mb
86,91
160
Indexation duration
32 minutes
1,34 minutes
29 seconds
12,05 minutes
On average 1 Gb an hour
20.06
72.7
39.4
26.8
The tests have revealed that in terms of indexing speed SearchInform works about 3-4 times faster than its counterparts. This document does not include the results of our competitors, but if you wish to get acquainted with those, send your request to support@searchinform.com, and our experts will provide you with all the relevant information.