1. SoftInform Search Technology
1.1 Information Search Problems
One of the major challenges facing companies at present is the need for quick search of documents in large data volumes. The organization of data access is in direct relation with the technologies and software that are quick and efficient in processing information. At present there is a great number of technologies performing phrasal search (Google, Hummingbird, Verity and others), but, unlike SoftInform technology, they do not solve the problem of information search in full measure.
For example, a certain piece of information is to be found in dozens of thousands of documents. By using phrasal search, and even by selecting ideal key words and phrases it is next to impossible to gain a quick and adequate result. In order to find a more or less acceptable result, you will have to browse through numerous documents, select new key words and waste the time on perusing useless information. It would be much easier to find at least one text close to the topic in question and simply click the button for searching similar documents...
1.2 SoftInform Search Technology. Searching Similar Documents
SoftInform Search Technology is the technology for searching and processing the information in text files on user PC and in local networks, in databases and informational systems. It incorporates all the tools required for structuring disintegrated information within the framework of the company and is an efficient solution to any problems related with searching and consolidating information.
The main advantage and difference of SoftInform Search Technology as compared to the existing technologies and search systems is the function of searching documents with contents similar to query text patented by the SoftInform company.
The search of similar documents involves all the words used in the document, all word-forms and the synonyms dictionary. As soon as the results have been processed, the resulting list (with indication of relevance) displays documents most similar to the query text. A 100% match means that a duplicate document has been found. A document with a lower match level is, consequently, similar to query text. It should be noted that the technology is intellectual enough for determining the relevance of the found document to the query with high accuracy, regardless of changes (deleted or replaced text sections) made in the query text.
1.3 The Potential of SoftInform Search Technology
At present the software based on the SoftInform Search Technology sports the most comprehensive functionality and is the fastest as compared to similar technologies and solutions.
- 30 Gb an hour even on office PCs that do not boast the highest capacity.
- the size of the resulting index about 20-25% of pure textual information
- supporting over 50 well-known file formats (including archives, PDF, MHT, CHM, MDB, etc.). The number of supported formats is updated in each new software version based on the SoftInform Search Technology
- indexing and searching information in electronic messages of MS Outlook, Outlook Express and TheBat!, as well as instant messengers logs (ICQ 99-2005 and MS Messenger)
- the ability to consolidate information at a company from various sources (searching and processing information from various databases, informational systems and so on), user access rights differentiation (NTFS authentication of access rights)
- information security at the company
- language independence. The technology is language independent. All language components can be added as plug-ins.
The technology core allows adapting SoftInform Search Technology (by introducing minimum corrections) to any database or informational system due to the data sources concept. At the same time the data sources to be indexed by our program can be quite diverse and find location in different places.
1.4 Technology Scalability
At present scalability is supported in several directions: both for increasing query processing speed and for enlarging indexed data volume. The tests revealed that using 10 computers instead of one will raise the speed of system reaction about 6 times.
1.5 The Pool of Potential Customers for Using SearchInform Technologies
Search technologies are the core which serves as the basis for implementing custom projects for processing information. They are intended for solving current issues of corporate clients that proved to be too tough for our competitors.
Virtually any company with a park of over 20 computers processing textual information (whether it is a documents circulation system or analytical documents on disks, etc.) is a potential customer.
And if a company is large, introducing an informational system on the basis of SearchInform is even more vital. This will enable them to migrate to a new unified system without replacing already existing ones in order to consolidate the information for search. That is, SearchInform is successfully installed on the already existing systems and regulates information processing workflow.
|