|
» SoftInform Search Technology
» Searching in the Corporate Network
» SearchInform Competitors
» Segmentation and Market Analysis
» SearchInform in the Internet
SearchInform in the Internet
1. Introduction
2. Test Version Restrictions
3. Using Mini Search Engine SearchInform
3.1. Creating an Internet Search Engine
3.2. Topical Search
3.3. User Archive
3.4. RSS Aggregator
3.5. News Processing
3.6. Indexing Blogs and Forums
1. Introduction
The application range for SoftInform Search Technology is rather sweeping. They are desktop search applications, corporate search systems and the ability to be embedded into documents circulation systems. Another prospective room for improvement is using this technology for developing Internet solutions such as an Internet search engine, topical search system, RRS aggregator and so on.
2. Test Version Restrictions
The development of the SearchInform tool as you can now see it has taken no more than two days. The main objective of this solution is just an approximate trial display of the potential of the SoftInform search technology.
This search Internet shell can not be considred as a full-fledged resource. The reason is that some of the features inherent in conventional search engines are not working properly, and some of them are not working at all:
- search results are not stored in cache, therefore every time the search is generated anew (this, no doubt, slows down the work)
- a spider for searching and indexing new resources has not been implemented. The information indexed by this search engine is static
- when queries are processed, the multiprocessor nature of the server is not utilized (only one processor is busy).
- in the process of searching similar documents not a separate page fragment, but the whole page is analyzed. This certainly levels out the positive effect of using this feature.
- automatic rubrication is not supported (problem with IE)
All of the above imperfections (and many others that have not been mentioned) are the direct result of the goals set in the production of the alpha version. A display of our technology potential and creating a competitive project in the Internet are two totally different objectives. The search engine in its current state is a test of our technology. However, it is not inconceivable that with sufficient funding we will be able to create a version of the search engine that corresponds to all requirements and ready for all-round utility.
3. Using SearchInform Search Engine
3.1. Creating an Internet Search Engine
With sufficient funding, completion (or developing from scratch) of a turn-key Internet search engine is quite feasible. Indexing an extensive amount of information (html without images) will take 10 computers and several days and nights (the actual indexing speed at a bandwidth of 100 gigabytes per hour will make up 60-80 gigabytes). Further optimization includes developing a "spider" for indexing and refining the query algorithm for using all server processes. Thus, within a reasonably short period a beta version of the search engine can be completed and delivered to the users. In the process of operation the indexed informational database will be supplemented every day. This is some food for thought for Yandex adherents.
3.2. Topical Search
The SoftInform Company is currently working on developing a topical search system for some online mass media of authority. This system incorporates the tools for indexing, searching and classifying information that has already been published in topic-related resources (such as computer games) or immediate processing of new data. The convenience of this method is in using the feature of searching documents similar to query text. At the beginning a certain set of resources to be indexed (known to the user) is input into the system. Further on, the topical search system works with the Google search engine, indexes pages from search results that correspond to the key words related to the topic. The steadily ascending list of the resources being indexed is re-indexed at a certain interval. This enables our search system to react to the changes in the resource content much faster than the well-known Google whose index database is not particularly pin-pointed and that indexes new resources at a rather slow rate. Consequently, a user of the topical search system will always keep abreast of up-to-the-minute changes in the sites included into his or her database.
Searching within a certain topic displays in the resulting list only the information related to the requested topic with automatic rubrication into smaller units. Indexing similar topic resources only for a specific purpose, real time monitoring by a specially developed "spider" and unique search similar will empower the users with the latest information on the topic of interest rubricated into related sections.
3.3. User Archive
This is an opportunity for each user to create his or her own index of information on the server for subsequent searching. This feature involves saving Web pages on your server with subsequent access to them via a unique login and password. Downloading your own files from your computer or from the Internet is also supported. Upon logging into the system the user will get access to his or her own personal index.
This feature can be expanded, first of all, by the ability to work with selections and lists of Web pages or by the ability to add and delete elements in the list. Second of all, by adding the feature of automatic rubrication of documents arriving to the archive. Third of all, by a number of additional nice-to-haves: determining the status of the Web page (viewed - not viewed), the ability to send the information to a friend, and so on.
3.4. RSS Aggregator
This is a service for downloading RSS news from various resources, indexing them and presenting the users with a convenient global RSS resource engine. Another strength of this prospective development is the ability to use the feature of searching documents with a similar content and automatic rubrication that will enable users to create selections of RSS news by different topics on the whole and in particular for every user.
3.5. News Processing
In the present day and age more and more people give their attention to processing information, including news processing. However, the vast torrent of news bears only a few pieces of information that might be interesting to the user. In addition, viewing, say, 500 news briefs a day instead of 5-10 of the most interesting ones is way too extravagant. As a result, facing such an amount of news a user may simply ignore them all.
The technology for automatic rubrication is based on the technology for searching similar documents. The user will only have to bind the first thousand of documents manually marking the documents that he or she finds interesting. Further on new information will be processed by the system and automatically bound to a specific rubric, deciding which of them deserve to be displayed before all others. The documents bound by the search utility are marked in a special way. If the user confirms the system's actions, he or she will just have to click "confirm". Thus, the system is self-learning and enables the user to save a lot of time.
3.6. Indexing Blogs and Forums
A number of large companies pay more and more attention to tracking potential clients’ opinion. Some of the most reliable opinion pools nowadays are blogs and forums. However, tracking all information that may be of any interest by means of Google will not always be possible: Google returns only the first 1000 results by key word, and there is a big chance that the forum or blog will never get to you.
|