Internet Search Engines
General considerations and choice of search engine
Internet search engines are constantly changing, with new ones appearing and new features being added to existing ones. This section does not attempt to describe all currently available search engines, but provides a comparison of some which are deemed to be among the most useful. The selection is totally subjective!
The choice of search engine will be dictated by a number of factors. There are differences in the number of web sites covered, the depth to which they are covered (i.e. whether all, some or just the home page is indexed), and the frequency with which sites are re-indexed. For example, Search engine watch compares a number of different search engines such as Yahoo!, Google, HotBot, Northern Light, Excite, Lycos and WebCrawler.
The overall speed with which a search engine can be accessed and the speed with which it processes a query may also be important in determining choice. For example, Lycos Pro offers some very sophisticated features, but frequently downloads so slowly as to be almost unusable. Speed is one reason why some of the smaller and probably less accessed regional search engines may also be useful.
The available search features vary in sophistication and are compared below. The order in which the search results are displayed will depend on the algorithms used by the search engines to process the results and assess relevancy.
These change frequently, so the same search may give different results from week to week. The criteria used by the search engines to determine relevance may not necessarily relate to the relevance of the site content. Various tricks of the trade are used by professional web site optimisers in order to provide their clients with high placings in searches carried out on keywords chosen by the client. Looking at some of the lower rankings may sometimes pull up highly relevant pages. Another strategy is to carry out a search on a meta-searcher. These are not search engines as such, but they process queries for simultaneous submission to a number of different search engines, so that pages are identified on the basis of a number of sets of relevancy criteria. Two meta-searchers are described below.
It should always be remembered that many relevant pages on the Web may not be identified as such by a search engine. One of the best ways of finding these pages is by following links supplied by other relevant pages which have been retrieved. A number of meta-sites consist only of links to pages concerned with one or more related topics. When following a large number of links, the best strategy is to right click on each link in turn and then select either to bookmark it for future evaluation (it is quite useful to put these into a temporary "evaluation" folder), or open it in a new browser window. Once the link has been evaluated (and bookmarked if useful), the window can be closed, leaving visible the first window with the original list of links.
Comparison of search features
Detailed comparisons and much other information about search engines can be found on Search Engine Watch.
In many cases, both a simple and an advanced search mode is possible. Boolean operators are often permitted only in the advanced mode. A description of how these and other operators are used can be found in Search Basics.
Refer to the Help facility of individual search engines for further information on less commonly required search features.
Easy-to-use search engines: Google and Yahoo!
The most commonly used search engine is Google. As well as complex descriptor use, file type restricted searches, this search engine in the simple mode allows strings of words to be searched without any knowledge of boolean or other descriptors. It has a spell checking facility and suggests alternative spelling.
In the advanced mode, Google allows searching in 41 different lanaguages and the menu driven option settings allow searches for results with all or any of the search terms in a string of words or for entire phrases. Google allows searches in one of six different file formats as well as any file format and facilitates searching within the URL, title, body of the page or links from the page. Searches can be restricted to the last 3, 6 or 12 months or to specific topics. Google Scholar is a way to search for scientific articles and researchers.
Yahoo! does not have the same spell checking and suggestion facilities as Google. In the simple search mode, strings of words can be searched.
The advanced mode dispenses with the need for boolean or other types of descriptors since it allows the user to enter the search terms and decide whether all or any of the words should appear and exclude results with other terms. It also allows the user to restrict searches according to whether the search term appears in the title, URL address or anywhere in the page.
Yahoo! allows the user to search for pages in 37 languages, search for results in a particular file formata and within a particular website or domain.
AltaVista
Altavista operates in 36 languages. It now forms part of the Yahoo! groups of companies and, thus, uses the Yahoo! search database.
The search engine is case-sensitive on upper but not on lower case. If no operators are used, the default is for AltaVista to search for any rather than all of the input search terms (i.e. it assumes the Boolean operator OR between each term).
In simple search mode, the permitted operators are " ", +, - as well as * for truncation. The advanced mode also supports the Boolean operators AND, OR, AND NOT, NEAR (search terms separated by not more than 10 words), as well as entry of date ranges and the use of parentheses. However, it does not permit the use of + and -, although " " and * may be used. In either mode, it is possible to limit a search to a domain (e.g. .co.uk), url (i.e. a web address), page title, etc by putting in the corresponding operator, e.g. domain: url: title: for the examples given (refer to AltaVista Help for full list) and also to search for a specified media type, e.g. images. It is also possible to limit the search by the language of the retrieved pages. The feature "refine search" capability. Selecting this will display a number of terms found in the retrieved document set. These can be used as filters to include or exclude certain concepts in a subsequent search carried out only on the retrieved set.
The advanced search screen contains two fields. The lower field takes a Boolean expression which will be used for the search itself, while the top field may, but need not, be used to input keywords which determine the order by which the search will be ranked. For example, if a Boolean search is conducted on transgenic AND (welfare OR "adverse effect*" OR pathol*), entering mice, rats, rodents into the upper field will ensure that documents containing the word "mice" will be ranked ahead of those containing the word "rats", but not "mice", while documents containing the word "rodents" and neither of the other two words will be given the lowest rankings.
In simple search mode, AltaVista will in certain cases classify the retrieved documents into topics from which a selection can be made. For example, a search on diabetes resulted in AltaVista asking whether the purpose of the search was to find dietary recommendations, information on therapies, support groups, etc. This feature is probably limited to popular topics. Another feature in simple mode is the introduction of the AskJeeves natural language search facility, i.e. it is possible to input the sort of question that would be asked of a human being rather than just keywords or a search expression. The results obtained with this feature are sometimes not as might be expected!
Try a search on AltaVista
Euroseek
If no operators are used, the default is to assume the Boolean operator AND between each search term, i.e. the
search is for pages which contain all rather than any of the search terms. The permitted operators are AND, OR, and "", and parentheses may also be used. The search can be limited to a specific region and/or language. Euroseek operates in 40 different languages.
Try a search on Euroseek
Excite
In the normal search mode, Excite accepts the following operators: "", +, -, AND, OR, AND NOT and also permits the use of parentheses. The "power search" does not permit any of these operators, but instead offers some of the same facilities through a menu system. This can be used to specify that the retrieved documents "must", "can" or "must not" contain a specified search word or phrase.
If one of the retrieved documents is of particular relevance, it is possible to request "more like this". The index terms assigned by Excite to this document are then used as the basis for a further search.
Excite uses Intelligent Concept Extraction to find relationships between words and concepts. Retrieved documents will contain not only the input search terms, but also words which are conceptually related to these terms. The relationships are built up during Excite’s indexing procedures.
Try a search on Excite
HotBot
HotBot provides menu options to specify that a word or phrase "must", "should" or "must not" appear in the retrieved documents. The default is to search for documents containing all the search terms, but the menu allows this to be changed to "match any". In addition, it is possible to use the operators +, - and "", and also * for truncation although this last operator is not mentioned in the HotBot help files.
The search can be limited by date and by language. It is possible to limit the search to pages containing specific items such as file extensions, e.g. Acrobat (.pdf) files. Stemming can be turned on or off.
The stemming will include not only the plural form of an input search term, but other grammatical forms related to the term. It is also possible to select "Boolean Search" from the search options, which will permit the use of AND, OR, NOT and nested parentheses. A useful feature is the ability to conduct a further search on the retrieved set of documents.
HotBot also provides a link to a facility to search newsgroup archives.
Try a search on HotBot
Northern Light
Northern Light differs from other search engines because it searches not only through the Web, but also through its own "special collections" of documents. These include journal articles and texts from other sources. Special collection documents are listed together with retrieved web pages. However, the full text of these documents can only be accessed on payment of a fee.
Northern Light supports the operators AND, OR, NOT with nested parentheses, as well as +, - and "". Wildcards can be used not only for truncation, but also within a term. A single character is represented by % and multiple characters by *. A wildcard must have at least 4 characters in front of it. Stemming is automatic, but only for the singular and plural forms of common words.
The Power Search mode allows a search to be limited by URL, title, date, domain, language or country.Search results can be viewed from within a number of folders which separate the retrieved documents according to concept or source. For example, web pages from commercial companies might be segregated into a "commercial" folder, those from academic establishments into "education", etc.. The categorisation of web page source is performed automatically according to domain name. Unfortunately this means that some misclassification must inevitably occur.
Try a search on Northern Light
Meta-searchers
This is the ability to perform a search simultaneously on more than one search engine at a time.
The two search engines described below both support a number of operators. The search profile is optimised for each search engines. Operators which are not supported by a search engine are stripped from the profile submitted to that engine. Consequently, the quality of the results may differ significantly both within and between searches.
Dogpile
Dogpile submits searches to a number of leading search engines including Google, Yahoo!, Live Search and Ask.com. In addition it sends searches to newsgroup archives at AltaVista News, Reference.com, Dejanews and HotBot News. It also searches on a number of current newswire services and on FTP Search (for downloadable software).
It is possible to exclude any number of these search engines from the search and also to change the order in which they are searched.
If no operators are used, the default assumption is AND between all the search terms. The operators supported by Dogpile (but not necessarily by all the search engines) are AND, OR, NEAR, NOT. NEAR is substituted by AND in cases where it is not supported, while NOT is submitted either as NOT or as AND NOT, depending on the requirements of the search engines. Parentheses may also be used, as well as double quotation marks to indicate phrases.
Dogpile searches three search engines at a time and displays the results in the order in which they are received. If less than 10 documents are retrieved, the search is automatically moved to the next three search engines on the list until 10 hits are obtained. If more than 10 hits are obtained, the next set of search engines can be selected. If more than 10 hits are obtained from any search engine, a link is provided to the search engine page so that the remaining results can be viewed if desired. The exact form of the search profile submitted to each search engine is displayed next to the results from that engine. If the same document is found by more than one search engine, it will be displayed in each group of results.
The same search can also be transferred to Metafind, which performs a further search on search engines, and also to the Electric Library. The latter is a paid subscription service offering access to the full text of journal, magazine and newspaper articles and transcripts of radio and television broadcasts.
Try a search on Dogpile
Cyber411
Cyber411 performs a simultaneous search on AltaVista, GoogleGroups/Dejanews, Excite, Galaxy, Goto, HotBot, LookSmart, Lycos, Magellan, PlanetSearch, Search.com, Snap, Thunderstone, Webcrawler, What-U-Seek and Yahoo!. Any of these search engines can be excluded. Duplicate retrievals are removed before the results are displayed.
The operators supported by Cyber411 (but not necessarily by all the search engines) are AND, OR, NEAR, NOT. NEAR is substituted by AND in cases where it is not supported, while NOT is submitted either as NOT or as AND NOT, depending on the requirements of the search engines. Parentheses may also be used, as well as double quotation marks to indicate phrases.
A very useful feature of Cyber411 is the ability to place a time limit on the duration of the search.











Print the page