General considerations and choice of search engine
Internet search engines are constantly changing, with new ones appearing and new features being added to existing ones. This section does not attempt to describe all currently available search engines, but offers advice on how to get the best from them.
Choice of search engine will be dictated by a number of factors such as:
- number of web sites covered
- depth to which they are covered
- whether all, some or just the home page is indexed
- frequency with which sites are re-indexed
- overall speedwith which a search engine can be accessed
- sophistication of available search features
Search engine watch compares a number of different search engines and provides tips and information about searching the web.
The order in which search results are displayed will depend on the algorithms used to process the results and assess relevancy. These change frequently, so the same search may give different results from week to week. The criteria used by the search engines to determine relevance may not necessarily relate to the relevance of the site content.
Professional web site optimisers use various tricks to provide their clients with high placings in searches carried out on keywords chosen by the client. Looking at some of the lower rankings may sometimes pull up highly relevant pages. Another strategy is to carry out a search on a meta-searcher. (See below)
Many relevant pages on the Web may not be identified as such by a search engine. One way to find these pages is by following links from other relevant pages.
When following a large number of links, the best strategy is to right click on each link in turn and then select either to bookmark it for future evaluation (it is quite useful to put these into a temporary "evaluation" folder), or open it in a new browser window. Once the link has been evaluated (and bookmarked if useful), the window can be closed, leaving visible the first window with the original list of links.
Comparison of search features
Detailed comparisons and much other information about search engines can be found on Search Engine Watch.
In many cases, both a simple and an advanced search mode is possible. Boolean operators are often permitted only in the advanced mode. A description of how these and other operators are used can be found in Search Basics.
Refer to the Help facility of individual search engines for further information on less commonly required search features.
Easy-to-use search engines: Google and Yahoo!
The most commonly used search engine is Google. In the simple mode it allows strings of words to be searched without any knowledge of boolean or other descriptors. It has a spell checking facility and suggests alternative spelling.
In the advanced mode, Google allows searching in 41 different lanaguages. The menu driven option settings also allow searches based on region, last update, domain name, where terms appear on teh page, reading level and file type. Google Scholar is a way to search for scientific articles and researchers.
Yahoo! does not have the same spell checking and suggestion facilities as Google. In the simple search mode, strings of words can be searched.
The advanced mode (available from the Options pull-down menu, top right) dispenses with the need for boolean or other types of descriptors since it allows the user to enter the search terms and decide whether all or any of the words should appear and exclude results with other terms. It also allows the user to restrict searches according to whether the search term appears in the title, URL address or anywhere in the page.
Yahoo! allows the user to search for pages in 37 languages, search for results in a particular file format and within a particular website or domain.
Altavista operates in 36 languages. It now forms part of the Yahoo! groups of companies and, thus, uses the Yahoo! search database.
The search engine is case-sensitive on upper but not on lower case. If no operators are used, the default is for AltaVista to search for any rather than all of the input search terms (i.e. it assumes the Boolean operator OR between each term).
In simple search mode, the permitted operators are " ", +, - as well as * for truncation. The advanced mode (available from the More pull-down menu) links to the Yahoo advanced search page and therefore operates in the same way.
Try a search on AltaVista
In the normal search mode, Excite accepts the following operators: "", +, -, AND, OR, AND NOT and also permits the use of parentheses. The "power search" does not permit any of these operators, but instead offers some of the same facilities through a menu system. This can be used to specify that the retrieved documents "must", "can" or "must not" contain a specified search word or phrase.
If one of the retrieved documents is of particular relevance, it is possible to request "more like this". The index terms assigned by Excite to this document are then used as the basis for a further search.
Excite uses Intelligent Concept Extraction to find relationships between words and concepts. Retrieved documents will contain not only the input search terms, but also words which are conceptually related to these terms. The relationships are built up during Excite’s indexing procedures.
This is the ability to perform a search simultaneously on more than one search engine at a time.
The two search engines described below both support a number of operators. The search profile is optimised for each search engines. Operators which are not supported by a search engine are stripped from the profile submitted to that engine. Consequently, the quality of the results may differ significantly both within and between searches.
Dogpile submits searches to a number of leading search engines including Google, Yahoo!, Live Search and Ask.com. In addition it sends searches to various newsgroup archives, a number of current newswire services and on FTP Search (for downloadable software).
It is possible to exclude any number of these search engines from the search and also to change the order in which they are searched.
If no operators are used, the default assumption is AND between all the search terms. The operators supported by Dogpile (but not necessarily by all the search engines) are AND, OR, NEAR, NOT. NEAR is substituted by AND in cases where it is not supported, while NOT is submitted either as NOT or as AND NOT, depending on the requirements of the search engines. Parentheses may also be used, as well as double quotation marks to indicate phrases.
Dogpile searches three search engines at a time and displays the results in the order in which they are received. If less than 10 documents are retrieved, the search is automatically moved to the next three search engines on the list until 10 hits are obtained. If more than 10 hits are obtained, the next set of search engines can be selected. If more than 10 hits are obtained from any search engine, a link is provided to the search engine page so that the remaining results can be viewed if desired. The exact form of the search profile submitted to each search engine is displayed next to the results from that engine. If the same document is found by more than one search engine, it will be displayed in each group of results.
Try a search on Dogpile
Cyber411 performs a simultaneous search on AltaVista, GoogleGroups/Dejanews, Excite, Galaxy, Goto, HotBot, LookSmart, Lycos, Magellan, PlanetSearch, Search.com, Snap, Thunderstone, Webcrawler, What-U-Seek and Yahoo!. Any of these search engines can be excluded. Duplicate retrievals are removed before the results are displayed.
The operators supported by Cyber411 (but not necessarily by all the search engines) are AND, OR, NEAR, NOT. NEAR is substituted by AND in cases where it is not supported, while NOT is submitted either as NOT or as AND NOT, depending on the requirements of the search engines. Parentheses may also be used, as well as double quotation marks to indicate phrases.
A very useful feature of Cyber411 is the ability to place a time limit on the duration of the search.