This section will cover basic principles needed to construct search filters for use on Internet search engines or on online databases. It includes general considerations relating to the choice of search terms and to the use of operators to define the links between the terms.
The choice of search terms (words) and the way they are combined is very important, but other factors also affect the efficiency of a search.
The effectiveness of a search can be measured in terms of recall and precision. These ratios vary between databases, between searches, and between search engines.
Recall ratio = relevant records retrieved/total relevant records (i.e. how comprehensive the search is)
Precision ratio = relevant records retrieved/total records retrieved (i.e. how accurate the search is)
Recall and precision are inversely related, so broadening the search (e.g. by including synonyms) will increase the recall, but at the cost of decreasing precision, because the probability of retrieving irrelevant material increases. Narrowing the search (by using specific terms or spellings) will increase precision, but at the risk of not retrieving some potentially very relevant material.
They will be affected by the capabilities of the search language (operators) that can be used. In the case of databases, they will also be affected by the indexing system and for search engines, by the algorithms used to assign relevancy.
If the recall ratio is expressed as a percentage of all relevant documents that exist rather than the total relevant records in the database being searched, it will also be affected by the coverage of the database. This will depend on which journals and other sources are monitored. Likewise, recall will be affected by the number of web sites indexed by a search engine and the depth to which these sites are indexed.
As a result, there may be great differences in the results obtained from different databases or search engines. It is important to bear this in mind, in order to search effectively it is better to use more than one database/search engine.
Choice of search terms
Appropriate search terms for documents relevant to the Three Rs are listed in Planning a search, so are not specifically covered here.
Searching for information on the Three Rs, requires combination of appropriate Three Rs terms with search terms related to the scientific topic of interest.
Important factors in creating a search include:
- case specificity
- synonyms and related words.
- operators used to link the search terms (see below)
Case Specificity - Some search systems are case-specific for upper and lower case letters. In these instances, the use of capital letters where appropriate, for example for acronyms and proper names, may reduce the number of irrelevant documents retrieved.
Spelling - Correct spelling of search terms is vital, because search software looks for exact matches. To maximise the chance of finding relevantg documents it is sometimes necessary to use both British and American spellings, such as:anaesthesia, anesthesia, haemoglobin, hemoglobin.
Synonyms - It is also useful to include synonyms and related words which could be used by authors, for example the phrase "liver cells" might be used instead of the word "hepatocytes".
Although the phrase "in vitro" is commonly used in searches for replacement alternative methods, an exhaustive search may need additional use of terms such as "cell culture", "tissue culture", "organ culture", "subcellular fractions", etc.
Certain options available in some search systems can simplify some of the above considerations. These include:
Symbols such as * may be used in some search systems as wildcards to signify one or more characters. This can be a useful way of including different ways of spelling a word without having to input both versions. For example:
sul*ur will retrieve both sulfur and sulphur while truncation of the term: sulph* will retrieve sulphuric, sulphurous, sulphate, sulphite, etc, but not sulfuric, sulfurous, sulfate, sulfite, etc.
Note: It is not possible to combine a spelling wildcard with a truncation one.
Some pitfalls associated with the use of truncation operators are avoided in search engines which use stemming, i.e. those which automatically search for terms grammatically related to the search terms. This may be confined to combining the singular and plural forms of common words. In other cases, stemming applies to the combination of different grammatical forms, such as experiment, experimentation, experimental, experimenter, experimenting.
Boolean and other operators
The need to understand the use of operators may be questioned, given that database hosts and search engines now tend to provide a menu-driven search mode as well as an advanced interface in which operators may be used. However, menu-driven search interfaces may be limited in the search possibilities they offer.
Most online databases and Internet search engines permit the use of search operators. These operators define the relationship between the search terms being used. They act as commands to the search engine to specify how the sets of identified documents should be combined. The best known are the Boolean operators.
The results of Boolean-based searches can be represented using a Venn diagram. Each circle in the diagram represents the set of documents containing one of the search terms.
AND (also signified by & ) is used to find documents which contain both of the search terms linked by the operator and to eliminate documents which contain only one or neither of the search terms. For example:
to find documents on transgenic mice:
transgenic AND mice
OR (also signified by | ) is used to find documents which contain either one or both of the search terms. Fore example:
to find all documents referring to the kidney, the liver or both organs:
kidney OR liver
NOT (also signified by ! ) is used to exclude documents from a retrieved set. For example:
to find documents on rodents which do not deal with rats: rodents NOT rats
The NOT operator should be used with care because it can eliminate pertinent information. In the above
example, it would exclude all documents which mention rats in addition to other rodent species. However, it can be useful in reducing an unmanageable quantity of retrieved information and also to increase the precision of a search term. For example:
pigs NOT guinea
In some cases, NOT cannot be used alone and has to be combined with AND: rodents AND NOT rats.
NEAR is not a Boolean operator, but can be used on some search systems to indicate a requirement for proximity, where it will retrieve documents in which the key terms are within a certain number of words from each other. This can narrow down a search when searching for a sequence of words, if the exact sequence is not known, or if no other means is available to indicate that the key words should be treated as a phrase (see below). In some cases, the order in which the search terms are linked by the proximity operator will define the order in which they must appear in the document. For example:
the search profile (directive NEAR transport NEAR animals) AND European gave 16 hits on AltaVista, as compared to 987 when the search terms were simply joined with AND using no parentheses. Entry of the search terms into the simple search mode with no operators gave nearly 7 million hits, because the AltaVista default is to place OR between all search terms.
A number of other operators are commonly permitted by search engines and can be used to refine searching carried out in the simple search mode. The most useful of these are:
+ : used to specify words which must be present in the retrieved documents. for example:
+transgenic +mice will only retrieve documents which contain both words, while +transgenic mice will give a higher placing to documents containing both words, but will also retrieve documents containing the word transgenic but not the word mice.
- : used to exclude documents containing the search term. For example:
+transgenic -plants -mice will retrieve documents which contain the word transgenic but which do not contain the words plants or mice. Similar care should be used with this operator as with the Boolean operator NOT.
" " : used to enclose a series of words which are to be treated as a phrase. For example:
"Animals (Scientific Procedures) Act" will find only documents containing this specific phrase.
Many search engines define a list of common stop words (e.g. and, the, of) which are stripped out of search expressions. Enclosing these words in quotation marks will ensure that they are retained.
Other operators may include ways in which to limit the search to words occurring within a URL, within an Internet domain, within the title of the file, etc. These are not necessary for normal searching, but may sometimes be useful. The help pages of most search engines will list all the acceptable operators and how they may be used.
In some databases and search engines parentheses can be used together with Boolean operators to construct more-complicated search profiles. The operators within a pair of parentheses are treated as a single unit which is processed first. For example:
to find documents which mention cell culture or tissue culture: cultur* AND (cell* OR tissue).
Note: the use of truncation to cover variants such as cultured, culturing, cultures and cells.
If proximity operators are permitted, (cell* OR tissue) NEAR cultur* might be preferable, but it may be necessary to invert the search expression as well if the proximity operator also defines the order in which the words must appear: For example:
(cultur* NEAR (cell* OR tissue)) OR ((cell* OR tissue) NEAR cultur*) will recover documents mentioning cell culture or tissue culture and also documents referring to cultured cells, etc.
The above search filter uses nested parentheses which are permitted on many search systems. Another example is:
transgenic AND (mice OR rats OR (pigs NOT guinea)), which will retrieve documents on transgenic mice, transgenic rats and transgenic pigs, but not transgenic guinea pigs.