Home banner
Divider
A-Z Index

Quick way to the find the information that you need...

More button
Register with FRAME

Although you do not need to register, any information you provide will be confidential and used only by FRAME to improve the website

Register button
Account Login
Forgot password?

ATLA - ISI
The Journal

 

Alternatives to Laboratory Animals - ATLA

Download latest issue button Download back issues button Subscribe to ATLA
Contact Us

Tel icon

Tel: +44 (0)115 9584740


Tel icon

Fax: +44 (0)115 9503570

Make an Enquiry

Search Basics

 

 

Introduction



This section will cover basic principles applying to the construction of search profiles for use on Internet search engines or on any online database. It will include general considerations relating to the choice of search terms and to the use of operators to define the links between the terms.

 

The discussion will not be specific to any one information provider, although reference will be made to individual search engines in some of the examples given.


Search efficiency


The choice of search terms and the manner in which these terms are combined into a search profile is obviously of major importance. However, a number of other factors affect the efficiency with which relevant information can be obtained from a search.


The effectiveness of a search can be measured in terms of recall and precision.


Recall ratio = relevant records retrieved/total relevant records

Precision ratio = relevant records retrieved/total records retrieved

These ratios will vary from one search to another and from one database or search engine to another when the same search is carried out on each of them.

 

They will be affected by the capabilities of the search language (operators) that can be used. In the case of databases, they will also be affected by the indexing system and indexing policy, the quality of the indexing and the extent to which the searcher is familiar with the structure and indexing system.

In the case of search engines, they will be affected by the algorithms used to assign relevancy and hence ranking position to the documents and also by the indexing policy used.


If the recall ratio is expressed as a percentage of all relevant documents that exist rather than the total relevant records in the database being searched, it will also be affected by the coverage of the database. This in turn will depend on database policy about which journals and other sources are or are not monitored. Likewise, recall will be affected by the number of web sites indexed by a search engine and the depth to which these sites are indexed.


As a result, there may be great differences in the results obtained from different databases or search engines. The overlap in documents retrieved by the same search on the three major biomedical databases, Medline, Embase and Biosis, has been found to be as low as 30%. It is important to bear this in mind if a search has to be exhaustive.


Many researchers are tempted to restrict their searches solely to Medline, especially since it is now freely available. This may not always be a wise decision. In the same way, it may prove useful to perform an Internet search on more than one search engine.


Broadening the search strategy will increase the recall, but at the cost of decreasing precision, which means that a greater proportion of the retrieved documents will be irrelevant to the topic of the search. Conversely, narrowing the search will increase precision, but at the risk of not retrieving some potentially very relevant material. Most often, it is necessary to choose between high recall and low precision, or between low recall and high precision, although it has been estimated that even a carefully optimised search strategy designed to maximise recall will typically give recall ratios of only 25-30%.


The addition of indexing terms and codes to the database record is intended to increase the information provided by the title and abstract, and thus to increase search success. It is helpful to become familiar with the indexing systems of frequently used databases and assess whether any of the controlled index terms could be used to increase the precision of the search. However, it is not always possible to gain access to the indexing manuals.


Nevertheless, key index terms can often be identified from the full records of highly relevant documents and used to refine the search profile. If a key document is already known to the searcher, its record can be deliberately retrieved by searching on author, date and some words from the title.

 

 

Choice of search terms


Appropriate search terms for documents relevant to the Three Rs are listed elsewhere. When searching for information on the Three Rs, it will be necessary to combine appropriate terms from this list with search terms related to the scientific topic of interest. Once the search terms have been chosen, other factors also become relevant.

 

These include the operators used to link the search terms (see below) as well as case specificity, spelling, synonyms and related words.


Some, but not all search systems, are case-specific for the upper case. In these instances, the use of capital letters where appropriate, for example for acronyms and proper names, may reduce the number of irrelevant documents retrieved.

 

However, the usefulness of this facility on Internet search engines is reduced by the fact that many developers of web sites add key words into their web pages in both upper and lower case to maximise the possibility of being picked up by searches carried out on case-sensitive search engines, regardless of whether the searcher does or does not have the capital lock key switched on!


Correct spelling of search terms is vital, because the search software looks for an exact match. In addition, it is necessary to take into account differences in spelling between British and American English

 

E.g:

anaesthesia, anesthesia, haemoglobin, hemoglobin.

 

Both variants should be used if it is important to maximise the chances of identifying relevant documents.


In addition, it is necessary to take account of possible synonyms and related words which could be used instead of
the selected words.

 

E.g.:

the phrase "liver cells" might be used instead of the word "hepatocytes".

 

Although the phrase "in vitro" is commonly used in searches for replacement alternative methods, an exhaustive search may need the additional use of search terms such as "cell culture", "tissue culture", "organ culture", "subcellular fractions", etc. In the case of on-line databases, the database indexing manual, if available, will list the preferred term(s) used in the list of index terms assigned to each record. Where no such resource is available, all possible synonyms may need to be entered.


Certain options which are available in some search systems can simplify some of the above considerations. These include:

Wildcards
Certain symbols, often *, may be used in some search systems as wildcards to signify one or more characters. Their use is most frequently permitted only for truncation at the end of a search term. In those search systems which permit the use of wildcards within search terms, e.g. Northern Light, this can be a useful way of including different ways of spelling a word without having to input both versions separately:


E.g:

sul*ur will retrieve both sulfur and sulphur while truncation of the term: sulph* will retrieve sulphuric, sulphurous, sulphate, sulphite, etc, but not sulfuric, sulfurous, sulfate, sulfite, etc.



Truncation can result in too many irrelevant retrievals:

 

E.g.:

the truncated term diet* will retrieve documents containing the words diet, dietary, dietetic, dietician, but also any references to diethyl compounds.

Stemming
The pitfalls associated with the use of truncation operators are avoided in search engines which use stemming, i.e. which automatically search for terms grammatically related to the search terms. In some cases, e.g. on Northern
Light, this may be confined to combining the singular and plural forms of common words. In other cases, e.g. on
Euroferret, stemming also applies to the combination of different grammatical forms, such as experiment, experimentation, experimental, experimenter, experimenting.

Concept linking
Some search engines now provide more sophisticated ways to refine and/or expand a search strategy. Excite uses "Intelligent Concept Extraction" to find relationships between words and ideas, in order to retrieve relevant documents which may not necessarily contain the words that were used as search terms.

 

Both Euroferret and AltaVista have a means of refining a search in which important index words from the retrieved documents are presented. The searcher can include, and in the case of AltaVista also exclude, some of these words in a subsequent search. In Excite, it is also possible to indicate retrieved documents of especial relevance and ask the search engine to search for other similar documents. All of these possibilities reduce the need to give appropriate consideration to the use of synonyms and alternative spellings.


Boolean and other operators
The need to understand the use of operators may be questioned, given that both database hosts and a number of search engines are tending to provide a menu-driven search mode as well as an advanced interface in which operators may be used. However, although menu-driven search interfaces have the advantage of assuming no knowledge of search operators, they may be limited in the search possibilities they offer.

Most online databases and a large number of Internet search engines permit the use of a varying number of search operators. These operators are used to define the relationship between the search terms that are being used. They act as commands to the search engine to specify how the sets of identified documents should be combined. The most well known are the Boolean operators.


Database hosts, such as Dialog, Dimdi, STN, will have their own command language, which may modify the way in which
the operators are signified from that described below. Host command languages will not be covered in this guide.

Boolean operators

The results of boolean-based searches can be represented using a Venn diagram. Each circle in the diagram represents the set of documents containing one of the search terms.

 

AND (can also be signified by & ) is used to find documents which contain both of the search terms linked by the operator and to eliminate documents which contain only one or neither of the search terms:


E.g.:

to find documents on transgenic mice: transgenic AND mice

 

 

 

 

 

 

 

 

 

OR (can also be signified by | ) is used to find documents which contain either one or both of the search terms:

 

E.g.:

to find all documents referring to the kidney, the liver or both organs: kidney OR liver

 

 

 

 

 

 

 

 

 

 

 

 

 

NOT (can also be signified by ! ) is used to exclude documents from a retrieved set:

 

E.g.:

to find documents on rodents which do not deal with rats: rodents NOT rats

 


 

 

The NOT operator is a dangerous one, because pertinent information can be lost through its use. In the above
example, it would exclude all documents which mention rats in addition to other rodent species. However, it can be
useful in reducing an unmanageable quantity of retrieved information and also to increase the precision of a search
term:

 

E.g.:

pigs NOT guinea

 

In some cases, for example when searching on AltaVista and HotBot, NOT cannot be used alone and has to be combined with AND: rodents AND NOT rats.

 


NEAR is not one of the classic Boolean operators, but it can be used with them on some search systems in order to indicate a requirement for proximity, where it will retrieve documents in which the key terms are within a certain number of words from each other. This can be useful to narrow down a search when searching for a sequence of words, if the exact sequence is not known, or if no other means is available to indicate that the key words should be treated as a phrase (see below). In some cases, the order in which the search terms are linked by the proximity operator will define the order in which they must appear in the document.



E.g.:

the search profile (directive NEAR transport NEAR animals) AND European gave 16 hits on AltaVista, as compared to 987 when the search terms were simply joined with AND using no parentheses. Entry of the search terms into the simple search mode with no operators gave nearly 7 million hits, because the AltaVista default is to place OR between all search terms.

Parentheses
Where these are supported by the search engine/database, parentheses can be used together with Boolean operators to construct more-complicated search profiles. This is especially useful in situations where it is not possible to restrict a search to a subset of documents identified in a previous search. The operators within a pair of parentheses are treated as a single unit which is processed first.


E.g.:

to find documents which mention cell culture or tissue culture: cultur* AND (cell* OR tissue).

 

Note the use of truncation to cover variants such as cultured, culturing, cultures and cells.


If proximity operators are permitted, (cell* OR tissue) NEAR cultur* might be preferable, but it may be necessary to invert the search expression as well if the proximity operator also defines the order in which the words must appear:


(cultur* NEAR (cell* OR tissue)) OR ((cell* OR tissue) NEAR cultur*) will recover documents mentioning cell culture or tissue culture and also documents referring to cultured cells, etc.



The above search profile uses nested parentheses which are permitted on many search systems. Another example is the search profile transgenic AND (mice OR rats OR (pigs NOT guinea)), which will retrieve documents on transgenic mice,
transgenic rats and transgenic pigs, but not transgenic guinea pigs.

Other operators


The search engines which accept Boolean operators normally do so only in the advanced search mode. However, a number of other operators are commonly permitted by search engines and can be used to refine searching carried out in the simple search mode. The most useful of these are:


+ : used to specify words which must be present in the retrieved documents


for example: +transgenic +mice will only retrieve documents which contain both words, while +transgenic mice will give a higher placing to documents containing both words, but will also retrieve documents containing the word transgenic but not the word mice.


- : used to exclude documents containing the search term

for example, +transgenic -plants -mice will retrieve documents which contain the word transgenic but which do not contain the words plants or mice. Similar care should be used with this operator as with the Boolean operator NOT.

" " : used to enclose a series of words which are to be treated as a phrase

 

for example, "Animals (Scientific Procedures) Act" will find only documents containing this specific phrase.


Many search engines define a list of common stop words (e.g. and, the, of...) which are stripped out of search expressions. Enclosing these words in quotation marks will ensure that they are retained.

Other operators may include ways in which to limit the search to words occurring within a URL, within an Internet domain, within the title of the file, etc. These are not necessary for normal searching, but may sometimes be useful. The help pages of most search engines will list all the acceptable operators and how they may be used.