What is a thesaurus? Explain its functions - The word `thesaurus' comes from Greek term `thesauros' meaning a storehouse or treasury of words. The Oxford English Dictionary defines "thesaurus" as a archaeological term "a treasury of temple, etc." and quotes its use in 1736 as a treasury or store house of knowledge. Dictionary defines it as "a book of words or of information about a particular field or a set of concepts, specially a dictionary of synonyms". 

A dictionary lists words along with their meanings; synonyms, etc. in alphabetical order, but a thesaurus assembles all words related to an idea at one place. Modern usage may be said to date from 1852 when Peter Mark Roget thought of his thesaurus as a classification of ideas. Roget's Thesaurus had nothing to do with information retrieval. But his novel idea was later profitably employed in the compilation of thesaurus for information retrieval. Helen Brownson is said to be the first person who used the term `thesaurus' in the context of information retrieval in a paper presented at the Dorking Conference on Classification Research in 1957. 

H. P Luhn was probably the first person to think in" terms of information retrieval thesaurus, who suggested the compilation, for indexing purposes, of `families of notions' and dictionary of `notion families'. The first thesaurus used in information retrieval system was developed by Du Pont in USA around 1969 and since then many thesauri have been brought out in different subject fields: A number of standards have also come into existence to provide guidelines in the design and development of monolingual and multilingual: thesauri.


In the context of information retrieval, a thesaurus (plural: "thesauri") is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.

A thesaurus guides both an indexer and a searcher in choosing similar favored term or mix of favored terms to address a given subject. ISO 25964, the worldwide norm for data recovery thesauri, characterizes a thesaurus as a "controlled and organized jargon in which ideas are addressed by terms, coordinated so connections between ideas are made express, and favored terms are joined by lead-in sections for equivalents or semi equivalents."

A thesaurus is formed by something like three components: 1-a rundown of words (or terms), 2-the relationship among the words (or terms), demonstrated by their various leveled relative position (for example parent/more extensive term; youngster/smaller term, equivalent word, and so forth), 3-a bunch of rules on the most proficient method to utilize the thesaurus.


Any place there have been enormous assortments of data, whether on paper or in PCs, researchers have confronted a test in pinpointing the things they look for. The utilization of grouping plans to orchestrate the records all together was just an incomplete arrangement. One more methodology was to record the items in the archives utilizing words or terms, as opposed to characterization codes. During the 1940s and 1950s a few trailblazers, like Calvin Mooers, Charles L. Bernier, Evan J. Crane and Hans Peter Luhn, gathered up their file terms in different sorts of rundown that they called a "thesaurus" (by relationship with the notable thesaurus created by Peter Roget). The primary such rundown put intensely to use in data recovery was the thesaurus created in 1959 at the E I Dupont de Nemours Organization.

The initial two of these rundowns to be distributed were the Thesaurus of ASTIA Descriptors (1960) and the Synthetic Designing Thesaurus of the American Foundation of Substance Specialists (1961), a relative of the Dupont thesaurus. More followed, finishing in the compelling Thesaurus of Designing and Logical Terms (TEST) distributed together by the Specialists Joint Committee and the US Branch of Safeguard in 1967. TEST accomplished something other than act for instance; Reference section 1 introduced Thesaurus rules and shows have directed thesaurus development from that point onward. Many thesauri have been delivered from that point forward, maybe thousands. The most prominent developments since TEST have been: (a) Expansion from monolingual to multilingual capacity; and (b) Expansion of a reasonably coordinated show to the essential in order show.

Here we notice just a portion of the public and worldwide principles that have constructed consistently on the essential standards set out in TEST:

  • UNESCO Rules for the foundation and improvement of monolingual thesauri. 1970 (trailed by later versions in 1971 and 1981)
  • Noise 1463 Rules for the foundation and advancement of monolingual thesauri. 1972 (trailed by later versions)
  • ISO 2788 Rules for the foundation and improvement of monolingual thesauri. 1974 (updated 1986)
  • ANSI American Public Norm for Thesaurus Design, Development, and Use. 1974 (overhauled 1980 and supplanted by ANSI/NISO Z39.19-1993)
  • ISO 5964 Rules for the foundation and improvement of multilingual thesauri. 1985
  • ANSI/NISO Z39.19 Rules for the development, arrangement, and the executives of monolingual thesauri. 1993 (modified 2005 and renamed Rules for the development, organization, and the executives of monolingual controlled vocabularies.)
  • ISO 25964 Thesauri and interoperability with different vocabularies. Section 1 (Thesauri for data recovery) distributed 2011; Section 2 (Interoperability with different vocabularies) distributed 2013.

The most obviously noticeable pattern across this set of experiences of thesaurus improvement has been from the setting of limited scope disengagement to an organized world. Admittance to data was prominently improved when thesauri crossed the split among monolingual and multilingual applications. All the more as of late, as should be visible from the titles of the most recent ISO and NISO guidelines, there is an acknowledgment that thesauri need to work at work with different types of jargon or information association framework, for example, subject heading plans, grouping plans, scientific classifications and ontologies. The authority site for ISO 25964 gives more data, including an understanding rundown.

In data recovery, a thesaurus can be utilized as a type of controlled jargon to support the ordering of proper metadata for data bearing substances. A thesaurus assists with communicating the signs of an idea in a recommended manner, to support further developing accuracy and review. This implies that the semantic applied articulations of data bearing substances are more straightforward to situate because of consistency of language. Moreover, a thesaurus is utilized for keeping a various leveled posting of terms, generally single words or bound states, that help the indexer in restricting the terms and restricting semantic uncertainty.

The Craftsmanship and Engineering Thesaurus, for instance, is utilized by incalculable exhibition halls all over the planet, to inventory their assortments. AGROVOC, the thesaurus of the UN's Food and Horticulture Association, is utilized to record or potentially search its AGRIS information base of overall writing on farming examination.

Functions of a Thesaurus

Functions of a Thesaurus :

a) it provides a standard vocabulary for a given subject field by exercising control on the vocabulary of terms used in an indexing language. Methods of controlling the vocabulary .are: 

i) out of all possible synonyms and quasi-synonyms, only one term is selected as a descriptor, the scope of the meaning of the term is clearly indicated in a scope note for the best suitability of the selected meaning, a definite rule is followed for compound terms, word-forms, number (singular/plural) and spellings are standardized, and homonyms are differentiated by qualifiers; 

b) it shows the intrinsic, semantic relationship existing between, terms, and thus provides system of references between terms; 

c) it helps the indexer and the searcher in the choice of preferred terms; 

d) it provides hierarchical display of terms so that a search can be broadened or narrowed systematically; 

e) it increases the speed of retrieval by use of indexing terms and search terms; and 

f) it provides a map of a 'given subject field, which helps to understand the structure of the field.  


