Help

<< Previous | Next >>

What is chemical search?


Chemical search is the ability to search a searchable database using normalized chemical formulas and structures, rather than simply as keywords. Inherent in chemical search is preprocessing of the database to recognize chemical entities and convert them to a uniform representation (e.g., SMILES). This requires sophisticated parsing, and greatly improves the thoroughness with which a search can be performed.

To illustrate the difference between a convention keyword search and a chemical search, consider the following example:

You wish to find all references to "aspirin" in the database. Using a keyword search, you would enter "aspirin," and all documents containing that literal word would be returned as a response to your query. However, in a conventional keyword search, any documents that did not contain the word "aspirin" would be missed, even if they included words that, chemically speaking, meant "aspirin." For example, "acetylsalicylic acid" is the common chemical name for aspirin. The IUPAC name is 2-(acetyloxy)benzoic acid. The formula is C9H8O4. None of these would be found using a conventional keyword search.

On the other hand, when performing a chemical search, parsing processes have examined the entire database ahead of time, and any time the parsers find a chemical entity, it is turned into a uniform representation of that chemical. Our database uses SMILES strings for this purpose. The SMILES string for aspirin is CC(=O)Oc1ccccc1C(=O)O. No matter how a reference to aspiring is found, whether it be using a trade name, a common chemical name, an IUPAC name, etc., it is turned into the same SMILES string. Then, when you do your query, your query is also converted into that SMILES string behind the scenes. Now there is an exact correspondence between your search input and all the various ways "aspirin" may have been represented in the original documents, and you can find them all, regardless of the fact that the documents you seek may not have actually contained the literal word "aspirin".

Chemical search is absolutely necessary when searching chemical compounds if you need your search to be thorough. The number of synonyms many chemicals have make it virtually impossible to perform a complete search using only a conventional keyword search. For example, Valium (also called Diazepam, 7-chloro-1-methyl-5-phenyl-1,3-dihydro-2H-1,4-benzodiazepin-2-one, and C16H13ClN2O) actually has over *100* synonyms.

<< Previous | Next >>