What is InChI?
"InChI" is an acronym for IUPAC International Chemical Identifier, and is a text-based way of specifying molecular structures just like
SMILES. Here is a sample InChI entry for CH3CH2OH (ethanol):
InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3
Unlike SMILES, each InChI entry starts with "InChI=", followed by the version number (currently "1").
Each "/" in an InChI string denotes a different layer, with the layers being:
- Main layer
- Chemical formula (no prefix). This is the only sublayer that must occur in every InChI.
- Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones.
- Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each of the other atoms.
- Charge layer
- positive charge sublayer (prefix: "p")
- negative charge sublayer (prefix: "q")
- Stereochemical layer
- Isotopic layer
- Fixed-H layer
- Reconnected Layer
InChI strings are humanly readable, but should not be confused with the InChIKey, which is a fixed length (25 characters) compressed representation of an InChI string which is not humanly-readable. InChI keys were developed to facilitate searching on compounds where, due to the length of the InChI input string, the full InChI version caused problems for search engines.