Help

<< Previous | Next >>

What is InChI?


"InChI" is an acronym for IUPAC International Chemical Identifier, and is a text-based way of specifying molecular structures just like SMILES. Here is a sample InChI entry for CH3CH2OH (ethanol):

InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3

Unlike SMILES, each InChI entry starts with "InChI=", followed by the version number (currently "1").

Each "/" in an InChI string denotes a different layer, with the layers being:


  • Main layer

    • Chemical formula (no prefix). This is the only sublayer that must occur in every InChI.
    • Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens) are numbered in sequence; this sublayer describes which atoms are connected by bonds to which other ones.
    • Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each of the other atoms.

  • Charge layer

    • positive charge sublayer (prefix: "p")
    • negative charge sublayer (prefix: "q")

  • Stereochemical layer
  • Isotopic layer
  • Fixed-H layer
  • Reconnected Layer


InChI strings are humanly readable, but should not be confused with the InChIKey, which is a fixed length (25 characters) compressed representation of an InChI string which is not humanly-readable. InChI keys were developed to facilitate searching on compounds where, due to the length of the InChI input string, the full InChI version caused problems for search engines.

<< Previous | Next >>