The National Institute of Standards and Technology (NIST) has updated its database of chemical fingerprints, called mass spectra, that are used to identify unknown chemical compounds. The NIST Mass Spectral Library and its new version, called NIST20, is used in health care, drug discovery, foods and fragrances, oil and natural gas, environmental protection, forensic science, and almost every other industry that manufactures or measures physical items.

“If you have a mysterious substance—you have no idea what it is—you generate its fingerprints then run those prints through our library,” said Tytus Mak, NIST biostatistician. “If you find a match, you know what the substance is.”

Those chemical fingerprints are generated using a mass spectrometer, which breaks molecules into pieces and then lines those pieces up on a graph according to their mass. The resulting mass spectrum appears as a series of vertical lines that form a unique pattern for each compound. The NIST Mass Spectral Library comes pre-installed on many instruments, and users can purchase the update from their instrument manufacturer or other distributors. Collections of mass spectra used in specialized areas of research can be downloaded for free from the NIST website.

Mass spectrometry is particularly useful for identifying organic compounds. Part of Mak’s role in this project was to decide, of the countless organic compounds out there, which ones to include in the library. To do this, he scoured the catalogs of chemical manufacturers and lists of important compounds published by private companies, government agencies, and scientific researchers. He then prioritized the compounds based on their relative importance and the cost of purchasing samples for analysis.

This update includes more than 14,000 human and plant metabolites, as well as pesticides and environmental contaminants, chemicals used in manufacturing such as lubricants and surfactants, pharmaceutical drugs, and illicit drugs. After NIST purchased samples of the compounds, chemists ran them through carefully calibrated mass spectrometers. They did this on different instruments under varying conditions, producing multiple mass spectra for each compound. A team of experts then analyzed the data to ensure high accuracy and precision.

“We carefully acquire and curate the data so users can have high confidence in their identifications,” said Sara Yang, NIST computational biologist, who worked on quality control.

Additional details are available at