Determining which substance has actually been produced in a test tube or flask is one of the central tasks of chemistry. Particularly in the case of complex or novel compounds, however, this can be extremely time-consuming, even for experienced specialists. A research team from Friedrich Schiller University Jena, Helmholtz-Zentrum Berlin for Materials and Energy, the Helmholtz Institute for Polymers in Energy Applications Jena and the Swiss software company Zakodium Sárl has now developed an artificial intelligence (AI) system that proposes suitable molecular structures from the raw data of spectroscopic measurements and assesses their plausibility. The system is openly accessible and has been presented in the journal Nature Communications. 

    Why structure elucidation is so challenging

    “Anyone who synthesises a molecule must also prove its chemical structure,” says Dr Kevin Jablonka of the University of Jena. He adds: “To do this, chemists typically use analytical techniques such as nuclear magnetic resonance (NMR) spectroscopy, infrared spectroscopy and mass spectrometry. Each of these methods provides clues about the structure, but often only to a limited extent. The many individual measurement signals therefore form a kind of chemical puzzle that must be solved correctly.”

    Structure analysis is often particularly challenging for novel molecules that have never been described before, especially because measurement data obtained in practice are frequently less than ideal. “Impurities in a substance can generate their own signals or overlap with the signals of the target compound,” explains Jablonka. “This is where our system has a particular strength: for proton NMR spectra, which are routinely measured very frequently, it can cope with impurities present in real samples.”

    How SECS works

    “The new system, SECS, combines two artificial intelligence approaches,” explains Adrian Mirza, first author of the study. “First, the model learns to translate spectra and molecular structures into a shared mathematical representation. An evolutionary algorithm then refines the results by modifying candidate molecules step by step, adding or removing atoms and bonds and repeatedly testing whether the outcome provides a better fit to the measurement data.”

    The system ultimately produces a ranked list of possible structures, together with similarity scores based on the chemical context.

    Comparable with experienced specialists

    “In a benchmark test involving different spectroscopic methods, SECS identified the correct molecular structure as its top-ranked prediction in more than 80 per cent of cases,” says Jablonka, describing the system’s performance. It also proved capable of matching human experts in direct comparison. “In a pilot study, we asked chemists to solve 20 challenging NMR structure-elucidation problems,” Jablonka explains. The result: the AI achieved a level of performance comparable to that of the participating specialists. 

    “However, we do not see SECS as a replacement for human expertise,” Mirza emphasises. “The system can provide a highly useful second opinion.” If the proposed structures are plausible and receive high scores, this reinforces confidence in the interpretation. “If, on the other hand, the suggested structures differ substantially from the expected molecular structure, it may be worth taking a closer look,” Jablonka adds.

    An open tool for research

    The source code, model data and a test version of the application are publicly available. According to information provided during discussions with the researchers, the current web version is primarily designed for the direct evaluation of one-dimensional proton NMR raw data. Support for additional spectral types and more complex raw datasets is planned for future releases.

    Share.

    Comments are closed.