15. January 2015, 16:15
Ernst-Abbe-Platz 2, seminar room 3423
Automatic (bio-)chemical annotation of multistage mass spectrometry data
Dr. Lars Ridder
(Laboratory of Biochemistry, Wageningen University)
High-resolution multistage LC-MSn data contains detailed chemical information of (unknown) compounds observed in metabolite profiling studies. To support full exploitation of this data we have developed the MAGMa algorithm for automatic substructure-based annotation of multistage spectral trees. The algorithm yields a hierarchical tree of substructures of a candidate molecule to explain the fragment peaks observed at consecutive MS levels. The resulting candidate score indicates how well the observed hierarchical fragmentation pattern is explained and can be used to rank extensive lists of candidate molecules, e.g. retrieved from the PubChem database. The method is evaluated on the basis of a published benchmark dataset and we present recent results on the spectral data from the CASMI contest for small molecule identification. Furthermore, we present applications to LC-MSn metabolite profiling data from green tea as well as from human urine samples obtained after green tea consumption. More than 100 compounds found in the green tea sample data were systematically converted by in silico biotransformation rules defining possible modifications in the human gut and liver before their excretion in urine. This systematic virtual library of potential tea metabolites was used as candidate set for automatic annotation of the urine LC-MSn datasets. In addition to 74 compounds previously identified in the urine samples, 26 additional urinary metabolites originating from green tea consumption were putatively identified. 77% of the annotated metabolites were not present in the Pubchem database, indicating the importance of combining automatic structure annotation methods with in silico biotransformation to discover novel metabolites.