Motivation: Glycan chains are synthesized by a combination of several kinds of glycosyltransferases (GTs). Thus, once we know the repertoire of GTs in the genome, in the transcriptome or in the proteome, it should in principle be possible to predict the repertoire of possible glycan structures in an organism or at a specific stage of the cell. Here, we show that a repertoire of glycan structures can be predicted from the set of GTs in the transcriptome. That is, using knowledge about glycan structure characteristics, we can predict glycan structures from incomplete or noisy data such as DNA microarray data.
Results: First, we constructed a reaction pattern library consisting of bond-formation patterns of GT reactions and investigated the co-occurrence frequencies of all reaction patterns in the glycan database. This was followed by the prediction of glycan structures using this library and a co-occurrence score. A penalty score was also implemented in the prediction method. Then we examined the performance of prediction by the leave-one-out cross validation method using individual reaction pattern profiles in the KEGG GLYCAN database as virtual expression profiles. The accuracy of prediction was 81%. Finally, we applied the prediction method to real expression data. Using expression profiles from the human carcinoma cell, glycan structures with sialic acid and sialyl Lewis X epitope were predicted, which corresponded well with experimental results.