2025 AOCS Annual Meeting & Expo.
Analytical
Hyukjin Kwon
Graduate student
Kansas State University, United States
Yonghui Li, PhD
Associate Professor
Kansas State University
Manhattan, Kansas, United States
Osborne fractionation is a widely used classification system for plant proteins, yet molecular-level distinctions between proteins in each class remain unclear. This study characterizes 1,034 seed storage proteins across 175 species by their Osborne class, utilizing protein sequence and predicted monomeric structures from AlphaFold3. Albumins and prolamins exhibited distinctive features, with albumins showing a high cysteine content (5.95 ± 1.81%) and prolamins a pronounced hydrophobicity index (0.35 ± 0.27). In contrast, globulins and glutelins showed minimal variation, suggesting similar biochemical profiles.
Three machine learning algorithms (SVM, KNN, and RF) were used to classify the four protein classes, with SVM achieving 95.4% accuracy. However, its confusion matrix revealed low accuracy (89.4%) between globulin and glutelin. To address this, two binary graph convolutional network (GCN) classifiers were trained using protein contact maps, achieving 100.0% and 96.1% accuracy for albumin/prolamin and globulin/glutelin classifications, respectively. Saliency map interpretation of the GCN models revealed that certain surface-exposed residues (A, V, P, and C) were keys to differentiate albumins from prolamins. In the case of globulin and glutelin, it was observed that they have exposed local structures that have highlighted differences in E, Q, S, and G. This was generalizable with the entire dataset, and the regions with the emphasized differences were localized primarily within loop regions of the cupin domain, outside the beta-barrel core.
All-atomic and coarse-grained molecular dynamics simulations were performed to compare protein behavior in different Osborne solvent systems. These simulations highlighted how solvent environments, including salt concentrations and hydrophobicity, influence protein dynamics and aggregation.
Overall, this work leverages the largest dataset of seed storage proteins to date, provides novel insights into the global and local features that differentiate the Osborne classes, and explores the dynamics of plant proteins under different solvents, advancing both the fundamental understanding and practical application of plant proteins.