As efforts are underway to improve diagnostic tools for cancer, microRNAs are at the forefront biomedical research.
Cancer is one of the most devastating diseases in the world. In 2023, more than 1.9 million new cancer cases and 609,820 deaths are projected to occur in the United States alone. As efforts are underway to improve diagnostic tools, microRNAs are at the forefront biomedical research.
MicroRNAs, or miRNAs, are a class of small non-coding ribonucleic acids (RNAs), which are essential for all biological functions. The main role of miRNA in the human body is gene regulation. As such, they regulate a variety of biological and pathological processes, including the formation and development of cancer. In fact, many cancers are closely associated with miRNA functionality.
The association of miRNAs with cancer development has spurred interest in investigating miRNA expression profiling data as a potentially less invasive diagnostic tool for early detection. Machine learning methodologies have been used to develop high performance pan-cancer classification models and to identify potentially novel miRNA biomarkers for clinical investigation. However, understanding how these data science techniques correlate to established biological processes to advance integration into clinical environments is key.
To further explore the feasibility of miRNAs as biomarkers for cancer classification and improving clinical classification applications, researchers from Florida Atlantic University’s College of Engineering and Computer Science created a multiclass cancer diagnostic model using miRNA expression profiles. Their methodology utilized an iterative process that applied several key techniques to a continually increasing dataset of miRNA expression quantification data.
For the study, researchers assessed how the top miRNA features selected by machine learning models relate to clinically and biologically verified miRNA biomarkers. They developed Support Vector Machine and Random Forest machine learning models for cancer classification, and iteratively added cancer classes to the multiclass models. They looked at the relationship between the relevant miRNAs identified through feature selection and the performance metrics of the classification models across 20 iterations. Each iteration added another primary sample site to the multi-class models, increasing the number of cancer types involved.
Researchers examined the change in success metrics as more cancer types were introduced to the subset, how the 20-miRNA signature changed as more cancer types were introduced to the subset, and the characteristics of the full dataset via principal component analysis, a popular technique for analyzing large datasets containing a high number of dimensions or features.
Unlike previous studies, which have only focused on miRNA feature signatures for a final multiclass dataset, this study tracked changes in clinical and biological relevance after each addition of a cancerous tissue type.
Results of the study, published in the Institute of Electrical and Electronics Engineers’ ACCESS journal, found on IEEE Xplore , indicate that models with a greater number of cancer classes shift toward focusing on cancer-diverse miRNAs of greater relevance with characterized functionality. The study suggests that miRNAs may be highly unique to specific cancerous tissues and can be strong biomarkers for detection and classification; however, current verified biomarkers fall toward more cancer-wide miRNAs when detecting cancer.
The study provides insights into potential relationships between the overall clinical relevance of the feature extraction signature and the success metrics of the models and demonstrates the feasibility of using a multi-tissue miRNA cancer signature as a generalizable signature for single class cancer detection in a number of prominent cancers.
Findings showed that as the number of cancer classes increased, the performance metrics decreased, yet the percentage relevance of the miRNA feature selection signature slightly increased before stabilizing. In addition, after conducting principal component analysis, the non-cancer tissues from all samples had very similar expression visualizations, while all cancerous tissues had unique profiles.
“MicroRNAs have significant promise for future diagnostic tests because they can be detected directly from biological fluids such as blood, urine or saliva as well as the availability of high-quality measurement techniques for miRNAs,” said Oneeb Rehman, corresponding author and a Ph.D. candidate in the Department of Electrical Engineering and Computer Science within FAU’s College of Engineering and Computer Science. “This makes understanding and characterizing the biological basis behind potential miRNA classification tools crucial for integration into clinical environments.”
Under Rehman’s supervision, a team of senior design undergraduate students and co-authors Charles Briandi and Eyan Eubanks, led by Matthew Acs and Richard Acs, from the Department of Electrical Engineering and Computer Science, participated in the study. Hanqi Zhuang, Ph.D., co-author and chair and professor of the Department of Electrical Engineering and Computer Science, served as the team’s mentor.
“This study, which explored the relationship between the composition of microRNAs and various types of cancers, has important implications for the potential use of miRNAs as biomarkers in both research and the clinical field,” said Stella Batalama, Ph.D., dean, FAU College of Engineering and Computer Science. “What is especially impressive about this research is that it involved a number of our undergraduate students who collaborated to investigate a better way to manage a disease that impacts millions of people around the world each year.”
The research utilized data from the Genomic Data Commons Data Portal, which was sponsored by the National Cancer Institute.
Cancers originating from different primary sample sites have specific patterns of miRNA expression, as revealed by principal component analysis, a popular technique for analyzing large datasets. These patterns allow for the highly accurate classification of cancer types by machine learning models.