Online resource gives researchers easy access to gene data
A National Institutes of Health-developed online platform is making information about genes more easily accessible for biomedical researchers.
The newly available information is on public gene expression data, which describe the degree to which genes are turned on or off under certain conditions. Until recently, this wealth of information has remained largely untapped for data reuse because many researchers lack the expertise needed for data retrieval, processing and analysis.
NIH’s National Institute of Allergy and Infectious Diseases (NIAID) developed the platform that takes a crowdsourcing approach that makes public databases containing millions of gene expression profiles available to scientists, who can reuse the data to generate and address new research questions.
“There are problems facing researchers who want to reuse the data in an effective way,” says John Tsang, chief of the Systems Genomics and Bioinformatics Unit in NIAID’s Laboratory of Systems Biology. “One is it requires a lot of computational expertise.”
The website provides instructional videos and a step-by-step tutorial to help users navigate the free platform, called OMics Compendia Commons (OMiCC). By using crowdsourcing techniques to harness the expertise of the research community, users of OMiCC can create groups of gene expression data and annotate them by assigning parameters, such as sample type and disease, using a standardized vocabulary. The platform then saves these user-created groups and associated annotations, making them available to others for reuse.
Another challenge OMiCC overcomes is that structured, meta-information critical for data reuse and cross-study analyses is not readily available—that’s because public database entries typically contain raw study data, which need to be structured for analysis. However, the platform enables the broader biomedical research community to generate and test hypotheses through reuse and analysis of existing data sets.
According to Tsang, within OMiCC, users can pool these groups of data and analyze information from multiple studies to search for biological relationships, a statistical approach known as meta-analysis. He believes that as the platform’s user community grows, it will develop into a rich resource that can transform the increasing amounts of public data into novel biological insights.
“Knowing the gene expression program of a particular cell, organ or tissue, you can use that data to tell whether someone with a tumor—of one type versus another—may survive longer,” concludes Tsang. “There’s a lot of data out there. The big bottleneck is actually transforming that data into insights.”