New UCSD search engine can sort through functional genomics data

Researchers can ID patterns of health and illness, says bioengineering prof Sheng Zhong.

University of California-San Diego bioengineers have created the first online search engine for web-based functional genomics data, enabling researchers to comb through massive amounts of data held in Internet repositories that might someday lead to medical breakthroughs. 

Called GeNemo, the new search system is designed to help uncover functions in specific parts of genomes that are associated with normal physiology as well as diseases of certain organs and tissues. By searching functional genomics data, which record the diverse activities of every piece of an organism's genome, researchers can identify patterns directly relevant to health and illness.

Unlike genomics, functional genomics focuses on the dynamic aspects of genomic information—as opposed to the static aspects—attempting to make use of the wealth of data produced by genome sequencing projects to describe gene/protein functions and interactions.

According to Sheng Zhong, associate professor of bioengineering at UC-San Diego, GeNemo searches user-input data against online functional genomic datasets, including the entire collection of  Encyclopedia Of DNA Elements (ENCODE) and mouseENCODE datasets. However, unlike text-based search engines, GeNemo’s searches are based on pattern matching of functional genomic regions, distinguishing it from text or DNA sequence searches. Instead of merely searching by text, the new tool enables researchers to search inside the functional data measured across the genome. 

“Genomics big data research converged to create a set of new data types. These are not genome sequences, not texts, but activities,” says Zhong. “It’s a new data format. If you think of functional genomic data files as video files, then the ‘text search’ is like searching by keywords in the title or the description of a video file. The ‘inside data search’ is like searching for a video clip by pattern matching within the video itself.”

New data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others, which are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. By clicking on a search result, the user can visually compare their data with the found datasets and navigate the identified genomic regions.

“This will enable physicians, clinicians, students and even the general public, to look at the functional aspects of the genome very quickly—in a matter of minutes,” contends Zhong, who says that research teams around the world are using the publicly available tool to conduct functional genomic data searches online.

So far, he says the response from researchers has been “way beyond our expectations.”

GeNemo is designed to help researchers keep up with the massive amounts of functional genomic data that is growing exponentially, which is creating challenges for scientists. ENCODE and mouseENCODE data are incorporated in GeNemo as epigenetic data sets for search and visualization to provide better insights from the vast amount of searchable data, which is estimated at more than 10 terabytes.

However, Zhong says that this represents only a small piece of the functional genomics data that is in existence. “There’s much more data out there, probably 10 times more, held by different research institutions and hospitals, including the National Institutes of Health,” he adds.

Zhong sees the potential for marrying functional genomics data with the electronic health records of patients. “That’s the way to go,” he concludes. “But, it has to be done diligently, carefully, with total respect for issues of ethics and privacy.”

Ultimately, Zhong believes that the explosion of mobile devices used for the tracking of personal health, lifestyle habits, socio-economic status and environmental exposures combined with personal genomics will yield far-reaching benefits.

More for you

Loading data for hdm_tax_topic #better-outcomes...