University of California San Diego bioengineers have created what is arguably the first Google-like search engine for functional genomics data, according to recent work published in the journal Nucleic Acids Research. The new search engine called GeNemo is a combination of “Ge” from the word gene and Nemo from the movie “Finding Nemo.” According to the researchers, this is the first software to be released for executing functional genomic data searches online.
GeNemo Search Engine Overview
Led by Sheng Zhong bioengineering lab at UC San Diego, GeNemo addresses a pressing challenge: effectively searching functional genomic data from online data repositories. The functions of an organism’s genome, captured in functional genomic data, are directly relevant to health and disease. Functional genomics data record the diverse activities of every piece of an organism’s genome. The new search system may lead researchers to uncover the functional aspects in specific parts of genomes that are associated with normal physiology or disease of specific organs and tissues.
How It Works
GeNemo queries user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo’s searches are based on pattern matching of functional genomic regions.
Instead of just “searching by text,” the new tool allows researchers to search inside the functional data. Searching for binding patterns that are similar to that of a novel transcription factor is just one example.
“If you think of functional genomic data files as video files, then the ‘text search’ is like searching by keywords in the title or the description of a video file. The ‘inside data search’ is like searching for a video clip by pattern matching within the video itself,” explained Zhong in the paper.
“Functional genomic assays are producing massive amounts of data, in challenging data types. We have developed an online tool that empowers users to input any complete or partial functional genomic dataset, for example, a binding intensity file like bigWig, or a peak file,” explained UC San Diego bioengineering scientist Xiaoyi Cao, a joint first author on the paper. “GeNemo reports any genomic regions, ranging from 100 bases to 100,000 bases, from any of the online ENCODE datasets that share similar functional patterns such as binding, modification and accessibility.”
Functional genomic assay data opportunities
Leveraging DNA sequencing such as a high-throughput readout, functional genomic assays can interrogate genome-wide distributions of transcription factor binding (ChIP-seq), epigenetic modifications (ChIP-seq), regulatory regions (DNase-seq, FAIRE-seq) and other functional outcomes. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science.
The search engine offers different research teams a powerful tool to utilize functional genomic data.
Paper: “GeNemo: a search engine for web-based functional genomic data,” published in the journal Nucleic Acids Research, doi: 10.1093/nar/gkw299