Frederick

Tan

Developmental Biology & Human Health Genomics

Frederick Tan holds a unique position at Embryology in this era of high-throughput sequencing where determining DNA and RNA sequences has become one of the most powerful technologies in biology. DNA provides the basic code shared by all our cells to program our development. While there are about 30,000 human genes, 98% of DNA sequences are comprised of repetitive and regulatory sequences within and between genes. Measuring the specific set of DNA sequences that are transcribed into RNA helps reveal what and how our tissues are doing by showing which genes are active.

Modern sequencing platforms, such as the Illumina HiSeq 2000, generate only short, ordered sequences, usually 100 consecutive bases—adenine, guanine, cytosine, and thymine—in each reaction. But by doing this on billions of molecules in parallel, these sequencers generate between 100,000 and 1 million times more data than previous methods. That’s where Frederick Tan fits in. Departmental scientists cope with this data avalanche by combining their biological insights with complex computer algorithms and statistical methods. Tan manages the genomics and bioinformatics facilities and guides others on their use.

Tan shares his knowledge by leading workshops, group-study sessions, and teaching courses to familiarize Carnegie and Johns Hopkins researchers with bioinformatics approaches. He started a study group called Data Wranglers Anonymous where people learn to handle large numbers of data files, conduct exploratory data analyses, and write programs to extract useful information. Tan also created a series called Nitty Gritty Workflows to discuss sequence analysis pipelines. Speakers describe software programs they used and why they chose specific parameters—important for troubleshooting and establishing best practices. He helped Embryology host the fourth annual Practical Genomics Workshop and has ambitious plans for future training.