Hi, I'm Jeff.
I'm a research scientist in computational biology at Deep Genomics in Toronto. Before that, I did my PhD work on building computational methods to reconstruct the evolutionary history of cancer in individual patients as part of Quaid Morris' lab at the University of Toronto and Memorial Sloan Kettering Cancer Center.
Cancer evolution studies often want to compare the intratumor heterogeneity of different cancers or different samples from the same cancer. This requires some means of quantifying heterogeneity. I show how two commonly used measures of heterogeneity are misleading and propose more robust alternatives.
Examining precision, recall, and specificity from a probabilistic perspective reveals interesting properties of these metrics, and can help explain why medicine prefers sensitivity-specificity to precision-recall when evaluating classifiers.
How should you select a graduate program? How do you get in once you’ve chosen one? Do you even want to go? I reflect on lessons I’ve learned through six years of being a grad student.
Fixing the present working directory in tmux sessions to refer to the logical rather than physical working directory
Computers are annoying. If your home directory includes a symlink, you’re using the tmux terminal multiplexer, and you want your shell’s present working directory to reflect the logical rather than physical working directory, you must invoke magic.
ZFS is an amazing filesystem that offers a number of benefits for bioinformatics work.
Minimum spanning trees and Kruskal’s algorithm are fun!
I developed a dynamic programming algorithm to select the best set of mutually compatible high-scoring pairs.
When I first developed Kablammo, I ignored strandedness. This was a bad idea.
The full wealth of InParanoid’s output is accessible only through its human-readable results. I wrote code to parse this format.
Much of the difficulty in running Newbler on Ubuntu arises because the application requires 32-bit libraries.
I developed a cool new web-based tool for visualizing BLAST results and exporting publication-ready visualizations.
How many times can we find consecutive occurrences of GATTACA in NCBI’s BLAST database?
You might want to stop Arch Linux from entering sleep mode when the lid is closed. This is easily accomplished.
HTSeq is a wonderfully useful Python library for analyzing high-throughput sequencing data. I implemented an algorithm to perform set subtraction on its GenomicIntervals.