Why InParanoid sucks
From a software engineer’s standpoint, InParanoid sucks. The software is widely used, with four publications (2001, 2005, 2008, 2010) to its credit that have, according to Google Scholar, together garnered 1766 citations. This recognition is not undeserved, for the software performs a useful purpose in computing orthologous protein groups—I would not be griping about it otherwise.
Despite this popularity, InParanoid’s technical quality will make any software engineer’s stomach churn. How do I hate thee? Let me count the ways.
- No download link is provided on the InParanoid web site. To obtain the program, one must fill out a form so that an automated system may e-mail you an attached tarball.
- Upon untarring InParanoid’s tarball, all files are placed under a
tmp/tmpBg6DE3/inparanoid_4.1/directory. The presence of the
tmp/tmpBg6DE3/prefix is unreasonably sloppy.
- In the installation instructions, you are told that the Perl XML libraries must be installed. No mention, however, is made of the precise package required, let alone the appropriate command to install it from CPAN.
- The InParanoid script doesn’t accept command-line parameters. Instead, you customize run-time options by directly editing its source file.
- The only way to run InParanoid is seemingly to copy your FASTA data files to
the application’s directory. Woe be unto you if you place the FASTA files in
proteinssubdirectory—when I did, I received the error
cannot create proteins/prjeb502.munged.fa-proteins/prjna205202.munged.fa: Directory nonexistent. InParanoid makes no attempt to divine the basename of your FASTA inputs so it may create its intermediate files in a reasonable fashion; instead, it simply conjoins the provided paths with a hyphen and attempts to create a file located at the path given by the resulting string, making it impossible for your FASTA files to reside in any directory but that from which you run InParanoid.
- The last two points imply, naturally, that for each InParanoid run, you should create a new directory, copy a fresh InParanoid installation into it, copy your data files into it, and finally perform the run. In this scenario, your output files will reside in this directory. Consequently, you must rely on the file timestamps to determine exactly what files correspond to InParanoid’s output, given that they will be intermixed with InParanoid’s installation files, the temporary files both it and BLAST created, and your input files.
- InParanoid is unreasonably picky about FASTA headers. Those included in files
from WormBase caused InParanoid to crash. Thus, I had to rewrite my FASTA
files to use headers such as
>prjeb506_prot_5, with the trailing integer incremented for each sequence.
- InParanoid relies on legacy BLAST, when most of the world has progressed to BLAST+. With BLAST+ released more than five years ago, this legacy requirement is unconscionable in so widely used a piece of software. Moreover, the most recent v4.1 InParanoid revision arrived fully a year after BLAST+’s release, in October 2009, giving the developers ample time to support the new BLAST revision.
I can marshal no more complaints at the moment. While I’m in favour of practicality over perfection, I nevertheless retain an aesthetic sensibility concerning software fashioned through a decade-and-a-half of staring at computer screens. Bioinformatics applications violate virtually every criterion of quality software, with InParanoid serving as an egregious example. Given its numerous glaring surface flaws, I have little faith in the veracity of its results.