

The chromosome names in the GENCODE gtf file did not match those in the genome sequence file, and were thus modified.īased on the sequence identifiers found in the FASTQ files, we reconstructed the sequencing study design used to collect the gene expression data in LinĮt al. Homo sapiens GRCh37 was downloaded from the Illumina iGenomes page: (Ĥ Release 14 transcript annotation file for human was downloaded from Homo sapiens genome build provided by ENSEMBLģ contains haplotypic regions that are not part of the primary assembly. the corresponding transcript annotation file was downloaded from Mus musculus GRCm38.68 was downloaded from

some of the files were only available from early January 2015).įor our analysis, we used the same genome build and gene annotation files as in Lin Supplementary Table 1) from the ENCODE project Based on this information we obtained sequence files in FASTQ format ( 2 the names of the sequence files used in their comparative analysis. In December 2014 we asked and were kindly provided by the authors of LinĮt al.

RNA-Seq data, genome and gene annotation files We argue that a flaw in their study design raises doubt regarding their conclusions. Here, we present a reanalysis of the mouse ENCODE Consortium comparative RNA sequencing data.

If gene regulation in any mouse tissue is markedly more representative of a general mouse regulatory network than the regulatory network of a corresponding human tissue, this would call into question the utility of the mouse, and perhaps any other non-human animal, as a useful model system for biomedical research. From a more practical perspective, the mouse is arguably the most important animal model for biomedical research. To a large degree, modern biology is built upon the empirical observation that homologous gene regulatory networks establish the identities of homologous cell-types, tissues, and organs across species – the results of LinĮt al., if true, challenge these observations and the biological basis of homology. The implications of the observation that human and mouse gene expression data may be clustering by species more than by tissues can be profound. proposed that previous studies might have been biased in their focus on a few ‘specialized’ tissues that tend to express the largest number of ‘tissue-specific genes’, while the overall pattern supports less tissue specificity. Indeed, previous comparative studies reported that gene expression data from human and mouse (and across other species more generally) tend to cluster by tissues, not by species. 2, which also acknowledged that this observation is somewhat unexpected. This pattern was confirmed and discussed in greater detail in a companion paper by LinĮt al. Their comparative analysis revealed that gene expression patterns tend to support clustering of the data by species, rather than by tissue (Figure 2a in reference To study gene expression levels, the Consortium collected RNA sequencing data from multiple tissues from human and mouse. The mouse ENCODE Consortium has collected multiple types of genomic and functional data in order to better understand the potential utility of the mouse as a model system for biomedical research.
