TEsingle
TEsingle is a software package for including both genes and transposable elements in single-cell (or single-nucleus) RNA-seq analysis. TEsingle handles mapping, cell barcode & UMI processing, and abundance estimation. We’ve paid particular attention to differentiating unprocessed intron sequences from expressed TEs. Output is formatted as a cell count matrix suitable for downstream analysis in your favorite single-cell clustering & plotting package (like Seurat).
If you encounter any issues or have any questions about TEsingle, please check out our GitHub page.
Download instructions
You can download the software from PyPi and Github. The transposable element GTF files required for TEsingle are available at this location, or (for human and mouse) accessible from Zenodo.
Tool description
TEsingle focuses on accurate assignment of aligned reads to the most likely gene or TE of origin. The basic design follows our popular software package for bulk gene expression analysis, TEtranscripts, involving an expectation-maximization algorithm for resolving any ambiguous alignments of TE-derived RNA-seq reads. We added several steps to properly handle Unique Molecular Identifiers (UMI) that label individual captured transcripts and enable resolution of PCR duplicates in snRNA-seq data. The output format conforms to Matrix Market Exchange (MEX) formats, to enable easy integration into downstream single-cell analysis packages.
GTF files for gene annotation can be obtained from UCSC RefSeq, Ensembl, iGenomes or other annotation databases. GTF files for TE annotations are customized versions of the annotation from UCSC RepeatMasker or other TE databases. They contain two custom attributes, class_id and family_id, corresponding to the class (e.g. LINE) and family (e.g. L1) of the corresponding transposable element. A unique ID (e.g. L1Md_Gf_dup1) is also assigned for each TE annotation in the transcript_id attribute. Pre-generated TE GTF files are available for a number of organisms, and can be downloaded here. If the organism or genome build of your interest is not available, please contact us and provide a curated annotation of the transposable elements (e.g. genomic location and TE name/type). We will do our best to help you generate a suitable TE GTF file.
Citation
Available soon.