NLR-Finder: An Easy and Efficient Annotation Tool for the NLR Superfamily in Plant Genomes

Gene annotation is an essential process to identify gene structures and define biological functions. It is an important step for subsequent analyses including gene cloning and identification of genes for agricultural traits. However, current gene annotation misrepresents the whole gene repertoire due to biased gene model construction. Nucleotide-binding and leucine-rich repeat (NLR) superfamily is one of the poorly annotated gene families in plants. The NLR family tends to be clustered in genomes by segmental and tandem duplications, which makes the gene annotation challenging. The NLR-Finder was developed for unbiased genome-wide identification of the NLR superfamily in assembled plant genomes. The NLR-Finder firstly detects candidate NLR gene regions by extending 30 kb to both sides of all the identified NB-ARC domain regions. Secondly, evidence-based NLR genes are predicted by aligning published proteins and transcriptome sequences to the candidate gene regions. Thirdly, additional NLR genes are extracted using an ab initio prediction approach. Lastly, final NLR gene models are generated by integration of the evidence- and ab initio-based NLR genes. The re-annotation was performed using the NLR-Finder on 17 different plant genomes. On average, public annotation tools identified about 310 genes, whereas the NLR-Finder annotated about 497 genes. In Gossypium hirsutum and Vigna radiata, the number of re-annotated genes tripled compared to that of publicly available data. The re-annotated genes were successfully validated by comparing with high-quality annotations of Arabidopsis thaliana, Brachypodium distachyon, and Solanum lycopersicum. This study demonstrated that the NLR-Finder provides an easy-to-use and efficient method to annotate the NLR superfamily in plant genomes.
