Introduction
Non-canonical peptides (NCPs) are a class of peptides derived from previously thought non-coding regions, such as the introns, 5' untranslated regions (UTRs), 3' UTRs, as well as intergenic regions. They have caught significant attention as functionally important endogenous peptides in various organisms. NCPbook is a comprehensive database of NCPs evidenced by mass spectrometry, ribosome profiling (Ribo-seq) or molecular experiments. It incorporated data from diverse public databases and scientific literature. The current version of the NCPbook includes 180,676 NCPs across 29 different species. These NCPs are distributed across kingdoms, comprising 123,408, 56,999, and 269 NCPs sourced from 14 plants, 7 animals, and 8 microbial species, respectively. Furthermore, NCPbook encompasses 9,166 functionally characterized NCPs derived from prior investigations. These NCPs have been ascribed multifaceted roles, including antifungal, immunogenic, stress resistance, as well as growth and developmental functions. In addition, NCPbook generated user-friendly interface facilitating the users to search, browse, visualize, and retrieve the required data. The NCPbook with its comprehensiveness and user-friendly design will serve as an indispensable platform for researching NCPs across diverse plant, animal, and microbial species.
Calculation In progress...
Peptides, typically composed of 2~100 amino acid residues, represent small biological molecules with important roles in biology. According to the genomic locus, a novel class of peptides derived from introns, 3’UTRs, 5’UTRs, junctions, and intergenic regions, are defined as non-canonical peptides (NCPs). In addition, increasing pieces of evidence in both plants and animals have revealed that NCPs can be encoded by non-coding RNAs (ncRNAs), including lncRNAs, circRNAs, miRNAs, snRNAs (Wang et al. 2019; Wang et al. 2020).
As a novel class of peptides, NCPs have attracted significant attention as functionally important endogenous peptides in different species. The first case of NCPs was reported in 1996, in which a 10-aa peptide encoded by ENOD40 was identified (van de sande et al., 1996), and ENOD40 exhibited its vital role in regulating auxin response of flowering plants (Rohrig et al., 2002). Later, more discovered plant NCPs such as PLS (36 aa), BRK1 (84 aa), ROT4 (53 aa), KOD (25 aa), OSIP108, PSEPs (40, 41, 57, or 61 aa), miPEPs (miPEP171b, miPEP165a, vvi-miPEP171d1, miPEP858) play essential roles in plant growth, development, and stress responses (Lauressergues et al., 2015; Plaza et al., 2017; Khitun et al., 2019; Wang et al., 2020). In animals and humans, NCPs such as APPLE, AW112010, MLN, and MP31 are known to play important roles in a diverse range of cellular processes, such as calcium transport, embryogenesis, muscle performance, translation control, immune response, and stress resistance (Anderson et al., 2015; Jackson et al., 2018; Sun et al., 2021; Huang et al., 2021). Together, these studies have demonstrated that NCPs play vital roles in various biological processes.
The increasing importance of NCPs has led to emerging strategies for their discovery. The use of Ribo-seq and MS-based methods have shown that the number of NCPs is probably much more than previously suspected. In the previous study, our lab firstly identified a large-scale of NCPs in plants by MS-based peptidogenomic pipeline, including 1993 and 1860 NCPs in Zea mays and Arabidopsis thaliana, respectively (Wang et al., 2020). With increasing numbers of NCPs in recent years, several databases have been proposed to gather the NCPs encoded by ncRNAs or non-coding sORF (Huang et al., 2021; Chen et al., 2021; Luo et al., 2022). However, it still lacks a comprehensive database to integrate discovered NCPs. NCPbook is the first database that provides comprehensive information on NCPs in plants, animals, and microbes. In all, NCPbook will provide great convenience for retrieving NCPs in various species and enrich the knowledge of non-coding translation.
References
NCPbook provides a user-friendly web interface that contains five service modules: Browse, Search, BLAST, Jbrowse and Download. The homepage provides an introduction of NCPbook (Figure 1).
NCPbook includes 29 species such as “Homo sapiens”, “Mus musculus”, “Arabidopsis thaliana” and so on. First, choose the species group (Plant, Animal, and Microbe) in the Species panel that contains your subject species (Figure 2). The images and the species names of 14 plant species are listed in the Species panel of the Plant submenu under the Browse menu (Figure 3). The images and the species names of 7 animal species are listed in the Species panel of the Animal submenu (Figure 4). The images and the species names of 8 microbe species are listed under the Microbe submenu (Figure 5).
Then, click on the species picture, and all entries related to your selected species are returned (Figure 6). Subsequently, click any row of the table, and you can get detailed information of the NCPs, including NCP ID, sequences, length, function, references and other information (Figure 7).
In the Search page, users can find the interested NCPs easily through three types of search. 1) searching by NCP IDs; 2) searching by host genes; 3) searching by genomic locations. Users can input multiple NCP IDs, host gene IDs and genomic location of NCPs for any of the 29 species to obtain the details of all input NCPs. The detailed information of the selected NCP can be viewed in a table, which can be exported as a csv or excel file. The NCP sequences, NCP length, references other information can be displayed by clicking the row of the output Table (Figures 8-12).
On Blast webpage, users can assess sequence similarity of NCPs in multiple species. Users can enter fasta format sequence directly or load fasta files from disk. NCP sequences from 29 species were added to the blast database. Program blastp means from NCP to NCP, and blastx means from translated nucleotide to NCP. The results can be generated with default parameters or specified parameters. After clicking the “Blast!”button, BLAST alignment would be conducted and the results can be viewed in the Output panel (Figure 13).
Once the BLAST alignment is finished, you would be taken to the Output panel of the Blast menu, which displays the BLAST result in a data table (Figure 14). The whole BLAST results can be exported as a csv or excel file. Each row of the data table represents a BLAST hit. By clicking a row of this table, the detailed information of the selected BLAST hit would be displayed (Figure 14).
The JBrowse page currently provides genome browser service for eight species (Homo sapiens, Arabidopsis thaliana, Zea mays, Oryza sativa, Vitis vinifera, Drosophila melanogaster, Mus musculus, Escherichia coli). Users can get drop-down menu by clicking the “Jbrowse” in the toolbar and selecting the interested specie picture (Figure 13). Three tracks including gene models, NCP information and reference genome sequence were shown. Users can manually change tracks to be shown or hidden (Figure 14). The detailed information will be displayed by clicking the gene models or NCP ID.
Download links of NCPs information and sequences across 29 species are available in this section. Users can download the information and sequences by species (Figure 17).
Liuji Wu
College of Agronomy
Henan Agricultural University, Zhengzhou 450046, China
Wen Yao
College of Life Sciences
Henan Agricultural University, Zhengzhou 450046, China