NCPbook: a comprehensive database of non-canonical peptides

Introduction
Non-canonical peptides (NCPs) are a class of peptides derived from previously thought non-coding regions, such as the introns, 5' untranslated regions (UTRs), 3' UTRs, as well as intergenic regions. They have caught significant attention as functionally important endogenous peptides in various organisms. NCPbook is a comprehensive database of NCPs evidenced by mass spectrometry, ribosome profiling (Ribo-seq) or molecular experiments. It incorporated data from diverse public databases and scientific literature. The current version of the NCPbook includes 180,676 NCPs across 29 different species. These NCPs are distributed across kingdoms, comprising 123,408, 56,999, and 269 NCPs sourced from 14 plants, 7 animals, and 8 microbial species, respectively. Furthermore, NCPbook encompasses 9,166 functionally characterized NCPs derived from prior investigations. These NCPs have been ascribed multifaceted roles, including antifungal, immunogenic, stress resistance, as well as growth and developmental functions. In addition, NCPbook generated user-friendly interface facilitating the users to search, browse, visualize, and retrieve the required data. The NCPbook with its comprehensiveness and user-friendly design will serve as an indispensable platform for researching NCPs across diverse plant, animal, and microbial species.

Browse NCPs identified in 14 Plant genomes (Click on an image to view the NCPs identified in a genome)
Loading...

List of all the collected NCPs (Click on a row to check the details of the selected NCP)

Loading...







Browse NCPs identified in 7 Animal genomes (Click on an image to view the NCPs identified in a genome)












Loading...

List of all the collected NCPs (Click on a row to check the details of the selected NCP)

Loading...







Browse NCPs identified in 8 Microbe genomes (Click on an image to view the NCPs identified in a genome)












Loading...

List of all the collected NCPs (Click on a row to check the details of the selected NCP)

Loading...


























Search NCPs in one or multiple genomes by sequence similarity using BLAST


Calculation In progress...












JBrowse of NCPs in 8 genomes


Statistics of NCPs


Source of NCPs collected for different species



Different evidences supporting NCPs identified in different species




Why NCP and NCPbook?

Non-canonical peptides (NCPs)

Peptides, typically composed of 2~100 amino acid residues, represent small biological molecules with important roles in biology. According to the genomic locus, a novel class of peptides derived from introns, 3’UTRs, 5’UTRs, junctions, and intergenic regions, are defined as non-canonical peptides (NCPs). In addition, increasing pieces of evidence in both plants and animals have revealed that NCPs can be encoded by non-coding RNAs (ncRNAs), including lncRNAs, circRNAs, miRNAs, snRNAs (Wang et al. 2019; Wang et al. 2020).

NCPs play important biological roles

As a novel class of peptides, NCPs have attracted significant attention as functionally important endogenous peptides in different species. The first case of NCPs was reported in 1996, in which a 10-aa peptide encoded by ENOD40 was identified (van de sande et al., 1996), and ENOD40 exhibited its vital role in regulating auxin response of flowering plants (Rohrig et al., 2002). Later, more discovered plant NCPs such as PLS (36 aa), BRK1 (84 aa), ROT4 (53 aa), KOD (25 aa), OSIP108, PSEPs (40, 41, 57, or 61 aa), miPEPs (miPEP171b, miPEP165a, vvi-miPEP171d1, miPEP858) play essential roles in plant growth, development, and stress responses (Lauressergues et al., 2015; Plaza et al., 2017; Khitun et al., 2019; Wang et al., 2020). In animals and humans, NCPs such as APPLE, AW112010, MLN, and MP31 are known to play important roles in a diverse range of cellular processes, such as calcium transport, embryogenesis, muscle performance, translation control, immune response, and stress resistance (Anderson et al., 2015; Jackson et al., 2018; Sun et al., 2021; Huang et al., 2021). Together, these studies have demonstrated that NCPs play vital roles in various biological processes.

A comprehensive NCPs database is in urgent need

The increasing importance of NCPs has led to emerging strategies for their discovery. The use of Ribo-seq and MS-based methods have shown that the number of NCPs is probably much more than previously suspected. In the previous study, our lab firstly identified a large-scale of NCPs in plants by MS-based peptidogenomic pipeline, including 1993 and 1860 NCPs in Zea mays and Arabidopsis thaliana, respectively (Wang et al., 2020). With increasing numbers of NCPs in recent years, several databases have been proposed to gather the NCPs encoded by ncRNAs or non-coding sORF (Huang et al., 2021; Chen et al., 2021; Luo et al., 2022). However, it still lacks a comprehensive database to integrate discovered NCPs. NCPbook is the first database that provides comprehensive information on NCPs in plants, animals, and microbes. In all, NCPbook will provide great convenience for retrieving NCPs in various species and enrich the knowledge of non-coding translation.

References

  • Anderson, et al. A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell. 2015; 160(4): 595-606.
  • Chen, et al. PsORF: a database of small ORFs in plants. Plant Biotechnol J. 2021; 18(11): 2158-2160.
  • Huang, et al. An Upstream Open Reading Frame in Phosphatase and Tensin Homolog Encodes a Circuit Breaker of Lactate Metabolism. Cell Metab. 2021; 33(2): 454.
  • Huang, et al. cncRNAdb: a manually curated resource of experimentally supported RNAs with both protein-coding and noncoding function. Nucleic Acids Res. 2021; 49(D1): D65-D70.
  • Jackson, et al. The translation of non-canonical open reading frames controls mucosal immunity. Nature. 2018; 564(7736): 434-438.
  • Khitun et al. Small open reading frames and cellular stress responses. Mol Omics. 2019; 15(2): 108-116.
  • Lauressergues et al. Primary transcripts of microRNAs encode regulatory peptides. Nature. 2015; 520(7545): 90-3.
  • Luo et al. SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients. Nucleic Acids Res. 2022; 50(D1): D1373-D1381.
  • Plaza et al. In Search of Lost Small Peptides. Annu Rev Cell Dev Biol. 2017; 33: 391-416.
  • Rohrig et al. Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc Natl Acad Sci USA. 2002; 99(4): 1915-20.
  • Sun et al. The oncomicropeptide APPLE promotes hematopoietic malignancy by enhancing translation initiation. Mol Cell. 2021; 81(21): 4493-4508.e9.
  • van de Sande et al. Modification of phytohormone response by a peptide encoded by ENOD40 of legumes and a nonlegume. Science. 1996; 273(5273): 370-3.
  • Wang et al. Large-Scale Discovery of Non-conventional Peptides in Maize and Arabidopsis through an Integrated Peptidogenomic Pipeline. Mol Plant. 2020; 13(7): 1078-1093.
  • Wang et al. Peptides encoded by noncoding genes: challenges and perspectives. Signal Transduct Target Ther. 2019; 4: 57.
Tutorial of NCPbook

NCPbook provides a user-friendly web interface that contains five service modules: Browse, Search, BLAST, Jbrowse and Download. The homepage provides an introduction of NCPbook (Figure 1).



Figure 1. The homepage of NCPbook.

Browse NCPs collected for 29 species

NCPbook includes 29 species such as “Homo sapiens”, “Mus musculus”, “Arabidopsis thaliana” and so on. First, choose the species group (Plant, Animal, and Microbe) in the Species panel that contains your subject species (Figure 2). The images and the species names of 14 plant species are listed in the Species panel of the Plant submenu under the Browse menu (Figure 3). The images and the species names of 7 animal species are listed in the Species panel of the Animal submenu (Figure 4). The images and the species names of 8 microbe species are listed under the Microbe submenu (Figure 5).



Figure 2. Species groups, including Plant, Animal, and Microbe in the Species panel.


Figure 3. Names and images of 14 plant species listed in the Species panel of the Plant submenu under the Browse menu.


Figure 4. Names and images of 7 animal species listed in the Species panel of the Animal submenu under the Browse menu.


Figure 5. Names and images of 8 microbe species listed in the Species panel of the Microbe submenu under the Browse menu.

Then, click on the species picture, and all entries related to your selected species are returned (Figure 6). Subsequently, click any row of the table, and you can get detailed information of the NCPs, including NCP ID, sequences, length, function, references and other information (Figure 7).



Figure 6. List of all the NCPs identified for a selected specie.


Figure 7. Detailed information of a selected NCP.

Search NCPs by NCP ID, host gene ID or genomic location

In the Search page, users can find the interested NCPs easily through three types of search. 1) searching by NCP IDs; 2) searching by host genes; 3) searching by genomic locations. Users can input multiple NCP IDs, host gene IDs and genomic location of NCPs for any of the 29 species to obtain the details of all input NCPs. The detailed information of the selected NCP can be viewed in a table, which can be exported as a csv or excel file. The NCP sequences, NCP length, references other information can be displayed by clicking the row of the output Table (Figures 8-12).


Figure 8. Search the input NCP information by multiple NCP IDs with Exact match.


Figure 9. Search the input NCP information by multiple NCP IDs without Exact match.


Figure 10. Search the input NCP information by multiple host gene IDs with Exact match.


Figure 11. Search the input NCP information by multiple host gene IDs without Exact match.


Figure 12. Search the input NCP information by genome location.

Search NCPs by sequence similarity using BLAST

On Blast webpage, users can assess sequence similarity of NCPs in multiple species. Users can enter fasta format sequence directly or load fasta files from disk. NCP sequences from 29 species were added to the blast database. Program blastp means from NCP to NCP, and blastx means from translated nucleotide to NCP. The results can be generated with default parameters or specified parameters. After clicking the “Blast!”button, BLAST alignment would be conducted and the results can be viewed in the Output panel (Figure 13).



Figure 13. Steps to perform BLAST.

Once the BLAST alignment is finished, you would be taken to the Output panel of the Blast menu, which displays the BLAST result in a data table (Figure 14). The whole BLAST results can be exported as a csv or excel file. Each row of the data table represents a BLAST hit. By clicking a row of this table, the detailed information of the selected BLAST hit would be displayed (Figure 14).



Figure 14. BLAST result viewed in the output panel.

Visualize NCPs in JBrowse

The JBrowse page currently provides genome browser service for eight species (Homo sapiens, Arabidopsis thaliana, Zea mays, Oryza sativa, Vitis vinifera, Drosophila melanogaster, Mus musculus, Escherichia coli). Users can get drop-down menu by clicking the “Jbrowse” in the toolbar and selecting the interested specie picture (Figure 13). Three tracks including gene models, NCP information and reference genome sequence were shown. Users can manually change tracks to be shown or hidden (Figure 14). The detailed information will be displayed by clicking the gene models or NCP ID.



Figure 15. The JBrowse menu of NCPbook.


Figure 16. JBrowse of *Homo sapiens*.

Download information and sequences of NCPs

Download links of NCPs information and sequences across 29 species are available in this section. Users can download the information and sequences by species (Figure 17).



Figure 17. Download links of NCPs information and sequences across 29 species.


Liuji Wu

wuliuji@henau.edu.cn

College of Agronomy

Henan Agricultural University, Zhengzhou 450046, China


Wen Yao

yaowen@henau.edu.cn

College of Life Sciences

Henan Agricultural University, Zhengzhou 450046, China