Read distribution analysis of miRNA-Seq in lung tissues and plasma from patients with LUAD
To investigate the tsRNA profiles in depth, we assessed at the read distribution of various types of small RNAs in plasma and tissues, including microRNA (miRNA), Ro-associated Y RNA (yRNA), tRNA, ribosomal RNA (rRNA), and non-coding RNA (ncRNA). There was a significant variation in the distribution of small RNA components between normal people and LUAD patients, as illustrated in Fig. 1A, B (10 tissue samples were randomly selected), whether in plasma or tissue. The reads proportion of all tissues samples is provided in Additional file 8: Figure S2. In contrast to tissues (whether normal or LUAD tissues), where miRNA is always a substantial component, the amount of miRNA in plasma differed significantly between normal and LUAD patients' samples. In addition, plasma LUAD samples had a higher percentage of remaining reads than normal plasma, but normal and LUAD tissues had about the same percentage of remaining reads.
Furthermore, the amount of tRNA in plasma and tissues differed between normal people and LUAD patients. As shown in Fig. 1A, B, tRNA content varied greatly: a small number of reads may be mapped to mature tRNA in normal plasma samples, whereas the proportion of tRNA in LUAD patients' plasma is substantially greater than in normal samples. As we saw with plasma, the proportion of tRNA in normal tissues was substantially lower than that in tumor tissues. Normal plasma, on the other hand, had substantially lower levels of tRNA than normal tissues.
The expressed number of various tsRNA sub-types was shown in Fig. 1C, F. In LUAD patients, both plasma and tissue tsRNA had a higher expression range when compared to normal, which was consistent with the read distribution results. The same pattern of expression range among tsRNA sub-types was found in plasma and tissue samples. i-tRF had the greatest expressed numbers, followed by 3'tRF, 5'tRF, 5'-half, and 3'-half. Figure 1D, E, G, H showed the percentages of sub-types in greater detail. However, no differences in tsRNA sub-type distribution were found between normal people and LUAD patients, whether in plasma or tissues, or between plasma and tissues.
These findings demonstrated that the distribution of small RNAs, including tRNAs, differed significantly between normal and LUAD, plasma and tissues, but not tRNA sub-type.
Identification of differentially expressed tsRNAs in tissues and plasma from normal and LUAD patients
The expression levels of tsRNAs in normal plasma/tissues and LUAD plasma/tissues were determined using the pipeline we built. Only genes with an expression level larger than 10 counts in the total number of samples were maintained. Finally, 2798 and 13,226 tsRNA were found in plasma and tissue, respectively. Differentially expressed tsRNAs (DEtsRNAs) in LUAD plasma and tissues were obtained using the criteria of fold change > 2 and P value < 0.05. The differential expression analysis result had been shown using a volcano graphic (Fig. 2A in plasma, Fig. 2B in tissue). We discovered 523 DEtsRNAs in plasma, 391 of which were up-regulated and 132 of which were down-regulated. There were 2292 DEtsRNAs identified in tissues, with 1477 upregulated and 815 downregulated. These DEtsRNAs were listed in Additional file 3: Table S3.
These findings implied that LUAD patients' plasma and tissues contain a substantial number of differently expressed tsRNAs.
Construction of the tsRNA-mRNA regulatory network
The UpSet diagram of up and down regulated tsRNA in plasma and tissue is shown in Fig. 2C. Only 155 consistent differential expression tsRNA (Co-DEtsRNAs) were detected by intersection analysis, 135 of which were up-regulated and 20 of which were down-regulated, and these were kept for further analysis. In Additional file 4: Table S4, the sequence, fold change, and sub-type of Co-DEtsRNAs in plasma and tissue were listed. To get reliable tsRNA targets mRNA, tRFTar was used in which the target genes were verified by Argonaute CLIP-Seq datasets and CLASH-Seq datasets. In total, 2220 mRNA targets were identified. We further analyzed the differentially expressed mRNAs in LUAD from TCGA and 4,238 DEmRNAs were identified (P < 0.05 and | log twofold change|> 1) (Fig. 2D). Subsequently, we determined 406 Co-DEmRNAs by intersecting the target mRNAs and DEmRNAs.
We used 155 Co-DEtsRNAs and 406 Co-DEmRNAs to extract sub-network of LUAD tsRNA-mRNA network in tRFTar database. 77 tsRNAs have no target mRNA. Finally, a regulatory network comprised of 78 Co-DEtsRNAs, 406 Co-DEmRANs, and 1305 targeted connections was created. In Fig. 3A, the network is depicted. Nodes were colored differently to distinguish different forms of tsRNA and mRNA, and edges were colored differently to designate different sorts of action modes (binding sites in the 3′UTR [untranslated region], 5′UTR, or CDS [Coding DNA Sequence]). Larger node represented more neighbors. We found three hub tsRNAs in our network using the degree > 100 as a hub gene in our analysis. The most closely related tsRNA to target mRNAs was tRF-16-L85J3KE (degree = 269), followed by tsRNA tRF-21-RK9P4P9L0 (degree = 129), and tsRNA tRF-16-PSQP4PE (degree = 129). The majority of tsRNA targets were 6–10 mRNAs (Fig. 3B). Most mRNAs, on the other hand, only had one tsRNA target (Fig. 3C). tsRNAs, obviously, had more targets than mRNAs.
These results demonstrated that constructing tsRNA-mRNA regulatory networks can benefit in the discovery of essential tsRNAs in LUAD.
Enrichment analysis of targeted mRNAs
To investigate the probable roles of DEtsRNAs, we used GO, KEGG, Reactome, and Wikipathway enrichment analyses on these 406 Co-DEmRNAs. Figure 4A showed the top ten results from the GO biological processes. The majority of the findings were connected to the occurrence and progression of malignancies, with the majority of the findings focusing on cell adhesion. Cancer metastasis is strongly linked to cell adhesion. This might indicate a solitary metastasis that can be detected by changes in tsRNA expression. Furthermore, the KEGG pathway, Reactome, and Wikipathway enrichment analyses revealed that the cell cycle was implicated. Several cancer-related pathways, such as the p53 signaling pathway, the TGF-beta signaling pathway, and the VEGFA-VEGFR2 signaling pathway, were also included (Fig. 4B–D). Following a PPI network analysis, 349 nodes from the 406 Co-DEmRNAs were shown to be highly associated (1672 edges) (Fig. 4E). It can be classified into 14 clusters using the MCODE algorithm (Fig. 4F). Some of the clusters participated in cancer. For instance, MCODE1, which is involved in the TGF-beta signaling pathway, and MCODE3, which is involved in cell cycle and nucleotide metabolism (Additional file 5: Table S5).
These results indicated that functional enrichment analysis of tsRNA target genes can help elucidate the mechanism of tsRNAs in LUAD and identify strategies for future research.
Biomarker identification of tsRNAs in LUAD using machine learning
We employed a fourfold cross-validation strategy to uncover candidate tsRNAs that can predict LUAD and validate our model. We developed a tsRNA expression level basis model for LUAD prediction using the SVM approach. The findings revealed that tRF-16-L85J3KE had a high likelihood of correctly classifying normal and LUAD plasma samples (Fig. 5A). Its AUC score reached 0.99, which was significantly higher than that of tRF-21-RK9P4P9L0 (0.81) and tRF-16-PSQP4PE (0.56). However, all single tsRNA in tissue can't distinguish between normal and LUAD samples well (Fig. 5B). Nonetheless, the model accuracy was significantly better when all three tsRNAs were coupled than when only one was used (Fig. 5C, D). The AUC value in the tissue increased by 0.19 to 0.92.
These findings indicated that tsRNA can be utilized as a diagnostic biomarker for LUAD, but that a single tsRNA was less efficient than a combination of multiple tsRNAs.
Function analysis of tRF-21-RK9P4P9L0
We subsequently investigated the expression characteristics of the three hub tsRNAs. i-tRF tRF-16-L85J3KE was down-regulated (log2 fold change = − 4.79, P value = 0.0063) in plasma, whereas 5′-tRF tRF-21-RK9P4P9L0 (log2 fold change = 2.94, P value = 0.0027) and 5′-tRF tRF-16-PSQP4PE (log2 fold change = 4.14, P value = 0.0075) were up-regulated (Fig. 6A). The differential expression pattern was the same in tissue, although the fold change values were smaller (Fig. 6B). Using the samples we gathered, we then validated its expression level in 37 paired LUAD tissues using RT-qPCR. As shown in Fig. 6C–E, the expression level of tRF-16-L85J3KE was decreased in tumor tissues while tRF-21-RK9P4P9L0 and tRF-16-PSQP4PE was highly elevated in tumor tissues compared to normal tissues, which was consistent with results from public data.
Only tRF-21-RKP4P9L0 was discovered to have a significant link with prognosis in LUAD (Fig. 6F, data used for analyzing the prognosis of tRF-21-RKP4P9L0 was provided in Additional file 6: Table S6), hence the properties of tRF-21-RKP4P9L0 were investigated further.
With a length of 21nt (GGGGGTGTAGCTCAGTGGTAG), tRF-21-RK9P4P9L0 was cleaved at the 5′end of tRNA. Figure 6G depicts the structure of its tRNA host.
To explore the underlying mechanism of tRF-21-RK9P4P9L0, we built a target mRNA network and a PPI network to investigate its possible function. Figure 7A showed that there were 114 tsRNA-mRNA interactions and 209 PPI interactions, with the CDS region of the target mRNAs (74 CDS, 42 3′-UTR, and 5 5′-UTR) serving as the primary binding sites (Additional file 7: Table S7). We performed an enrichment analysis on the genes in the network, a distinct enriched term is presented by a different color node, Cytoscape MCC analysis was used to figure out what the core module in this network was (Fig. 7B). Figure 7C showed the extraction of a strongly linked sub-network. The network's hub was discovered to be Notch1.
Notch1 has been investigated to have roles in cancer cell proliferation, migration and invasion, we then explored these functional impacts of tRF-21-RK9P4P9L0 in LUAD cell lines. After detecting the tRF-21-RK9P4P9L0 expression in different LUAD cell lines, we selected A549 and H1299 for subsequent functional experiments due to their relative higher expression level (Additional file 8: Figure S3). The results demonstrated that tRF-21-RK9P4P9L0 inhibition increased Notch 1 expression level (Fig. 8A, B) and significantly reduced the proliferation (Fig. 8C), migration (Fig. 8D) and invasion (Fig. 8E) ability of A549 cells. The same impacts were observed in H1299 cell line (Fig. 8F–J).
These findings demonstrated that bioinformatics methods can be used to investigate the function of specific genes.