Discovery and validation of PZP as a novel serum biomarker for screening lung adenocarcinoma in type 2 diabetes mellitus patients

Patients with type 2 diabetes mellitus (T2DM) have an increased risk of suffering from various malignancies. This study aimed to identify specific biomarkers that can detect lung adenocarcinoma (LAC) in T2DM patients for the early diagnosis of LAC. The clinical information of hospitalized T2DM patients diagnosed with various cancers was collected by reviewing medical records in Wuxi People’s Hospital Affiliated to Nanjing Medical University from January 1, 2015, to June 30, 2020. To discover diagnostic biomarkers for early-stage LAC in the T2DM population, 20 samples obtained from 5 healthy controls, 5 T2DM patients, 5 LAC patients and 5 T2DM patients with LAC (T2DM + LAC) were subjected to sequential windowed acquisition of all theoretical fragment ion mass spectrum (SWATH-MS) analysis to identify specific differentially-expressed proteins (DEPs) for LAC in patients with T2DM. Then, these results were validated by parallel reaction monitoring MS (PRM-MS) and ELISA analyses. Lung cancer was the most common malignant tumor in patients with T2DM, and LAC accounted for the majority of cases. Using SWATH-MS analysis, we found 13 proteins to be unique in T2DM patients with early LAC. Two serum proteins were further validated by PRM-MS analysis, namely, pregnancy-zone protein (PZP) and insulin-like growth factor binding protein 3 (IGFBP3). Furthermore, the diagnostic values of these proteins were validated by ELISA, and PZP was validated as a novel serum biomarker for screening LAC in T2DM patients. Our findings indicated that PZP could be used as a novel serum biomarker for the identification of LAC in T2DM patients, which will enhance auxiliary diagnosis and assist in the selection of surgical treatment at an early stage.


Background
Diabetes mellitus is a group of metabolic disorders characterized by chronic hyperglycemia caused by complicated etiologies. Statistical data organized by the International Diabetes Federation revealed that there were approximately 387 million people worldwide who had diabetes mellitus in 2014, which is estimated to increase to 592 million in 2035 [1]. Diabetes mellitus occurs when the body cannot produce enough insulin or use insulin effectively. The former is defined as type 1 diabetes mellitus (T1DM), and the latter is type 2 diabetes mellitus (T2DM) [2]. Increasing evidence has revealed that T2DM is associated not only with microvascular complications (including nephropathy, retinopathy and neuropathy) and macrovascular complications (such as cardiovascular diseases) [3] but also with the oncogenesis and development of multiple types of cancer, including lung cancer, breast cancer and pancreatic cancer [4,5].
Cancer is gradually becoming the first cause of mortality worldwide with growing numbers of estimated new cases and deaths each year [6]. Increasing evidence supports a direct association between T2DM and cancer with higher risks of cancer morbidity and mortality, especially for some of the most common malignancies [7]. To date, several mechanisms underlying the cancer-T2DM association have been explored, uncovering dysregulations of the insulin-like growth factor (IGF) system as the most important paradigm [7,8]. However, despite the higher risk of cancer morbidity in the T2DM population, reliable biomarkers for screening and early diagnosis of specific types of cancer in T2DM patients have not yet been discovered.
Mass spectrum (MS)-dependent strategies offer novel insights for the identification and validation of diseaserelated biomarkers [9,10]. For example, Geyer et al. developed a plasma proteome analysis pipeline using label-free quantitative MS, which detected 284 ± 5 proteins containing > 40 FDA-approved biomarkers without removing high-abundance proteins [11]. Sequential windowed acquisition of all theoretical fragment ion mass spectrum (SWATH-MS) is a newly developed strategy using a data-independent acquisition (DIA) method with high quantitative accuracy and reproducibility [12]. Using this strategy, increasing numbers of disease biomarkers have been identified, and novel criteria for disease typing based on proteomics have been established [13][14][15].
In this research, we first collected clinical information of hospitalized T2DM patients diagnosed with cancer and found that lung cancer was the most common malignant tumor in patients with T2DM in our cohort, with lung adenocarcinoma (LAC) accounting for the majority of cases. Using SWATH-MS and parallel reaction monitoring MS (PRM-MS) analyses, we discovered and preliminarily validated pregnancy zone protein (PZP) and insulin-like growth factor binding protein 3 (IGFBP3) as potential biomarkers. ELISA analysis was next used to further validate these biomarkers, and PZP was determined as a novel serum biomarker for screening LAC in T2DM patients, which will enhance auxiliary diagnosis and assist in the selection of early surgical therapeutics for LAC.

Patients and sample description
The clinical information of hospitalized T2DM patients diagnosed with cancer was collected by reviewing medical records in Wuxi People's Hospital Affiliated to Nanjing Medical University from January 1, 2015, to June 30, 2020. The following two cohorts were used to discover and validate biomarkers (Fig. 1a): In the discovery set, a total of 20 serum samples from 5 healthy controls, 5 T2DM patients, 5 LAC patients at TNM stage 1 and 5 T2DM patients with LAC at TNM stage 1 (T2DM + LAC), which were submitted to SWATH-MS analysis; besides, 20 serum samples from T2DM patients and 20 serum samples from T2DM patients with LAC at TNM stage 1 were submitted for PRM-MS and ELISA analysis. In the validation set, 20 serum samples from T2DM patients and 20 serum samples from T2DM patients with LAC at TNM stage 1 were collected for ELISA analysis. Before analysis, the serum samples were kept at −80 °C until use. The study was approved by the Ethical Committee at Wuxi People's Hospital Affiliated to Nanjing Medical University, and the study was performed according to the Declaration of Helsinki.

Sample preparation
An Agilent Multiple Affinity Removal LC Column (Human 14) (Agilent, CA, USA) was used to remove high-abundance proteins in accordance with the protocol to obtain a low-abundance component solution in the serum sample. A 5 kD ultrafiltration tube was used for ultrafiltration and concentration, and one-fold volume of SDT lysis was added into the system, which was incubated in a water bath at 100 °C for 10 min and centrifuged at 14,000 × g for 15 min. The supernatant was extracted for protein quantification using a BCA kit, and the samples were subpackaged and stored at −80 °C.

FASP digestion
DTT was added to 200 μg of protein solution collected from each sample to reach a final concentration of 100 mM, and the samples were incubated in a water bath at 100 °C for 5 min. UA buffer (200 μL) was then added, and the samples were mixed and transferred to a 30 kD ultrafiltration centrifuge tube. The samples were centrifuged at 12,500 × g for 25 min, and the filtrate was discarded (this step was repeated twice). IAA buffer (100 μL; 100 mM IAA in UA) was then added, and the samples were shaken at 600 rpm for 1 min. The samples were allowed to react at room temperature for 30 min in the dark and then centrifuged at 12,500 × g for 25 min. UA buffer (100 μL) was then added, and the samples were centrifuged at 12,500 × g for 15 min (this step was repeated twice). Then, 40 mM NH 4 HCO 3 (100 μL) was added, and the samples were centrifuged at 12,500 × g for 15 min (this step was repeated twice). Trypsin buffer (40 μL; 4 μg of trypsin in 40 μL of 40 mM NH 4 HCO 3 ) was then added, and the samples were shaken at 600 rpm for 1 min and placed at 37 °C for 16-18 h. The collection tube was replaced, and the samples were centrifuged at 12,500×g for 15 min followed by the addition of 20 μL of 40 mM NH 4 HCO 3 and centrifugation at 12,500×g for 15 min to collect the filtrate. A C18 cartridge was used to desalt the peptides. After the peptides were dried, they were reconstituted with 40 μL of 0.1% formic acid solution.

High PH RP classification
The peptide mixtures of all samples were submitted for fractionation using the Agilent 1260 infinity II HPLC system. Buffer A solution consisted of 10 mM HCOONH 4 and 5% ACN (pH 10), and solution B consisted of 10 mM HCOONH 4 and 85% ACN (pH 10). The chromatographic column was balanced with buffer A, and the sample was loaded by the autosampler onto the chromatographic column (XBridge Peptide BEH C18 Column, 130 Å, 5 µm, 4.6 mm × 100 mm; Waters, MA, USA) for separation with a flow rate of 1 mL/min. The liquid phase gradient was as follows: linear gradient of 5% B to 45% B within 40 min with a column temperature maintained at 30 °C. In total, 36 components were collected, and each component was dried in a vacuum concentrator for use. The sample was lyophilized, reconstituted with 0.1% formic acid aqueous solution and combined into 12 fractions.

DIA-MS analysis
From each fraction, 6 μL was removed and added to 2 μL of 10 × iRT standard peptide, and 2 μL of each sample was separated with nano-LC and analyzed by online electrospray tandem MS. The entire experimental system was an Orbitrap Q Exactive HF mass spectrometer (Thermo Fisher Scientific, MA, USA) connected in series with a Waters Acquity UPLC (Waters, MA, USA) system. Buffer A consisted of 0.1% formic acid aqueous solution, and buffer B consisted of 0.1% formic acid acetonitrile aqueous solution (acetonitrile was 80%). The sample was separated by an analytical column (Thermo Fisher Scientific, MA, USA; Acclaim PepMap C18, 75 μm × 25 cm) at a flow rate of 200 nL/min using the following nonlinear increasing gradient: 0-5 min, 1% B; 5-95 min, 1% B to 28% B; 95-110 min, 28% B to 38% B; 110-115 min, 38% B to 100% B; and 115-120 min, 100% B. The electrospray voltage was 2.0 kV. The MS parameters were set as follows: (1) MS: scan range (m/z) = 350-1250, resolution = 120,000, AGC target = 3e6 and maximum injection time = 20 ms; and (2) DIA: resolution = 30,000, AGC target = 1e6, maximum injection time = auto and NCE = 25.5,27,30. The original MS data and the default parameters of Spectronaut Pulsar X were used to analyze the DIA data. The protein qualitative standard was a precursor threshold of 1.0% FDR. Serum proteins compared between the two specified groups with a threshold of fold change (FC) ≥ 1.50 or ≤ 0.67 and P value ≤ 0.05 were considered as differentially-expressed proteins (DEPs).

PRM-MS analysis Sample preparation and FASP digestion
The expression of DEPs was preliminarily verified by PRM, which was a target proteomic strategy. For PRM assays, the methods for sample preparation and FASP digestion were the same as previously described for SWATH-MS analysis.

MS analysis
The same mass of peptides from each sample was extracted and mixed well, and 2 μg of each sample was separated with nano-LC and analyzed by online electrospray tandem MS. The complete liquid-mass tandem system was composed of a liquid system (Easy nLC system; Thermo Fisher Scientific, MA, USA) and an MS system (Q-Exactive; Thermo Fisher Scientific, MA, USA). Buffer A was composed of 0.1% formic acid aqueous solution, and buffer B was composed of 0.1% formic acid acetonitrile aqueous solution (acetonitrile was 80%). The sample was separated by an analytical column (Thermo Fisher Scientific, MA, USA; Acclaim PepMap RSLC 50 μm × 15 cm, nano viper, P/N164943) at a flow rate of 300 nL/ min using the following nonlinear increasing gradient: 0-1 min, 2% B to 8% B; 1-46 min, 8% B to 28% B; 46-56 min, 28% B to 40% B; 56-57 min, 40% B to 90% B; and 57-60 min, 90% B.
The samples were chromatographed and analyzed by a Q Exactive mass spectrometer with the following parameters; analysis time of 60 min; detection method was positive ion; precursor ion scan range of 350-1500 m/z, resolution of the primary MS was 60,000; AGC target was 3e6; and primary maximum IT was 45 ms. The massto-charge ratios of peptides and peptide fragments were collected according to the following method: 10 fragment patterns (MS2 scan) were collected after each full scan (MS2 scan); MS2 activation type was HCD; isolation window was 2 m/z; MSMS resolution rate was 15,000, AGC target was 2e5; secondary Maximum IT was 45 ms; and normalized collision energy was 27 eV.

PRM precursor ion screening
Proteome Discoverer 2.1 (Thermo Fisher Scientific, MA, USA) software was used to convert the original map files (.raw files) generated by Q Exactive into.mgf files, which were submitted to the MASCOT2.6 server for database retrieval through the built-in tools of the software. The database used was Uniprot_HomoSapi-ens_20386_20180905. The reliable protein screening criterion was peptide FDR ≤ 0.01.

ELISA analysis
The concentrations of PZP (Catalog No. DY8280-05; R&D Systems, MN, USA) and IGFBP3 (Catalog No. DGB300; R&D Systems, MN, USA) in serum were quantified with commercially available ELISA kits according to the manufacturer's protocol. Most samples were assayed in duplicates, and the average values were reported as pg/mL or ng/mL. The linear correlation between the PRM-MS and ELISA results was calculated using Pearson's correlation analysis.

Analysis of public data
The data of PZP mRNA expression in the TCGA database was obtained from the Xena website. The correlations between PZP expression and immune cell infiltration were determined by the TIMER database [16]. Besides, the summary of PZP protein was consulted in the HPA database [17,18].

Statistical analysis
Statistical analysis was mainly performed in SPSS (v26.0) and GraphPad Prism (v.8.0). Most of the data between the two groups were presented as means ± SDs (Std. Deviations) if not noted and were compared by Student's t-test or the Mann-Whitney test. Correlation analysis was evaluated by Pearson's correlation analysis. Receiveroperating characteristic (ROC) analysis was used to assess the specificity and sensitivity of the biomarkers, and the area under the ROC curve (AUC) was estimated for each individual protein. For all analyses, P values less than 0.05 were considered statistically significant.

Distribution of tumor location and subtype of lung cancer in T2DM patients
Previous research has indicated that lung cancer is the most common concomitant malignant tumor among patients with diabetes [1]. Thus, to further confirm the distribution of tumor location, we collected clinical information of hospitalized T2DM patients diagnosed with cancers from January 1, 2015, to June 30, 2020. After analyzing the distribution, we found that lung cancer was the highest proportion of malignant tumors (20.84%) followed by digestive tract cancers (colorectum: 12.81%, stomach: 12.32%, and liver: 6.18%) ( Table 1). We next analyzed the histological types of T2DM patients with lung cancer. The proportion of histological types was as follows: adenocarcinoma (60.62%), squamous carcinoma (13.86%), small cell carcinoma (3.69%), mixed carcinoma (1.47%), neuroendocrine carcinoma (0.88%), magnocellular carcinoma (0.29%) and other histological types (0.88%) ( Table 2). Overall, LAC accounted for the most common tumor in T2DM patients and should be monitored and diagnosed early.

Patient characteristics and study design
Before we screened the potential biomarker that could differentiate LAC in T2DM patients, we first tried to compare the general pathological parameters in the main two groups in the whole set consisting of 40 serum samples from T2DM patients and 40 serum samples from T2DM patients with LAC. In the T2DM group, there were 23 males and 17 females with an average age of 61.05 ± 9.78 years and an average fasting plasma glucose (FPG) of 7.95 ± 1.91 mmol/L. In the T2DM + LAC group, there were 19 males and 21 females with an average age of 64.68 ± 7.10 years and an average FPG of 7.41 ± 2.55 mmol/L. There were no statistically significant differences in sex, age and FPG between the two groups (P > 0.05) ( Table 3). Besides, there was also no significant differences in therapeutic regimens for hypoglycemia between these two groups (P > 0.05) ( Table 3). Moreover, we compared the concentrations of the most commonly used tumor biomarkers in the clinic between these two groups. The results showed that there were no significant    (Table 3). These results suggested that the identification of novel biomarkers is urgently needed for the detection of LAC in T2DM patients.
Considering the limited values of common tumor biomarkers in T2DM patients, we next performed SWATH-MS, PRM-MS and ELISA analyses to identify and validate novel biomarkers for the detection of LAC in T2DM patients. The overall strategy and simplified workflow are shown in Fig. 1b. Briefly, 20 samples obtained from 5 healthy controls, 5 T2DM patients, 5 LAC patients and 5 T2DM patients with LAC were submitted for SWATH-MS analysis to identify DEPs specific for LAC in patients with T2DM. These results were next validated by PRM-MS and ELISA analysis. Moreover, the validation set consisting of 20 serum samples from T2DM patients and 20 serum samples from T2DM patients with LAC were collected for ELISA analysis and further validation.

Identification of differentially expressed proteins by SWATH-MS analysis
Using SWATH-MS analysis, we analyzed global protein changes in serum samples from 20 patients (5 healthy controls, 5 T2DM patients, 5 LAC patients and 5 T2DM + LAC patients). A total of 70 proteins were identified as differentially expressed between these disease groups and the control group (Fig. 2a-c). As shown in Fig. 2d, the three protein lists from the above analysis (T2DM vs. normal, LAC vs. normal and T2DM + LAC vs. normal) were further compared to identify a small group of proteins that were differentially expressed only in the T2DM + LAC group. Overall, 13 proteins were found to be unique in patients with T2DM + LAC (Fig. 2d). Among these proteins, 7 candidates exhibited differential expression between the T2DM + LAC and T2DM groups, including 2 upregulated proteins and 5 downregulated proteins (Tables 4 and 5). To arrange the samples according to similarities in protein expression patterns, we performed a hierarchical cluster analysis of the 70 DEPs as previously described [19]. Cluster analysis indicated a clear separation of the four groups (Fig. 2e).

Verification of selected candidate proteins by PRM-MS ELISA analyses
Of the 13 proteins identified as DEPs in patients with T2DM + LAC by SWATH-MS analysis, 7 proteins showed significant dysregulation between T2DM + LAC and T2DM, including CCD87, FHR1, FRPD2, HBB, IGFBP3, PZP, and ZN350 (Table 5). We next used targeted PRM-MS to provide high sensitivity relative peptide quantification for validation. A total of 4 proteins were detected by PRM-MS, and significant differential expression of 2 of these candidate proteins was confirmed, namely, PZP and IGFBP3 (Fig. 3a-d, Additional file 1: Figure S1).
We next validated the protein abundance changes of PZP and IGFBP3 using commercially available antibodies and ELISA kits. The concentration-dependent standard curve is shown in Additional file 2: Figure S2. To evaluate the feasibility of developing an assay that could be more easily deployed in a clinical environment, we assessed the transferability of the PRM-MS-based results to ELISA. The levels of PZP and IGFBP3 were quantified by commercially available ELISA kits, and the correlation with the results obtained by PRM-MS was evaluated. The results showed a linear correlation for PZP but not IGFBP3 (Fig. 4a, Additional file 3: Figure S3). In addition, the level of PZP between the T2DM + LAC and T2DM groups was significantly different in the discovery set, the validation set and the whole set, and the ROC analysis indicated an AUC of 0.742 (Fig. 4b-e). However, no significant difference was observed in IGFBP3 levels between these two groups (Additional file 3: Figure S3). In summary, detection of PZP level provides enough sensitivity and specificity, and it merits further validation in larger cohort samples.

Discussion
As two common chronic non-communicable diseases, more and more studies have realized the correlation between lung cancer and T2DM. In a meta-analysis, Lee et al. systematically analyzed 34 observational studies and found that after adjusting for smoking and other variables, T2DM was an independent risk factor for the  occurrence of lung cancer with a relative risk of 1.11 and a 95% CI of 1.02 to 1.20 [20]. At the same time, T2DM is also related to the risk of lung cancer death. Tseng et al. conducted a prospective study of 244,920 T2DM patients with a 12-year follow-up and found that the LC mortality rate of T2DM patients was significantly higher [21].
In the present research, we systematically analyzed the distribution of tumor location and subtype of lung cancer in T2DM patients. The results revealed that lung cancer was the most common malignant tumor in patients with T2DM, with LAC accounting for the majority of cases. Moreover, unlike pancreatic cancer, which has the highest increased risk in patients with T2DM, the early diagnosis and treatment of lung cancer can significantly improve prognosis [22,23]. Therefore, more strategies for the early screening of LAC in T2DM patients should be further explored.
Although cytology is the gold standard for the diagnosis of malignancies, serum biomarkers are also invaluable in the screening and auxiliary diagnosis of malignant tumors as well as monitoring curative effects [24,25]. The serum proteome holds significant interest as a potential source of biomarkers and is an easily accessible fluid for auxiliary diagnosis. Four tumor biomarkers, including AFP, CEA, CA125 and CA199, are widely used in clinical practice. An observational study presented by Chen et al. revealed the association between the levels of these biomarkers and the tumor stage of LAC. Serum AFP was not correlated with T stage, N stage or M stage, but serum CEA and serum CA125 were positively correlated with T stage, N stage and M stage. Serum CA199 was not correlated with T stage but was positively correlated with N stage and M stage [26]. However, it is unknown whether these four biomarkers help to identify LAC in patients with T2DM. In our study, the results indicated that there were no significant differences in serum CEA, AFP, CA125 and CA199 levels between the T2DM + LAC group and the T2DM group, indicating an urgent need for the identification of promising biomarkers for the detection of LAC in T2DM patients.
The MS-dependent identification of serum biomarkers has recently emerged [27,28]. SWATH-MS is a newly developed technology, which combines the advantages and characteristics of traditional "shotgun" proteomics and selective reaction monitoring/multiple reaction monitoring (SRM/MRM) [12]. SWATH-MS technology can obtain all fragment information of all ions in the sample without omission and difference, while PRM technology can achieve the absolute quantification of protein expression. The combination of the two strategies can be used for the efficient, comprehensive and accurate screening of potential biomarkers [29,30]. In this study, we performed SWATH-MS analysis to identify DEPs specific for LAC in patients with T2DM, and these potential biomarkers were validated by PRM-MS and ELISA analysis in the discovery and validation cohort.
To identify a small group of proteins that were differentially expressed in the T2DM + LAC group, we compared the three protein lists (T2DM + LAC vs. normal, T2DM vs. normal and LAC vs. normal) and identified 13 proteins that were unique in patients with T2DM + LAC. Among these proteins, 7 candidates exhibited differential expression between the T2DM + LAC and T2DM groups.  To identify useful diagnostic indicators from these 7 proteins, we conducted further validation by PRM-MS. The results showed that 4 proteins were detected by PRM-MS and that significant differential expression of 2 of these candidate proteins was confirmed, namely, PZP and IGFBP3. As a first step toward clinical implementation, the diagnostic biomarker was assessed by ELISA. Immunoassays continue to be the preferred method for clinical validation and further application in clinical practice [31]. The PZP levels were significantly different between the T2DM + LAC and T2DM groups, and the ROC analysis indicated an AUC of 0.742 in the whole set. However, no significant difference in IGFBP3 levels was observed between these two groups. PZP is associated with pregnancy, and it is produced in the liver, placenta and other tissues. The blood concentration of PZP increases during pregnancy [32].
Mechanically, elevated estrogen levels during pregnancy may regulate PZP levels [33]. Moreover, elevated PZP has been identified as an indicator associated with P. aeruginosa infection. Sputum but not serum concentrations of PZP have been significantly associated with the Bronchiectasis Severity Index, the frequency of exacerbations and symptoms [34]. Previous research has also uncovered the role of PZP in cancers. In hepatocellular carcinoma, PZP has low expression in tumor tissues, and the downregulation of PZP is correlated with poor clinical outcomes [35]. Our research identified and validated PZP as a novel serum biomarker for screening LAC in patients with T2DM by SWATH-MS, PRM-MS and ELISA analyses. Besides, we also analyzed the expression of PZP and its correlations with immune cell infiltration in lung cancer. The results showed that PZP mRNA was downregulated in lung cancer tissues and significantly correlated  Figure  S4A-B). However, in the TCGA database, not all patients have T2DM before the diagnosis of lung cancer. Besides, the TCGA database only provides gene expression data at the mRNA level. Serum biomarkers are not only derived from tumor cell, but may also be released by tumorrelated immune cells [36]. According to the HPA database, PZP is highly expressed in immune cells, including T cells and macrophages. In previous research, P. aeruginosa infection-induced PZP elevation was derived from neutrophils [34]. Therefore, serum PZP may be derived from tumor-related immune cells, but further studies still need to confirm the source of PZP and its diagnostic value by large-scale analysis.

Conclusion
In conclusion, the present results revealed that PZP could be used as a novel serum biomarker for the detection of LAC in T2DM patients, which will enhance auxiliary diagnosis at an early stage. However, the present study was conducted using a small sample size at a single center. Hence, the performance of the biomarker panel needs to be validated in a prospective, multicentric study with a higher number of patients.