Grizzle et al

Biases in samples include conditions of sampling. (type of sample ..... The stromal and inflammatory response to the tumor may also modu- late proteins (e.g ...
402KB taille 7 téléchargements 473 vues
REVIEW

The Need for Review and Understanding of SELDI/MALDI Mass Spectroscopy Data Prior to Analysis William E. Grizzle,a O. John Semmes,b William Bigbee,c Liu Zhu,a Gunjan Malik,b Denise K Oelschlager,a Barkha Manne,a Upender Mannea University of Alabama at Birmingham, Birmingham, AL;a Eastern Virginia Medical School, Norfolk, VA,b University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA.c Abstract: Multiple studies have reported that surface enhanced laser desorption/ionization time of flight mass spectroscopy (SELDITOF-MS) is useful in the early detection of disease based on the analysis of bodily fluids. Use of any multiplex mass spectroscopy based approach as in the analysis of bodily fluids to detect disease must be analyzed with great care due to the susceptibility of multiplex and mass spectroscopy methods to biases introduced via experimental design, patient samples, and/or methodology. Specific biases include those related to experimental design, patients, samples, protein chips, chip reader and spectral analysis. Contributions to biases based on patients include demographics (e.g., age, race, ethnicity, sex), homeostasis (e.g., fasting, medications, stress, time of sampling), and site of analysis (hospital, clinic, other). Biases in samples include conditions of sampling (type of sample container, time of processing, time to storage), conditions of storage, (time and temperature of storage), and prior sample manipulation (freeze thaw cycles). Also, there are many potential biases in methodology which can be avoided by careful experimental design including ensuring that cases and controls are analyzed randomly. All the above forms of biases affect any system based on analyzing multiple analytes and especially all mass spectroscopy based methods, not just SELDI-TOF-MS. Also, all current mass spectroscopy systems have relatively low sensitivity compared with immunoassays (e.g., ELISA). There are several problems which may be unique to the SELDI-TOF-MS system marketed by Ciphergen®. Of these, the most important is a relatively low resolution (±0.2%) of the bundled mass spectrometer which may cause problems with analysis of data. Foremost, this low resolution results in difficulties in determining what constitutes a “peak” if a peak matching approach is used in analysis. Also, once peaks are selected, the peaks may represent multiple proteins. In addition, because peaks may vary slightly in location due to instrumental drift, long term identification of the same peaks may prove to be a challenge. Finally, the Ciphergen® system has some “noise” of the baseline which results from the accumulation of charge in the detector system. Thus, we must be very aware of the factors that may affect the use of proteomics in the early detection of disease, in determining aggressive subsets of cancers, in risk assessment and in monitoring the effectiveness of novel therapies. Keywords: bias, specimens, specimen processing, mass spectrometry, serum, cancer detection

Introduction Surface enhanced laser desorption/ionization time of flight mass spectroscopy (SELDI-TOF-MS) is a relatively new high throughput proteomic technique that has been reported to be useful in the early detection of disease. Specifically, SELDI-TOF-MS has been used to analyze samples of body fluids to aid in the early detection of multiple neoplastic processes. Serum has been the major bodily fluid utilized in most studies reported to date (Table 1). Before any attempt is made at analysis of data of any form, the statistician / bioinformaticist should be thoroughly familiar with the source and accuracy of the data. The obvious trite statement applies: junk to the statistician equals junk from the statistician; thus, care should be taken to understand the quality of the data prior to analysis. The purpose of this manuscript is to alert those analyzing SELDI-TOF-MS, other mass spectroscopy techniques, and other proteomic data of the potential sources of incorrect, inaccurate and/or biased data. The sources of problematic data can be subdivided into the following: experimental design, patient, sample, protein chip, chip reader, measures of the spectrum including peak identification, peak comparisons, and algorithms of spectral analysis.

Experimental Design The importance of careful experimental design involves each of the potential sources of biases listed above. For example, great care must be taken in identifying the patients to be studied and especially in carefully choosCorrespondence: William Grizzle, [email protected] Cancer Informatics 2005:1(1) 86-97

86

Grizzle, Semmes, Bigbee et al Table 1: Summary of some of the SELDI/MALDI-TOF-MS Cancer Case/Control Serum Proteomic Profiling Studies Organ Site

Bladder Breast Breast

ProteinChip Type

WCX2 weak cation exchange IMAC3 nickel activated IMAC3-Cu SAX-2

Head and IMAC3 copper Neck activated Head and MALDI Neck WCX2 Kidney weak cation exchange IMAC3-Cu Liver WCX2 WCX2 Lung weak cation exchange C16 hydrophoOvary bic interaction SAX2 Ovary strong anion exchange IMAC3-Cu Pancreas WCX IMAC3 copper Prostate activated C16 hydrophoProstate bic interaction Prostate

IMAC3-Cu WCX2

Number of Patients in Learning / Training Set

Results of Training Set

Number of Patients in Test Set

Results of Test Set

Number Reference of Peaks

Controls

Cancer

Sensitivity

Specificity

Controls

Cancer

Sensitivity

Specificity

104

87

87

84

18

21

78

67

7

1

66

42

N/A

N/A

66

103

852

912

3

2

89 306

45 306

N/A 82 (84)4 90

N/A 85 (90)4 97

89 306

45 306

N/A 80 (78)5 907

N/A 79 (83)5 937

4 3 4

3

75

75

91

88

27

24

83

100

3

4

95

66

N/A

N/A

48

33

738

908

45

5

21

15

N/A

N/A

21

15

879

859

5

6

20

38

N/A

N/A

20

38

9010

9210

6 4

7

51

30

N/A

N/A

31

15

93

97

3

8

50

50

100(?)

100(?)

66

50

100

95

8

9

73

67

96

83

22

22

95

91

14

10

120

60

N/A

N/A

120

60

782

972

9

11

159

167

98

94

30

30

83

97

9

12

25

31

NR11

NR11

228

38

95

78

7

13

30

44

N/A 891

N/A 871

26

62

66 63 851

38 77 851

5 6 31

14

3

3

Legend 1

Combined performance of peaks from both IMAC-Cu and WCX2 ProteinChip® arrays Bootstrap cross-validation using all patients and controls 47 healthy controls, 42 patients with benign breast disease 4 Versus healthy controls and benign disease, respectively 5 Cross-validation versus healthy controls and benign disease, respectively 6 Randomly selected 30 cases and controls 7 Cross-validation using combined performance of peaks from both IMAC-Cu and SAX ProteinChip® arrays 8 Cross-validation using two-thirds of cases and controls as the training set, one third as the test set 9 Results of five independent simulation studies 10 Cross-validation using combined performance of peaks from both IMAC-Cu and WCX2 ProteinChip® arrays 11 Training set results not reported 2 3

ing control patients. For example, how does one ensure that controls do not have the subclinical form of the disease being studied? As part of the experimental design, the type of sample to be analyzed needs to be considered; for example, would serum, plasma, urine, saliva, cytological specimens, or combinations of these be the best samples to study. Also, samples from cases and controls must be collected, processed and stored consistently. The use of SELDI-TOF-MS in early detection usually requires processing of samples using robotics, performing the assays in triplicate, and consistency in processing (e.g., dilution) of the sample before adding the tripli87

cate of the sample to the chip. The type of chip(s) should be selected before the experiment (Table 2) and enough chips from one lot should be assembled to complete the experiment if practicable within the shelf lives of chips. Samples should be applied to the chip in a manner selected to prevent bias and errors introduced from the methodology. Thus, because random and consistent errors may arise due to spot, chip, and day of assay, error can be minimized by ensuring that each of the triplicates is placed on a different spot (e.g., not more than one of a triplicate on spot A), each on a different chip (i.e., no more than one of Cancer Informatics 2005:1(1) 1-1

Need for Understanding of SELDI/MALDI Mass Spectroscopy Data Table 2: Types of chip Old Designation

Current Chip

Biochemical Action of Surface Chemistry

IMAC3

IMAC30 (with hydrophobic barrier)

Bivalent metals can be attached to the chip. Proteins that bind to these divalent metals (eg, Cu+2) are bound by the chip.

Same WCX2

This is a weak cation exchange chip. It contains negatively charged (anionic) carboxylate (CM10 mimics WCX2 but does groups that will bind proteins with positively charged areas containing high numbers of lysine, arginine, and/or histidine amino acids. not replace)

H4

Same (C16 contains 16 CH3)

SAX2

Strong anion exchanger which is composed of quartenary ammonium groups that are Q10 (with hydrophobic barrier) charged positively. This chip will bind proteins/peptides with regions rich in acidic groups, especially regions of peptides high in aspartic and/or glutamic amino acids.

NP1 and NP2

NP20

General protein binding surface which binds via serine, threonine or lysine

PS1 and PS2

PS10 / PS20

Binds via attachment of capture molecules such as antibodies, binding proteins, etc.

SENDID

The chip contains multiple chains, each composed of 16 methylene groups. This very hydrophobic microenvironment binds molecules that are hydrophobic.

Incorporates EAM into chip.

the triplicates on the same chip) and each of the triplicates should be analyzed on a different day. Given these restrictions, cases and controls should then be applied randomly and blindly to the chip based upon statistical considerations. Thus, each of the triplicates is analyzed within one of three groups. Each of these groups should be analyzed on separate days. Spots and chips are randomly assigned except that spots and chips must be different on each of the triplicates of one specimen (see Table 3).

so as to not exhaust the sample-EAM matrix (Figure 1). If a directed (e.g., peak identification) approach to analysis is to be used, a careful, consistent method of peak identification should be utilized and the alignment of peaks should be consistent. The instrument should be calibrated periodically and the consistency of the instrumental output should be verified using the standard samples as part of the quality control program. Finally, there must be a consistent approach to the analysis of data.

Similarly, a consistent aliquot of a standard control sample should be used (at least one per chip) and the standard control sample also should be loaded on chips without bias as to the spot on the chip to which it is loaded. The chip should be prepared using a robotic system within one day of analysis. The energy absorbing molecule (EAM) should be chosen with care to match the molecular weight range of interest. If a whole spectrum is of interest 1000 Daltons to 200,000 Daltons, then multiple runs of a sample will be necessary, each run over a specific molecular weight range with a separate set of triplicates and different adjustments including to laser and detector. For each range of interest (e.g., 1 kD to 15 kD) an EAM should be selected and the machine should be calibrated for this spectral area. EAMs vary in efficiency based upon the molecular weight range being analyzed. Thus, an area such as from 50 kD to 100 kD will require a different EAM and a specific set of molecular weight standards (e.g., 5 to 7 purified proteins) which bridge these molecular weights. The points at which the laser samples the spots, should be selected

Patient and Controls

Cancer Informatics 2005:1(1) 1-1

The general approach of proteomics is to compare one condition with another; thus, in studies involving the early detection of a disease, patients with the disease (cases) are compared with patients without the disease (controls). The first very important issue is to identify what are the clinical (research) and other parameters that define a good control. For example, controls for prostate cancer (PCa) should not Table 3: Potential Organization of Triplicates of Samples on Chips (Spots and Chips Randomly Chosen in Each Group) Group 1

Group 2

Group 3

Sample 1A, Spot C, Chip 7 Sample 2A, Spot D, Chip 5 Sample 3A, Spot A, Chip 4

Sample 1B, Spot A, Chip 5 Sample 2B, Spot C, Chip 3 Sample 3B, Spot F, Chip 1

Sample 1C, Spot F, Chip 2 Sample 2C, Spot A, Chip 4 Sample 3C, Spot G, Chip 7

·

·

·

·

·

·

·

·

·

Sample NA, Spot D, Chip M

Sample NB, Spot G, Chip M-5

Sample NC, Spot B, Chip M+1

88

Grizzle, Semmes, Bigbee et al 6000

8000

10000

8

First Read A

5929.2+H

6 9314.9+H 7776.3+H

4

2

0 8

6000

8000

10000

6

First Read B

5930.1+H 9316.3+H 7777.3+H

4

2

0 8

6000

8000

10000

First Read C

5932.6+H

6 9320.5+H 7781.3+H

4

2

6000

8000

10000

Fourth Read B

0 8

6 5928.5+H

4 9315.9+H 7776.9+H

2

6000

8000

10000

Tenth Read B

0 8

6

4

2 5927.5+H 7774.2+H

0 8

9314.2+H

8000

10000

Fourtheenth Read B

6000

6

4

2 5928.7+H 7775.7+H

9315.0+H

0 6000 6000

8000 8000

10000 10000

Figure 1: Figure 1 demonstrates the spectral decreases upon multiple laser shots in same area. The same sample was applied to spots A, B, and C and these spots were analyzed initially with 5 laser shots per spot to demonstrate consistency of spectral pattern. Following 3 additional samplings of spot B, at 5 laser shots each at the same site, the decrease in the spectrum is clear (4th 5 shot read of spot B). After a total of 10 and 14 5 shot samplings at the same site on spot B, a marked decline in sample intensity was noted (10th read of spot B and 14th read of spot B respectively).

have prostate cancer but should be males who have conditions such as benign prostatic hyperplasia (BPH) which may mimic PCa as well as be within the age range that PCa usually occurs > 50 years of age. Familial cases may need separate controls. The usual normal range for PSA is ≤ 4 ng/ml; however, based upon the Prostate Cancer Prevention Trial (PCPT) about 20% of patients with PSA values ≤ 4 ng/ml, have PCa (Thompson et al, 2004). Similarly, about 60% of patients with PSA values from 4 to10 ng/ml have no PCa but rather BPH. Men with PSA values much greater than 100 ng/ml are more likely to have PCa (Urban et al, 1999). Although no con89

trol group is perfect, one might select controls for a serum based study of PCa as being males with PSA ≤ 10 ng/ml, a normal digital rectal examination, normal prostatic ultrasonography, and a recent negative biopsy (at least sextant) of the prostate taken after the sample of serum was obtained because biopsy of the prostate may “activate” the prostate for several weeks subsequently (Urban et al, 1999). Controls above 4 ng/ml but less than 10 ng/ml are chosen because most samples in this range do not have PCa and controls should not exclude proteomic changes secondary to benign prostatic hyperplasia (BPH) (Grizzle et al, 2003, 2005). Such controls might be Cancer Informatics 2005:1(1) 1-1

Need for Understanding of SELDI/MALDI Mass Spectroscopy Data CEA T A G -7 2 C A 1 25 M U C -2 L EW IS Y

liv e r

ti s s u e r e a c tio n

tu m o r

M O L E C U LE S AN D M E T A B O L IT E S O F N O R M A L T IS S U E S - A C T IN

Lym p h nod e Ve no us a n d lym p h a t ic f lu id s co n t a in in g tu m o r , im m u n o lo g ic a l a n d tis s u e -r ea c tio n p ro d u c ts a n d t h e i r m e ta b o lite s

TU M OR M O LEC ULA R FE ATU RES

kid ne y

U rin e w ith tu m o r an d t is s u e re a ct i o n p ro d u c ts an d m et a b o l it es

M OL ECU LAR FEA TUR ES OF NOR MAL T IS S U E

Im m u n o lo g ic a l, tu m o r s u r r o u n d , d i s ta n t r e a c tio n N o rm al an d T um o r M arke rs- P S A, A l p h a 1 -A N T IT R Y P S IN

Figure 2: Figure 2 demonstrates that molecular markers in bodily fluids that may be used in the diagnosis of cancers may come from multiple sources. The tumor itself may produce markers such as CEA (colorectal cancer) that are produced by the tumor. Other tumor products may circulate and induce changes in distant tissues (e.g., liver and kidney) affecting the synthesis or metabolism of specific molecules. The stromal and inflammatory response to the tumor may also modulate proteins (e.g, cytokines) in serum.

Figure 3: Figure 3 emphasizes that some markers such as oncofetal tumor molecules are produced directly by the tumor while other tissue specific molecules such as PSA may be produced by the uninvolved tissues in addition to the tumor. Patterns of all proteins in bodily fluids depend upon multiple factors including where the contents of dying cells are dumped as well as the rate of cellular death.

collected prospectively or their samples might exist in a tissue bank; however, it is critical that all conditions of cases (site, collection, storage, etc.) match controls. Because various metabolic states may affect proteomic assays (e.g., diurnal rhythm, chronic diseases, stress), the metabolic and other conditions under which the control samples were collected should mirror the conditions under which the samples from patients with PCa were collected. For example, either a group of males all of whom had rheumatoid arthritis or a group of males all of whom had undergone a glucose challenge 2 hours prior to obtaining samples would be a bad control group for a group of males with PCa and the normal incidence of rheumatoid arthritis and a fasting state prior to sampling. It is necessary that such conditions average out and thus their contributions to the analysis of spectral patterns would be noise. The patterns of eating of disease groups can result in an important bias because some medical facilities may require patients to be fasting when visiting a clinic/hospital while other locations may not. Identifying members of the control groups for other cancers is just as demanding. Note that the age, sex and health of the control group should match the case group. One way to approach this is to require the same number of controls as cases be collected from each site (i.e., if 30 cases of PCa are collected from site A, then site A would also supply 30 controls). Because some of the spectral peaks in SELDITOF-MS analysis have been attributed to nonspecific inflammatory peaks, in some studies of neo-

plasia it would be useful to have additional separate groups of patients without the cancer being studied to identify non-specific proteins which may be associated with conditions such as rheumatoid arthritis, systemic lupus erythematosis, sepsis, and other tumors. Such groups should not be used in training the algorithm but rather to test the algorithm as to the specificity of the informative peaks selected by the algorithm.

Cancer Informatics 2005:1(1) 1-1

Specific issues to consider in the selection of members of the case group would be the type(s) of cancers studied. If the focus were to be on early detection of PCa, one would not want to evaluate samples from males who had metastatic PCa at the time of sampling. Similarly, Gleason scores might be chosen to separate more indolent PCa (Gleason ≤ 6) from more aggressive (Gleason ≥ 7) PCa. Ultimately in early detection, analysis of samples of serum obtained one to several years prior to the diagnosis of a specific cancer is necessary (Pepe et al, 2001). When the cancers of an organ such as lung are of various types, e.g., squamous cell carcinoma, adenocarcinoma, bronchoalveolar, small cell (oat cell) undifferentiated and “other”, then each type of cancer should be evaluated separately. Although it is possible that all types of carcinomas of the lung could be separated from controls by SELDI-TOF-MS analysis, more specific peaks will probably be identified if each type of tumor were analyzed as a separate group. For such cases, each type of tumor should 90

Grizzle, Semmes, Bigbee et al

utilize a different case group although the control group could remain the same, except for bronchoalveolar carcinoma which usually is not associated with smoking as a risk factor. Also, a separate algorithm should be developed and optimized for each type of tumor.

Figure 4: Figure 4 demonstrates two types of peaks observed in spectra when comparing cases with controls. The molecular features of the control lower spectrum demonstrates three peaks. These would usually be seen in any patient without disease. In the spectrum of the diseased patient, a new peak (primary peak) is present in the spectrum. This would probably result from a product produced because of the disease. Of interest, a peak (secondary peak) present in the spectra of most patients without disease is now absent from the spectrum of the diseased patient. Such peaks are not understood but may represent tumor-normal organ cross talk to reduce production of a protein or the production of an enzyme by the diseased state which metabolizes the protein of the secondary peak.

When an individual has cancer, multiple changes may occur in the spectral patterns of proteins in blood. As suggested in Figures 2 and 3, these changes include spectral peaks that correlate with molecules that are released from the tumor and molecules that are produced by local and distant responses to the tumor. Similarly, regulatory molecules may be released from the tumor and/or the epireaction and may modulate the production of proteins by distant tissues (e.g., acute phase reactants produced by the liver). The immune system also will produce reactions to the products of the tumor including cellular contents released into the circulation Other molecular changes may be modulated by tumors including the rate at which specific molecules are excreted or metabolized by the kidney or biliary-colorectal system. Also, specific proteolytic enzymes may be produced by a tumor or surrounding cells and these enzymes may affect patterns of proteins in serum and other bodily fluids. Ultimately, carrier proteins may bind smaller molecules; this may affect their analysis, for example, by concentrating them (Grizzle et al, 2005). All these potential sources of molecular changes in the spectral proteins in serum may combine to form spectral patterns which may be characteristic of the presence of a specific disease, e.g., a specific type of tumor. In spectra from cases and controls, two different types of informative peaks may be identified. One type of informative peak is highest in the cases of cancer (Figure 4). We designate such peaks as “primary” because they are suggestive of a primary product arising because a tumor is present in a patient. The second type of informative peak is lower Figure 5: Figure 5A is a cartoon which suggests how the secretions of living cells and the contents of dying cells of normal prostatic glands may exit the body without being absorbed into the vascular system. In contrast, dying cells of prostate cancer dump their contents into the interstitutium and these products are likely to be absorbed into the vascular system. Figure 5B demonstrates that as benign prostatic hyperplasia develops that glandular contents may be blocked from the usual pathway. Subsequently the glands may become dilated and/or inflamed and contents including PSA may leak from the lumen of the gland into the interstitial space.

90

Cancer Informatics 2005:1(1) 1-1

Need for Understanding of SELDI/MALDI Mass Spectroscopy Data

Protein Peak Intensity

60

due to increased proteolytic activity or, the production of a strong binding protein. Also, such changes in peaks may occur via the modulation of production of a molecular species and/or of excretion of molecules via kidney or biliary-colorectal system, by binding and removal via the immune system, or via a combination of these processes.

50 40 30 20 10 0 Thaw 2

Thaw 4

Thaw 8

Thaw 10

Thaw 12

Protein MW: 8.6 KD Num ber of Freeze-Thaw s

Protein MW: 8.1KD Protein MW: 9.3 KD

Figure 6: The bar graph of figure 6 demonstrates the decline in the peaks of three unidentified proteins in serum that follow multiple freeze thaw cycles of samples of serum. The pattern of decline varies with specific peaks; however, most peaks do not decline greatly until after at least three freeze thaw cycles.

in the cases with cancer than in control serum (Figure 4). We designate such peaks as “secondary” because their presence suggests that the presence of the cancer in a patient causes a decrease in a peak that usually is present in serum of normal individuals. This suggests the presence of the tumor increases the degredation of the secondary peak either Figure 7: Figure 7 demonstrates that storage of aliquots of a sample at -20°C (non-self defrost) for more than 6 months results in changes in peak amplitudes (A) and peak amplitude ratios (B vs C). Such changes were not noted on storage of an aliquot from the same original specimen for 10 months at -80°C.

Sometimes it is not the concentration of a molecular species that is produced by a tumor or that is present in the cells of tumors, but rather where the contents of tumor cells are released and the rate at which tumor cells die that control the levels of a protein in fluids. For example, the cells of adenocarcinoma of the prostate (PCa), in general, have a lower concentration of prostatic specific antigen (PSA) than the cells of normal prostatic tissues. How then can PSA be a sensitive marker of the early detection of PCa? First, when the cells of normal prostatic tissues die, they dump their contents into the lumina of the normal prostate glands and the contents of the glands are ultimately cleared in the ejaculate (Figure 5A). In contrast, when the cells of PCa die, they are not located in an intact ductal system and thus these C

A

Aliquot of sample (#20) stored at -80ºc

B

Aliquot of sample (#20) transferred from -80ºc and stored for 3 months at –20ºc Aliquot of sample (#20) stored at -20ºc for 5 months Aliquot of sample (#20) stored at -20ºc for 7 months Aliquot of sample (#20) stored at -20ºc for 8 months

Second aliquot of sample (#20) stored at -20ºc for 8 months Second aliquot of sample (#20) stored only at –80ºc for 10 additional months

5000 Cancer Informatics 2005:1(1) 1-1

7000

10000

91

Grizzle, Semmes, Bigbee et al

A

3940 3963 3960 3970

3486.

2950

Cancer markers from Serum on IMAC Protein Chips 2000 - 8000 Da

5382

5297

4579

4469 4475

4071 4079

5064 5074

4000 4283 4300

2000

7819 7820 7844 7884

7480

6949 6990 7024 7057

6541

6099

6797

6000 6430

4000

Figure 8: The spectra in Figure 8A (titled “Cancer markers from Serum on IMAC Protein Chips 2000 - 8000 Da”) and 8B (titled “Cancer markers from Serum on IMAC Protein Chips 8000 – 100,000 Da”) demonstrate some of the informative spectral peaks (e.g., peak locations) reported in the literature for the early detection of prostate, breast and head and neck tumors as detected using serum samples on IMAC copper activated protein chips.

6000

8000

Prostate12,13,14,27,28

Breast3,29,30,31

HNSCC4,32

10,000 10,266

16163

8000 10068

9655 9656 9713 9719

9507

9149

8900 8932 8943

8610

8066 8100 8141

8355

Cancer markers from Serum on IMAC Protein Chips 8000 – 100,000 Da

B

20,000 22,832

10,000

20,000

100,000

Prostate12,13,14,27,28

Breast3,29,30,31

HNSCC4,32

cells dump their contents into the interstitial space so that their contents are absorbed by the vascularlymphatic system (Figure 5A). Similarly, if benign ducts become blocked by changes of BPH or by the accumulation of concretions, their wall may leak products such as PSA into the interstitial space (Figure 5B).

Samples The choice of the best type of sample is controversial. Some investigators use plasma to avoid the activation of the coagulation system and consequent 92

release of factors from platelets and proteins associated with coagulation. Others prefer serum in which coagulation has removed some high concentration proteins. Once the type of sample is chosen, the sampling conditions need to be standardized. The choice of sample collection and storage containers may influence spectral results. While glass sample collection tubes may activate specific proteins, plastic collection and storage containers may contaminate specimens with plastic components. Because plastics usually are poly-molecular forms (e.g., polyvinyl Cancer Informatics 2005:1(1) 1-1

Need for Understanding of SELDI/MALDI Mass Spectroscopy Data

chloride), plastic contaminants may present as repeating spectral peaks, each separated by a standard molecular weight, usually about 100 to 200 Daltons. Similarly, additives to tubes (e.g., anticoagulants such as heparin) may influence spectral patterns. A minimum sample size should be selected and samples should be aliquoted to minimum sizes shortly after collection to avoid repeated freeze-thaw cycles. As demonstrated in Figure 6, peak amplitudes may decline upon several freeze-thaw cycles. The conditions between the collection of samples and storage of specimens should be standardized. It also is important to have conditions of storage consistent; for example, if cases are to be collected in the next two years, one does not want to use controls from an archival serum collection stored for 10 years or more at -70°C. It is critical that samples be stored at best at -70°C or colder. Storing samples at -20°C or warmer, even in a non-self defrost freezer, may result in degredation of proteins as measured by spectral changes (Figure 7). We are aware of MALDITOF-MS data that indicate spectral peaks do not change upon storage at -70°C or colder over a 4 year period. The proteomic systems may be very sensitive to small errors in pipetting and thus robotic processing of samples including the addition of EAM to sampling spots is recommended. In some cases the protocol may call for removal of proteins normally present at high concentrations (e.g., albumin, immunoglobulins). The binding of proteins/peptides to carrier proteins (e.g., albumin) may act to concentrate low molecular weight proteins normally expected to be present in low concentrations or to be of small size which can be cleared rapidly by renal excretion (Grizzle et al, 2005). One must be aware that the method of removal of carrier proteins also may remove small molecules at low concentrations that are carried by, for example, albumin. Also, whether or not samples are to be diluted must be considered. It should be noted that the results may be sensitive to the extent of dilution so various dilutions should be tested on aliquots of the same sample. Foremost, it is critical that cases and controls be collected, processed and stored under the same general conditions.

Protein Chip The SELDI system is designed for relatively high throughput analysis by chromatographic separation Cancer Informatics 2005:1(1) 1-1

of specific categories of molecules from complex molecular mixtures such as serum and for efficiently analyzing these molecules via time of flight mass spectroscopy. This task is performed using “protein chips” which are metal chips with usually 8 sample spots. Each spot is an area of the metal chip to which specific chromatographic material is strongly attached. When complex samples are applied to a spot, molecules with specific biochemical properties are chromatographically bound by the spot and are retained on the spot even after the spot is washed extensively to remove unbound material. The chips currently available from Ciphergen are listed in Table 2. For a more detailed explanation of how SELDI-TOF-MS operates see references 2, 12 and 19. It is important to understand that each type of chip (e.g., IMAC3 – copper activated) will produce a different protein spectra from the same sample than that produced by other types of chips. Therefore, using the same sample of serum, data from an IMAC3 Cu chip will be different from data obtained using an H50 chip. However, when similar chips are used, IMAC3 copper activated and IMAC30 copper activated, the spectra are more likely to be similar but not necessarily identical. Although users should be aware that when Ciphergen adds the hydrophobic boundary other parameters of the binding may change such that binding to the WCX2 chip differs from binding to the CM10 chip. In analysis, data on the same peaks from different types of chips should not be combined in training sets or testing sets because one cannot be certain that identical peaks are being evaluated. Several issues related to chip characteristics may affect the results of SELDI-TOF-MS analysis. First, proteins present in large concentrations (10,000 “X” of 10 kD protein) which have the same binding characteristics of proteins present at lower concentrations, 100 “X” of 15 kD protein may saturate binding characteristics of sample spot and prevent the retention of the protein present at a lower concentration. Once proteins are bound to the chip and the EAM is applied, the laser may not free/ionize equivalently all the proteins that were bound originally. As previously demonstrated in Figure 1, after 20 to 30 laser shots to the same site, there is a 93

Grizzle, Semmes, Bigbee et al

5903 Fractions on WCX

5987

5439

5352

Serum on C16

4819 Serum on C16

4972.1

4753 LCM on IMAC

4270

4358 LCM on IMAC

7914.0 Urine on SAX

Serum on WCX

Serum on WCX

Cells on H4

NAF on C16

LCM on IMAC

Serum on SAX

33,270 35,000

40,000

44,600

66,288

100,120 133,190

Purified on H4

Urine on SAX

Urine on SAX

Urine on SAX

Serum on C16

Serum on WCX

17922

18,220

Tissue on IMAC

23,420 25,167

17250 Tissue on IMAC

Bladder37 Colon38 Lymphoblastoid33 Cell Line

20,074

9964.6 Cells on H4

16570 Juice on IMAC

Lung35 Brain36

Kidney6 Liver38

22,810

9470 9495 NAF on C16 Urine on SAX

16008 16,087

16173

13794 13,952 14010

Tissue on H4

Tissue on IMAC Serum on WCX Tissue on IMAC

HNSCC4,32

Purified on H4

13262 13457 Urine on SAX

Pancreas11,34

Fractions on WCX Serum on WCX

12861 Fractions on WCX

Breast3,29,30,31

Bladder37 Colon38 Lymphoblastoid33 Cell Line

Purified on H4 Serum on WCX

8929 Fractions on IMAC

12348 Tissue on IMAC

Prostate12,13,14,27,28 Ovary9, 10

Brain36

Purified on H4

8563 Fractions on WCX

12000 10,000

Cells on SAX

20,000

Lung35

8000

Tissue on IMAC

8226

10,000

HNSCC4,32

6000

Cancer markers from various chip types 20,000 – 100,000 Da

D

Serum on WCX

8000

Prostate12,13,14,27,28 Ovary9, 10 Breast3,29,30,31,38 Pancreas11,34

Purified on H4

Kidney6 Liver38

6440.6

3972 3970 Serum on WCX

6000

Urine on SAX

3885 3900 Serum fractions on IMAC Serum on WCX

Bladder37 Colon38 Lymphoblastoid33 Cell Line

4000

Cancer markers from various chip types 8000 – 20,000 Da

C

Serum on SAX

4030 4030 3680 Serum on SAX

3146 Serum fractions on WCX

3353

3080 Serum on C16

Lung35 Brain36

3415.6 3448 3473

2950 Serum on SAX

HNSCC4,32

Urine on SAX

2582 Serum on C16

Prostate12,13,14,27,28 Ovary9, 10 Breast3,29,30,31,38 Pancreas11,34

NAF on C16 LCM on IMAC Serum fractions on WCX

2367

2465

4000

Serum on C16

2251 Serum on C16

2000

Serum on C16

2000

Serum on C16 Serum on C16

2092 2111

0

4107 4149.7 4153 4233.09

989 Serum on C16

Cancer markers from various chip types 4000 – 8000 Da

B

Serum on WCX NAF on C16 Serum on WCX

534 Serum on C16

Cancer markers from various chip types 0 – 4000 Da

A

100,000

20,000

Kidney6 Liver38

Prostate12,13,14,27,28 Ovary9, 10 Breast3,29,30,31,38 Pancreas11,34

HNSCC4,32

Lung35

Brain36

Bladder37 Colon38 Lymphoblastoid33 Cell Line

Kidney6 Liver38

Figure 9: Figure 9A is depicts cancer markers from various chip types, 0 to 4000 Da. Figure 9B depicts cancer markers from various chip types 4000 to 8000 Da. Figure 9C depicts cancer markers from various chip types 8000 to 20,000 Da. Figure 9D depicts cancer markers from various chip types 20,000 to 100,000 Da. The spectra in Figure 9 A-D demonstrate the informative spectral peaks (e.g., peak locations) reported in the literature (except those of Figure 8) for the detection of neoplasias in multiple organ systems using multiple samples and various types of protein chips.

marked deterioration in the spectral characteristics from that site.

Spectra Careful calibration of the instrument with proteins of known molecular weight and binding characteristics is extremely important because the raw data are measured as times of flight of the ions and the software must convert these “times” to molecular weights via the calibration step. Before samples are run, the molecular weights of interest should be determined and the instrument should be calibrated using molecular weight standards which are appropriate for the molecular weight range of interest. As discussed, if various areas of the spectrum are of interest, multiple triplicates of the same sample should 94

be run for each molecular weight range of interest. Thus a set of peaks in the range of 2,000 to 15,000 Daltons should be measured and analyzed separately from a set of peaks from 30,000 to 60,000 Daltons. This includes using a different EAM as well as a different set of calibration standards covering the range of 30 kD to 60 kD. At each range different laser and detector settings also may be required.

Analysis of Spectra Once the spectral data are obtained either a “directed” or “non-directed” approach is taken in analysis. In the “non-directed” approach, an amplitude for each mass to charge (M/Z) position in the spectra is determined and analysis begins after this determination. This approach may be more sensitive Cancer Informatics 2005:1(1) 1-1

Need for Understanding of SELDI/MALDI Mass Spectroscopy Data

to changes in peak location than the “directed” approach in which peaks are identified and peaks are matched based upon the resolution (± 0.2% mass) of the Ciphergen instrument. Alternatively a method of analysis based on determining the area under the curve (AUC) of spectral peaks can be used (Meleth et al, 2005). Subsequently the amplitude or the area under the curve of a spectra peak is determined. For each such measurement, informative peaks and AUCs are identified that separate disease from nondisease. Before choosing the first peak for an analytical algorithm, one could make a controversial distinction among peaks classifying them either as primary or secondary and evaluating their typical amplitudes or AUC, peaks with smaller amplitudes or AUCs may be less reproducible and hence less reliable in algorithms. Also, even though it has been proposed that changes that are greater in magnitude cause secondary peaks to decrease in disease patients in comparison to increases in primary peaks, the choice for the first peak in a decision algorithm might be a primary peak with a high amplitude due to the potentially more straight forward relationship with the disease process and a greater reliability in comparisons of disease with non-disease. How variable is the peak location (resolution) and peak amplitude when samples are run on the same day on the same machine? On different days on the same machine? On different machines? Our validation study of prostate cancer (Semmes et al, 2005) has demonstrated that SELDI-TOF-MS machines at widely separated sites can be calibrated and standardized and that separate sites can identify blindly the same diagnostic peaks that previously have been shown to be important in the diagnosis of prostate cancer (Semmes et al, 2005). All areas of the SELDI-TOF-MS spectrum are not the same from the standpoint of the early detection of disease. Areas of molecular weight of less than 1000 are in an area that is not standardized so that molecular sizes in this region cannot be determined accurately. Also, in the spectral area of less than 2000, peaks may be secondary to components of the energy absorbing molecule and/or other contaminants from, for example, plastics and the anticoaogulants in collection tubes. Instrumental noise may in addition be present in spectral areas less than 1000 M/Z. In the spectral area above 50,000 D, Cancer Informatics 2005:1(1) 1-1

many molecular species such as albumin are present in very high concentrations. Also, in our experience, the system is not very useful in detecting proteins of 50 kD or greater without adjustments (Malyarenko et al, 2005). Because the Ciphergen system is a low resolution mass spectrometer, a protein of molecular weight 10,000 D with a concentration of 1000 “X” will prevent a protein of 10,020 D with a concentration of 100 “X” and similar binding and release characteristics for the chip utilized from being detected. Diamandis has published several criticisms of the general SELDI approach (Diamandis 2003, 2004). In each of these criticisms Diamandis has argued that for the same type of cancer, different laboratories should be identifying the same peaks. This actually should not be the case and even for prostate cancer the identification of different peaks should be the rule rather than unusual result (Grizzle and Meleth, 2004) because different studies have used different chips which bind different proteins and because 100s of peaks may separate cancer from non-cancer and this plus the algorithms chosen may result in different peaks being selected. Similarly, Diamandis (23,24) and others have argued that SELDI may be identifying peaks that are characteristic of inflammatory aspects of neoplasia or epiphenomena of cancers in general (Malik et al, 2005). This clearly is an important issue not only with SELDI-TOF-MS but also with any current forms of mass spectroscopy which have sensitivities of orders of magnitude less than the sensitivities which are necessary to detect tumor products such as PSA and CA125. Some of the peaks identified to date and their association with a specific cancer are demonstrated in Figures 8 and 9. Of interest is that peaks for various cancers have varied to date; however, based on our argument this may be serendipitous rather than an indication that the peaks are specific for a specific cancer. Types of measurements and a discussion of issues regarding what is to be measured in spectra and various analytical approaches are discussed extensively in the other manuscripts of the volume and are beyond the scope of this manuscript. However, it is critical that all studies published include a learning/training set of samples followed by analysis of a test set of independent samples. 95

Grizzle, Semmes, Bigbee et al

Acknowledgements Supported in part by the Early Detection Research Network (EDRN) (CA86359-04 - WEG), (CA85067 – OJS), (CA84968 – WLB). We thank Ms. Libby Chambers for secretarial assistance.

References Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, Semmes OJ, Schellhammer PF, Yasui Y, Feng Z and Wright GL Jr. 2002. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res, 62(13):3609-3614. (#12) Ball G, Mian S, Holding F, Allibone RO, Lowe J, Ali S, Li G, McCardle S, Ellis IO, Creaser C, Rees RC. 2002. An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumoours and rapid identification of potential biomarkers. Bioinformatics, 18:395-404. (#36) Banez LL, Prasanna, P, Sun L, Ali A, Zou A, Adam L, McLeod DG, Moul JW and Srivastava, S 2003. Diagnostic potential of serum proteomic patterns in prostate cancer. J Urol, 170:442-446. (#14) Carter D, Douglass JF, Cornellison CD, Retter MW, Johnson JC Bennington AA, Fleming TP, Reed SG, Houghton RL, Diamond DL, Vedvick TS. 2002. Purification and characterization of the mammaglobin/lipophilin B complex, a promising diagnostic marker for breast cancer. Biochemistry, 41:6714-6722. (#30) Cazares LH, Adam BL, Ward MD, Nasim S, Schellhammer PF, Semmes OJ, Wright GL Jr. 2002. Normal, benign, preneoplastic, and malignant prostate cells have distinct protein expression profiles resolved by surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res, 8:2541-2552. (#28) Diamandis EP. 2003. Point proteomic patterns in biological fluids: do they represent the future of cancer diagnosis? Clin Chem, 49(8):1272-1275. (#23) Diamandis EP. 2004. Commentary: analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems. J Natl Cancer Inst, 96(5):353-356. (#24) Diamond DL, Zhang Y, Gaiger A, Smithgall M, Vedvick TS, Carter D. 2003. Use of ProteinChip array surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) to identify thymosin beta-4, a differentially secreted protein from lymphoblastoid cell lines. J Am Soc Mass Spectrom 14:760-765. (#33) Grizzle WE, Meleth S. 2004. Clarification in the point/counterpoint discussion related to Surface-Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometric identification of patients with adenocarcinomas of the prostate: Proteomic pattern complexity reveals a rich and unchartered continent of biomarkers. Clin Chem; 50(8):1475-7. (#25) Grizzle WE, Adam BL, Bigbee WL, Conrads TP, Carroll C, Feng Z, Izbicka E, Jendoubi M, Johnsey D, Kagan J, Leach RJ, McCarthy DB, Seemes OJ, Srivastava S, Thompson IM, Thornquist MD, Verma M, Zhang Z, Zou Z. 2003-2004. Serum protein expression profiling for cancer detection: validation of a SELDI-based approach for prostate cancer. Disease Markers 19:185-195. (#17) Grizzle WE, Semmes OJ, Bigbee WL, Malik G, Miller E, Manne B, Oelschalger DK, Zhu L, Manne U. 2005. Use of high throughput mass spectrographic methods to identify disease processes with emphasis on SELDI-TOF-MS methods. In: Molecular Diagnostics. (Eds. G. Patrinos, W. Ansorg). (#18) Koopmann J, Zhang Z, White N, Rosenzweig J, Fedarko N, Jagannath S, Canto MI, Yeo CJ, Chan DW, Goggins M. 2004. Serum diagnosis of pancreatic adenocarcinoma using surface-enhanced laser desorption and ionization mass spectrometry. Clin Cancer Res, 10:860-868. (#11) Kozak KR, Amneus MW, Pusey SM, Su F, Luong MN, Luong SA, Reddy ST, Farias-Eisner R. 2003. Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: Potential use in diagnosis and prognosis. Proc Natl Acad Sci, USA 100: 12343-12348. (#10) Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. 2002. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 48:1296-1304. (#2)

96

Malik G, Ward MD, Gupta SK, Trosset MW, Grizzle WE, Adam B-L, Diaz JI, Semmes OJ. Serum levels of an isoform of apolipoprotein A-II as a potential marker for prostate cancer. 2005. Clin Can Res 11:1073-1085. (#26) Malyarenko DI, Cooke WE, Adam B-L, Malik G, Chen H, Tracy ER, Trosset MW, Sasinowski M, Semmes OK, Manos DM. 2005. Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using timeseries analysis techniques. Clin Chem;51(1); 65-74. (#22) Meleth S, Eltoum IA, Zhu L, Oelschlager D, Piyathilake C, Grizzle WE. 2005. Novel approaches to smoothing and comparing SELDI TOF spectra. Cancer Informatics (current issue). (#20) Paweletz CP, Trock B, Pennanen M, Tsangaris T, Magnant C, Liotta LA, Petricoin EF 3rd. 2001. Proteomic patterns of nipple aspirate fluids obtained by SELDI-TOF: potential for new biomarkers to aid in the diagnosis of breast cancer. Dis Markers, 17:301-307. (#31) Pepe MS, Etzioni R., Feng Z, Potter JD, Thompson ML, Thornquist M, et al., Phases of biomarker development for early detection of cancer, J Natl Cancer Inst 93 (2001), 1054-1061. (#19) Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, and Liotta LA. 2002. Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359:572-577. (#9) Petricoin III EF, Ornstein DK, Paweletz CP, Ardekani A, Hackett PS, Hitt B.A, Velassco A, Trucco C, Wiegand L, Wood K, Simone CB, Levine PJ, Linehan WM, Emmert-Buck MR, Steinberg SM, Kohn EC, Liotta LA. 2002. Serum Proteomic Patterns for Detection of Prostate Cancer. J Natl Cancer Inst, 94(20):1576-1578. (#13) Poon TCW, Yip T-T, Chan ATC, Yip C, Yip V, Mok TSK, Lee CCY, Leung TWT, Ho SKW, Johnson PJ. 2003. Comprehensive proteomic profiling identifies serum proteomic signatures for detection of hepatocellular carcinoma and its subtypes. Clin Chem, 49: 752-760. (#7) Qu Y, Adam BL, Yasui Y, Ward MD, Cazares LH, Schellhammer PF, Feng Z, Semmes OJ, Wright GL Jr. 2002. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin Chem, 48:1835-1843. (#27) Rosty C, Christa L, Juzdzal S, Baldwin WM, Zahurak ML, Carnot F, Chan DW, Canto M, Lillemoe KD, Camerson JL, Yeo CJ, Hruban RH, Goggins M. 2002. Identification of hepatocarcinoma-intestine-pancreas/ pancreatitis-associated protein I as a biomarker for pancreatic ductal adenocarcinoma by protein biochip technology. Cancer Res, 62:1868-1875. (#34) Semmes OJ, Feng Z, Adam B-L, Banez LL, Bigbee BL, Campos D, Cazares L, Chan DW, Grizzle WE, Izbicka E, Kagan J, Malik G, McLerran D, Moul JW, Partin A, Prasanna P, Rosenzweig J, Sokoll LJ, Srivastava S, Srivastava S, Thompson I, Welsh MJ, White N, Winget M, Yasu Y, Zhang Z, Zhu L. 2005. Evaluation of serum protein profiling by surfaceenhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: 1. Assessment of platform reproducibility. Clin Chem , 51(1);102-112. (#21) Shiwa M, Nishimura Y, Wakatabe R, Fukawa A, Arikuni H, Ota H, Kato Y, Yamori T. 2003. Rapid discovery and identification of a tissue-specific tumor biomarker from 39 human caner cell lines using the SELDI ProteinChip platform. Biochem Biophys Res Commun 309:18-25. (#38) Sidransky D, Irzarry R, Califano JA, Li X, Ren H, Benoit N, Mao L. 2003. Serum protein MALDI profiling to distinguish upper aerodigestive tract cancer patients from control subjects. J Natl Cancer Inst, 95:1711-1717. (#5) Thompson I, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL, Minasian LM, Ford LG, Lippman SM, Crawford ED, Crowley JJ, Coltman CA. 2004. Prevalence of prostate cancer among men with a prostatespecific antigen level ≤ 4.0 ng per milliliter. N Eng J Med, 350(22), 22392246. (#15) Urban D, Myers R, Manne U, Weiss H, Mohler J, Perkins D, Marklewicz M, Lieberman R, Kelloff G, Marshall M and Grizzle W. 1999. Evaluation of Biomarker Modulation by Fenreinide In Prostate Cancer Patients. Eur Urol, 35(5-6):429-438. (#16) Vlahou A, Schellhammer PF, Mentrinos S, Patel K, Kondylis FI, Gong L, Nasim S, Wright Jr GL. 2001. Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine. Am J Path 158:1491-1502. (#1) Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D, Perry RR, Wright Jr GL, Semmes OJ. 2003. A novel approach toward develop-

Cancer Informatics 2005:1(1) 1-1

Need for Understanding of SELDI/MALDI Mass Spectroscopy Data ment of a rapid blood test for breast cancer. Clin Breast Cancer 4:203209. (#3) Von Eggeling F, Junker K, Fiedle w, Wollscheid V, Durst M, Claussen U, Ernst G. 2001. Mass spectrometry meets chip technology: a new proteomic tool in cancer research? Electrophoresis, 22:2898-2902. (#37) Wadsworth JT, Somers KD, Stack BC, Jr, Cazares L, Malik G, Adam BL, Wright GL Jr, and Semmes OJ. 2004. Identification of patients with head and neck cancer using serum protein profiles. Arch Otolaryngol Head Neck Surg, 130:98-104. (#4) Won Y, Song HJ, Kang TW, Kim JJ, Han BD, Lee SW. 2003. Pattern analysis of serum proteome distinguishes renal cell carcinoma from other urologic diseases and healthy persons. Proteomics, 3:2310-2316. (#6)

Cancer Informatics 2005:1(1) 1-1

Wu W, Tang X, Hu W, Lotan R, Hong WK, Mao L. 2002. Identification and validation of metastasis-associated proteins in head and neck cancer cell lines by two-dimentional electrophoresis and mass spectrometry. Clin Exp Metastasis, 19:319-326. (#32) Wulfkuhle JD, McLean KC, Paweletz CP, Sgroi DC, Trock BJ, Steeg PS, Petricoin EF 3rd. New approaches to proteomic analysis of breast cancer. Proteomics, I:1205-1215. (#38) Xiao X, Liu D, Tang Y, Guo F, Xia L, Liu J, He D. 2003-2004. Development of proteomic patterns for detecting lung cancer. Disease Markers, 19:3339. (#8) Zhukov TA, Johanson RA, Cantor AB, Clark RA, Tockman MS. 2003. Discovery of distinct protein profiles specific for lung tumors and premalignant lung lesions by SELDI mass spectrometry. Lung Cancer, 40:267-279. (#35)

97