Evaluation of three strategies using DNA markers for traceability

DNA markers have been proposed as a tool for traceability. These markers ... Both microsatellite markers or single nucleotide polymorphism markers were considered. ..... Sequenom (TM) MASS-Array system, and assuming large number of ...
142KB Taille 1 Tlchargements 4 vues
Aquaculture 250 (2005) 70 – 81 www.elsevier.com/locate/aqua-online

Evaluation of three strategies using DNA markers for traceability in aquaculture species Ben HayesT, Anna K. Sonesson, Bjarne Gjerde AKVAFORSK, Institute for Aquaculture Research, P.O. 5010, 1432 A˚s, Norway Received 14 May 2004; received in revised form 27 January 2005; accepted 2 March 2005

Abstract Traceability schemes for aquaculture species are essential for tracing market product to farm of origin in the event of detection of disease or toxins in the market fish. DNA markers have been proposed as a tool for traceability. These markers can be used to genotype fish by taking a sample from live fish or fish product at any stage along the production chain. In this paper, we consider three alternate traceability schemes using DNA markers. The example of the Norwegian farmed Atlantic salmon industry was used. This industry, like many aquaculture industries, has three tiers, the nucleus, multiplier and commercial tiers. The nucleus individuals are grandparents of the commercial fish, and the multiplier individuals are the parents of the commercial fish. The traceability strategies we considered were: (1) FS, assignment of market place fish to full sib families based on the marker information (this strategy assumes all individuals from a full sib family are allocated to a single farm and a limited number of fish, representing all full sib families on that farm, are genotyped); (2) PAR, assignment of market place fish to parents (multiplier individuals) and (3) GRAND, assignment of market place fish to grandparents (nucleus individuals). Using simulation, we determined the number of DNA markers required to achieve 95% of correct assignment decisions for each strategy. The simulation included a wild population which contributed to market place fish. The wild fish were correctly assigned if they were excluded from belonging to the farmed population in each strategy, otherwise they were incorrectly assigned. Both microsatellite markers or single nucleotide polymorphism markers were considered. Seventy five, 15, and 50 microsatellites were required to achieve 95% correct assignment decisions for FS, PAR and GRAND, respectively. Four hundred, 75 and 200 SNPs were required to achieve 95% correct assignment decisions for FS, PAR and GRAND, respectively. If the cost of genotyping microsatellites is assumed to be five times as high as genotyping a SNP, GRAND using SNP markers is the cheapest strategy. The logistics of implementing each strategy are discussed. GRAND in particular and PAR in some industries requires complicated logistics. The most suitable and cost effective traceability strategy for a particular industry will depend heavily on the organisation of that industry, for example

T Corresponding author. Tel.: +47 6494 9542; fax: +47 6494 9502. E-mail address: [email protected] (B. Hayes). 0044-8486/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.aquaculture.2005.03.008

B. Hayes et al. / Aquaculture 250 (2005) 70–81

71

the degree of recording transfer of fish, eggs and larvae between tiers. Even if complicated logistics prevent the adoption of marker based schemes by some industries, traceability with DNA markers may still be important for verification of labellingbased schemes. D 2005 Elsevier B.V. All rights reserved. Keywords: Traceability; SNP markers; Microsatellites; Accuracy; Aquaculture species

1. Introduction Traceability schemes allow consumers to obtain information on the origin and the production chain of food products. Such schemes are essential for tracing of fish back to farm of origin, for example in the event of detection of disease or toxins in market fish. Such schemes could also be instrumental in monitoring producers to minimize the number of escapees from the farms and thus reducing the environmental load of farming (for example parasite problems and possible interbreeding with wild fish). Ha˚stein et al. (2001) reviewed methods of traceability available for aquaculture species. The methods reviewed included external tags, chemical marking using inorganic substances, physical marking (e.g. fin clipping), labelling of product (e.g. documents relating to movement, invoices, etc) and DNA markers. Using DNA markers in traceability schemes is attractive for a number of reasons. The DNA markers can be genotyped by taking a sample from the fish or fish product at any stage along the production chain, and analysing this sample in the laboratory. A very small sample of tissue is required for DNA analysis, so the method can also be used on live fish. Two types of DNA markers have been suggested for use in traceability schemes, microsatellites and single nucleotide polymorphisms (SNPs). Considerable numbers of microsatellite loci are now available for tilapias, rainbow trout, Atlantic salmon (e.g. Kocher et al., 1998; Sakamoto et al., 2000; Gilbey et al., 2004) and have been or are being developed for many other species, while large numbers (N1000) of SNPs have been described for both salmonoids (Hayes et al., 2004; Smith et al., 2003) and catfish (He et al., 2003). Microsatellite markers are highly informative, with many alleles at each locus, while SNPs typically have only two alleles per locus. However, microsatellites are more expensive than SNPs to genotype (Glaubitz et al.,

2003). DNA markers can also be used to verify that traceability schemes using other methods, such as labelling, are accurate. One option to implement a traceability scheme with DNA markers would be to genotype all farmed fish and store their genotypes in a data base. When fish are sampled from the market place, their genotypes would be compared to the genotypes in the data base, in order to determine farm of origin. Unfortunately for many aquaculture industries, the large number of farmed animals and thus the enormous cost of genotyping required mean that this option is not feasible. So schemes which reduce the amount of genotyping are required. One alternative scheme would be possible if entire full sib families are always allocated to a single farm. In this case, if enough individuals are genotyped from a single farm such that all full sib families are represented in the sample, then a fish from the market place can be assigned to farm of origin by reconstructing the full sib relationships from the marker genotypes. Other alternative traceability schemes with DNA markers could take advantage of the multi-tier structure which exists in most aquaculture industries. Generally a relatively small nucleus population, where selective breeding takes place, supplies eggs or larvae to much larger grow-out or commercial operations. So a feasible traceability scheme may be to genotype only the parent individuals in the nucleus and store their genotypes. Fish sampled from the market place (offspring of nucleus individuals) could then be assigned to their parents using marker information. Assignment of fish to parents or populations with a high degree of accuracy with DNA markers has been demonstrated using simulated data (e.g. SanCristobal and Chevalet, 1997) and in living populations of turbot, rainbow trout and Atlantic salmon (Estoup et al., 1998; Letcher and King, 2001; Villanueva et al., 2002). It is important

72

B. Hayes et al. / Aquaculture 250 (2005) 70–81

to note that this strategy does not achieve full traceability directly, as market fish are traced back to nucleus parents. To further trace fish to farm of origin, the allocation of eggs resulting from a mating in the nucleus to commercial operations would have to be recorded. In some industries this may, due to complicated logistics, become a difficult task. Some aquaculture industries, such as the Norwegian salmon industry, have an additional tier, the multiplier. This tier is required when the nucleus is unable to supply all commercial farms with eggs or larvae. The multiplier tier takes eggs or larvae from the nucleus, grows these to broodfish, mates these broodfish and then supplies the commercial tier with the resulting eggs or larvae. In industries which have this three tier structure, another traceability strategy is possible. All the parent fish in the multipliers are genotyped, and this information is stored in a data base. Fish sampled from the market place can then be assigned back to the nucleus parents (i.e., their grandparents) or to the multiplier parent (i.e., their parents). Letcher and King (2001) showed that fish could be accurately assigned to their grandparents provided that a sufficient number of markers are used. This scheme has quite complicated logistic requirements, as the destination of both the eggs from the nucleus matings and eggs from the multiplier matings must be recorded. The aim of this paper was to assess the feasibility, in terms of the number of markers and number of fish required to be genotyped, of alternate traceability schemes using DNA markers in aquaculture species. We have used computer simulation to evaluate alternate schemes. The ultimate goal of traceability systems is to trace a fish from any point in the production chain to any other point in the production chain. None of the strategies above achieve this directly, and the logistical considerations required by each strategy are discussed.

2. Methods 2.1. Structure of the simulation We have loosely based our simulation on the example of the Norwegian salmon industry. Two closed nucleus populations, representing two breeding

companies or two different sub-populations of one breeding company, were simulated. The founders of the nucleus were sampled from a large simulated wild population (of 1000 individuals). For ten generations, within each nucleus, 30 males and 60 females were selected at random and mated, in the mating ratio of one male to two females, and with 10 offsprings per mating. After the ten generations of breeding (random mating and no selection) in the nuclei, 30 males and 60 females were randomly selected from each of the nuclei populations and mated to produce two multiplier populations. From these two multiplier populations, 300 males and 600 females were selected at random from each population and mated to produce a total commercial population of 12,000 fish. These fish belonged to fifty commercial operations, with the offspring from the first 24 of the matings among the parents belonging to the first commercial operation, the offspring of the next 24 matings belonging to the next commercial operation, and so on. As there were 10 offspring per family, there were 240 individuals per commercial operations. Simultaneously, the wild population continued to breed, maintaining a constant size of 1000. Note that the wild population here could also included farmed fish originating from breeding operations in different countries (as long as parents or grandparents did not belong to the nucleus or multiplier operations described above). Additionally, in practise the wild fish may in fact be escapees from fish farms. However in this case they should be identified by the traceability schemes (see methods below and Discussion) as being of farmed origin. The commercial and wild population together constituted the fish in the market place. A single fish was sampled from the market place for purposes of assignment. There were at least 200 replicates for each scheme. In our simulation, a fish had two marker alleles at each of a number of independently segregating loci. Markers were either microsatellites (10 alleles per loci, average heterozygosity 0.8), or single nucleotide polymorphisms (SNPs) (two alleles per loci). In the base population, the frequency of the 10 microsatellite alleles was sampled from a Poisson distribution (Fig. 1), for each locus. This distribution was chosen based on the results of Skaala et al. (2004), who reported an average of 9.9 alleles segregating within different

B. Hayes et al. / Aquaculture 250 (2005) 70–81

73

Frequency in base population

0.25

0.2

0.15

0.1

0.05

0 1

2

3

4

5

6

7

8

9

10

Microsatellite allele Fig. 1. Frequency of alleles at each microsatellite locus in the simulated base population.

Atlantic salmon populations for 12 microsatellites, with an average heterozygosity up to 0.76 in some wild populations, and to reflect to some degree the observation that for many loci in fish populations there are a small number of alleles at moderate frequencies, and a large number of alleles at low frequencies (e.g. Letcher and King, 2001). SNPs had two alleles with frequencies of 0.8 and 0.2 (at least 1200 human SNPs reviewed by Marth et al. (2001) had minor allele frequencies of 0.2 or greater) giving an average heterozygosity of 0.32. A progeny from the mating of two parents received an allele from each parent, with equal probability of an allele being transmitted from either of the parent’s alleles. The number of microsatellite markers simulated was between 5 and 200, the number of SNPs between 5 and 500. For computational reasons, we were unable to simulate the full size of the Norwegian salmon industry. Additionally, the smolt rearing and growout operations which are a part of this industry were considered as a single entity (a commercial operation), as both these sets of operations use fish from the same generation. The population simulated is shown in Fig. 2. 2.2. Traceability strategies Three different traceability strategies were then compared for the number of markers (either micro-

satellites or SNPs) required to achieve a 95% level of accuracy in assigning fish back to commercial operation, multiplier or nucleus of origin. 2.2.1. Strategy FS In strategy FS (for full sib), either 25, 50, 100 or 150 fish were sampled from each commercial operation at random and genotyped. A fish was then sampled from the market place. The fish from the commercial operations and the fish sampled from the market place were genotyped for 25, 50, 100, 150 or 200 markers (microsatellites or SNPs). Using this marker information, the relationship between the market fish and each of the fish sampled from the commercial operations was calculated as in Eding and Meuwissen (2001). For a given single locus, a similarity index S xy between two individuals x and y is calculated, where S xy = 1 when genotype x = ii (i.e. both alleles at loci l are identical) and genotype y = ii, or when x = ij and y = ij. S xy = 0.5 when x = ii and y = ij, or vice versa, S xy = 0.25 when x = ij and y = ik, and S xy = 0 when the two individuals have no alleles in common at the locus. The similarity as a result of chance alone was s¼

a X

p2i

i¼1

p i is the frequency of allele i in the (random mating) population, and a is the number of alleles at the locus.

74

B. Hayes et al. / Aquaculture 250 (2005) 70–81 Wild population (effective population size =1000)

500

x 500 Sampling of wild population for nuclei foundation

Nucleus 1

Grandparents

Nucleus 2

60

x 120

Multipliers

Parents

Multipliers

600

x 1200

1000 Offspring

Offspring Commercial operation

Commercial Commercial operation operation

Commercial operation

Commercial operation

12000 Offspring, split into 50 commercial operations of 240 fish each

Breeding in the wild

Market Fig. 2. Population structure for simulation. There were two nuclei populations, two multipliers, and fifty commercial operations. Thirty males and 60 females were selected from each nucleus to breed each multiplier tier, and 300 males and 600 females were selected from each multiplier to breed 12,000 commercial offspring, split across 50 commercial operations of 240 fish each.

Then the relationship between individuals x and y at locus l is calculated as   rl ¼ Sxy  s =ð1  sÞ Overall r which utilized information from all loci was computed as an average value from across all loci. This index is appropriate in our case, as it accounts for inbreeding that is expected to occur in a relationship of finite size (Eding and Meuwissen, 2001). The threshold value of r for two individuals to be considered as full sibs was set at 0.375 for microsatellites and 0.5 for SNPs, as in 100 replicate simulations with large numbers of full sibs and markers these were the minumum r values for two full sibs. If the value of r did not exceed the threshold for any of the comparisons, the fish sampled from the market place was assumed to be of wild origin. Two hundred replicate samples were performed, and the proportion of correct assignment decisions (fish sampled from the market place correctly allocated to commercial operation of origin, or to the wild) was calculated.

2.2.2. Strategy PAR For strategy PAR (for parents), and for a fish sampled from the market place, we calculated the probabilities that the fish came from any of the possible pairs of parents, following Letcher and King (2001). For each marker, the probability that an offspring with the genotype A i A j is derived from parents with genotype A a A b and A c A d , is:    Pr Ai Aj jðAa Ab Þ; ðAc Ad Þ ¼ T ðijabÞT ð jjcd Þ þ T ð jjabÞT ðijcd Þ where T(i|ab) = Pr([A i ]|(A a A b ),(A c A d )) =½(a = i) +½ (b = i), and (a = i) and is (b = i) are Boolean operator that give the value of one if the allele value of a equals the allele value of i, or zero otherwise. If the offspring is a homozygote Pr([A i A j ]|(A a A b ),(A c A d )) is divided by two. The global likelihood for the offspring conditional on the parental pair is the product of all single locus likelihoods. The fish sampled from the market place is considered to be the offspring of the parental pair with the highest global likelihood. If all

B. Hayes et al. / Aquaculture 250 (2005) 70–81

the global likelihoods are zero, the fish is considered to be of wild origin. A sampled fish was correctly assigned if it was from multiplier parents, and was assigned to the correct parents, or if the sampled fish was of wild origin, it was correctly assigned if it was excluded as the offspring of multiplier parents. 2.2.3. Strategy GRAND In strategy GRAND (for grandparents), the exclusion probabilities were extended from those derived for parentage assignment (as above) to assignment of grandprogeny to grandparents, as described by Letcher and King (2001). A sampled fish was correctly assigned if it was from nuclei grandparents, and was assigned to the correct grandparents, or if the sampled fish was of wild origin, it was correctly assigned if it was excluded as the offspring of multiplier parents.

3. Results 3.1. Proportion of correct assignment decisions from FS Our results suggest dependencies between the number of fish genotyped from each commercial

75

operation and the number of markers that need to be genotyped in the FS strategy, in order to exceed 0.95 of correct assignment decisions (Fig. 3, microsatellites; Fig. 4, SNPs). If fewer fish are sampled from each commercial operation, a larger number of markers must be genotyped in order to achieve 0.95 correct assignment decisions. With microsatellites (Fig. 3), the lowest number of total genotypings required to achieve 0.95 correct assignment decisions (e.g., number of markers per fish times number of fish sampled per commercial operation times number of commercial operation) was achieved when 100 fish were sampled per commercial operation, and these fish were genotyped for about 75 markers. As there were 24 full sib families per commercial operation, sampling 100 fish per farm should give on average 4 fish sampled per full sib family. There was no advantage (in terms of correct proportion of assignment decisions) in sampling more than 100 fish per commercial operation, indicating that in a sample of this size all full sib families in the commercial operation are sufficiently represented. With SNPs, the 0.95 threshold was reached with about 175 SNPs (150 fish sampled), 225 SNPs (100 fish sampled) or 400 SNPs (50 fish sampled) (Fig. 4).

Proportion of correct assignment decisions

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

0.95 Threshold 150 fish genotyped per farm 100 fish genotyped per farm 50 fish genotyped per farm 25 fish genotyped per farm

0.2 0.1 0 0

50

100

150

200

Number of microsatellite markers Fig. 3. Proportion of correct assignment decisions from strategy FS with increasing numbers of fish sampled from fifty commercial operations and genotyped for an increasing number of microsatellite markers.

B. Hayes et al. / Aquaculture 250 (2005) 70–81

Proportion of correct assignment decisions

76

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3

0.95 Threshold 150 fish genotyped per farm 100 fish genotyped per farm 50 fish genotyped per farm 25 fish genotyped per farm

0.2 0.1 0 0

50

100

150

200

250

300

350

400

450

500

Number of SNP markers Fig. 4. Proportion of correct assignment decisions from strategy FS with increasing numbers of fish sampled from each of fifty commercial operations and genotyped for an increasing number of SNP markers.

3.2. Proportion of correct assignment decisions from PAR and GRAND The proportion of correct assignment decisions in PAR increased rapidly as more markers were used (Fig. 5). Ninety five percent of correct assignment decisions were achieved with about 15 microsatellites, while about 75 SNP markers were required to achieve the same level of accuracy.

A substantially greater number of markers, either microsatellites or SNPs, were required using GRAND to achieve 0.95 of correct assignment decisions compared with PAR. The number of microsatellites required was increased approximately 3 fold from 15 to 50. The number of SNPs required was also increased approximately 3 fold (75–200). While the proportion of correct assignment decisions increased rapidly as more markers were used in PAR (both

1

Proportion of correct assignment decisions

0.9 0.8 0.7 0.6 0.5 0.4 0.3

PAR Microsatellites PAR SNPs GRAND Microsatellites GRAND SNPs 0.95 Threshold

0.2 0.1 0 0

20

40

60

80

100

120

140

160

180

200

Number of Markers Fig. 5. Proportion of correct assignment decisions from strategies PAR and GRAND with increasing number of microsatellite and SNP markers.

B. Hayes et al. / Aquaculture 250 (2005) 70–81

SNPs and microsatellites) and GRAND with microsatellites, the proportion of correct assignment decisions increased only gradually as more markers were added for GRAND with SNPs. This was probably because of the low information content of these markers, together with the large number of potential grandparental combinations.

4. Discussion Our results suggest that, for the industry simulated, accurate assignment of a fish sampled from the market place to either the wild population or to the farmed population can be achieved using either microsatellite or SNP markers. For fish assigned to the farmed population, assignment to parents, grandparents or full sib groups is possible. Strategy PAR required the fewest number of markers for 95% correct assignment decisions (15 microsatellites or 75 SNPs), followed by GRAND (50 microsatellites or 200 SNPs) and FS, (75 microsatellites or 400 SNPs). We can compare the relative genotyping costs of each strategy, by evaluating at the number of markers required to achieve a 0.95 accuracy of assigning the market place fish to parents, grandparents or full sib family, or the wild population, and assuming a ratio of cost of genotyping a microsatellite to the cost of genotyping a SNP was 5 : 1, Table 1. This relative cost of genotyping is based Table 1 Relative genotyping costs required to achieve N0.95 correct assignment decisions for FS, PAR and GRAND Microsatellites (100 fish sampled per commercial operation) FS Number of markers Cost PAR Number of markers Cost GRAND Number of markers Cost

SNPs (100 fish sampled per commercial operation)

75

400

9,000,000

4,800,000

15

75

135,000

135,000

50

200

45,000

36,000

Cost of genotyping SNP= 1 unit, cost of genotyping microsatellites = 5 units.

77

on a comparison of capillary electrophoresis microsatellite genotyping and SNP genotyping with the Sequenom (TM) MASS-Array system, and assuming large number of each marker are to be genotyped (in true currency the costs of genotyping are currently 10 NOK and 2 NOK approximately, for microsatellites and SNPs respectively). The costs are incurred each time a new set of commercial operation (FS), parents (PAR), or grandparents (GRAND) are used. As there are fewer grandparents than parents, the cheapest strategy, considering relative genotyping costs only, was GRAND with SNPs. However these costs ignore the additional cost of logistic considerations required for traceability in each strategy (discussed below). 4.1. Comparison of results to those from other studies In PAR, the proportion of correct assignment decisions depends on the number of loci, the allelic diversity at these loci (number of alleles and the distribution of their frequencies), the number of offspring and the number of parents and possible mating combinations. The conclusion from a number of studies, with a wide range of numbers of possible parents and offspring, indicate that between 6 and 10 microsatellite markers, with 6–10 alleles per locus, are sufficient to accurately assign offspring to the correct parental pair (e.g., Bernatchez and Duchesne, 2000; Estoup et al., 1998; SanCristobal and Chevalet, 1997; Letcher and King, 2001; Villanueva et al., 2002). Our results roughly concur with these studies: 15 microsatellites were required to assign progeny sampled from the market place to the correct parental pair (1800 possible parents), or exclude the possibility that the sampled fish was the offspring of any of the 1800 parents (e.g. a fish originating from the wild), in 95% of replicates. When SNP markers were used, the number of markers required to achieve the same accuracy was 5 fold greater, reflecting the lower allelic diversity of these markers. Of course there is scope here to make some pre-selection of the SNPs used to increase their informativeness (select those with the highest frequency of the rare allele). However the maximum heterozygosity of SNPs is still only 0.5, much lower than the heterozygosity of a typical microsatellite. In GRAND, the proportion of correct assignment decisions is dependent on similar parameters as in

78

B. Hayes et al. / Aquaculture 250 (2005) 70–81

Proportion of correct assignment decisions

PAR; the number of loci, allelic diversity at these loci, the number of offspring and the number of grandparents and possible mating combinations among these grandparents. In our simulation scheme, the number of possible grandparents (180) was ten times less than the number of parents (1800). However, the number of possible matings among the grandparents to produce parents and then progeny is much higher than among the parents to produce progeny (1804 = 1.05  109 compared with 18002 = 3.24  106, respectively). To achieve the same level of correct assignment decisions (0.95), approximately 3 times as many microsatellites were required for assignment of offspring to grandparents compared with assignment of offspring to parents, or to the wild population. Letcher and King (2001) used simulation to assess the number of loci and number of alleles at these loci required for accurate assignment of fish to parents or grandparents in an Atlantic salmon population (though in a much smaller population than considered here). In Table 1 of their manuscript, 4 loci with 15 alleles were required for 95% accuracy of assigning fish to parents, while 16 loci with 18 alleles were required to achieve the same accuracy when fish were assigned to grandparents. These results (4 fold increase in the number of markers required from PAR to GRAND at the same proportion of correct assignment decisions) are in rough agreement with ours.

The proportion of correct assignment decisions from FS is primarily determined by two parameters. One is the accuracy of estimating the relationship between a fish sampled from the market place and a fish genotyped from the commercial operation (Fig. 6). This depends on the number of markers used to estimate the relationship. The results of Glaubitz et al. (2003) suggest at least 16–20 microsatellites or 100 SNP markers (frequency of rare allele of 0.2) are required to accurately (proportion of correct assignment decisions of 0.95) determine whether two individuals are full sibs or unrelated. Our results indicate larger numbers of markers may be required for the situation we evaluated. Approximately 75 microsatellites were required before the proportion of correct decisions reached 0.95. The large discrepancy may be a result of our sampling scheme, where there is a chance that not all the full sib families are represented in the sample, the second parameter of importance. If the number of markers is low, more individuals from each full sib family must be included in the sample, as each relationship between the fish sampled from the market place to a fish sampled from the commercial operation will not be estimated very accurately. As the number of full sib families per commercial operation is decreased, the sample size taken per commercial operation can also be decreased, as the

1 0.9

Wild

0.8

Farmed

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

20

40

60

80

100

120

140

160

180

200

Number of Markers Fig. 6. Proportion of correct assignment decisions from strategy FS, when the fish sampled in the market place was either from a commercial operation or from the wild population.

B. Hayes et al. / Aquaculture 250 (2005) 70–81

probability of including representatives of all full sib families increases. 4.2. Limitations of our study A 95% level of accuracy was used in this study as the threshold for daccurate assignmentT. Whether this level is sufficient will depend on the specific goals of the traceability scheme. While a 95% level may be sufficient for example labelling fish products with farm of origin to inform consumer choice, a 100% level would be required in the situation where a farmer has exposed a group of fish to a toxin, and 100% of exposed fish need to be identified. The number of markers required for 100% accuracy of assignment with the different strategies (if it is attained) can be found in Figs. 3–5. We have assumed in our simulations that all of the DNA markers were unlinked. This will not be the case especially with the large numbers of SNPs required in the FS and GRAND strategies. When there is nonindependence (i.e. linkage) between some of the markers, more markers may be required to achieve the same level of accuracy of assignment. Further simulations are required to determine the effects of linkage on the number of SNPs required for accurate assignment. The size of the aquaculture industry we were able to simulate was limited by the computing time required in the GRAND strategy. We were unable to simulate the full size of a large aquaculture industry such as the Norwegian salmon aquaculture industry: for example the number of grandparents simulated was 180, approximately 1 / 5 of the number of grandparents used in the industry, the number of parents simulated was 1800, approximately 1 / 17 the number of the number of parents of commercial progeny used each year in the Norwegian salmon industry. Thus, we may ask whether our results be applied to an industry which is at least ten times larger than the one we have simulated? Let us consider the PAR strategy first. Bernatchez and Duchesne (2000) derived analytical formulas to determine the number of loci required to reach a given level of assignment success with different numbers of parents. Their results indicated that increasing the number of parents from 50 to 100 required more loci to achieve 90% assignment success, while increasing the numbers of

79

parents from 100 to 300 generally did not require more loci to achieve 90% assignment success. The results of Letcher and King (2001) support this conclusion: in their study 10 microsatellite loci with 6 alleles were sufficient to assign progeny to 50, 110, 210, 310 or 410 parents with greater than 95% accuracy. Villanueva et al. (2002) reported that 10 loci with 6 alleles were sufficient to assign progeny from crosses among either 200 or 800 parents with close to 100% accuracy. So our result of approximately 15 microsatellite loci or 75 SNPs to achieve 95% probability of correct assignment decisions may also hold with much larger numbers of parents. For the strategy of assigning market fish to grandparents of origin (GRAND), there has been no investigation in the literature into the effect of increasing the number of grandparents on the accuracy of assigning the grand offspring. To investigate this, we simulated a number of populations with smaller numbers of grandparents than that used above (results not shown). In general, we found changing the number of grandparents did not greatly alter the conclusions, i.e. that approximately 50 microsatellites and 200 SNPs are required for accurate assignment of fish to grandparents. Letcher and King in their simulation study concluded that 16 loci with 18 alleles each were sufficient to correctly assign fish to grandparents: they required fewer loci than we did, but the number of alleles per loci in their study was substantially higher. So in general, we conclude that the results from our simulations (e.g. number of loci required to achieve N0.95 proportion of correct assignment decisions from each strategy) should also roughly apply to aquaculture industries using large numbers of parents and grandparents of commercial offspring than we have simulated. In all strategies, the decision to assign fish sampled from the market place to the wild population was based on excluding the possibility that the fish was either a full sib (FS), offspring (PAR) or grand offspring (GRAND) of the genotyped fish. We used an exclusion probability of one as the criteria to assign fish to the wild. In other words, there was a zero probability that the fish sampled from the market place could be offspring, grand offspring or full sib of the sampled fish. It could be argued that this criterion was too stringent, and using a lower exclusion

80

B. Hayes et al. / Aquaculture 250 (2005) 70–81

probability would allow us to correctly assign fish to the wild more frequently. However, further investigation showed that if a fish sampled from the market place was of wild origin, fewer markers were required to accurately assign it correctly than if the fish sampled was of farmed origin. For example in strategy FS 100 SNPs were sufficient to correctly assign wild fish to the wild population with 100% of correct decisions, while 175 SNPs were required to assign farmed fish to commercial operation of origin correctly. 4.3. Application of strategies Given that the ultimate objective of any traceability scheme is to trace a fish from any point in the production chain to any other point in the production chain, it is necessary to consider how this could be achieved with the three strategies we have evaluated here. For the example of the Norwegian farmed salmon industry, the logistics are complicated considerably by the fact that while a multiplier unit obtains the genetic material from one breeding nucleus, a commercial producer may purchase eyed eggs from different multiplier units. Although GRAND has the lowest genotyping cost of any strategy, it also has the most complicated logistical requirements. GRAND requires that all the progeny (i.e., the parents at the multiplier units and the offspring at commercial units) of a pair or a set of grandparents are kept separate both at the multiplier and commercial levels, in order for a fish sampled from the market place to be traced to any point in the production chain, given that the fish can be assigned to a set of grandparents. This is probably impractical: due to the very high fecundity in fish species this will make up too many large and unmanageable fish groups at the different levels of the production chain. However, GRAND could still be used to discriminate between fish of a different stock (as by allocating fish to grandparents the nucleus of origin is implicitly identified) or from different sub-populations (yearclasses) of the same nucleus. One alternative to DNA markers for implementation of traceability systems is labelling of product with information on stock and farm of origin, date of harvest and so on (Ha˚stein et al., 2001). In practise, traceability systems based on labelling may be

cheaper and easier to implement than systems using DNA markers. DNA markers may still have a role to play in such systems however. Periodic verification that the labelling system is accurately tracing the production chain may be required, and this could be independently achieved with DNA markers, using any of the three strategies described here. Although we have not explicitly tackled the issue of escapees from fish farms, our results do have some bearing on this problem. All three of our strategies discriminate between fish from captive populations and the wild population. The strategy with the most immediate relevance though is perhaps FS. If fish were sampled from a particular region, and DNA taken from the sampled fish, the FS strategy could be used in the first instance to determine if any of the fish sampled were escapees from nearby fish farms. Only fish from the farms in the region of interest would then need to be genotyped. The traceability strategies using DNA markers outlined in this paper are based on the flow of the marker genes from a few grandparents at the nuclei level, to many parents at the multiplier level and finally to a very high number of grow-out animals at the commercial farm level. This structure is typical of many aquaculture industries, including the farmed Atlantic salmon industry in Norway as was used as an example in this paper. Modifications of this structure may exist in particular due to the special reproduction characteristics and capacity of the actual fish species and the size of the production output from the industry. For example, in industries with a low output from very highly prolific and multiple spawning species, a sufficient number of parent fish at the multiplier level may be recruited directly from the nucleus. In this case, the GRAND strategy is not valid, and either PAR (with nucleus parents genotyped) or FS must be used. The most suitable and cost effective traceability strategy for a particular industry will depend heavily on the organisation of that industry, for example the degree of recording transfer of fish, eggs and larvae between tiers.

Acknowledgements The authors are grateful for funding from the Norwegian research council (Project number 130162/

B. Hayes et al. / Aquaculture 250 (2005) 70–81

140). Professor Theo Meuwissen is thanked for advice on the similarity index.

References Bernatchez, L., Duchesne, P., 2000. Individual-based genotype analysis in studies of parentage and population assignment: how many loci, how many alleles? Can. J. Fish Aquat. Sci. 57, 1 – 12. Eding, J.H., Meuwissen, T.H.E., 2001. Marker based estimates of between and within population kinships for the conservation of genetic diversity. J. Anim. Breed. Genet. 118, 141 – 159. Estoup, A., Gharbi, K., SanCristobal, M., Chevalet, C., Haffray, P., Guyomard, R., 1998. Parentage assignment using microsatellites in turbot (Scophthalmus maximus) and rainbow trout (Oncoryhnchus mykiss) hatchery populations. Can. J. Fish Aquat. Sci. 55, 715 – 725. Gilbey, J., Verspoor, E., McLay, A., Houlihan, D., 2004. A microsatellite linkage map for Atlantic salmon (Salmo salar). Anim. Genet. 35 (2), 98 – 105. Glaubitz, J.C., Rhodes, E., DeWoody, A., 2003. Prospects for inferring pairwise relationships with single nucleotide polymorphisms. Mol. Ecol. 12, 1039 – 1047. Hayes, B., L&rdahl, J., Lien, S., Berg, P., Davidson, W., Koop, B., Adzhubei, A., Hbyheim, B., 2004. Detection of single nucleotide polymorphisms (SNPs) from Atlantic salmon Expressed Sequence Tags (ESTs). Proc. Euro. Assoc. Anim. Prod. Bled, Slovenia. He, C., Chen, L., Simmons, M., Li, P., Kim, S., Liu, Z.J., 2003. Putative SNP discovery in interspecific hybrids of catfish by comparative EST analysis. Anim. Genet. 34 (6), 445 – 448.

81

Ha˚stein, T., Hill, B.J., Berthe, F., Lightner, D.V., 2001. Traceability of aquatic animals. Rev. Sci. Tech. - Off. Int. E´pizoot. 20, 564 – 583. Kocher, T.D., Lee, W., Sobolewska, H., Penman, D., McAndrew, B., 1998. A genetic linkage map of a cichlid fish, the Tilapia (Oreochromis niloticus). Genetics 148, 1225 – 1232. Letcher, B.L., King, T.L., 2001. Parentage and grand parentage assignment with known and unknown matings: application to Connecticut River Atlantic salmon restoration. Can. J. Fish Aquat. Sci. 58, 1812 – 1821. Marth, G., Yeh, R., Minton, M., Donaldson, R., Li, Q., Duan, S., Davenport, R., Miller, R.D., Kwok, P.Y., 2001, Apr. Singlenucleotide polymorphisms in the public domain: how useful are they? Nat. Genet. 27 (4), 371 – 372. Sakamoto, T., Danzmann, R.G., Gharbi, K., Howard, P., Ozaki, A., Khoo, S.K., Woram, R.A., Okamoto, N., Ferguson, M.M., Holm, L.E., Guyomard, R., Hoyheim, B., 2000. A microsatellite linkage map of rainbow trout (Oncorhynchus mykiss) characterized by large sex-specific differences in recombination rates. Genetics 155 (3), 1331 – 1345. SanCristobal, M., Chevalet, C., 1997. Error tolerant parent identification from a finite set of individuals. Genet. Res. 70, 53 – 62. Skaala, a., Hbyheim, B., Glover, K., Dahle, G., 2004. Microsatellite analysis in domesticated and wild Atlantic salmon (Salmo salar L.): allelic diversity and identification of individuals. Aquaculture 240, 131 – 143. Smith, C.T., Templin, W.D., Seeb, J.E., Seeb, L.W., 2003. Nuclear and mitochondrial SNPs provide high-throughput resolution for migratory studies of Chinook salmon. Villanueva, B., Verspoor, E., Visser, P.M., 2002. Parental assignment in fish using microsatellite markers with finite numbers of parents and offspring. Anim. Genet. 33, 33 – 41.