Genomic variation in European Sea bass: from SNP discovery within ESTs to genome scan
Souche, E. (2009). Genomic variation in European Sea bass: from SNP discovery within ESTs to genome scan. PhD Thesis. KU Leuven, Faculty of Science: Leuven. |
Abstract | European sea bass (Dicentrarchus labrax) is an economically important marine species in European aquaculture. Although sea bass population structure is well known, aquaculture does not benefit from selection programs, sea bass production being nearly completely based on wild-caught fishes reproducing in semi-controlled conditions. More knowledge on the sea bass genome would help the breeding progress in this species, the study of natural populations and their evolution as well as the management of fisheries. The generation of large collections of Expressed Sequence Tags (ESTs) would provide genomic resources for discovering new genes and new markers, identifying intron-exon boundaries and studying genes expression profiles. In this thesis, efforts were concentrated on the discovery of Single Nucleotide Polymorphisms (SNPs) within ESTs. SNPs are the most abundant source of variation in most eukaryotic and prokaryotic genomes and can have applications both in aquaculture and natural populations. Approximately 30,000 ESTs from 14 libraries of five sea bass individuals have been sequenced. This large EST collection was described and compared to a similar set of ESTs generated for gilthead sea bream (Sparus aurata), another economically important marine species. The processing of ESTs led to the generation of 17,716 and 18,198 sea bass and sea bream unique sequences, of which less than a third were common to both species. Automatic annotation indicated that more protein coding sequences were generated for sea bass than for sea bream. This was further confirmed by the prediction of Open Reading Frames (ORFs) and by the GC content of sea bass and sea bream unique sequences. Gene Ontology (GO) annotation showed that the same categories were represented for both species. Six SNP discovery tools were used on sea bass ESTs and their performance was assessed by validating around 10% of the SNP candidates. This analysis demonstrated that the selection of redundant candidate SNPs (mismatches detected at least twice in the ESTs) was a good mean of improving SNP discovery performance. The selection of SNP candidates with a minimum allele frequency greater or equal to 0.3 further enhanced SNP discovery performance although reducing the number of SNP candidates. Finally the selection of SNP candidates detected by several tools and the exclusion of indels were also good means of reducing the number of false positive candidate SNPs. Transition SNP candidates appeared to be less reliable than transversion SNP candidates due to the presence of RNA editing sites in EST collections. High quality of EST assembly and of the flanking regions of SNP candidates revealed to be essential for an efficient SNP discovery. These conclusions led to the development of a pipeline integrating the six tested SNP discovery tools. This efficient and easy to use pipeline allows the detection of SNPs in any EST dataset, the selection of SNP candidates according to redundancy and/or minimum allele frequency and the comparison of SNP candidates according to SNP discovery tool. It has been used successfully on EST collections of the fishes Dicentrarchus labrax, Sparus aurata, Anguilla anguilla and the waterflea Daphnia magna. The use of the six SNP discovery tools identified 1,072 unique SNP candidates of which a subset was validated. A total of 360 SNPs were discovered in introns and ESTs, proving that resequencing the contigs predicted to be polymorphic was an efficient way of discovering SNPs. The nucleotide diversity of sea bass was estimated to one SNP every 137 bp and was higher in introns than in ESTs. The Mendelian inheritance was checked on 17 SNPs polymorphic on the Venezia Fbis family used to produce sea bass linkage maps. Four of them did not follow Mendelian inheritance, suggesting the presence of null alleles. Finally, 22 wild sea bass populations were successfully genotyped at 49 SNPs. This set of SNPs sufficed to confirm the established sea bass population structure, namely the differentiation of Atlantic and Mediterranean samples. Adriatic samples were shown to be genetically distinct from Western and Eastern Mediterranean samples. Selection analyses pointed to a locus that could be under natural selection in the Atlantic Ocean. In conclusion, a bioinformatic approach to discover SNPs was proven to be very valuable. Meanwhile SNP genotyping technologies have evolved, allowing the validation of SNP candidates on the samples to be investigated. |
|