Introduction

Breast cancer is the most common form of malignancy affecting women worldwide [1, 2]. Breast cancer incidence rates have increased progressively in Arab populations over the last 10 years, probably due to more reliable data being collected from cancer registries and to easier access by patients screening and diagnostic program [3]. In Arab populations, breast cancer represents ~13–30 % of newly diagnosed malignancies in women and occurs at a median age of 49–52 years as compared to 63 in industrialized nations [3]. It is characterized by younger age at onset, advanced stage and poor prognosis [3]. In Tunisia, breast cancer remains the most common cancer among women, and it is considered to be a real problem of public health. During the period from 1993 to 2007, the Cancer Registry of the Center of Tunisia counted 2,404 new cases of breast cancer. The median age at diagnosis was 48 years and the age-standardized incidence rate (ASR) was 29.2 per 100,000 s during the study period [4].

The etiology of breast cancer is extremely complex and, while not yet elucidated, appears to involve numerous genetic, endocrine, and external environmental factors. Family history represents the most prominent risk factor for the development of the disease. It is estimated that about 5–10 % of all breast cancers may arise from the inheritance of germline mutations in dominant highly penetrant susceptibility genes such as BRCA1 and BRCA2. Mutations in these genes are rare and explain only a small fraction of the familial risk for the disease [5, 6]. This leads to the suggestion that the remaining breast cancer susceptibility is likely to be explained by a polygenic model involving a combination of low-penetrance alleles, each conferring a small increase in risk [7]. Recently, genome-wide association studies (GWAS) have provided a systematic way to search for genetic variants and have successfully identified several low-penetrance susceptibility loci for breast cancer [819]. Most of breast cancer GWAS and replications published today has been conducted in Northern European populations [810, 12, 1416, 18] and to a lesser extent in Asians [13, 17, 19] and Ashkenazi Jews [11]. Thus, it is important to assess whether these variants confer risk across different populations with diverse ancestry backgrounds, including women of Arab ancestry. Moreover, little is known about risk factors and molecular events associated with breast cancer in Arab populations, which differ strongly from the other populations by ethnicity, lifestyle, reproductive behavior, and environmental exposure. This prompts us to analyze the previously GWAS-identified breast cancer risk variants in Tunisians using a case–control study. In this report, we focus on nine polymorphisms in the following genes/regions: FGFR2, TNRC9 (also known as TOX3), MAP3K1, LSP1, 2q35, and 8q24. We further explored other potential effects of these risk loci on disease characteristics and survival.

Materials and methods

Study population

A total of 1,011 individuals, comprising 640 breast cancer patients and 371 healthy controls, were included in this study. Controls and patients were selected from the same population living in the middle coast of Tunisia and including only unrelated subjects. The sporadic breast cancer patients were recruited from the department of Radiation Oncology of Sousse Hospital (Sousse, Tunisia) between 1996 and 2011. Their disease information was obtained from their hospital medical records. All patients included in this study had primary breast cancer, with unilateral breast tumors and with no family history of the disease. The diagnosis of cancer was confirmed by histopathological analyses. The patients had a mean age of 47.9 ± 10 years. The median follow-up was 65 months (range, 1–276 months). At the time of analysis, 118 patients relapsed (local or distant recurrence). Among them, 20 (17 %) patients died from breast cancer. A detailed description of the clinic-pathological characteristics of this cohort was summarized in Table 1.

Table 1 Description of the study population

Controls were healthy women having a mean age of 55 ± 14 years. They were blood donors with no evidence of any personal or family history of cancer (or other chronic illness). Samples from healthy controls were collected consecutively between 2004 and 2010 and were age matched to the cases.

Both patients and controls gave their written consent to participate in the study and to allow their biological samples to be genetically analyzed. Approval for the study was given by the Tunisian National Ethical Committee and by Weill Cornell Medical College in Qatar IRB committee.

Genomic DNA extraction

Genomic DNA was extracted from peripheral blood leukocytes by a “salting out” procedure [20]. Briefly, 10 ml of blood was mixed with Triton lysis buffer (0.32 M sucrose, 1 % Triton X-100, 5 mM MgCl2, H2O, 10 mM Tris–HCl, pH 7.5). Leukocytes were spun down and washed with H2O. The pellet was incubated with proteinase K at 56 °C and subsequently salted out at 4 °C using a saturated NaCl solution. Precipitated proteins were removed by centrifugation. The DNA in supernatant fluid was precipitated with ethanol. The DNA pellet was dissolved in 400 μl of sterile distilled water. DNA concentration and quality were analyzed by the nanodrop 2000.

SNP selection and genotyping

We selected and genotyped 9 SNPs that had been associated with increased risk of breast cancer in GWAS studies. This includes rs1219648 and rs2981582 in FGFR2. rs8051542, rs12443621 and rs3803662 in TNRC9, rs889312 in MAP3K1, rs3817198 in LSP1, rs13387042 in 2q35 and rs13281615 in 8q24.

Genotyping was performed using the TaqMan® SNP genotyping assays. PCR mixture was as follows: 12.5 μl of TaqMan® 2× Universal PCR Master Mix, 5-25 ng of DNA, 0.625 μl of predesigned TaqMan® SNP Genotyping Assay mix (40×) and water to bring the final reaction volume to 25 μl. The PCR thermal cycling was as follows: initial denaturing at 95 °C for 10 min; 40 cycles of 92 °C for 15 s and 60 °C for 1 min. Thermal cycling was performed using the Applied Biosystems 7500 Real-Time PCR System. All reactions were carried out with no template as negative controls. Genotype call success rate for cases and for controls was 97.6 and 97.8 %, respectively. Randomly selected DNA sequencing and PCR replication with a coincidence rate greater than 99 % verified genotype reproducibility.

Statistical analyses

The genotype and allele frequencies of the 9 SNPs that had been associated with increased risk of breast cancer in GWAS were tested for the Hardy–Weinberg equilibrium for both patient and control groups using the Chi-square test. According to the general genotype model, risk association between the genotypes and breast cancer susceptibility and tumors characteristics was estimated by crude odds ratio (OR) and 95 % confidence intervals (95 % CI) using the unconditional logistic regression analysis with the low-risk genotype as a reference category [21, 22]. A P value of less than 0.05 was required for statistical significance.

Clinical pathological parameters were dichotomised as follows: nodal status (≥1 vs. no positive lymph node), SBR (Scarff, Bloom and Richardson) tumor grade (1–2 vs. 3), clinical tumor size (T1–T2 vs. T3–T4) and estrogen/progesterone receptor status (positive vs. negative).

The statistical analysis was performed using the Epi-Info statistical program (version 5.01; Centers for disease Control and Epidemiology Program office, Atlanta Georgia, USA). Breast cancer-specific overall survival (OVS) was defined as the time from the date of diagnosis to death if the patient died from breast cancer or to last contact. Six-year survival rates were estimated, and survival curves were plotted according to Kaplan and Meier [23]. The differences between groups were calculated by the log-rank test [24]. Univariate analyses for each SNP were carried out by estimating Kaplan–Meier survival curves stratified by genotypes using SEM-STATISTIQUES software (Centre Jean Perrin, Clermont-Ferrand, France).

Results

GWAS-identified loci as risk factors for breast cancer in Tunisians

Minor allele frequencies and estimates for the association between the nine SNPs and overall breast cancer risk are shown in Table 2. Genotype frequencies in cases and controls appear in Supplementary Table 1. Genotype distribution of all SNPs in both patients and controls did not deviate from Hardy–Weinberg equilibrium (P > 0.05) (Supplementary Table 1).

Table 2 Association of nine SNPs identified from previous GWAS with breast cancer risk in Tunisian women

In this study, 5 out of the 9 breast cancer-associated SNPs discovered in GWAS were replicated in the Tunisian population. Both polymorphisms of FGFR2 (rs1219648, rs2981582), TNRC9 rs8051542, MAP3K1 rs889312, and the SNP located on 8q24 (rs13281615) were statistically significantly associated with breast cancer risk at P less than 0.05 (Table 2). A suggestive association was observed between rs3817198 in LSP1 gene and breast cancer risk with an increased risk for GG genotype and G allele (OR = 1.45, P = 0.1; OR = 1.19, P = 0.08, respectively). However, no significant evidence was observed for associations between the two other SNPs in TNRC9 and the SNP located on 2q35 and breast cancer risk (P = 0.49, 0.3 and 0.16 for rs12443621, rs3803662, and rs13387042, respectively).

Most significant associations were of high magnitude. The strongest associations were found for rs2981582 in the FGFR2 gene and rs8051542 in the TNRC9 gene. Homozygous variant genotypes of rs2981582 and rs8051542 were associated with over a two-fold increased risk of breast cancer (OR = 2.23, P = 0.00001; OR = 2.11, P = 0.0001, respectively).

Effects of GWAS risk loci on disease characteristics of breast cancer

In this study, we also analyzed the effects of the 9 GWAS-identified SNPs on a series of disease clinico-pathological characteristics, including clinical tumor size, SBR tumor grade, lymph node metastasis, distant metastasis, and estrogen/progesterone receptor status.

Both polymorphisms of the FGFR2 gene and rs8051542 in the TNRC9 gene were found to be associated with either small or large clinical tumor size. However, slightly stronger associations were found for FGFR2 rs2981582 and TNRC9 rs8051542 with T1–T2 tumor size (P = 0.0001, P = 0.0002, respectively) (Supplementary Table 2). Variant allele of rs889312 in the MAP3K1 gene seems to be associated with increased risk of both small and large tumors (OR = 1.70, P = 0.03; OR = 1.89, P = 0.04) (Supplementary Table 2). However, homozygous variant genotype of rs13281615 on 8q24 was found to be associated with increased risk of small tumor size (OR = 1.63, P = 0.01).

For homozygous variant genotypes of both polymorphisms in the FGFR2 gene, the associations were also stronger with low than with high-grade SBR (OR = 2.01, P = 0.0008 for rs1219648; OR = 2.53, P = 0.00001 for rs2981582) (Supplementary Table 3). However, stronger association with high-grade SBR was found for rs8051542 in the TNRC9 gene (OR = 2.54, P = 0.0002) (Supplementary Table 3).

Associations of both polymorphisms in the FGFR2 gene (rs1219648 and rs2981582) with breast cancer risk were stronger for patients with negative than with positive nodal involvement. The strongest association was found with rs2981582. Homozygous variant genotype was associated with over threefold increased risk of lymph node negative breast cancer (OR = 3.33, P = 0.0000006) (Table 3). For both polymorphisms, increased breast cancer risk with negative nodal involvement was associated with the minor allele in a dose-dependant manner. Moreover, among cases, rs2981582 AA genotype was more frequent in lymph node negative compared to lymph node positive breast cancers (28.5 vs. 21.9 %; OR = 0.56, P = 0.018) (Table 3). The association of rs8051542 in the TNRC9 gene with breast cancer risk tended to be slightly stronger for patients with positive nodal involvement (OR = 2.15, P = 0.0004) (Table 3). However, rs889312 in the MAP3K1 gene was equally associated with lymph node negative and positive breast cancer (Table 3).

Table 3 Associations between GWAS breast cancer loci and lymph node involvement

Associations by ER and PR tumor status revealed also some findings. Stratification of tumors by ER status indicated that rs2981582 FGFR2 polymorphism increased risk of both ER+ and ER− tumors. Slightly stronger association was observed with ER+ (P = 0.001, P = 0.01 for ER+ and ER−, respectively) (Supplementary Table 4). Moreover, the minor allele was associated with increased risk of ER+ tumors in a dose-dependant manner (OR = 1.57, P = 0.02; OR = 2.15, P = 0.001, for heterozygous and homozygous variant genotypes, respectively) (Supplementary Table 4). On the other hand, homozygous variant genotype of rs8051542 in the TNRC9 gene was found to be associated with an increased risk of both ER+ and ER− tumors with the same extent (OR = 2.38, P = 0.0005, OR = 2.35, P = 0.0007 for ER+ and ER−, respectively) (Supplementary Table 4). FGFR2 rs2981582 and TNRC9 rs8051542 polymorphisms were also associated with both PR+ and PR− tumors, while rs889312 in the MAP3K1 gene was only associated with PR+ tumors (OR = 2.07, P = 0.01) (Supplementary Table 5). The rs13281615 SNP on 8q24 was found to be associated with ER+ and PR− tumors (Supplementary Table 4 and 5).

Regarding disease progression, it was noted that FGFR2 rs2981582 and rs1219648, TNRC9 rs8051542, and MAP3K1 rs889312 polymorphisms were associated with increased risk of distant metastasis development (Table 4). The strongest association was found with the minor allele of FGFR2 rs2981582 in a dose-dependant manner (OR = 2.30, P = 0.004, OR = 3.57, P = 0.00006) (Table 4). For rs13387042 on 2q35, GG genotype was more frequent in patients developing distant metastasis compared to patients without distant metastasis (25.2 vs. 16 %) (OR = 1.94, P = 0.02).

Table 4 Associations between GWAS breast cancer loci and distant metastasis

Effects of GWAS risk loci on survival from breast cancer

In this study, we also assessed the effect of GWAS risk loci on the OVS of patients. A significant difference was observed between the OVS Kaplan–Meier survival curves for rs1219648 in the FGFR2 gene. As shown in Fig. 1a, the breast cancer-specific OVS rate was significantly higher among patients carrying the G variant allele. The OVS rates in the group of patients with or without the rs1219648 G allele were 98.1 versus 92 %, respectively (log-rank test, P = 0.013). However, no significant difference between the OVS Kaplan–Meier survival curves was observed for rs2981582. In addition, significant difference between OVS Kaplan–Meier curves was observed for rs13387042 on 2q35. The OVS rate was significantly lower in the group of patients without rs13387042 A allele compared to patients with rs13387042 A allele (89.9 vs. 97.7 % respectively; log-rank test, P = 0.005) (Fig. 1b).

Fig. 1
figure 1

The 6-year breast cancer-specific overall survival of 640 breast cancer patients stratified by genotype. Overall survival of 640 patients according to the presence or absence of a rs1219648-G allele and b rs13387042-A allele (P denotes the log-rank test value)

Discussion

GWAS have led to the identification of multiple new genetic variants associated with breast cancer risk. Most of these breast cancer GWAS and replication studies have been conducted in European populations [810, 12, 1416, 18] and to a lesser extent in Asians [13, 17, 19]. However, there are significant differences in allele frequencies and the prevalence of breast cancer among different populations. It is, therefore, important to explore the effects of the GWAS-identified markers in other ethnic populations, including women of Arab ancestry. Thus, we carried out this study to estimate the allele frequencies of 9 GWAS-identified loci in the Tunisian population and to investigate, with a case–control study, the potential association of these loci with the risk of breast cancer among Tunisian women.

The 10q26 (FGFR2) locus was discovered in two GWAS among women of European descent [8, 9] and the index SNPs rs1219648 and rs2981582 have since been consistently replicated in European [12, 25] and Chinese populations [26] as well as in several ethnic groups including Hispanic and non-Hispanic white women [27] and African American women [28, 29]. In this study, it was also confirmed that FGFR2 rs1219648 and rs2981582 were significantly associated with increased breast cancer risk in the Tunisian population, which strengthens the conclusion that this locus plays an important role in the development of breast cancer. FGFR2 is a member of the receptor tyrosine kinase family, involved in mammary gland proliferation and development [30, 31]. It has been shown that FGFR2 can transform normal human mammary epithelial cells and is over-expressed in breast tumors [32]. The two polymorphisms in the FGFR2 gene were originally identified by Hunter et al. [9] and were associated with risk of sporadic post-menopausal breast cancer Slattery et al. [27] reported similar findings for post-menopausal Hispanic women. In this study, we have not evaluated the risk of breast cancer according to the menopausal status. However, although 53.1 % of our cases were premenopausal, we found strong associations between both FGFR2 polymorphisms and breast cancer risk. This may suggests that in the Tunisian population, SNPs rs1219648 and rs2981582 may confer similar effects in both pre and post-menopausal women.

Of the 3 SNPs evaluated in the TNRC9 locus, only rs8051542 replicated breast cancer risk among Tunisian women. No associations were found with rs3803662 and rs12443621. The SNP rs3803662 was identified as breast cancer susceptibility variant in two GWAS, both conducted in European populations [8, 10]. This SNP remained the strongest signal for the 16q12 region in further studies of European ancestry and has also been confirmed from deep sequencing study as a key TNRC9 SNP associated with breast cancer [16, 33]. Zheng et al. [34] found that rs3803662 was associated with breast cancer risk in a Southern Chinese population. However, the association with TNRC9 rs3803662 was not confirmed in other ethnic groups including Hispanic [27] and African American women [28, 29, 33, 35]. In this study, we also showed the lack of association of rs12443621 with risk of breast cancer among Tunisian women. SNP rs12443621 was identified to increase breast cancer risk by Easton et al. [8] and was found to be in strong linkage disequilibrium with SNP rs3803662 of the TNRC9 gene. In addition, at the 16q12 locus, the LD pattern between rs3803662 and rs3104793 was also different across populations [36]. Taken together, LD pattern difference across populations may explain the discrepancy between these studies. Thus, a fine-mapping study might be an effective approach to identify the causal variant(s) in the 16q12 locus in Arab women.

In this study, we also found the SNP rs889312 in the MAP3K1 gene and rs13281615 on 8q24 to be associated with an increased risk of breast cancer in the Tunisian population. The rs889312 SNP in the MAP3K1 gene was identified by Easton et al. and has been shown to be involved in a potential key pathway for breast cancer [8, 9, 14]. The rs13281615 variant lies in a non-genetic region of chromosome 8q24. Other independent variants in the region 8q24 have been associated with the risk of prostate, colorectal and ovarian cancer [3739].

Associations with most of the susceptibility loci identified to date are evidently stronger for ER+ than for ER− disease. The strongest evidence is for a variant in FGFR2 that was primarily associated with ER+ disease [12]. Garcia-Closas et al. [40] also confirmed a stronger association between FGFR2 rs2981582 and ER+ tumors. Similarly, FGFR2 rs1219648 and rs2981582 genotypes were significantly associated with breast cancer in European-American only in ER+ and PR+ tumors [25]. Findings of Slattery et al. [27] suggest that FGFR2 polymorphisms decrease the likelihood of ER−/PR− tumors among non-Hispanic white women, while increasing the likelihood of ER+/PR+ among Hispanic women. Our data showed that FGFR2 rs2981582 and TNRC9 rs8051542 were strongly associated with both positive and negative tumor status of ER and PR receptors.

MAP3K1 variants were found to be relevant in ER+ and PR+ tumors to greater degree than in ER negative or PR negative tumors [40]. Moreover, MAP3K1 rs889312 variant genotype was associated with larger tumors in Asians but not in European populations, and less likely to be associated with lymph node positive at breast cancer diagnosis in a Dutch population [40, 41]. Rebbeck et al. [25] showed that the same SNP was associated with breast cancer in African-American women, but again limited only to ER+, PR+ tumors. Our data showed that homozygous variant genotype of MAP3K1 rs889312 was associated with PR+ but not with PR− tumors, and while not reaching significance, was more likely to be associated with ER− tumors but not with ER+ tumors. Moreover, rs889312 homozygous variant genotype seems to be associated with an increased risk of both small and large tumors and was equally associated with negative and positive nodal involvement. Taken together, associations between MAP3K1 rs889312 and breast cancer characteristics need to be further explored in other ethnic groups.

Regarding disease progression, we found that rs13387042 homozygous variant genotype was associated with distant metastasis. This finding suggested that rs13387042 variant allele might affect the progression of breast cancer. The rs13387042 variant lies in a non-genic region of chromosome 2q35. Thus, functional studies in this region are likely to lead to a better understanding of mechanisms of carcinogenesis and progression of breast cancer.

Two out of the nine SNPs included in this study had a significant association with the OVS. A higher rate of survival was observed in patients carrying the variant allele of rs1219648 in the FGFR2 gene. Conversely, a lower OVS rate was found in patients carrying homozygous variant genotype of rs13387042 on 2q35. In the Tunisian population, homozygous variant genotype of rs1219648 was associated with ER+ but not with ER− tumors. The good prognosis known for ER+ tumors could explain the association found between rs1219648 variant of the FGFR2 gene and the high rate of survival.

The present association study in the Tunisian population highlighted genetic susceptibility patterns different from that reported for other populations. These differences could stem from disparities in genetic background, including differences in allele frequencies and LD pattern, and gene–environment interaction. Moreover, clinical and biological differences in breast cancer have been found in Arab women compared to Europeans [42]. Early disease onset and aggressive forms of breast cancer are seen more frequently in Arab populations [42]. Breast cancer tumors in Arab populations are frequently characterized by large tumor size, high histo-pronostic SBR grade and molecular luminal subtype B [42]. In addition, apart differences in biological characteristics and genetic background, Arab populations greatly differ from European and Asian populations by lifestyle, reproductive behavior, family history status, and environmental exposure, suggesting that risk factors associated with breast cancer development and progression might be different in these different populations. The high proportion of young-onset and poor prognosis of breast cancer in women of Arab ancestry is probably due to a correspondingly high prevalence of pertinent genetic risk factors that may be uniquely associated with these populations. Thus, new GWAS in women of Arab ancestry may promise to reveal new causal variants and are needed to fully uncover the genetic basis for breast cancer susceptibility in Arab population.

In conclusion, the present association study in the Tunisian population revealed several implications of the 9 SNPs that had been associated with increased risk of breast cancer in GWAS. It reinforces the need to replicate the GWAS discovered variants across different populations and ethnicities.