Abstract
Free full text
Biological and Statistical Approaches for Modeling Exposure to Specific Trihalomethanes and Bladder Cancer Risk
Abstract
Lifetime exposure to trihalomethanes (THM) has been associated with increased risk of bladder cancer. We explored methods of analyzing bladder cancer risk associated with 4 THM (chloroform, bromodichloromethane, dibromochloromethane, and bromoform) as surrogates for disinfection by-product (DBP) mixtures in a case-control study in Spain (1998–2001). Lifetime average concentrations of THM in the households of 686 incident bladder cancer cases and 750 matched hospital-based controls were calculated. Several exposure metrics were modeled through conditional logistic regression, including the following analyses: total THM (μg/L), cytotoxicity-weighted sum of total THM (pmol/L), 4 THM in separate models, 4 THM in 1 model, chloroform and the sum of brominated THM in 1 model, and a principal-components analysis. THM composition, concentrations, and correlations varied between areas. The model for total THM was stable and showed increasing dose-response trends. Models for separate THM provided unstable estimates and inconsistent dose-response relationships. Risk estimation for specific THM is hampered by the varying composition of the mixture, correlation between species, and imprecision of historical estimates. Total THM (μg/L) provided a proxy measure of DBPs that yielded the strongest dose-response relationship with bladder cancer risk. A variety of metrics and statistical approaches should be used to evaluate this association in other settings.
Drinking-water disinfection systems use reactive chemicals, such as chlorine, to inactivate microbiological threats to human health. However, disinfectants react with organic matter to produce a variety of undesired chemicals, known as disinfection by-products (DBPs) (1). Approximately 600–700 DBPs have been identified, and more compounds are being identified (2). The most widely used disinfectant is chlorine. Trihalomethanes (THM) constitute the most prevalent class of DBPs, representing 10%–20% of the mixture in chlorinated water (3). Total trihalomethanes (TTHM) are defined as the sum (in µg/L) of 4 constituents: chloroform, bromodichloromethane (BDCM), dibromochloromethane (DBCM), and bromoform.
TTHM have been used as surrogates for the halogenated DBP mixture in epidemiologic studies. Several cohort and case-control studies have shown that long-term exposure (over 40 years) to the sum of the 4 THM increases cancer risks, especially risk of bladder neoplasms (4–10). Experimental in vivo and in vitro studies have tested the cytotoxicity, mutagenicity, and carcinogenicity of individual compounds (11–14). Enzyme-dependent mutagenicity has been found in vitro for brominated THM, such as bromoform and DBCM (15, 16). Liver and kidney carcinogenicity has occurred in rodents exposed to BDCM and other DBPs via drinking water (17). The International Agency for Research on Cancer has classified chloroform and BDCM as possible human carcinogens (18–22). Other DBPs have also been identified as carcinogens and mutagens—for example, halogenated hydroxyfuranones (e.g., 3-chloro-4-(dichloromethyl)-5-hydroxy-2(5H)-furanone, also called mutagen X or MX), nitrogenated DBPs, and iodinated DBPs (23). Long-term exposure assessment of these substances is difficult because they are unregulated, they appear in very low concentrations, routine measurements are unavailable, and estimates of historical exposures cannot be established (24).
As a result, estimation of the risks associated with DBP exposure is a challenge for environmental epidemiologists. Much of the epidemiologic evidence relies on exposure estimates based on TTHM or on a single compound, such as chloroform, to represent the mixture. The first approach assumes that all the constituents of the mixture are equivalent, ignoring the experimental evidence of their differential toxicity. On the other hand, component-based analyses assume that the mixture's constituents are different and may be analyzed individually (25). However, this approach ignores correlations between THM and how they are related to unknown constituents of the DBP mixture. Adjustment for multiple components that are weakly correlated could address this problem.
We explored different biology-based approaches and used statistical modeling to evaluate the bladder cancer risk associated with 4 THM (chloroform, BDCM, DBCM, and bromoform) in a case-control study.
MATERIALS AND METHODS
Study design and participants
We used data from the Spanish Bladder Cancer Study, a multicenter, hospital-based case-control study conducted in Spain between June 1998 and June 2001. Cases and controls were recruited from 6 geographical areas: Alicante, Asturias, Barcelona, Tenerife, Manresa, and Sabadell (Vallès/Bages). Cases were patients aged 20–80 years with a histologically confirmed diagnosis of primary bladder cancer who were living in one of the geographic catchment areas of the participating hospitals. Cases were identified through the hospital urological services at diagnosis. Complete case ascertainment was guaranteed through regular evaluations of local cancer registries and hospital discharge and pathology records. Controls were patients admitted to the participating hospitals with diagnoses unrelated to the main risk factors for bladder cancer, such as tobacco use. Control diagnoses included: circulatory, dermatological, and ophthalmological disorders (4%, 2%, and 1%, respectively), fractures (23%), hernias (37%), hydrocele (12%), other abdominal surgery (11%), other orthopedic problems (7%), and other diseases (3%). Controls were individually matched to cases by sex, age group (5-year strata), and geographical area of residence.
The study protocol was approved by the institutional review boards of the participating institutions. All participants gave informed consent beforehand. A total of 1,457 eligible cases and 1,465 eligible controls were identified. Among them, 84% of cases (n = 1,219) and 87% of controls (n = 1,271) participated.
Personal interview
Trained interviewers administered a comprehensive computer-assisted personal questionnaire to participants during their hospitalization. Collected information included sociodemographic characteristics, smoking habits, family history of cancer, and medical, occupational, and residential histories from birth (for all residences of at least 1 year). Residential histories provided information on water exposures relevant to the present analysis. In addition, a food frequency questionnaire was self-administered. When a subject refused to answer the questionnaire, a reduced interview on critical items was administered (21% of cases, 19% of controls).
Water utility data
Current and historical information about water source, treatment, and quality was obtained from public water supplies in the study areas. Structured questionnaires were sent to approximately 200 local authorities and 150 water companies to ascertain: 1) the proportions of groundwater and surface water sources over the years back to 1920; 2) the year in which chlorination started at each utility; and 3) annual average concentration (in µg/L) of chloroform, BDCM, DBCM, and bromoform, when available. The amount of data collected differed among areas. Data on water-source history and year in which chlorination started were available for 123 municipalities, accounting for 78.5% of person-years from lifetime residential histories. In addition, 48 of these municipalities also had data on chloroform, BDCM, DBCM, and bromoform levels (58% of person-years). To augment the database of water utility measurements, we measured chloroform, BDCM, DBCM, and bromoform levels in 113 tap water samples from the study's geographical areas between September and December 1999.
Estimation of historical levels
We used data on water-source history (proportions of groundwater and surface water over the years), the year in which chlorination was initiated, and available chloroform, BDCM, DBCM, and bromoform levels to estimate annual average levels in the past. We assumed that levels remained unchanged by municipality when water source had not changed. Available measurements were averaged and imputed back to the year 1920, as long as water source remained unchanged. If the water source had changed, the proportion of surface water was used as a weight (26). THM levels before chlorination started were assumed to be zero. The year in which chlorination started varied widely among study municipalities, from 1933 in Barcelona to the 1990s in many small municipalities in Asturias. For those municipalities with a water-source history but missing data on THM levels, levels were imputed from neighboring municipalities with the same water source. Estimation of past levels in Barcelona was conducted at the postal-code level, since the city is supplied by 2 rivers (Llobregat and Ter) with dissimilar raw water characteristics. Details on the exposure assessment are available elsewhere (26, 27).
Lifetime individual exposure indices
For all residences where participants had lived for at least 1 year from birth to the time of interview, the following information was requested: year in which the participant started living in that location, year stopped, full street address, city, province, region, and country. The address was used to ascertain postal code in Barcelona. Individual and municipal databases were merged by year and municipality of residence to obtain annual average levels of chloroform, BDCM, DBCM, and bromoform for each study subject. Different exposure windows were explored, and the period from age 15 years to the time of interview was selected because it maximized the information available, since exposure data were scarce before that age. A time-weighted average level of exposure at all residences where the participant had lived during this exposure window was calculated for all subjects.
Statistical methods
The normality of the interindividual THM levels was examined, and Spearman rank correlation coefficients were calculated overall and by area. In alternative analyses, residuals from a linear regression of the THM components using area of residence as an independent variable were calculated. Spearman correlations between these residuals (partial Spearman correlations) were calculated in order to obtain overall correlations adjusted for area.
Fixed-effects conditional logistic regression (CLR) was used to estimate bladder cancer risk. CLR is the standard technique used to analyze binary matched data, since the resulting coefficients are derived from within-matching-strata comparisons (28). In order to explore nonlinearity of the effects of exposure, we fitted new CLR models in which, instead of using a linear term for exposure, we used spline functions. Splines use piecewise polynomials to model the shape of the association, and their high flexibility allows capturing almost any kind of shape. We used cubic splines with knots at the 10th, 50th, and 90th percentiles, according to Harrell's recommendations in 2012 (29). Exposure coefficients from spline models do not have a direct interpretation. Instead, plots of the resulting associations were used to interpret the results. The models using a linear term and the model using splines were compared via likelihood ratio tests. If the spline model did not provide a statistically significantly better fit, this was taken as support for linearity of the effect. Model fit was examined using Akaike's Information Criterion. The lowest Akaike Information Criterion value determined the best-fitting model.
Different models were fitted using the following exposure variables: 1) TTHM in μg/L, as the sum of the 4 constituents; 2) weighted sum of the 4 THM, obtained by multiplying the concentrations with a weight derived from a mammalian cell cytotoxicity assay (0.4116 × chloroform + 0.3443 × BDCM + 0.7388 × DBCM + bromoform) (23); 3) TTHM concentration on a molar basis as the sum of the 4 constituents in pmol/L; 4) the 4 THM constituents in separate models with single compounds; 5) the 4 THM constituents in 1 model; 6) total brominated THM (BDCM, DBCM, bromoform) and chloroform in 1 model; and 7) principal-components logistic regression. To observe the exposure response, we grouped exposure variables using quartiles as boundaries in the CLR. In models with splines, we kept exposure variables continuous (30).
The principal-components logistic regression was preceded by principal-components analysis (PCA) of the 4 THM. The selected components included were those explaining more than 10% of variance. The component scores were predicted following this procedure: 1) average residential levels of the 4 THM were mean-centered and standardized (converted to z scores); 2) 4 z scores were weighted by score coefficients (correlation coefficients of the corresponding eigenvector); and 3) finally, weighted scores were summed. The procedure was repeated for all selected component scores using the corresponding eigenvector. The scores obtained were entered as independent variables in the CLR and spline models as quartiles and as continuous variables, respectively.
The CLR used the matching groups as fixed effects. All models adjusted for smoking status (never, former, or current cigarette smoker), employment in a high-risk occupation (occupations linked to the production of aromatic amines, rubber manufacture, exposure to dyes and printing in the textile industry, paint, aluminum, tanning and curing of hides, and the driving of motor vehicles (31)), and quartiles of fruit and vegetable consumption. Missing data in the categorical covariates were coded in a separate category and included in the analyses. We calculated 95% confidence intervals for the CLR estimations.
To determine the accuracy of risk estimates, we bootstrapped the confidence intervals of the models using 50 iterations (32). Unfitted sample matched sets were resampled, and confidence intervals of CLR models were adjusted using the bootstrap standard error correction (33). Variation above 10% in the standard error between the original CLR and the bootstrapped estimates was considered to represent instability. Statistical analyses were performed using Stata statistical software, release 12 (StataCorp LP, College Station, Texas) and the POSTRCSPLINE module developed by Marteen Buis (34).
RESULTS
Among all cases and controls (n = 2,490), only persons with a reliable or high-quality interview as reported by the interviewer (n = 2,213; 88.9%) and those with more than 70% of modeled THM data in the exposure window (n = 1,448; 58.2%) were included in the analyses. Original individual matching was broken because of exclusions in the final data set, and subjects were grouped according to matching strata in 83 pooled k1j:k2j groups. Ten groups (12 observations) were unmatched and were excluded, leading to 686 cases and 750 controls suitable for analysis. We compared data on case-control status, sex, age, high-risk occupation, and smoking status between the excluded and included groups. Statistically significant differences were found for age (the mean age of excluded subjects was 2.8 years higher than that of those included; P = 0.005) and smoking status (1.7% more current smokers were excluded from the analyses; P = 0.012). The median age at interview was 66 years, and 87.4% of participants were men (Table 1). Excess risks were found for former and current smokers. Subjects who reported higher fruit and vegetable intake were at lower risk of bladder cancer (Table 1).
Table 1.
Characteristic | Cases (n = 686) | Controls (n = 750) | Odds Ratioa | 95% Confidence Interval | ||
---|---|---|---|---|---|---|
No. | % | No. | % | |||
Sex | ||||||
Male | 603 | 87.9 | 652 | 86.9 | ||
Female | 83 | 12.1 | 98 | 13.1 | ||
Mean age, years | 64.57 (10.2)b | 63.87 (10.0) | ||||
Geographical area | ||||||
Alicante | 58 | 8.5 | 66 | 8.8 | ||
Asturias | 295 | 43.0 | 321 | 42.8 | ||
Barcelona | 118 | 17.2 | 137 | 18.3 | ||
Manresa | 26 | 3.8 | 26 | 3.5 | ||
Sabadell | 62 | 9.0 | 55 | 7.3 | ||
Tenerife | 127 | 18.5 | 145 | 19.3 | ||
High-risk profession | ||||||
No | 558 | 81.3 | 639 | 85.2 | 1.00 | Reference |
Yes | 128 | 18.7 | 111 | 14.8 | 1.29 | 0.96, 1.74 |
Smoking status | ||||||
Never smoker | 128 | 18.7 | 272 | 36.3 | 1.00 | Reference |
Former smoker | 276 | 40.2 | 315 | 42.0 | 2.64 | 1.89, 3.69 |
Current smoker | 282 | 41.1 | 163 | 21.7 | 6.01 | 4.21, 8.58 |
Ptrend | <0.001 | |||||
Quartile of fruit and vegetable consumption, g/dayc | ||||||
0–421.8 | 166 | 29.5 | 137 | 24.9 | 1.00 | Reference |
>421.8–671.0 | 148 | 26.3 | 135 | 24.5 | 0.88 | 0.63, 1.23 |
>671.0–1,000.6 | 142 | 25.2 | 138 | 25.0 | 0.77 | 0.55, 1.08 |
>1,000.6 | 107 | 19.0 | 141 | 25.6 | 0.59 | 0.41, 0.83 |
Ptrend | 0.023 |
a Odds ratios from conditional logistic regression models stratified by matched set. Cases and controls were matched on sex, age (in 5-year groups), and geographical area.
b Numbers in parentheses, standard deviation.
c Numbers do not total 686 and 750 because of missing data (included in a separate category).
Concentrations of TTHM and specific THM in the residences of study subjects varied among study areas (Table 2). Median levels of chloroform were elevated in Manresa (37.1 µg/L; interquartile range (IQR), 16.8–44.6), Barcelona (20.6 µg/L; IQR, 18.4–25.1), and Asturias (16.1 µg/L; IQR, 9.4–22.3), while the median bromoform level was elevated in Alicante (17.8 µg/L; IQR, 10.0–20.9), Sabadell (9.3 µg/L; IQR, 7.2–10.6), and Tenerife (2.5 µg/L; IQR, 1.9–3.1).
Table 2.
Area | No. | % | TTHM | Chloroform | BDCM | DBCM | Bromoform | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Median | IQRa | Median | IQR | Median | IQR | Median | IQR | Median | IQR | |||
All areas | 1,436 | 100.0 | 27.2 | 9.4, 49.8 | 15.0 | 3.4, 21.2 | 4.7 | 1.5, 16.8 | 0.9 | 0.5, 8.2 | 1.5 | 0.5, 3.9 |
Alicante | 124 | 8.6 | 72.1 | 64.8, 85 | 14.1 | 12.3, 16.1 | 23.2 | 20.2, 27.2 | 20.0 | 17.2, 23.3 | 17.8 | 10.0, 20.9 |
Asturias | 616 | 42.9 | 20.8 | 12.6, 28.3 | 16.1 | 9.4, 22.3 | 3.7 | 2.3, 4.9 | 0.6 | 0.4, 0.7 | 0.4 | 0.3, 0.8 |
Barcelona | 255 | 17.8 | 59.4 | 48.4, 71.8 | 20.6 | 18.4, 25.1 | 22.1 | 18.4, 25.1 | 9.6 | 7.6, 12.2 | 1.8 | 0.9, 11.4 |
Manresa | 52 | 3.6 | 49.3 | 35.1, 57.7 | 37.1 | 16.8, 44.6 | 8.9 | 6.3, 9.4 | 1.6 | 1.1, 1.7 | 1.9 | 1.6, 2.0 |
Sabadell | 117 | 8.2 | 44.2 | 38.8, 52.2 | 14.2 | 12.4, 18.6 | 11.2 | 9.8, 15.6 | 8.1 | 7.2, 9.6 | 9.3 | 7.2, 10.6 |
Tenerife | 272 | 18.9 | 4.0 | 3.1, 5.8 | 0.6 | 0.2, 1.0 | 0.5 | 0.4, 0.7 | 0.7 | 0.5, 1.0 | 2.5 | 1.9, 3.1 |
Abbreviations: BDCM, bromodichloromethane; DBCM, dibromochloromethane; IQR, interquartile range; TTHM, total trihalomethanes.
a 25th–75th percentiles.
Spearman rank correlation coefficients for correlations between individual THM and combinations thereof showed high variability, from negative correlations (−0.20) to high positive correlations (0.99) (Table 3). These correlations differed by area (see Web Table 1, available at http://aje.oxfordjournals.org/). For example, the TTHM-chloroform correlation ranged from 0.30 (P < 0.05) in Barcelona to 0.99 (P < 0.05) in Asturias. Partial Spearman correlations of residuals adjusted for area led to the following coefficients: 0.69 (chloroform-BDCM), 0.19 (chloroform-DBCM), −0.10 (chloroform-bromoform), 0.58 (BDCM-DBCM), 0.35 (BDCM-bromoform), and 0.70 (DBCM-bromoform).
Table 3.
Exposure Index | TTHM | TTHM (Weighted)a | Chloroform | BDCM | DBCM | Bromoform | Total Br-THMb | Cl-THM Scorec |
---|---|---|---|---|---|---|---|---|
TTHM (weighted) | 0.99* | 1 | ||||||
Chloroform | 0.75* | 0.70* | 1 | |||||
BDCM | 0.99* | 0.98* | 0.73* | 1 | ||||
DBCM | 0.76* | 0.80* | 0.33* | 0.77* | 1 | |||
Bromoform | 0.35* | 0.42* | −0.20* | 0.33* | 0.75* | 1 | ||
Total Br-THM | 0.94* | 0.96* | 0.60* | 0.93* | 0.90* | 0.58* | 1 | |
Cl-THM score | 0.50* | 0.44* | 0.90* | 0.49* | 0.03 | −0.50* | 0.33* | 1 |
Br-THM scored | 0.99* | 0.99* | 0.70* | 0.99* | 0.80* | 0.41* | 0.93* | 0.44e* |
Abbreviations: BDCM, bromodichloromethane; Br-THM, brominated trihalomethanes; Cl-THM, chlorinated trihalomethanes; DBCM, dibromochloromethane; TTHM, total trihalomethanes.
* P < 0.05.
a TTHM weighted by cytotoxicity: 0.4116 × chloroform + 0.3443 × BDCM + 0.7388 × DBCM + bromoform.
b BDCM + DBCM + bromoform.
c Cl-THM score from the principal-components analysis.
d Br-THM score from the principal-components analysis.
e Pearson correlation coefficient; shows independence between the Br-THM and Cl-THM scores from the principal-components analysis.
Results from the multivariate CLR models are shown in Table 4. The model using TTHM (in µg/L) as a surrogate measure of the mixture showed a monotonically increased risk of bladder cancer, with statistically significant associations in groups above the median (Table 4). Confidence intervals were highly stable after bootstrapping (standard error variation of 6%). The model using cytotoxicity-weighted TTHM (in µg/L) showed a similar pattern that was attenuated, but statistical significance at the P < 0.05 level was lacking (Table 4). The use of molar concentrations (pmol/L) produced odds ratios similar to those obtained using cytotoxicity weights (Table 4). The cytotoxicity-weighted model and TTHM in molar concentration were highly unstable after bootstrapping, with standard error variations between 20% and 31%. Models evaluating risks for specific compounds and the model including the 4 THM led to inconsistent and highly unstable dose-response relationships (results not shown), because of multicollinearity (variance inflation factors: chloroform, 2.18; BDCM, 13.37; DBCM, 14.80; and bromoform, 3.80). Grouping all brominated compounds and adjusting for chloroform in the same model solved multicollinearity (variance inflation factor = 1.49), and all point estimates showed a higher risk for chloroform (Table 4).
Table 4.
Model | No. of Cases | No. of Controls | Odds Ratioa | 95% Confidence Interval |
---|---|---|---|---|
TTHM, µg/L | ||||
Q1 (<9.4) | 159 | 200 | 1.00 | Reference |
Q2 (9.4–<27.4) | 166 | 193 | 1.22 | 0.8, 1.88 |
Q3 (27.4–<49.8) | 194 | 166 | 2.06 | 1.2, 3.55 |
Q4 (≥49.8) | 167 | 191 | 2.09 | 1.1, 3.98 |
Ptrend | 0.008 | |||
Cytotoxicity- weighted TTHM, µg/L | ||||
Q1 (<5) | 165 | 194 | 1.00 | Reference |
Q2 (5–<11.4) | 165 | 196 | 1.03 | 0.73, 1.44 |
Q3 (11.4–<24.8) | 192 | 165 | 1.65 | 0.89, 3.06 |
Q4 (≥24.8) | 164 | 195 | 1.56 | 0.7, 3.45 |
Ptrend | 0.083 | |||
Molar concentration of TTHM, pmol/L | ||||
Q1 (<0.06) | 162 | 198 | 1.00 | Reference |
Q2 (0.06–<0.21) | 167 | 191 | 1.17 | 0.73, 1.89 |
Q3 (0.21–<0.34) | 191 | 168 | 1.74 | 0.92, 3.29 |
Q4 (≥0.34) | 166 | 193 | 1.52 | 0.67, 3.45 |
Ptrend | 0.075 | |||
Chloroform and brominated THM, mutually adjusted | ||||
Chloroform, µg/L | ||||
Q1 (<3.4) | 160 | 199 | 1.00 | Reference |
Q2 (3.4–<15) | 176 | 183 | 1.43 | 0.83, 2.48 |
Q3 (15–<21.2) | 170 | 189 | 1.34 | 0.65, 2.75 |
Q4 (≥21.2) | 180 | 179 | 1.76 | 0.91, 3.39 |
Ptrend | 0.119 | |||
Brominated THM, µg/L | ||||
Q1 (<3.8) | 158 | 201 | 1.00 | Reference |
Q2 (3.8–<6.2) | 180 | 180 | 1.14 | 0.75, 1.75 |
Q3 (6.2–<29.1) | 181 | 177 | 1.04 | 0.68, 1.57 |
Q4 (≥29.1) | 167 | 192 | 1.05 | 0.55, 2 |
Ptrend | 0.773 | |||
Scores from principal- components analysis, mutually adjusted | ||||
Chlorinated THM score | ||||
Q1 (<−1.02) | 160 | 199 | 1.00 | Reference |
Q2 (−1.02 to <−0.09) | 176 | 183 | 0.77 | 0.53, 1.13 |
Q3 (−0.09 to <0.78) | 170 | 189 | 0.90 | 0.54, 1.48 |
Q4 (≥0.78) | 180 | 179 | 0.83 | 0.49, 1.39 |
Ptrend | 0.835 | |||
Brominated THM score | ||||
Q1 (<−1.20) | 168 | 191 | 1.00 | Reference |
Q2 (−1.20 to <−0.72) | 167 | 192 | 1.08 | 0.66, 1.79 |
Q3 (−0.72 to <0.77) | 185 | 177 | 1.67 | 0.71, 3.96 |
Q4 (≥0.77) | 166 | 190 | 1.74 | 0.69, 4.39 |
Ptrend | 0.190 |
Abbreviations: Q, quartile; THM, trihalomethanes; TTHM, total trihalomethanes.
a Odds ratios from conditional logistic regression models stratified by matched set. Cases and controls were matched on sex, age (in 5-year groups), and geographical area. Odds ratios were adjusted for smoking status (never, ever, or current smoker), ever having worked in a profession with high risk for bladder cancer (yes, no), and quartile of fruit and vegetable consumption (g/day).
The PCA reduced the 4 THM into 2 components explaining 94% of the variance (component 1: 68.1%; component 2: 25.5%). We refer to the first component as “brominated THM PCA score” because of a higher correlation with the 3 brominated THM (r > 0.52) than for chloroform (r = 0.24). We call the second component score “chlorinated THM PCA score” given its high correlation with chloroform (r = 0.90), a lower correlation with BDCM (r = 0.14), and negative correlation with the other constituents. The Pearson correlation coefficient for correlation between the 2 components was zero. The partial PCA of residuals adjusted for area showed similar results. Odds ratios for bladder cancer according to PCA scores showed a monotonic increase with the brominated THM scores and a flat association for the chlorinated THM scores, with wide confidence intervals (Table 4).
The exposure-response curves obtained from models with splines showed nonlinear associations in most cases (Web Figures 1 and 2). However, our tests indicated that the fit of these models was not statistically better than the fit of models using linear terms (i.e., fitting a straight line).
DISCUSSION
We found substantial variability in THM concentration and composition between areas. Varying correlations between individual THM species were found, and these correlations differed by area. The estimation of bladder cancer risk with separate THM species was not feasible because of multicollinearity, yielding unstable results. PCA converged into 2 data-driven and independent components: One correlated with brominated THM constituents, while the other mainly correlated with chloroform. Bladder cancer risk showed increasing dose-response relationships in models based on TTHM. Bladder cancer risk for specific THM constituents differed between models, and no consistent pattern was observed.
Given the differences in THM levels by geographical area, the use of area-stratified analyses was warranted, but statistical power was insufficient. Matched analysis overcame the effects of potential nuisance parameters not related to the outcome (matching variables, including area) in the linear models, avoiding the bias from pooled unconditional analyses (28). A potential impact on reported risks resulting from analyzing a subset of the original data set due to exclusions is unlikely, since excluded and included subjects differed only in terms of age and smoking. Smoking was unrelated to the exposure, and the potential effect of age was minimized through the matched analysis.
Models with the single THM and models with TTHM showed different methodological limitations. The estimation of associations for separate species was not feasible because of multicollinearity, yielding invalid models with unstable results. Estimates showed wide confidence intervals, inconsistent estimator signs, and negative β estimations in the multicollinear components (35). Imprecise exposure information has probably contributed to a lack of precision in estimating exposure to each of the THM species. In our data, bladder cancer risk showed increasing dose-response relationships in models based on TTHM, and bootstrapped estimations were stable. The use of cytotoxicity weights did not modify the general trend of the results. Similarly, use of molar concentration gave similar results. Both cytotoxicity-weighted and molar models showed wider confidence intervals than TTHM. To our knowledge, these 2 approaches have not been attempted in previous studies.
Separation of chlorinated and brominated compounds may offer a biological and statistical solution to separate compound estimations. This separation has been applied to account for differential toxicity in other studies (36). This separation is biologically plausible because of these compounds’ probable differential mechanisms of action (22, 23, 37, 38). In addition, this is statistically a solution to multicollinearity. We explored 2 different options to separate chlorinated and brominated compounds, and these gave different results. Our first approach was to separate chloroform and the sum of 3 compounds with bromide. The second approach was data-driven, using a PCA. The PCA led to 1 score primarily representing brominated THM constituents and another that was almost exclusively associated with chloroform. In the former, a stronger bladder cancer risk was found for chloroform compared with brominated compounds, while the opposite occurred in the latter. Exposure-response curves using the sum of brominated compounds showed a flat association, while the PCA approach showed a steeper slope for brominated scores. Although we expected similar results from both approaches, the data appeared to be treated differently, leading to divergent results. Differences appeared because the data were treated in different ways. In the first approach, raw and area-adjusted data showed results for chloroform and brominated compounds totally separated. In the second, PCA components used predicted z scores, which do not separate the 4 compounds completely. The first component, “brominated THM PCA score,” actually included chloroform as part of the calculation. The second component was downscaled because of negative correlation for the brominated compounds (39–41). In addition, the PCA approach is sensitive to normality issues and outliers. The 4 THM were not normally distributed, with long right tails and several outliers affecting PCA calculations. None of the results from the models were statistically significant.
The correlation between DBPs has been previously evaluated in different settings (25, 37, 42). Other studies with multiple areas have shown divergent correlations between compounds and poor correlation of bromoform levels with the other compounds (43). Brominated exposure metrics have been used more often in studies of fetal and gestational outcomes (36). PCA regression has been used only to predict formation of THM under specific conditions for water purveyors, but to our knowledge principal-components logistic regression has not been used in formal epidemiologic analyses of DBP (12, 44, 45). In studies of other environmental exposures, such as polychlorinated biphenyls, data-driven scores including PCA and Newton-Raphson search techniques have been used to weight the relative contributions of individual compounds (46).
We followed some of the strategies proposed by Samet (47) to analyze complex mixtures with different underlying toxicological assumptions. Analysis of complex mixtures is a challenge in environmental epidemiology and has been underexplored in the field of health risks related to DBPs (38). A major challenge is to improve the precision of historical DBP exposures, which vary in space and time, increasing misclassification of exposure (26, 48). In addition, it is uncertain how THM levels are correlated with other carcinogens present in the mixture (nitrogenated DBPs, mutagen X (MX), iodinated THM, etc.). All of these exposure assessment issues limit our ability to estimate risk more precisely. Other statistical approaches offer limited help in disentangling associations of cancer risk with individual compounds in the current setting. Multivariate techniques such as factor analysis, cluster analysis, and discriminant analyses are alternatives for use in larger data sets with multiple surrogates, as seen in the PCB literature (46). Artificial neural networks and semi-Bayesian approaches are promising alternatives for dealing with highly correlated compounds that deserve to be further explored in the future (36, 49).
To overcome these limitations, solutions go beyond statistical tools. DBPs appear in variable, complex, and diluted mixtures with an important unidentified fraction (50). Hence, improved exposure assessment is necessary, based on better surrogates or extensive data about other DBPs to refine the estimates. Furthermore, the retrospective assessment of lifetime exposures is prone to important biases. Information biases, including recall bias, hamper precise estimation of risks. In our study, we used reliable, high-quality interviews in an effort to minimize these biases. Finally, the lack of information on other compounds in the mixture hindered an evaluation of how much of the mixture was due to THM or other DBPs.
In summary, the estimation of risks for specific THM is hampered by the varying composition of the mixture, correlation between species, and imprecision of historical estimates. In the absence of better information, TTHM were a better proxy of DBP exposure than separated THM in our data. However, TTHM may convey bias due to varying composition in time and space. Toxicity adjustment using biology-based weights for the components of the mixture assumes extensive experimental data, which is not the case for THM. In addition, the predominance of 1 component (usually chloroform) in many areas gives results similar to those of TTHM. The use of other methods depends heavily on the distribution and correlation of specific constituents in each area. Given that results may differ considerably depending on the methods used, we would suggest that investigators analyzing water DBPs evaluate and present results from more than 1 model. The relationship of TTHM to the most toxic elements of the mixture may vary from region to region, and therefore among studies. We thus recommend that researchers in other studies explore a variety of models to select the best way to analyze their data, stating clearly the potential limitations and how the challenging statistical issues involved in exploring this question are handled.
ACKNOWLEDGMENTS
Author affiliations: Centre for Research in Environmental Epidemiology, Barcelona, Spain (Lucas A. Salas, Manolis Kogevinas, Cristina M. Villanueva); Hospital del Mar Medical Research Institute, Barcelona, Spain (Lucas A. Salas, Manolis Kogevinas, Cristina M. Villanueva); Department of Experimental and Health Sciences, Pompeu Fabra University, Barcelona, Spain (Lucas A. Salas); Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland (Kenneth P. Cantor, Nathaniel Rothman, Debra Silverman); Department of Preventive Medicine and Public Health, Faculty of Medicine, University of Oviedo, Oviedo, Spain (Adonina Tardon); CIBER Epidemiología y Salud Pública, Barcelona, Spain (Adonina Tardon, Manolis Kogevinas, Cristina M. Villanueva); Centre of Research in Occupational Health, Universitat Pompeu Fabra, Barcelona, Spain (Consol Serra); Consorci Hospitalari del Parc Taulí, Sabadell, Spain (Consol Serra); Medical Oncology Department, Ramon y Cajal University Hospital, Madrid, Spain (Alfredo Carrato); Research Unit, Canarias University Hospital, La Laguna, Spain (Reina García-Closas); Spanish National Cancer Research Centre, Madrid, Spain (Núria Malats); and National School of Public Health, Athens, Greece (Manolis Kogevinas).
This study was supported in part by the Intramural Research Program of the US National Institutes of Health, National Cancer Institute, Division of Cancer Epidemiology and Genetics (contract NCI NO2-CP-11015). The project also received funding from the Spanish Health Ministry (grants FIS/Spain 00/0745 and ISIII-GO3/174) and the European Union (grant BMH4-98-3243). The current analyses were supported by an Erasmus Columbus Master Scholarship (grant 2009-5123/001-001-ECW to L. A. Salas) and a Colciencias PhD Scholarship (grant 529/2011 to L. A. Salas).
We thank Dr. Francisco X. Real for his contribution to the study design and Dr. Xavier Basagaña for statistical advice on the data analysis.
Conflict of interest: none reported.
REFERENCES
Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press
Full text links
Read article at publisher's site: https://doi.org/10.1093/aje/kwt009
Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/aje/article-pdf/178/4/652/17341005/kwt009.pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1093/aje/kwt009
Article citations
Disinfection By-Products in Drinking Water and Bladder Cancer: Evaluation of Risk Modification by Common Genetic Polymorphisms in Two Case-Control Studies.
Environ Health Perspect, 130(5):57006, 10 May 2022
Cited by: 5 articles | PMID: 35536285 | PMCID: PMC9088962
A review on the 40th anniversary of the first regulation of drinking water disinfection by-products.
Environ Mol Mutagen, 61(6):588-601, 19 Jun 2020
Cited by: 10 articles | PMID: 32374889 | PMCID: PMC7640377
Review Free full text in Europe PMC
An Assessment of Current and Past Concentrations of Trihalomethanes in Drinking Water throughout France.
Int J Environ Res Public Health, 15(8):E1669, 06 Aug 2018
Cited by: 3 articles | PMID: 30082664 | PMCID: PMC6121592
Mapping the distribution of tick-borne encephalitis in mainland China.
Ticks Tick Borne Dis, 8(4):631-639, 17 Apr 2017
Cited by: 23 articles | PMID: 28461151
A two-stage predictive model to simultaneous control of trihalomethanes in water treatment plants and distribution systems: adaptability to treatment processes.
Environ Sci Pollut Res Int, 24(28):22631-22648, 15 Aug 2017
Cited by: 0 articles | PMID: 28812243
Go to all (10) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
LINE-1 methylation in granulocyte DNA and trihalomethane exposure is associated with bladder cancer risk.
Epigenetics, 9(11):1532-1539, 01 Nov 2014
Cited by: 14 articles | PMID: 25482586 | PMCID: PMC4622716
DNA methylation levels and long-term trihalomethane exposure in drinking water: an epigenome-wide association study.
Epigenetics, 10(7):650-661, 01 Jan 2015
Cited by: 13 articles | PMID: 26039576 | PMCID: PMC4622514
Colorectal Cancer and Long-Term Exposure to Trihalomethanes in Drinking Water: A Multicenter Case-Control Study in Spain and Italy.
Environ Health Perspect, 125(1):56-65, 06 Jul 2016
Cited by: 16 articles | PMID: 27383820 | PMCID: PMC5226692
Polymorphisms in GSTT1, GSTZ1, and CYP2E1, disinfection by-products, and risk of bladder cancer in Spain.
Environ Health Perspect, 118(11):1545-1550, 01 Nov 2010
Cited by: 76 articles | PMID: 20675267 | PMCID: PMC2974691