## Abstract

The uncertainty associated with the determination of load parameters, which is a key step in the design of wastewater treatment plants (WWTPs), was investigated on the basis of data sets from 58 WWTPs. A further analysed aspect was the organic load variations associated with variable sewage temperatures. Data from 26 WWTPs with a high inflow sampling frequency was used to simulate scenarios to investigate the effect of lower sampling frequencies through a Monte Carlo approach. The calculation of 85-percentile values for chemical oxygen demand (COD) loadings based on only 26 samples per year is associated with a variability of up to ±18%. Approximately 90 samples per year will be necessary to reduce this uncertainty for estimation of COD loadings below 10%. Hence, a low sampling frequency can potentially lead to under- or overestimation of design parameters. Through an analogous approach, it was possible to identify uncertainties of ±11% in COD loading when weekly average data was used with four samples per week. Finally, a tendency to lower COD input loads with increasing temperatures was identified, with a reduction of about 1% of the average loading per degree Celsius.

## HIGHLIGHTS

Uncertainty of statistical measures for COD loads relating to sampling frequency.

Determination of yearly 85-percentile values from COD loads with one weekly sample results in an up to ±13% uncertainty.

Calculation of weekly averages of COD loads with four samples per week is associated with an uncertainty of up to ±11%.

Relative influent COD loading decreases with higher temperatures.

### Graphical Abstract

## INTRODUCTION

The prognosis of the ongoing processes within domestic wastewater treatment plants (WWTPs) is a highly complex task that requires identifying uncertainties at several levels and from different sources (Oliveira & von Sperling 2008; Belia *et al.* 2009). With respect to the design of these facilities, the determination of organic and nutrient loadings from sewage is a fundamental step, which requires a careful consideration of its associated uncertainty. In locations with pre-existing sewage systems, the design loadings are commonly derived from a pool of inflow samples, whereby preferentially volume proportional samples should be used for load calculations. The combination of average values and peak loading factors are commonly suggested as the basis for design of WWTPs (e.g. Tchobanoglous *et al.* 2013). Alternatively, percentile values can be used to provide an estimation of design loadings. Thereby, different guidelines for the design of WWTPs recommend the use of percentile values to quantify loading parameters (Henze *et al.* 2002; ATV-DVWK 2003). In an international overview of activated sludge dimensioning guidelines – comprising design models from the United States, Denmark, Germany, Austria, Switzerland, Japan, and South Africa – the 50, 60, 80, and 90% percentiles are additionally reported for calculation of oxygen demand and nutrient loadings (Wichern 1996). However, information regarding the determination of an ideal sampling frequency for the derivation of organic and nutrient loading parameters is scarce. So far, no information is available on how the sampling frequency affects the accuracy of input parameters; for example, what is the gain in a parameter's accuracy when derived from 100 inflow samples in comparison to 50 inflow samples. This is an important question for understanding the quality of the design data, dimensioning of WWTPs and for planning cost-effective measuring campaigns. For instance, the quantification of loadings from antibiotics with an accuracy of ±20% might require from 10 to 160 samples per year depending largely on the target substance seasonality (Marx *et al.* 2015).

Regarding simulation studies, several recommendations and strategies to incorporate stochastic influent variations and prognostic uncertainties exist (e.g. Ort *et al.* 2005; Martin & Vanrolleghem 2014; Talebizadeh *et al.* 2016). Devisscher *et al.* (2006) suggested the use of seasonal average load data calculated from at least 2 years of operation data to perform simulations for cost estimation and evaluation of control strategies. Their recommendation is to use only values that lie within the 5 and 95% percentile limits for calculating average load. Moreover, the calibration of WWTPs models also is very dependent on the amount, and quality of influent data. Hence, the number of samples and tests should be kept at a good but minimal level to allow cost-effective use of the models (Hulsbeek *et al.* 2002). Still, recommendations for sampling campaigns have a much stronger focus on the wastewater characterization than on estimation of statistic measures for design purposes.

In this work, we provide a quantification of expected variance ranges of key inflow parameters for design of WWTPs on dependence of the sampling frequency. The chemical oxygen demand (COD) is the reference parameter to determine influent organic loadings for this present analysis. This is in line with most of the WWTP design guidelines (e.g. DWA 2016) and simulation models (e.g. Hauduc *et al.* 2010). This analysis concentrates exclusively on the variability of the measured data, and is not considering other sources of uncertainty and potential systematic errors, such as precision of flow measurement and of the analytical techniques or sampling conservation at the automatic auto sampler (Belia *et al.* 2009; Rieger *et al.* 2012). The influence of the sample pool size was investigated for two representative statistic measures for dimensioning and modelling purposes: the yearly 85-percentile and weekly averages. Additionally, a correlation analysis between wastewater temperature and influent COD load was performed. The main hypothesis for this analysis was that the degradation rates in the sewage system increase at higher temperatures and that this correlation can be visualised in a large sample collective from different WWTPs. As discussed by Ahnert *et al.* (2005), the temperature dependency may also have important implications for the determination of degradable and non-degradable organic material fractions.

## MATERIALS AND METHODS

### Database

Inflow data from a total of 58 activated sludge WWTPs (52 in Germany and six in Switzerland) was available for this study. The data was collected by members of the work group KA 6.4 (*Deutsche Vereinigung für Wasserwirtschaft, Abwasser und Abfall e.V*., German Water Association; DWA) and anonymized prior to this study. Together these 58 WWTPs correspond to an equivalent loading of ∼14 million people equivalent with a total of 64,705 daily COD loadings data points, which were determined on the basis of 24-h composite samples. Hence, the organic loading was obtained by multiplication by the total flow rate of a day and the COD concentration from the composite sample on this day. Data series with samples at the effluent of grit chambers and of primary clarifiers were both considered for these analyses. However, only a single sample point per WWTP was used. Thus, the number of sampling points at Figure 1 refers only to the sampling location with the largest pool of data. A threshold of 260 samples per year (or five samples per week) was defined to indicate WWTPs with quasi-complete data sets, which could be used for the stochastic reduction of the sampling pool. Data series from WWTP connected to separate and combined sewer systems and with variable contribution of industrial wastewaters were considered here. These sewage system characteristics were not further considered.

### Calculation platform

All the calculations were performed using the software MATLAB (release 2017b; MathWorks, USA). The Monte Carlo simulation used the function *randsample* for randomized sample selection. The calculation of the percentile value was performed with the function *prctile* in the MATLAB software. In short, this method performs a sorting of the vector accessing for each data point a percentile proportional to the sample size. All other percentiles in between are interpolated.

### Quantification of the effects of the sampling frequency on determination of the COD loading

*et al.*(2015) that used a randomized sample pool reduction to determine the accuracy of antibiotic loads at WWTPs. For each annual data series and sampling frequency, 10,000 reduced data sets were randomly generated, and the 85-percentile of the daily COD loads was calculated for each of this new stochastic COD load series. The relative standard deviation (RSD) between all 85-percentile values was adopted as a direct measure for uncertainty associated with the sampling frequency:where and are respectively the standard deviation and arithmetic mean from all 85-percentiles determined for the corresponding data set and sampling frequency. The upper threshold of this range, the 95-percentile (RSD

_{95}), is used to access the nominal uncertainty by estimation of design parameters.

The stochastic re-sampling of the yearly data sets was performed through two different year-matrix structures: (i) 52 weeks with 7 days; and (ii) 13 pseudo-months with 28 days. In this way, the removal of sampling points was always uniformly distributed – for example, 208 samples per year result in a distribution of four samples per week or 16 samples per pseudo-month (Figure 2(b)) – and a seasonal over-representation was avoided (e.g. more samples in summer than winter). All the years started on 1 January and no distinction between weekdays was considered.

The RSD by calculation of weekly average COD load was performed analogously to the uncertainty analysis for determination of 85-percentile COD loadings. Therefore, first all weeks with measurements available for all days were selected from all the 58 WWTPs data sets. A total of 3,179 weeks with 7-data points were available for this analysis. Through a Monte Carlo simulation, these data sets were reduced to lower sampling frequencies of 1, 2, 3, 4, 5, and 6 samples per week. For each week and sampling frequency, 10,000 reduced data sets were generated and the RSD between their average values was calculated with Equation (1).

### Wastewater temperature and COD loading measurements

_{COD}) and temperature data pairs were available for this analysis, whereby 79% of them also included data for phosphorous loading. This was used as a control variable as no net phosphorous removal within the sewer system is expected. Additionally to the phosphorous loads (L

_{P}), the P to COD ratios (R

_{P/COD}) were analysed. In order to allow an integrated analysis of all data points, each daily loading and R

_{P/COD}(L

_{COD,j,i}/R

_{P/COD,j,i}) was normalized (L

_{COD,norm,j,i}/R

_{P/COD,norm,j,i}) in relation to the average COD load from its respective WWTP (L

_{COD,mean,i}/R

_{P/COD,mean,j,i}):where i indicates the WWTP identifier, j each pair of the WWTP, and x the reference variable (COD or phosphorous). In the next step, the data points of all WWTPs were aggregated and re-organized in temperature clusters that were discretised in 1 °C intervals. From these temperature-defined data sets, the arithmetic relative loadings, the R

_{P/COD}ratio for each temperature (L

_{COD,mean,T}/R

_{P/COD,mean,T}) and also the 85-percentile from the COD loadings (L

_{COD,p85,T}) were calculated:where T indicates the temperature and n

_{T}is the number of data points for each temperature cluster.

## RESULTS AND DISCUSSION

### Sampling frequency effects on calculation of COD loadings

The analysis with data of 26 WWTPs, totalizing 92 individual years with at least 260 samples per year, indicated that the re-sampling using weekly and monthly based data structures results in very similar RSD values (Table S1 in supplementary materials). The month-based sampling results in 0.1 to 0.6% higher RSD_{95} values. Due to the convergence of both methods for discretization of the yearly data, they were integrated in one graphic (Figure 3(a)). The lowest sampling frequency of 13 samples per year resulted in an RSD_{95} of up to 25% within the 95% confidence threshold. Increasing the sampling frequency to one sample per week almost halved the RSD_{95}, and at least 208 samples per year are necessary to have an RSD_{95} below 5%. To reach a similar RSD_{95} for determination of weekly values, six samples are necessary. A utilization of four samples in a week, which is the minimal number of samples according to the German standard A198 that defines how to determine the influent parameters for WWTPs (ATV-DVWK 2003), results in an RSD_{95} of about 11%.

An increase of the catchment area and number of connections to the sewer system is expected to result in lower stochastic variation of the organic loadings at the influent of WTTPs (Tchobanoglous *et al.* 2013; DWA 2016). Although the three largest WWTPs of this study (COD loading above 80 ton/d) are within the lowest RSD range (Figure 4), it is not possible to confirm this trend here. The correlation between the RSD for 52 and 104 samples per year results in a coefficient of determination (R^{2}) of 0.17 and 0.16, respectively. This low R^{2} can be explained by the irregular distribution of the available WWTPs with high influent sampling regarding their COD loadings. The range from 3 to 20 ton COD per day represents more than half of the WWTPs (15 from 26). Moreover, it is important to underline that the number of yearly data series from each WWTP varied from 2 to 7 years (Figure 4(a)), and that the analysis integrates two sampling points (effluent grit chamber and influent activated sludge tanks). Thus, these factors limit the identification of potential correlations between RSD and specific characteristics of the WWTPs.

### COD loadings and temperature

In the visualization of the results for the COD load's dependency on the wastewater temperature, a distinction was made between temperature intervals with a high and low number of data points (Figure 5). Therefore, a threshold value of n_{T} = 400 was defined and helps to discard potential artefacts that might result from low data density. The same n_{T} threshold was assumed for the P data (Figure 5(b)). Due to the ∼20% daily data points without information regarding P loads, the L_{P} and R_{P/COD} has a tighter temperature interval with a high number of data points.

Overall, there is a tendency for lower loads to reach the sewage treatment plant with rising temperatures. Considering the high-density data interval from 10 to 23 °C, the average relative COD decreases 14%. This decline is more accentuated for the relative 85-percentile values, with a 19% lower loading at 23 °C than at 10 °C (Figure 5(a)). This is in line with the expected increased biological activity within the sewage system (Ahnert *et al.* 2005; Sun *et al.* 2018). However, there is also a similar trend for the phosphorous loads (Figure 5(b)), which decreases 7% between 11 to 22 °C. In view that no net phosphorous losses from the wastewater should be possible during the transport to the WWTP; that is, no long-term phosphorous accumulation or release as gas, this might point to seasonal variability of the organic loadings. Still, the higher temperature dependence of the COD loads in comparison to the phosphorous loads (Figure S1) combined with the concomitant increase of the R_{P/COD} (4% from T = 11 to 22 °C) for higher temperatures are important indicators of biological degradation of organic matter.

### Implication of these uncertainties estimations for the design of WWTPs

Regarding the determination of organic loadings at the influent, there are irreducible and reducible uncertainties sources (Belia *et al.* 2009). While variations due to scenario prognostics including weather and demographic changes lead to an intrinsic uncertainty, the variability of an existing system can be evaluated through intensive monitoring. Indeed, according to Figure 3(a), the uncertainty range by estimation of organic loads is reduced in an exponential-like curve by increasing the sampling frequency from 13 to 104 samples per year with an RSD_{95} drop from 24.9 to 8.4%. Thereafter, the correlation follows a quasi-linear behaviour and the RSD_{95} decreases to 2.9% for 312 samples per year.

Nonetheless, despite providing this uncertainty quantification in terms of the RSD_{95} values, the translation of this information to practical design might be not completely straightforward. First, because this analysis does not provide any further support to define what is an acceptable uncertainty level for the design of WWTP; that is, how much uncertainty by the estimation of the 85-percentile COD loads can be afforded? Therefore, the effects of these stochastic variations on the discharge characteristics need to be further considered (Oliveira & von Sperling 2008). Second, the absolute variations were dependent on the calculation approach developed here. The adoption of other thresholds for the quasi-complete data sets, the differentiation of working days and weekends, rain and dry weather conditions or merging the WWTP independent yearly data series (instead of separating each WWTP, as in Figure 4) would potentially affect the RSD calculations. However, further modifications or improvements to this calculation might require an expansion of this data set to include more WWTPs with high sampling frequency. Therefore, the proposed methodology (Figure 2) was developed aiming to use as few assumptions as possible to sort the data and to allow for an integrated analysis of all yearly data series. Indeed, it is expected that a larger pool of WWTPs with quasi-complete data sets would allow for identifying the dependence of the influent load variations to other variables, such as the catchment area size (Figure 4), according the sewer systems (separate and combined sewer systems), to the density of industries or to climatic conditions.

Conversely, the calculation of RSD_{95} resulting from missing samples for weekly average values could rely on a much larger data pool – 3,179 full-data weeks – than the yearly percentile calculations. A reduction of the complete week data series down to two samples per week results in almost linear increase of the RSD_{95} by ∼4% per missing sample. Interestingly, this data provide an estimate of the potential variation in the interpolation of weekly measurements from the RSD_{95} with a single sample per week of ±32.1%. Data interpolation is often reported in simulation studies. These loading variation uncertainties can be integrated with other process-related uncertainties in model-based studies for WWTPs operation and optimization, for example: Sin *et al.* (2009).

The relatively lower COD loadings at higher temperatures highlight the importance of considering seasonality and potential degradation processes within sewer systems. The approximate decrease of 10% COD load per 10 °C (Figure 5) shows the quantifiable impact of these temperature effects for the organic loads at WWTPs. A pre-degradation of the biodegradable wastewater fraction within the sewer system might increase the influent inert COD fractions. This would led to important implications for the operation of WWTPs, affecting the sludge production, the oxygen demand, and denitrification processes, among other aspects (Wichern *et al.* 2002; Ahnert *et al.* 2005). Moreover, in combined sewer systems, rain and dry weather conditions can have an important influence on the transport of solids (Lange & Wichern 2013), which might affect the COD loads. Hence, these climatic conditions; that is, temperature and precipitation, have potential overlapping effects within this analysis. Therefore, seasonal dependencies and wastewater temperature are important factors to be considered during sampling campaigns.

## CONCLUSIONS

The methodology for RSD determination presented here provides an approach to quantify the uncertainty of the influent COD loading at WWTPs with different sampling frequencies. Thereby the 95-percentiles of the RSD from 92 yearly data sets with at least 260 measured points was used as an uncertainty measure. About 90 samples per year are necessary to assure a variability below ±10% for the estimation of the 85-percentile value for the COD loading. The estimation of a weekly average COD loading with a similar precision requires four daily loadings per week. Although the implications of these estimated uncertainties are not fully addressed here, this study provides a novel reference from real data for the potential deviations of statistic measures for organic loadings that might arise with insufficient sampling. These findings can be used in simulation studies as well as to interpret data or sampling campaigns on WWTPs.

This method of stochastic reduction of sample pools can be implemented to any other WWTP, or sewer system. Also for different parameters, if data series with a sufficient sampling frequency are provided. Furthermore, the scope of this analysis might be expanded in future investigations by the identification of the probability distribution of the data and its corresponding parameters. The analysis, based on a large sample pool of WWTPs, also allows for identification of potential effects from increased wastewater temperatures and organic loading variations. However, further investigations including a detailed overview of the catchment area are necessary to provide a comprehensive description of the underlying degradation mechanisms and to disclose effects from precipitation events (e.g. Rodríguez *et al.* 2013).

## ACKNOWLEDGEMENTS

The described investigation is part of the revision of German manual ATV-DVWK (2003) by the DWA technical board KA 6 and its work group KA 6.4. The authors would like to thank all the WWTP operators, which provided the data, and the KA 6.4 members: M. Ahnert, K. Alt, M. Blunschi, S. Keller, M. Klingel, R.-L. Lange, K.-H Rosenwinkel, B. Teichgräber, D. Thöle, A. Spindler, and T. Schmitt. The financing by the German Federal Ministry of Education and Research (BMBF; Project 02WA1450B) is acknowledged by T. Gehring. We also thank the suggestion of an anonymous reviewer to include the phosphorous data in these analyses.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## REFERENCES

*Internationaler Vergleich Verschiedener Bemessungsverfahren von Kläranlagen auf Biologische Stickstoffelimination (International Comparison of Different Design Methods of Wastewater Treatment Plants for Biological Nitrogen Elimination)*

*Diploma thesis*