3b. Expert survey reliability: theory
In TIDE, the ES assessment is entirely based on expert surveys and inventories. A first survey, implying 27 estuarine users and stakeholders from regional working groups of all four estuaries, was used to determine the focal ecosystem services, demand of ecosystem services, and trends in these services. A second survey, implying 12 professional experts in estuarine functioning, was used to provide information on the supply of ES.
In the ecosystem service based management (EBM) approach of Granek et al (2010), “the benefits that make services relevant to human wellbeing” are regarded as “end-use demand”. Although we do not quantify the material demand for a certain amount of service or benefit, and consequently this cannot be an economic demand for a certain quantity of service, the assigned value can be regarded as a representation of a societal demand of a given service relative to other services. “By definition, an ecosystem service is only a service, if there is a benefit. This means, there must be a certain demand by people to use a particular service. [These demands] can be derived from statistics, modeling or interviews […] [and] transferred to a scale similar to the one used for ecosystem services supply [..].(Burkhard et al 2012). This is an important concept since it finally determines priorities for conservation and restoration of service providing units.
As “values may be assigned heterogeneously by people over the landscape” (Norton and Hannon, 1997 in Bryan et al 2010), it is essential to account for local variability in this demand value. We adapted the survey approach in order to account for spatial (estuaries, salinity zones) as well as temporal aspects. The surveys were performed for every estuary, along four common salinity zones, and for historical (ca. 1900), present and future (ca. 2050) times.
An essential but often overlooked aspect in using expert data are the scientific checks of consistency and agreement among raters (or rater groups, in this case estuarine regional groups), the argumentation of validity by comparing results to other data sources or observed patterns as well as describing the experts’ basic background. This is crucial before interpreting results of the survey, but also to verify whether data can be extrapolated to other systems or if the survey -as a tool- is reliable.
Within the TIDE approach, both statistical procedures, assessment of general patterns and argumentative confidence are verified.
In statistics, Cronbach's alpha (Cronbach 1951) is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a test score for a sample of examinees. Cronbach's alpha statistic is widely used in the social sciences, business, nursing, and other disciplines. A commonly accepted rule of thumb for describing internal consistency using Cronbach's alpha is as follows (George et al 2003, Kline 1999):
|α ≥ 0.9
|0.8 ≤ α < 0.9
|0.7 ≤ α < 0.8
|0.6 ≤ α < 0.7
|0.5 ≤ α < 0.6
|α < 0.5
Some professionals (Nunally 1978) as a rule of thumb, require a reliability of 0.70 or higher (obtained on a substantial sample) before they will use an instrument.
Intraclass correlation coefficient (ICC)
Another statistic which is prominently used for assessment of consistency or reproducibility of measurements made by different observers, is the intraclass correlation coefficient (abbreviated ICC, Koch 1982). ICC can be applied when quantitative measurements are made on units organized into groups. It describes how strongly units within the groups resemble each other. While ICC is a correlation, unlike most other correlation measures it operates on groups, rather than on paired observations. ICC might thus be a more appropriate evaluation of the TIDE survey methods, which clearly assessed grouped data (estuaries, zones, habitats).
The test can be performed in several ways depending on the conditions (see R package irr version 0.82). When considering which form of ICC is appropriate for an actual set of data, one has to take several decisions (Shrout & Fleiss, 1979):
- Should only the subjects be considered as random effects ('"oneway"' model) or are subjects and raters randomly chosen from a bigger pool of persons ('"twoway"' model). We have chosen a twoway model.
- If differences in judges' mean ratings are of interest, interrater '"agreement"' instead of '"consistency"' should be computed. We have performed both.
- If the unit of analysis is a mean of several ratings, unit should be changed to '"average"'. In most cases, however, single values (unit='"single"') are regarded. As the score was a consensus (debated mean) of expert groups, the former is the case in our survey.
Finally, it has been shown that alpha (or ICC) can return high values even when several unrelated latent constructs are measured (e.g., Cortina, 1993; Cronbach, 1951; Green et al 1977; Revelle, 1979; Schmitt, 1996; Zinbarg et al 2006), such as our different estuaries could be considered. It is only appropriately used when the items measure different areas within a single construct. When more than one construct is measured, the coefficient omega_hierarchical (omegaH) is more appropriate (McDonald, 1999; Zinbarg et al, 2005).
Let us clarify this for the TIDE surveys. If our estuaries are considered different (functioning) systems, alpha (and ICC) are not appropriate. However, if they are similar entities of “the industrialized estuary”, the omegaH should yield about the same result as alpha and ICC.
OmegaH is a much more complex procedure involving factor analysis, which is a field of statistics related to principle component analysis. To find omega it is necessary to do a factor analysis of the original data set, rotate the factors obliquely, do a Schmid-Leiman transformation, and then find omega. In de R package psych (version 1.0-85), McDonalds provides the code to do this.
Traceability and argumentative confidence
When surveying for societally relevant answers, a broad survey is appropriate. However, apart from the number of respondents (which logically is as big as possible), their affinity is the key issue. It is essential to include stakeholders from different sectors. This is called segmentation. Theoretically, the number of respondents should be increased until a saturation point (no more differing ‘stakes’) is reached. In anonymous surveys, the number of representatives per sector is also important. In open group surveys, consensus scorings can be obtained, and the number of respondents per sector is far less important than their authority and expertise level.
When surveying for specialized information or scientific knowledge to fill data gaps and obtain scientifically supported qualitative statements, the number of respondents is irrelevant. Two specialists will generate data with a much higher confidence level than a hundred laymen which are no experts in the matter concerned. However, checks on confidence of this kind of surveys are crucial (Van Crombrugge 2002). Miedema (1988) and Van Ijzendoorn (1988) distinguish technical and argumentative confidence.
Confidence points towards the exact determination of the possibility to repeat a certain aspect of the research. Argumentative confidence is the non-quantitative indication of repeatability of the research process whenever exact repeatability cannot be determined (Van Ijzendoorn et al. 1986). For this kind of research, “traceability” is a more adequate term to evaluate confidence (Smaling 2004).This traceability accounts for the collection as well as for the analysis of the data. A well-known example of this kind of semi-quantitative research confidence evaluations is the IPCC research and their uncertainty approach, in which accordance of evidence is one of the features evaluated.
For the TIDE surveys, maximal transparency on survey questions, respondents and analyses is provided, as well as cross-checks of emerging patterns with physical reality.
Back to top