Case Study

The Delphi method is a form of nominal group process which originated with the Rand Corporation, as part of a defense research project in the 1950s (Linstone and Turoff 1975). Variations on the original technique were developed during the 1960s, with Delphi viewed as an alternative to other methods, such as surveys or conferences, used to tap expert opinion (Jones and Twiss 1978). Delphi quickly became a popular and widely accepted methodology, and during the 1960s and 1970s, literally hundreds of groups conducted Delphi processes, addressing a wide variety of concerns (Weingand 1980).

In a general sense, the Delphi method is defined as "a method for structuring group communication, so that the process is effective in allowing a group of individuals, as a whole, to deal with a complex problem" (Linstone and Turoff 1975). Perhaps most widely known for its application to technology forecasting, the Delphi method has also been used extensively as a policy development tool (Turoff 1975), applied to such diverse causes as forecasting future demand and supply of timber (Greene and Siegel 1994), developing policies for education (Bradley 1977), or refinement of wilderness management techniques (Merigliano 1987).

Although the Delphi method has continued to enjoy a measure of acceptance, it is not without its critics. In particular, specific Delphi processes have been criticized for inappropriate analysis techniques and exaggerated claims of accuracy when used in predictive applications (Sackman 1975). Nevertheless, the case has also been made that, if properly used, the Delphi method is quite effective, especially as a tool for the study of subjectively defined problems (Parden 1975). Extensive critical studies, testing Delphi with solvable problems and comparing Delphi to other similar methods, have in fact substantiated the method's overall soundness (Delbecq, Van de Ven, and Gustafson 1986; Van de Ven 1974; Weingand 1980), so long as the subjective nature of the technique is properly regarded as a constraint in projecting or extrapolating the acquired information.

One key aspect of all Delphi applications is that they entail an "iterative procedure of judgment making" (Jones and Twiss 1978) designed to elicit subjective expert opinion in a manner which is, in phenomenological terms, not merely "deductive," but "disclosive" as well (Jones 1989). Of the more broadly defined technique group inquiry, David Seamon observed:

"The process of group inquiry works to establish a supportive context in which people can build upon other's insights and come to moments of discovery, in which unrelated bits of information suddenly fuse together in larger significance, revealing a pattern which was unseen before" (Seamon 1979).

In this sense, the group-orientation of Delphi is of great importance, in that the inter-disciplinary and collective aspects of Delphi are seen as providing greater opportunities for insight and innovation than would a simple polling or survey process.

Additionally, the Delphi process has been variously described as an attempt to create a "collective human intelligence" (Linstone and Turoff), or a "shared reality" (Scheele 1975). To this end, the Delphi method is also meant to educate the individual participants themselves, by facilitating a level of dialogue and allowing participants to develop or refine their own personal opinions through the iterative process steps (Morrow 1971; Turoff 1975).

Numerous variations on the scale of Delphi applications exist. Although panels of as few as four and as many as several hundred individuals have been used, there is only general agreement on either minimum or optimum participant group size. Based on logistical considerations, optimum group size has been suggested at between ten and fifteen (Delbecq, Van de Ven, and Gustafson 1986; Jones and Twiss 1978), or more broadly at ten to fifty (Turoff 1975). Van de Ven's extensive studies of both nominal group and Delphi panels showed little was gained in terms of new ideas generated with a group size larger than nine (Van de Ven 1974), while Linstone and Turoff (1975) suggested a minimum acceptable group size of five participants.

More importantly, the selection of participants is considered critical to the process (Jones and Twiss 1978) with, not surprisingly, an emphasis on the advantages of a small, highly motivated group (Merigliano 1987). Additionally, participant selection must ensure both a range of participants which is as wide as possible in perspective, and a firm grounding of all participants in the pertinent subject matter (Linstone and Turoff 1975).

While the format for implementing the Delphi method may vary considerably, the basic mode of facilitation must involve some sort of "transactional mode" (Scheele 1975). In other words, a methodology must be used to structure the group communication processes, in order to obtain the most diverse and useful results (Linstone and Turoff 1975).

Most often this structuring of communication is accomplished through questionnaires; the framing of questions or "Delphi statements" (Linstone and Turoff 1975) within the questionnaires is recognized as perhaps the most difficult portion of administering a Delphi (Jones and Twiss 1978). If questionnaire wording is too concise, participants may tend to respond in too varied or lengthy a fashion; conversely, if wording is itself too lengthy, participants may find it difficult to assimilate all relevant details (Linstone 1975). It must also be kept in mind that, while questionnaires in themselves are not terribly engaging to most people, participants must remain motivated to maximize quality of their input and complete the process (Scheele 1975).

Additionally, the development of all questionnaires, following the initial one, must be carefully written to reflect the basic iterative nature of the method (Jones and Twiss 1978), while still imparting the necessary structure and guidance which will ensure a smoothly run process.

With these constraints in mind, mailed questionnaires have been the more or less standard format for Delphi applications. Aside from the logistical advantages of not having to physically assemble the participants as a group, this ensures that interactions between participants is on an anonymous basis, with the intent of keeping all expressed opinions on a more equal footing (Jones and Twiss 1978), and also minimizing "the band wagon effect" and other pitfalls of face-to-face group interaction (Dalkey et al. 1972; Linstone 1975).

Many early Delphi applications utilized participant opinion to generate predictions, or for forecasting (Jones and Twiss 1978). However, the format of a "policy Delphi," which seeks as its goal to create a forum for new ideas, rather than decision making by consensus (Turoff 1975), has also developed as a frequent application. In this context, the Delphi process is used to generate contributions in terms of statements, comments, and evaluation, rather than the traditional probability statements and predictions (Turoff 1975).

The number of questionnaires used, or "rounds" of participant input, varies among Delphi processes, with as few as two (Van de Ven 1974) or as many as five questionnaires (Brockhoff 1975). In general, it appears that the generation of new ideas, and the greatest convergence of opinion, occurs in the first five rounds (Brockhoff 1975), with little new information generally gained after the third round (Jones and Twiss 1978). On the other hand, participants limited to two rounds of input reported a low level of satisfaction with the process itself (Van de Ven 1974). The use of three or four rounds has become more or less standard for conventional Delphi processes (Linstone and Turoff 1975).

As to developing questionnaire content, of utmost importance is the sequential development of each round, based in part upon responses to the previous round or rounds. Each questionnaire should tangibly build upon the preceding one, bearing in mind that a key part of the Delphi method is the development of ideas through iteration (Jones and Twiss 1978). In effect, the group itself must dictate the content of each successive round through the sum of individual responses to preceding ones (Ludlow 1975). Likewise, it is considered paramount to focus on the process goal of stimulating new insights and "quality inputs" by steering, but not fully directing, the group process (Scheele 1975).

The case study Delphi application undertaken here, exploring the concept of naturalness in resource management, entailed a series of four questionnaires and an initial panel of forty-five volunteer participants, each of whom was recruited as an expert from various fields relevant to planning, research, or management of natural resources.

Administration of the study was carried out over an eight-month period, and involved five separate mailings: an initial contact letter, followed by four rounds of questionnaires. Participants were given three weeks to respond to each mailing, and then reminder postcards were mailed to any participant missing an assigned deadline; late responses were frequent, and preparation of a subsequent questionnaire was not possible until all responses were received, since each iterative questionnaire was based upon the results of the preceding one.

A number of books were generally consulted in developing the questionnaires: Alreck and Settle (1985); Dillman (1978); Labaw (1981); Linstone and Turoff (1975); Oppenheim (1966); and Delbecq, Van de Ven, and Gustafson (1986) all had a direct influence on the design and content of the questionnaires.

Using guidelines suggested by these authors for maximizing return rates, all questionnaires were printed on standard letter-size paper, using black ink (Alreck and Settle 1985; Oppenheim 1966). Questionnaires were printed on both sides, while cover letters and other inserts were printed single-sided and left unattached (Alreck and Settle 1985). A conscious effort was made to maximize white space and provide a visually appealing form (Alreck and Settle 1985) within the limits of the letter-sized format, and room for narrative comments or questions was included on all questionnaires (Linstone and Turoff 1975). Sample pages from two of the questionnaires are displayed in appendix A.

Although a time-consuming process, all envelopes and return envelopes were mailed using regular postage stamps instead of metering, as this has been found to be one of the more effective ways of achieving higher return rates (Alreck and Settle 1985; Oppenheim 1966). Each round of questionnaires included specific return deadlines (Alreck and Settle 1985), printed on the heading of the first page of each questionnaire. Participants who had not returned questionnaires by the indicated cut-off date were sent reminder postcards (Oppenheim 1966). Additionally, following the initial round of questionnaires, every effort was made to include handwritten notes specific to the concerns or suggestions of each participant, in part to genuinely clarify their opinions, but also to establish a level of personal rapport with each individual and thus encourage continued participation (Oppenheim 1966).

The Delphi study was initiated by compiling a list of potential participants and contacting them with a letter requesting their participation. Each person solicited for participation received a letter explaining the nature of the project and the anticipated time commitment involved in completing all four questionnaires, as well as a postage-paid reply card on which they had simply to check "yes" of "no" indicating their willingness to participate. The study design called for participants to be divided into three subgroups, and separate mailing lists were generated for each subgroup, with the hope of recruiting roughly twelve to fifteen persons for each.

For two of the three subgroups, two sets of mailings were required to recruit this level of participation. The Delphi subgroups, and the range of participant backgrounds included in each, can be summarized as follows:

The target population for this subgroup could generally be described as persons working as public land and natural resource managers. It was hoped to tap a wide range of specialists within this group, but to emphasize participation by those involved in the actual management of resources. A strong emphasis was placed on recruiting personnel from the U.S. Forest Service, due to the substantial role that agency has recently taken in developing the "Ecosystem Management" approach to resource management.

The mailing list for this subgroup was generated through several sources, chiefly lists of those who had authored relevant documents and technical agency publications focusing on management of ecosystems, or had otherwise participated in the internal on-line U.S. Forest Service networks "ECONET" and "Ecosystem Management News." Additional participants were added at the suggestion of Susan Sater, (U.S. Forest Service, Regional Office, Region-6) and Steven McConnell (Department of Forest Resources, University of Idaho). The common characteristic of participants in this subgroup was work focused on actual management of public lands and natural resources.

In addition to personnel from the U.S. Forest Service, those placed on the original list included personnel from the Environmental Protection Agency, the Natural Resource Conservation Service, The Nature Conservancy, the U.S. Fish & Wildlife Service, the National Park Service, and the Bureau of Land Management. Twenty-two persons were contacted from this mailing list, and fifteen persons initially agreed to participate, most of whom were with the U.S. Forest Service.

Those persons actually returning the first-round questionnaire from this subgroup listed the following when asked to describe their personal areas of expertise and training: forestry (3), ecology (3), botany (3), forest science, silviculture (2), forest ecology (2), range management, forest biometry, agriculture, biogeography, wilderness management, wildlife ecology, geography, literature, fire ecology, and business management.

The second subgroup can perhaps be best described as those persons actively working in the ecological and natural resource fields from a research perspective, including those involved in more experimental approaches to resource management, such as restoration ecologists. Those solicited for participation in this group were largely from academic institutions, but also included others working as researchers, consultants or contractors in resource management-related areas. An emphasis on research-oriented work was common to many participants in this subgroup.

The mailing list for this group was mainly generated through a literature review of current work in management applications of ecological science. Previous research has found that recruitment of participants from academic institutions tends to be less successful than from industry or management (Goldstein 1975), so a slightly larger list of potential participants for this subgroup was generated than for the LRM subgroup. Thirty persons were contacted from this group, and fifteen initially agreed to participate.

For those members of this subgroup that returned the first-round questionnaire, the following were listed as areas of personal expertise and training: biology (5), botany (4), horticulture (3), plant ecology (2), forest ecology (2), zoology, range management, terrestrial community ecology, fire ecology, restoration ecology.

This subgroup is perhaps more difficult than the two preceding ones to define, in large part because it was meant to serve as a kind of catch-all group which would expand the breadth of experience and perspective brought to the process by individual participants. In general, this subgroup can be typified as including persons whose work is connected to natural resources, natural places, and the

relationship between humans and the natural world, yet who would generally not fit into the two other subgroups. A common theme in the work of these participants was an emphasis on planning and design, but these fields were expansively defined, to include those with more general, although relevant, philosophical and aesthetic concerns.

As with the other two subgroups, most of those considered as potential participants for this one were authors recruited after a review of pertinent literature. A number persons placed in this subgroup were also selected from the Environmental and Architectural Phenomenology Newsletter (EAP) membership directory. As with the ESR subgroup, a slightly larger list of candidates was compiled than for the LRM subgroup; thirty persons were contacted from the list compiled for this subgroup, and fifteen agreed to participate.

Members of this subgroup returning the first-round questionnaire listed the following areas of expertise and training: philosophy (3), art (3), architecture (2), geography, cultural geography, environmental geography, biology, ecology, political science, land-use planning, humanities, cartography, creative design, interior design, phenomenological geography, behavioral geography, restoration ecology.

The fourth subgroup was not an actual subgroup of the Delphi panel, but was used separately as a control group, to help evaluate the effectiveness of the Delphi process itself. This group participated only in the fourth, final round of the Delphi process, without benefit of the previous rounds of iterative and interactive participation. This group was recruited from the full spectrum used in assembling the other three subgroups, to represent a smaller version of the total Delphi group, rather than a distinct subgroup of its own.

Fifteen people were recruited for this subgroup, with ten agreeing to participate; of those ten, eight returned completed questionnaires, a small but acceptable number since the minimum Delphi group size is considered to be between five and seven. Of

those returning questionnaires, the following areas of training or expertise were identified: biology (3), political science, forestry, hydrology, landscape architecture, wildlife biology, geography, conservation biology, environmental activism, environmental education, restoration ecology.

Additionally, table 5 summarizes participation and attrition rates for the forty-five initial participants. The "Total Delphi Group" refers to those participants in the three initially recruited subgroups. Participants in the control group are not included as part of the Delphi group proper.

	participants returning completed forms
	Questionnaire1	Questionnaire 2	Questionnaire 3	Questionnaire 4
LRM Subgroup	12	10	10	10
ESR Subgroup	12	10	8	9
PDI Subgroup	11	10	10	9
Total Delphi Group	35	30	28	28
Control Group	n/a	n/a	n/a	8
Return Rate for Mailed Questionnaires	78%	86%	93%	Delphi = 93% Control = 80%

Although some Delphi applications have used predetermined categories or facilitator-generated statements to "jump start" the process, policy-based applications have emphasized a "coarse filter" approach (Jones and Twiss 1978), meant to capture as widely as possible the perceptions of each individual participant (Turoff 1975). Accordingly, the first-round questionnaire was designed in an open-question format (Oppenheim 1966) to specifically elicit "personal knowledge" (Labaw 1981) and leave participants relatively unconstrained in their range of possible response.

This kind of open-question format has also been recommended for use in initial Delphi rounds in order to minimize introduction of bias by the Delphi facilitator (Linstone and Turoff 1975), and can be seen as a way of improving accuracy over precision (Labaw 1981), in that it generates a truer representation of each individual participant's position, yet is more difficult to summarize and analyze.

The first questionnaire was accompanied by a cover letter detailing the purpose of the Delphi study and thanking the group's members for their agreement to participate. The questionnaire itself consisted of a brief statement of the project's purpose and framed the question of how we might define "natural" conditions in ecosystems and landscapes, based on the participant's individual experiences and opinions. Each participant was asked to list up to seven "attributes or indicators" which he or she felt would help in defining "naturalness." The questionnaire also asked for some background information regarding the participant's education and professional work, in order to confirm the appropriateness of subgroup assignments.

The second questionnaire was designed to begin the iterative process, by asking each participant to evaluate the relative importance (Linstone and Turoff 1975) of each "Delphi statement," or suggested attribute, given as a response to the first questionnaire. The number of participants and the resulting volume of responses, however, made some condensation necessary. To this end, content analysis has been suggested as the logical tool to employ in evaluating and synthesizing Delphi participant input (Merigliano 1987). However, most forms of content analysis require pre-establishment of objective content categories (Holsti 1969), which could potentially undermine the use of an open-question format in the first round, as discussed above.

As a "middle-path" approach, responses from the first questionnaire were segregated by subgroup, and then analyzed using a form of "qualitative content analysis" (Berelson 1971), largely following methods described by Delbecq, Van de Ven, and Gustafson (1986). Basically, responses were sorted and resorted into groupings until categories emerged based upon patterns in the responses themselves. Obvious redundancies were eliminated by grouping, and then separate lists of suggested attributes, or "Delphi statements" were compiled for each of the Delphi subgroups.

A separate questionnaire was then developed for each of the three subgroups, thus utilizing the subgroups to refine ideas and develop preliminary assessments (Scheele 1975). Each Delphi statement was presented for evaluation, listed in random order, using the exact wording as submitted by participants themselves. Although this resulted in some awkward phrasing and statements which ranged from multiple sentences to single words, it was felt that preserving the original wording was crucial to minimize the introduction of bias.

To facilitate a level of quantitative analysis for Delphi, it has been argued that input solicited from participants should be obtained in as quantifiable a form as possible (Morrow 1971). To meet this goal, a five-part Likert scale of importance, defined on the cover sheet of the questionnaire, was displayed beside each statement, and the participants were asked to rate each statement's relative importance by circling the appropriate number.

The Likert scale was selected as a standard technique for attitude or opinion measurement (Alreck and Settle 1985), for the accuracy it provides as an interval scale (Delbecq, Van de Ven, and Gustafson 1986), and for the ease with which it can be used by participants (Scheibe et al. 1975). The questionnaires also included room for narrative comments beside each Delphi statement, and participants were encouraged to add comments or questions as they saw fit, thus allowing for a continuation of the more open format of the first questionnaire, despite the overall move toward closed questioning.

For the LRM subgroup, the second-round questionnaire was eleven pages in length, and listed sixty-seven Delphi statements for evaluation; for the ESR subgroup, the questionnaire was seven pages, listing thirty-eight statements; and for the PDI subgroup, the questionnaire was eleven pages, listing fifty-one statements. Questionnaires were accompanied by a cover letter, explaining that the listed statements were somewhat condensed, and asking if any particular ideas had been lost in the process. Also accompanying the questionnaire was a short summary of comments and questions submitted with return of the first questionnaire. Short handwritten notes were added to individual cover letters to address specific questions or concerns raised by participants.

Results of the second questionnaire were compiled, and mean ratings assigned to each evaluated Delphi statement. The ten top-rated statements from each subgroup were then incorporated into the third questionnaire. A few additional statements, beyond these top-rated thirty, were also retained, with the feeling that they represented a significant minority view which could prove important to the process (Turoff 1975). In compiling responses to the second questionnaire, it appeared that no participant felt that his or her ideas had been `lost' in the process of developing the second questionnaire (with several participants expressing, on the contrary, a perception of remaining redundancy), so no additional statements were added to meet this concern.

The third questionnaire again listed the selected Delphi statements for evaluation, this time also indicating the mean rating assigned by subgroups in the previous round. The range of individual ratings was also displayed, in part responding to the concern expressed by several participants that the group process created false impressions of agreement. These kinds of information constitute standard forms of process feedback, enabling individual participants to consider the overall group stance in relation to their own (Linstone and Turoff 1975), thus facilitating the iterative and educational functions of the process (Turoff 1975).

The third questionnaire, which was identical for all three subgroups, was eight pages in length, and listed thirty-six statements for evaluation. Again, a cover letter and a summary of comments received were mailed with the questionnaire, including responses to several content and procedural questions which had been raised. Personal handwritten notes were added to most cover letters, sometimes asking for clarification of comments or addressing specific questions, and reassuring those who had expressed frustration with their perceived "minority status" that their opinions were of continuing importance to the process.

The fourth questionnaire was in many ways similar to the third questionnaire. A further narrowing of Delphi statements was achieved by selecting only those statements with a mean group rating of 3.5 or above in the third questionnaire responses for further consideration. (A rating of "3" equating to a moderate evaluation of importance, and a rating of "5" the highest possible score). For feedback, mean ratings from both the second and third questionnaires were displayed, enabling participants to review not only the recent group rating, but also how that group opinion had shifted between iterations. Again, the participants were asked to rate each statement for its importance as a final evaluation.

Additionally, the fourth questionnaire included a series of statements written to capsulize participant comments expressed in the first three rounds. These statements were presented for group evaluation on a Likert scale, but in terms of agreement (Oppenheim 1966) instead of importance. In part this was a response to the observation of several participants that rating statements on importance often could be confused with the idea of agreement. Also, although this was a departure from the stringent on-going effort to minimize facilitator bias, it was hoped that more precise wording could isolate key concepts and clarify the basis for differing participant perspectives. Also, this introduction of possible bias was restricted to the second half of the questionnaire, as the first half maintained the original participant-generated wording of the statements.

Another departure with the fourth questionnaire was the addition of a fourth group of participants, a control group. This group was selected from a list of potential `alternates' which would have been tapped if return rates for the other subgroups had dropped below the acceptable minimum. As such, this fourth group represented persons of similar backgrounds, selected in the same manner, as the initial Delphi group. This subgroup was given the identical questionnaire as the Delphi group, but without benefit of participation in the previous iterative rounds.

The fourth questionnaire was identical for all four subgroups, and was nine pages in length, listing forty-two statements for evaluation; twenty-one statements in the first portion, carried over from the third questionnaire, and twenty-one "facilitator-generated" statements introduced for the second portion. As with the previous rounds, questionnaires were accompanied by cover letters and summaries of the narrative comments submitted with the previous questionnaire. Participants were also asked to briefly evaluate the overall process itself and invited to summarize their own final opinions in narrative form.

Although the literature offers some fairly consistent guidelines for administration of a Delphi process, analysis of results from various processes have taken a number of different approaches. For this reason, a brief discussion of available options and the reasons for settling upon particular ones is in order. For many Delphi studies, particularly those involving forecasts, analysis of participant responses have been made by focusing on the interquartile range and median scores (Helmer 1968; Morrow 1971), or percentiles and rank-order (Bradley 1977; Jillson 1975). Alternatively, some Delphi facilitators have also turned to mean scores and the distribution of raw scores for analysis (Bardecki 1984; Bradley 1977; Cherrett 1988).

Use of the interquartile range is perhaps the most common approach to analysis of responses found in the Delphi literature. Although this approach is relatively simple to compute and lends itself to graphic display very well, the interquartile range is, in a broader context, chiefly used when distributions are heavily skewed and the central tendency is of primary interest, since the interquartile focuses on the more normal midrange (Phillips 1996). Bearing this in mind, it can be understood why use of the interquartile in relation to Delphi data has been criticized as an approach which creates an artificial appearance of agreement, or "forced consensus" (Sackman 1975), since all scores outside of the midrange are ignored. Likewise, focusing on median scores can be seen as a way of implying false levels of agreement, since, unlike the mean, median scores are insensitive to extreme scores (Phillips 1996), which in effect represent dissent which may exist within the group (Sackman 1975).

In a similar way, the use of rank-order can be seen as problematic in that this approach simplifies the data derived from participant ratings; using rank-order effectively reduces interval data to ordinal data, thus limiting possibilities for statistical analysis (Downie and Heath 1970). This seems in particular to be an undesirable approach when working with responses given on Likert scales, since Likert scales by design lend themselves to more flexible statistical interpretation than would direct rank-ordering (Alreck and Settle 1985). Finally, much as the use of median or interquartile scores can be seen to imply artificial consensus, rank order and percentile scores can be seen as potentially affording false precision, as widely dispersed or clustered scores may be converted to evenly spaced intervals (Scheibe et al. 1975).

It may be further noted that, although each of the measures discussed above are widely and properly used in social science applications, they are more properly used in applications evaluating individual scores in the context of a larger group (Phillips 1996), as opposed to evaluating characteristics of a group as a whole. In contrast, Delphi is seen as a group process (Linstone and Turoff 1975), and it is analysis of results in a group context which assures the "retention of the pluralistic nature of responses" (Bardecki 1984) inherent in a truly group exercise. The present study in particular seeks to evaluate the range of existing opinion regarding the subject of naturalness. For these reasons, analysis in the present study has centered on consideration of mean scores and the distribution of raw scores, rather than on rank-order, percentile, median, or interquartile descriptions.

A number of different questions must be addressed in analysis of Delphi results. While key questions obviously will center on understanding how the group as a whole views the question of "naturalness," some questions also relate to the group process itself. The following outlines these questions and how each will be addressed through analysis.

The responses received for the first-round questionnaire were summarized using qualitative and quantitative content analysis. Each suggested attribute or indicator of naturalness, referred to as a "Delphi statement," was studied to determine its intended message; this analysis was then synthesized to develop a list of underlying themes, or "thematic content categories," which could describe the content of the entire body of statements. Thirty-one categories of thematic content were identified. Frequency of occurrence among the Delphi statements was then calculated for each thematic content category, and the most frequently occurring themes were identified. Additionally, a cross-referencing of individual Delphi statements with the thematic categories they encompassed was compiled, for use in discussion of the content of specific statements.

The most obvious direct result of participant input in this Delphi process was the mean group rating of each submitted Delphi statement on five-part Likert scales of "importance" (Alreck and Settle 1985). Even with lack of consensus, or even with a lack of general agreement, relatively high or low ratings on the scale of importance can be taken as indicative of the overall group opinion regarding specific concepts or parameters involved in defining naturalness. However, a few limitations inherent in the data must be kept in mind.

Overall ratings of importance for specific Delphi statements were calculated and analyzed in terms of mean scores, instead of median scores, rank-order, or quartiles, for the reasons previously discussed. Even analysis of these mean scores, however, must be considered cautiously, since "equal" scores do not actually indicate "equal" attitudes (Oppenheim 1966). It must additionally be noted that not all statements were evaluated for the same number of iterations; statements which received low scores were dropped in each subsequent questionnaire, so that some statements were evaluated only in questionnaire two, some in questionnaires two and three, and some in all three of the iterative rounds.

These considerations necessarily limit the appropriateness of using many statistical methods for evaluating the mean-score ratings assigned by the group as a whole; for these reason, the data will not be tested for significance of difference between calculated mean scores of individual statements. Since a five-part Likert rating scale was used for the participant evaluations, it was deemed most appropriate to discuss the mean scores in relation to those Likert scale based categories, rather than in terms of rank order. However, a more general discussion of relative high and low ranking statements serves as a portion of analysis as well, bearing in mind that these scores are valid in relative terms only.

Although a common result of the iterative Delphi process is some degree of convergence in opinion, true consensus is typically unlikely (Sackman 1975). Particularly in regard to defining the quality of naturalness, it can be assumed that, despite varying amounts of agreement, some range of distinct opinion will remain, and identifying this range of opinion is of as much importance as is identifying areas of agreement. The range of opinion was therefore a focus of analysis, and was measured in three ways: first, by examining the actual range of scores assigned by individual participants; second, by examining the standard deviation among those individual scores; and third, by examining relative levels of agreement or disagreement, through the use of a "collapsed scale" analysis, using a modification of procedures developed by Bradley (1977).

The use of participant subgroups is seen in part as a method for improving the overall process by "previewing" agreement among those with more allied viewpoints (Scheele 1975). Additionally, however, use of subgroups allows for comparison between these subgroups, to evaluate underlying differences (Jones 1975; Ludlow 1975). The three subgroups used in this study represent three distinct aspects of natural resource management: those involved in planning, research, and management. If there is a general trend for those approaching resources from these three groups to differ in their implicit understanding of concepts of naturalness, then those differences might prove to be sources of inter-disciplinary conflict or misunderstanding.

To explore these possible differences, mean ratings were compiled for each statement on each questionnaire, both as a total Delphi group score, and as separate subgroup scores. Overall trends in differences between these subgroup scores were examined and displayed graphically. These mean scores were then compared through analysis of variance (ANOVA) and Sheffé post-hoc analysis (Kranzler and Moursund 1995). This portion of data analysis also included both an overall comparison of mean scores and a comparison of mean scores separated by the subgroup origin of particular statements, as another way to detect areas of inter-disciplinary difference.

One process-oriented question which arises through use of the Delphi technique concerns the iterative nature of the questionnaires and the ensuing group dynamics. In brief, this concern can be encapsulated by the following question: does the iterative process, and the resultant intermingling of divergent opinions, cause any shift in group opinion, or does the overall mix of individual opinion generally persist throughout the process?

This question is often stated as a matter of the convergence of opinion (Brockhoff 1975; Dalkey et al. 1972), that is, the degree to which divergent views move toward a more common ground. Convergence has often been defined as narrowing of the interquartile range (Helmer 1968; Jones 1975), but can also be viewed as a narrowing of standard deviations (Bardecki 1984), an approach which fits with this study's use of mean scores.

In development of the Delphi technique itself, convergence was used to determine the number of iterations or rounds of input which could be seen as producing meaningful results, and it was widely recognized that convergence tends to occur within the first few rounds of questionnaires, with generally little change occurring after the third round (Jones and Twiss 1978). Convergence is also sometimes used as a measure of how effective a specific application of Delphi has been, in that, for certain applications, convergence or consensus building can be seen as a process goal in itself (Morrow 1971).

In a more applied sense, however, convergence can also be seen as a way of judging the effect of the Delphi process upon participant opinion, from which we may infer how effectively, and in what manner, expert opinion on the matter at hand can be altered through better interpersonal or interdisciplinary communication (Turoff 1975). It is mainly in this light that the present study examined convergence. Convergence was not seen as a process goal per se, but instead as a measure of what types of change have occurred and, by inference, what types of change in perception can generally be expected as experts grapple explicitly with the more-often implicitly defined concept of naturalness.

Three primary methods were used for this portion of analysis: first, an overall look at the kinds and magnitudes of change which occurred for statement mean ratings between questionnaire rounds; second, an examination for statistical significance of differences between mean scores, using the standard t-test; and third, comparison of the change in standard deviations of mean scores between rounds of questionnaires, following procedures developed for Delphi analysis by Bardecki (1984).

In addition, mean ratings assigned by a control group, which participated only in the fourth-round questionnaire, were also used, through comparison against mean ratings assigned the Delphi group, as a way of studying the effect which the group process may have had on ratings of importance. For this comparison, the t-test was again used, for evaluating the significance of differences between these pairs of mean scores.

Although the Delphi process attempts to be as unbiased and as quantifiable as possible, there are hazards in such an approach, chiefly in that quantifying responses in terms of group opinion risks the imposition of a "forced" consensus (Sackman 1975) and the loss of minority or dissenting individual opinions which, although they may be far from the group norm, are nevertheless equally "correct" when all offered opinions are viewed as value judgments (Morrow 1971).

This study attempted to avoid some of these problems by not making use of median scores or interquartile ranges in analysis, and by focusing as much on the range of opinion as on the group-assigned mean scores. However, analysis methods used still placed the greater emphasis on group opinion rather than on individual opinion. This is, in fact, a key aspect of Delphi, in that it is by design a group process, not a survey or individual opinion. Nevertheless, participants were also encouraged to submit narrative comments throughout the process, and the response in this regard was substantial; the second-round questionnaire alone generated over 350 substantive comments. While compilation and analysis of this volume of response was not feasible within the larger context of this study, a brief compilation of narrative comments was made. These were analyzed in a mainly qualitative fashion, and then compared with overall group process results.

Viewed in the context of group opinion, the breadth of diversity in initial participant input may be regarded as fairly moderate. In the first-round questionnaire responses, thirty-five participants submitted 156 separate Delphi statements; however, through content analysis, the message-content of these statements was reduced to thirty-one thematic content categories. This amounts to an average of less than one thematic concept, or one "new idea" per participant.

However, when regarded in the context of individual opinion, this initial participant input may be conversely regarded as having been fairly diverse; only six of the thematic content categories occurred within ten or more of the submitted Delphi statements, and the majority of content categories occurred in fewer than five statements.

In short, the majority of thematic content categories describe what were, at least initially, minority opinions, advocated in response to the first-round questionnaire by fewer than five of the thirty-five respondents.

While many of the thematic content categories identified themes which were raised by only a few individual participants, several nevertheless were found to occur with relatively great frequency. Those most frequently occurring, each arising among at least ten of the submitted Delphi statements, included:

Although some of these themes did not ultimately receive high ratings of importance from the group as a whole, they can nonetheless be taken as indicative of the concepts which initially were most widely associated with that of naturalness by the Delphi participants, and are perhaps thus indicative of relatively widely held perspectives on naturalness as well. A more detailed discussion of the content analysis results, including tables describing all thematic categories, and a cross-referencing of those categories with individual Delphi statements, is found in appendix B.

A central focus of the case study was identifying which Delphi statements, and which attributes or indicators suggested by them, were regarded with the highest of levels of overall "importance" for defining naturalness by the Delphi group participants. This was determined by calculating final mean group scores, as assigned through the process of iterative evaluation, using the five-part Likert scale of relative "importance." Those statements receiving the highest overall mean ratings were then identified and analyzed, both in their entirety and through the thematic content analysis discussed previously.

In this context, several statements, and their underlying concepts, seem to have been regarded by the participants as a whole as being of relatively great importance in conceptualizing or defining naturalness. The following discussion summarizes analysis of these statements and themes; more detailed analysis and discussion is found in appendix C, including tables listing the text of all evaluated Delphi statements, their final mean group ratings, and the thematic content categories they entailed.

Thematically, several underlying concepts were prevalent among those Delphi statements receiving high mean ratings of importance. Statements emphasizing the importance of understanding ecosystems in terms of processes (content analysis thematic category 2) accounted for ten of twenty-eight statements rating 3.5 or above on the importance scale, while statements emphasizing the absence of human interference as defining naturalness (thematic category 1) arose among eight of those statements rating 3.5 above. These were by far the most frequently occurring themes among highly rated statements.

Thematic category 3 (emphasis on ecosystem composition), category 5 (emphasis on disturbance or stochasticity in ecosystem dynamics), and category 6 (emphasis on the concepts of native or indigenous species) each occurred among four statements rated 3.5 or above.

Thematic category 7 (the idea that the influences of specific human cultures may be considered either natural or non-natural in specific areas) and category 8 (the idea that naturalness can be defined by comparison with a baseline area), each occurred in three of these most highly rated statements.

No other thematic category arose more than twice among the twenty-eight statements which received ratings of 3.5 or better, although twenty different thematic categories (out of a total of thirty-one) were encompassed by those top twenty-eight statements.

The five top-rated Delphi statements received final mean group ratings within a range of 4.3 to 4.1, with a rating of "4" defined as "Substantially Important, May Be a Determining Factor" on the five-part Likert scale. There were also five additional Delphi statements which received final mean ratings of 3.9. For ease of discussion, these ten top-rated statements are listed below; a full listing of all statements rated 3.5 or above can be found in table C-1.

"Natural landscapes" or "natural ecosystems" tend to exhibit the diversity of flora and fauna that were in existence prior to industrialization or the impact of dense human population pressures. Humans may be natural, but have tremendous potential to alter the environment. "

Although the above statements focus on the previously discussed most frequently occurring thematic categories, what is perhaps more noteworthy is that they also encompass several other themes which occurred with little frequency: the concepts of ecosystem integrity, self organization, and resilience (statement #132); the concept of human subsidy of ecosystems, or the lack thereof (statements #132 and #148); and the concept of co-evolution (statement #215).

These concepts are variously stated and restated among the top-rated statements, along with the more frequently occurring ones of: an emphasis on the role of ecosystem processes (statements #127, #216, and #160); an emphasis on ecosystem composition (statements #201, #217, #218, and #319); an emphasis on the role of native species (statements #201, #215, and #319), and an attempt to delineate some human influences as natural and others as not (statements #201 and #319). Taken as a group, these can be considered as constituting those concepts regarded by the Delphi participants as most important for defining naturalness.

Although the Delphi process focused chiefly on participant evaluation of Delphi statements which had been submitted by the participants themselves, the fourth-round questionnaire included a section in which facilitator-generated statements were also evaluated, in an attempt to clarify participant attitudes toward what seemed to be emerging as key concepts.

Twenty-one facilitator statements were evaluated by participants, using a five-part Likert scale rating levels of "agreement" with the statements: a rating of 5 meaning "strongly agree," a rating of 3 being "neutral," and a rating of 1 meaning "strongly disagree."

These twenty-one statements sought to address the following broader issues: the relationship between humans and the naturalness of ecosystems, the issue of whether naturalness may be restored or promoted through human actions, the importance of understanding ecosystems in terms of process and composition, the concept of species being native or non-native to specific systems, and the underlying apparent ambiguity encountered in attempts to define naturalness. In general, each of the issues listed above were approached through two or more facilitator-generated statements, in an attempt to isolate key aspects of the issue.

The idea that humans are at some level a "natural" part of their own environments was strongly endorsed in response to statements #420 (mean group rating of 3.8), #417 (mean rating of 4.2), #416 (mean rating of 4.0), and #407 (mean rating of 3.9), while the idea that a natural system is "pristine" in regard to any human influence was rejected (with a slightly less than neutral rating of 2.9). Moreover, the proposition that all human influences should be regarded as natural was rejected even more firmly, with a mean group rating of only 2.2. Plainly, while the importance of regarding humans as at some level being "natural" was affirmed, it is also true that the concept itself was not generally interpreted in a strict sense.

In fact, a general trend was found of participants assigning higher levels of agreement to statements which were less extreme or more qualified in their wording, while statements phrased in blanket or absolute terms received low scores. This pattern held not only in regard to defining the relationship between humans and nature, but also in regard to the issues of the restorability of naturalness, the importance of considering ecosystems in terms of process and composition, and the overall difficulties inherent in attempting to characterize systems as relatively natural or not.

Departing from this general trend were the ratings of a statement equating the concept of "native" species with naturalness, and a statement that certain levels of human technology mark the departure point at which human influence no longer constitutes a natural process. For both of these, despite overall high ratings of importance received by related Delphi statements, these facilitator-generated statements were given neutral group ratings of agreement. This would seem to indicate that, although the underlying concepts are acknowledged as important, a general ambivalence in how to apply them exists among participants.

Evaluation of facilitator-generated statements also affirmed the relatively low ratings several concepts had been given in the Delphi process evaluations. Statements linking the quality of naturalness to: concepts of ecosystem health, the use of a comparative baseline or reference area, the general history of the site or system itself, the greater qualitative context of the site or system, and the intent behind human actions influencing the site or system, all received ratings at or near the neutral point.

While the iterative character of the Delphi process is designed to progressively isolate statements which are most highly regarded by the participants, it is also worth noting certain concepts which arose in the initial participant input and received relatively low ratings, thus dropping from further consideration.

Although in many cases particular themes may have occurred in statements receiving both relatively high and low scores, some simply fared poorly in all of their occurrences. Among the thirty-one thematic categories identified for content analysis, six in particular were regarded by the group as a whole as consistently having low ratings of importance.

Thematic category 4 (naturalness as an issue of relationship or context), although having a high initial frequency of occurrence, had only 1 occurrence among statements rated 3.5 or above--and that single occurrence was in a very specific ecological context. Likewise, category 17, involving the related concept of "authenticity," although having 4 initial occurrences, did not occur at all in statements rated 3.5 or above.

Seven other thematic categories, while occurring among the Delphi participant statements, failed to occur among those statements receiving final mean group ratings of 3.5 or above. These were: category 29 (the concept of ecosystem health), category 21 (an emphasis on spatial and temporal scale), category 16 (the concept of wildness), category 11 (naturalness described in terms of aesthetic or spiritual qualities), category 30 (general comments regarding the complexity of the naturalness issue), and category 31 (specific illustrative examples of natural and/or non-natural places or systems).

Two of these categories were also the subject of facilitator-generated statements. The idea of context was raised in statement #413, which basically paraphrased statements by Ian McHarg (1969) and Edward Relph (1976) on the importance of context and "fitness," and received a nearly neutral rating of 3.2. This seems to confirm the relative lack of importance with which the group as a whole regarded the concept. Likewise, statement #409, which restated the importance of the concept of ecosystem health, received a somewhat negative score of 2.4. In both of these cases, group ratings assigned to the facilitator-generated statements seem to confirm the overall low evaluations given these concepts by the whole Delphi group.

In view of the fact that all other thematic categories raised by individual participants in response to the first-round questionnaire occurred at least once among those statements rated 3.5 or above, the low ratings assigned to these eight themes seem significant. This significance, moreover, is perhaps best framed in the context that these were concepts rated poorly by the group, but were also ideas put forth by one or more participants as a key attribute of naturalness. In this respect, they represent concepts held in low regard by the group, but also likely represent a minority view of what are to some individuals nonetheless important ideas.

Three approaches were used to evaluate the range of individual participant opinion: analysis of the range of individual ratings, analysis of standard deviation among those ratings, and analysis of overall patterns of agreement or disagreement. This portion of the analysis, discussed in detail in appendix D, may be summarized as follows:

Examining individual ratings, assigned through the evaluative portions of the Delphi process, most Delphi statements received ratings across the full range possible on the five-part Likert scale. Even among those statements rating favorably enough to be carried through to the fourth-round questionnaire, seventeen of twenty-one statements received the full range of possible ratings, and the remaining four statements received ratings over a range of four of the five possible values. In sum, individual ratings indicate that the extremes of opinion generally varied over the full range.

Likewise, examination of standard deviation shows a generally wide degree of variation, with most statements having values of s > 1.0. Even among statements receiving the highest final mean group ratings of importance, only one statement had s < .75. Given that about one-third of all statements had final mean ratings in a range between 4.3 and 3.5, standard deviations consistently greater than 1.0 are obviously indicative of variation which is both substantial and widely based.

In regard to overall trends of agreement or disagreement, when the five-part Likert scale was "collapsed," or recoded into a three-part scale, it was found that true consensus did not occur in the ratings for any Delphi statement. However, mixed patterns of relative "agreement," defined as a 70% or greater plurality of opinion (as discussed in appendix D), and disagreement did arise.

Among the second-round questionnaire results, in which participants were evaluating only those statements originating from their subgroup peers, general trends of agreement were identified for thirty-two statements, while a trend of overall disagreement was identified for eighty-six statements.

For the third-round questionnaire, in which the entire Delphi group assessed those statements from all three subgroups, nine statements had a pattern of agreement, while twenty-seven were found to have a pattern of disagreement. In the final, fourth-round evaluations, ten statements had a pattern of agreement, while eleven had a pattern of disagreement. Although this analysis of trends does seem to indicate somewhat of an increasing pattern of agreement, it should be kept in mind that only those statements receiving the highest mean group ratings were carried forward into subsequent questionnaire rounds.

In essence, the collapsed scale analysis indicates that, although consensus per se did not occur, overall patterns of agreement did emerge for a number of the Delphi statements. However, even among the twenty-one Delphi statements which received the highest mean group ratings, only ten exhibited an overall pattern of agreement. In more general terms, agreement occurred for only a small portion of those statements evaluated, with a broad level of disagreement being by far the more common pattern.

In sum, all other results from the group process must be tempered by acknowledgment that the range of opinion was wide, that variation among individual ratings was substantial, and that true consensus did not emerge for any of the Delphi statements. While both relatively high mean ratings and overall trends of agreement may be indicative of an idea or concept's importance in a general or normative sense, there was in fact no definitive level of agreement for any given aspect of naturalness evaluated through the case study project.

A substantial portion of the case study analysis involved evaluating differences between the three subgroups of Delphi participants: the Land and Resource Management (LRM) subgroup, the Ecological Science and Research (ESR) subgroup, and the Planning, Design, and Innovation (PDI) subgroup. Methods used included applications of ANOVA, Sheffé post-hoc analysis, and content analysis. These aspects of analysis are detailed in appendix E, including graphic depiction of overall trends and tables summarizing results of the statistical tests used.

To identify possible general trends of difference between subgroups, mean subgroup scores were first plotted and compared graphically. And, indeed, different patterns appeared to emerge in the way each of the three study subgroups rated the importance of the Delphi statements.

For a majority of statements in the third-round questionnaire, the LRM subgroup assigned the lowest mean ratings of any subgroup, while conversely the PDI subgroup assigned a majority of the highest ratings. The LRM subgroup assigned the lowest subgroup ratings for thirteen of the evaluated statements, and the highest for only six; in contrast, the PDI subgroup assigned the lowest subgroup ratings to only six statements, and the highest to eighteen. The ESR subgroup exhibited yet a different pattern, assigning the lowest ratings to ten statements and the highest to seven.

In the fourth-round questionnaire, the LRM subgroup again assigned the lowest mean ratings of any subgroup to thirteen statements, and the highest ratings to only two. In somewhat of a shift, the ESR subgroup assigned the lowest rating to only two statements, and the highest to eight, while the PDI subgroup assigned the lowest ratings to six statements and the highest to nine.

In sum, the LRM subgroup consistently assigned a majority of the lowest subgroup ratings, while the PDI and ESR subgroups together accounted for nearly all of the highest. These trends are illustrated in appendix E, by figures E-1 and E-2.

Looking separately at those statements originating from the input of LRM subgroup participants, the LRM subgroup assigned the highest subgroup ratings to five of thirteen statements. Thus, of the six statements in this questionnaire round which were assigned their highest subgroup rating by the LRM subgroup, five were those originating from its own ranks. Likewise, in the

fourth-round questionnaire the LRM subgroup assigned the highest ratings to three of the eight statements originating from its own subgroup. It should be noted, however, that even in evaluation of statements originating from its own ranks, the LRM subgroup assigned a lower mean subgroup rating than one or both of the other subgroups for a majority of those statements. These trends are displayed in appendix E, figures E-3 and E-4.

Turning to the third-round questionnaire assessment of statements originating from ESR subgroup members, the LRM subgroup consistently assigned relatively low ratings, with the lowest mean subgroup ratings for ten of the twelve statements evaluated. In contrast, the PDI subgroup assigned the highest ratings for seven of the twelve, and the ESR subgroup assigned the highest ratings to the remaining three.

In regard to the LRM subgroup, this trend continues into the fourth-round questionnaire, with the LRM subgroup assigning the lowest ratings for seven of nine statements carried forward, while in regard to the ESR and PDI subgroups there is a slight shift, with the ESR subgroup assigning six and the PDI subgroup assigning four of the highest ratings.

A distinct pattern thus emerges in which the LRM subgroup is consistently the most skeptical in its assessment of statements originating from the ESR subgroup, while both the ESR and PDI subgroups make relatively comparable levels of positive assessments. These trends are illustrated in appendix E, by figures E-5 and E-6.

Among those statements originating from PDI subgroup members, in the third-round questionnaire the PDI subgroup itself assigned the highest subgroup ratings for ten of eleven statements, while the LRM and ESR subgroups both assigned similarly lower ratings for the majority of statements. This trend more or less carries into the fourth-round questionnaire, although the small number of statements--four--receiving sufficiently high mean group ratings to be brought forward to this questionnaire round make a discussion of trends somewhat problematic. Trends for statements originating from the PDI subgroup are illustrated in appendix E, by figures E-6 and E-7.

Beyond noting general trends of difference between the participant subgroups, of interest is the identification of specific concepts around which these subgroup opinions might diverge. The subgroup ratings of nine Delphi statements showed statistically significant differences (through ANOVA and Sheffé post-hoc analysis), and these differences may best indicate the source of divergent perspectives held by the three subgroups. A detailed discussion of these statements and the differences between subgroup ratings is found in appendix E. In general, two specific areas of disagreement emerged as prominent in this context.

First, concepts of ecosystem self-regulation or self-organization seem regarded as highly important by both the ESR and PDI subgroups, but are regarded quite skeptically by most members of the LRM subgroup. This is perhaps the most interesting difference between subgroups, in that it may indicate a basic difference in the way in which resource managers understand the concept of natural ecosystems, or perhaps the ecosystem concept itself.

While the LRM subgroup did embrace the importance of a focus on system processes, much as did the ESR and PDI subgroups, they nevertheless appeared to do so short of actually adopting this related key element of systems theory.

Second, the subgroups appear to differ significantly in the way they approach the relationship between humans and nature. The ESR subgroup seems to place a relatively high level of importance on distinguishing natural ecosystems by their lack of human interference or control, while the LRM subgroup seems in contrast places a great degree of importance on the idea that human influence is an inherent part of all natural systems. The PDI subgroup, in contrast to the other two subgroups, embraces both of these perspectives to a fairly high degree, apparently demonstrating a certain willingness to deal with this issue as a substantive yet ambiguous point of distinction.

This second focal point of subgroup differences can perhaps be linked to the first, in that the LRM subgroup tendency to regard human influence as inherent to natural ecosystems may in part account for their tendency to discount the system theory implications of system self-organization. An interesting distinction between these areas of disagreement, however, is that the first is specific to components of ecological theory, while the second is more generally ideological or philosophical. In either case, fundamental differences appear to set the LRM and ESR subgroups apart in their approach to the issue of naturalness.

LRM subgroup members differed substantially from the ESR subgroup on ratings of Delphi statements #346, #206, #225, #220, and # 161, all showing an apparent overall skepticism on the part of the LRM subgroup toward the reality of "nature," as separate from human cultural control. Conversely, the ESR subgroup differed sharply from both the LRM and PDI subgroups on statements # 161, #141, #114, and #148, indicating an overall stance which was basically nature-endorsing.

In contrast to these distinctly nature-skeptical and nature-endorsing trends exhibited by the LRM and ESR subgroups, the PDI subgroup seemed to take a position which attempted to bridge these two extremes. For example, unlike the LRM subgroup, the PDI subgroup endorsed the idea of natural systems as being "real," to the extent of recognizing a property of self-organization (Delphi statement #132), but unlike the ESR subgroup, they also rejected sharp distinctions between naturalness and human control (statement #206), and embraced a more relativist perspective on the relationship between humans and nature (statement #161). Also indicative of this attempt to reconcile these seemingly opposed perspectives, the PDI subgroup differed from the other two subgroups in assigning high ratings to statements #319 and #320, as well as statements #325, #310, and #348, all of which focus on concepts of authenticity and context, rather than a simple criteria of the presence or absence of human impact.

As a general trend, the overall effect of the Delphi process was an apparent tendency toward convergence of opinion. This convergence is reflected in a narrowing of standard deviations for statements reevaluated in subsequent questionnaire rounds: of the thirty-six Delphi statements evaluated through at least the third-round questionnaire, twenty-one had a net decrease in standard deviation, and of those exhibiting an increase, only eight had a net increase which was > 0.1.

To the degree that participant opinion shifted toward greater agreement, it is important to note that the shift in ratings was predominantly in a trend toward lower ratings on the Likert scale evaluations. Among those thirty-six Delphi statements subject to iterative evaluation, twenty-five of had a net decrease in mean group scores, eight had a net increase, and three showed no change. This trend toward lower mean scores was also reflected in a comparison of the Delphi group's fourth-round mean ratings with those of a control group: the control group ratings were higher for thirteen of twenty-one statements, while the Delphi group's were higher for only three statements and for five statements they were the same.

However, it is also important to note that, in general, the magnitude in net change was small, with only twelve statements having a net change > 0.5, and only five statements showing a statistically significant difference in mean scores between questionnaire rounds. In sum, the overall effect of the Delphi process was a trend toward a narrowing of the variation between individual ratings, as reflected in standard deviations, although the overall range of ratings did not narrow appreciably.

This narrowing of opinion was for the most part of only a modest degree, and was chiefly a result of progressively lower mean group ratings; thus, while some convergence of opinion occurred, it was not a convergence toward greater overall endorsement of specific concepts, but rather toward a greater shared skepticism toward the majority of suggested attributes and indicators.

A more detailed discussion of this portion of the analysis, including tables summarizing changes in standard deviation and mean scores, a comparison of Delphi and control group ratings of individual statements, and t-test results, is found in appendix F.

It was originally hoped that the narrative comments of individual participants could be used directly in evaluating the breadth of opinions held; however, the second-round questionnaire alone included over 350 distinct substantive comments, ranging from short margin notations on the questionnaires, which were related to specific Delphi statements, on up to multiple-page letters detailing a participant's personal views on the overall subject of the study.

Comments submitted along with the subsequent third and fourth-round questionnaires were similar in volume, making detailed analysis of these comments simply impractical within the context of this project. Nevertheless, it seemed worthwhile to at least attempt to capture the essence of these comments, to illustrate the diversity of opinions expressed, and to identify the key issues around which comments seemed to coalesce.

In addition to these general comments, participants were also asked, at the end of the fourth-round questionnaire, to give a narrative statement which would summarize what they felt to be the most crucial issue that emerged through their participation in the Delphi process. A compilation of both these summary statements and the more general comments which accompanied questionnaire responses is found in appendix G, and only a very brief qualitative summation of what appeared to be key points is given here.

What was perhaps most striking about participant comments is the stark contrast between various statements. As has already been observed, although the Delphi process by design seeks to identify areas of greatest agreement, it seems obvious that there remained a high level of contrast between the opinions of certain specific individuals. This appears to hold true even for those themes which have been predominant among comments received.

In the specifically biological-ecological sense of defining naturalness, there does seem to be general agreement that an emphasis on process and composition is crucial--no participant directly challenged that concept. However, even within this general area of agreement, there were differing views on the implications of such a process/composition orientation.

Some placed the greatest emphasis on process alone, implying that even large-scale changes in ecological composition, or specific processes which determine composition, do not detract from naturalness if greater ecological function persists. Others seemed to argue that compositional changes or processes triggered by human management are not natural, even if they do produce effects which closely replicate otherwise inherent functions of an ecosystem.

Additionally, it was pointed out by several participants that process and composition are simply the basic descriptors of any ecosystem, as are qualities such as diversity, resilience, and so on. Thus, while there may be some level of agreement on the need to focus on process and composition, it remains unclear exactly what aspects of these attributes help to characterize a given system or place as "natural" or not.

An apparently more divisive issue centers on the idea of "native' or "indigenous" species or processes, including humans and human cultural practices. Some clearly equated "natural" with "native," while others just as clearly rejected the distinction, for example, citing non-native weed species as natural, or characterizing all human cultures as natural in any given place. This was taken to its furthest interpretation in the view that calling one group of humans "native" to North America is "racist," an issue which has arisen recently within the ecological restoration field, both in relation to humans and other animal or plant species. Yet other participants have attempted to strike some middle ground on this issue, especially in regard to the "nativeness" of humans. In sum, a number of participants found this to be a central issue, yet taken as whole, it would seem that opinion on this issue ranged from one extreme to another, and although some have attempted to establish a middle ground, no really concrete point of distinction emerged.

Linked to the issues of ecosystem components being native or indigenous is also the issue of defining the relationship between humans and the "natural" world. Many participants saw the difference between "natural" and non-natural systems or places as equating to the degree of influence of humans, yet for others it seemed an extremely important point to maintain that this distinction

should simply not be made, exemplified by statements equating any human cultural action with a natural one, or simply stating that the term natural is "meaningless." Interestingly, the view that the distinction of natural systems or processes should not be made seems to be most frequently held by those participants who work in land and resource management, and seems absent among those participants who work in ecological research.