ASSESSING METHODS FOR
EVALUATING STATE TECHNOLOGY DEVELOPMENT PROGRAMS: RECOMMENDATIONS
FOR THE GEORGIA RESEARCH ALLIANCE
Completed April 1997.
Presented at Annual Meeting, Technology Transfer Society, Denver,
CO, July 1997.
Jan Youtie, Economic Development
Institute, Georgia Institute of Technology, Atlanta, GA
30332-0640 USA, Email: jan.youtie@edi.gatech.edu; Barry Bozeman, School of Public
Policy, Georgia Institute of Technology, Atlanta, GA 30332-345
USA, Email: barry.bozeman@pubpolicy.gatech.edu; Philip Shapira School of Public Policy, Georgia
Institute of Technology Atlanta, GA 30332-0345 USA, Email:
ps25@prism.gatech.edu
Introduction
U.S. states have been increasing their
investments in technology development programs in recent years.
From 1992 to 1994, state investments in university/non-profit
centers, joint industry-university research partnerships, direct
financing grants, incubators, and near-term assistance programs
using science and technology for economic development grew by
more than 25 percent, approaching $400 million in 1994. (Coburn
and Berglund, 1995). These state investments are augmented, in
most cases, by multiple other funders including the federal
government, industry, venture capital, consortia, and private
sources.
The 1990s have also been a period in which more
attention is being paid to government program performance.
Thirty-five states have some type of performance-based budgeting
initiative, either through legislation, executive order, or
budget agency initiative. The field of technology development has
not been immune from this growing desire for performance
measurement. A recent survey of such programs found that 95
percent of states have some type of method for collecting
performance data or conducting a program evaluation. But, despite
the prevalence of some type of performance measurement or
evaluation effort among state technology development programs,
few states have well-conceived evaluation plans. For example,
activity reporting, client survey data, and informal client
contact are the most commonly used evaluation methods. (Melkers
and Cozzens, 1996). More systematic evaluation approaches are
less common. Only in part is this due to lack of funding or
interest; there are also complex issues about how best to apply
evaluation methodologies to assess the often diffuse and indirect
effects of technology promotion policies.
This paper reports on a study that examined the
appropriateness of different evaluation methods in assessing the
performance of one of Georgias major technology development
programs - the Georgia Research Alliance (GRA). The GRA is a
collaborative initiative among six research universities in the
state to use research infrastructure invested in targeted
industry areas to generate economic development results. Research
infrastructure investments in advanced telecommunications,
environmental technologies, and human genetics are administered
by three centers. A recent GRA programmatic addition funds the
university side of industry-university collaborative research
projects with significant commercial potential. GRA management
acts as a "holding company" for the program, developing
strategy and finding financial resources. In the past five years,
the state has invested approximately $126 million "in
eminent scholar endowments and equipment and facilities.
Evaluability Assessment
As an effort to further its performance-based
budgeting initiatives, the Governors Office of Planning and
Budget of the state of Georgia desired that evaluative
information about GRAs impacts be developed. However, to
what extent is this actually possible? What should be measured?
And, what actually can be measured, given the constraints of the
resources that can reasonably be allocated to evaluation? Such
questions raise the issue of the "evaluability" of
technology development programs - the degree to which the
particular characteristics of the program affect the ability to
provide effective evaluation.
The elements and factors that affect GRAs
evaluability include the following:
- Differences in stakeholder perspectives.
GRA has several distinct groups of stakeholders: program
management, university administrators, research faculty,
private sector partners, and state sponsors. Each of
these groups has somewhat different perspectives as to
what the key measures of GRA are or should be. For
example, the state, in general, aims to promote economic
development, and therefore emphasizes job-related
measures. Universities hope also to improve their
recognition (measured through discoveries, patents, and
publications), and desire facilities to support students.
Private companies usually seek market success, sales, and
profitability (not job creation, per se). Thus, although
many state technology development programs focus on
job-related measures as indicators of impacts, not all
stakeholders agree that such measures show success.
- Time-lags. Most of GRAs
investments in technological capability can be expected
to have significant results only over medium-to-long time
horizons (of perhaps 7 to 15 years). While costs and some
initial benefits are evident now, in many cases it is
still rather early to judge the full range of benefits,
spillover effects, and even the full magnitude of costs.
- Indirect links between program
intervention and desired program outcomes. GRA
"intervenes" by making additional investments
in facilities and equipment available for researchers,
including eminent scholars, at research universities. The
programs desired outcomes, with variations
according to different stakeholders, are focused on
attracting increased research funding, state economic
development and improvements in the perception of Georgia
as a location for technology-oriented companies. However,
particularly for the last two objectives, the link with
facility and equipment funding is indirect. First, making
the link between program investments and outcomes
requires information about specific GRA program
investments available through the GRA program office.
Information is also available on matching direct
investments, including those by private organizations,
and on additional grants attracted to GRA facilities.
Less information is readily available on sources and
streams of resources flowing to GRA centers and projects,
especially the flow of matching and complementary
resources. Second, making the link requires the
occurrence of a series of additional downstream steps
before changes occur. For instance, for economic
development effects to materialize, technologies have to
be transferred and commercialized, private companies
established in Georgia, production started, sales
generated, and new jobs created. While GRA staff pay
attention to these links, program funds cannot be
expended here and the program itself has, at best, rather
indirect and limited influence on the key downstream
steps. The "attribution" to GRA investments of
changes in downstream outcomes thus presents difficulties
to evaluators.
- Industry-specific effects and overall
economic and technological changes - GRAs
investments are targeted towards three main industrial
fields: advanced telecommunications, environmental
technologies, and biotechnology. The development of each
of these industrial areas in Georgia is affected by a
series of contingencies and broader forces, of which GRA
is but one. For example, regulatory changes, shifts in
federal R&D priorities, other state and local
economic development policies, business cycles, the
availability of local downstream investment capital and
entrepreneurial management skill, technological
developments elsewhere, and the growth of market demand
are among the factors which can greatly affect whether
GRAs investments yield economic development returns
to the state. Sorting out the effects of program
investment in the context of broader industry, economic,
and technological change is a complex and uncertain task.
- Difficulty of developing
counter-factual evidence and controls or benchmarks.
Ideally, program evaluation design should incorporate
elements which can consider counter-factual evidence and
arguments, i.e. what would have happened without
the investment of program funds? For some
technology-based programs, it is possible to establish
control groups which can provide a basis for
understanding what happens to those not served by the
program, attempting to hold other key variables constant
(Shapira and Youtie, 1995; Oldsman, 1997). In other
situations, alternative explanations for outcomes can be
probed through logic-driven questioning and evidence
collection designs (Cosmos, 1996; Youtie, 1997). It might
be possible to examine GRA in these ways, but not without
complications. For example, how can participants and
non-participants be reasonably distinguished and their
performance then controlled? Is the unit of participation
the six GRA research universities? In this case, there
are no other comparable institutions in Georgia. If
eminent scholars are chosen as the participants, what is
the control group? Other faculty in GRA universities?
But, is this a fair control, as eminent scholars are
pre-selected to be among the very best in their field?
Moreover, by design eminent scholars and GRA research
facilities aim to promote collaborative research teams
within specific technological fields. How can the roles
and effects of others in the team be separately
determined? It is also evident that while some program
inputs and outcomes can be measured, other effects are
hard or even impossible to gauge. Similarly, there are
likely differences in the degree of confidence which can
be expressed as to the causal associations between
program investments and particular outcomes. Thus, specific
additional research awards can be measured (in
dollars) and possibly attributed to GRA investments. But
it is likely to be much harder to separate GRAs
impact (from other factors) on any overall
increases in research funding in Georgia (although we may
again know by how many dollars research funding has
increased). Equally, while it may be possible to track
the transfer of specific new technologies developed in
GRA centers out to particular companies, the companies
themselves often find it very difficult to precisely
estimate benefits and costs, particularly if the
technology is still at an early stage of
commercialization (Roessner, 1996). Finally, such
elements as effects on overall economic development or
improvements in the "image" or "research
perception" of the state are harder both to measure
precisely and to causally associate with the program.
Recommended Methods
The set of methods available for evaluation is
considerable. Almost every methodological approach employed in
the social and behavioral sciences has, at some point, been
adapted to the purpose of evaluation. In addition, some methods
have been developed specifically for evaluation purposes. Rather
than deal comprehensively with the range of available evaluation
methods (a task more suited to text book-writing than to
evaluation design), we give more extensive treatment to those
techniques we recommend as most appropriate for evaluating GRA.
Each evaluation method has specific strengths and weaknesses,
both in general and for any specific application. In this section
some of the advantages and disadvantages of the recommended
methods are discussed with specific reference to GRA needs. In
addition, recommendations are provided for combinations of
evaluation methods, on the presumption that the weaknesses of one
method can often be offset by using another in combination.
- Case studies. Typically, case
studies appeal more to evaluation professionals and the
parties being evaluated than to decision-makers seeking
precise and objective information about program
effectiveness. The great strength of case studies is they
provide a sense of context and a richness of detail that
exceeds virtually every other approach to analysis. The
chief weakness of case studies is: (1) they are not
typically cumulative, (2) their results are not easily
communicated in a summary fashion, (3) they require a
good deal of interpretation. Recently, there have been
considerable advances in the use of case studies to
determine the impact of specific state-sponsored R&D
and technology development programs (e.g. Kingsley,
Bozeman and Coker, 1996). One well known set of case
studies has examined performance of energy research
programs in national laboratories, not only "telling
a story" but also providing good explanations as to
the causes of program success and failure (Brown, Berry
& Goel, 1991). Indeed, when applied with care, case
studies can provide a good deal of causal insight, not
just measures of effectiveness. Performed well, case
studies give an indication not only of the extent of
program success or failure but the reasons for
success or failure. There are several prerequisites to a
good, analytically-oriented case study (see Yin, 1989)
for elaboration. A case study should, ideally, have an
explanatory framework it is either testing or exploring.
It is this interest in explanation that distinguishes a
case study from a simple description of activities.
Second, the case study should have an explicit and
identifiable boundary. Finally, while the case study
necessarily requires both objective data and
interpretation, it is important to distinguish explicitly
between the two. Many programs similar to GRA find it
useful to have "success stories" that can be
presented to stakeholders and funding agents. But these
success stories are usually poorly documented. A case
study can serve the same function but, at the same time,
can give useful information about why the success was
generated and can provide a better documentation of the
success. Case studies would be particularly useful in the
GRA context because many of the program elements are
integrated and not easily sorted out in quantitative
methods. Also, some of the impacts of GRA are as much
perceptual as tangible and case studies are useful in
analysis of perceptions. Finally, case studies are
readily understandable to virtually anyone; case studies
require a good deal of expertise to perform but very
little to read and use. Case studies are an excellent
complement to objective and quantitative techniques
because the trade-offs entailed in case studies are
usually the mirror image of such quantitative techniques
as cost-benefit analysis or citation analysis. While case
studies can be quite expensive, there is little need to
use the method with any frequency. And if the need for
strong reliance on consultants is a disadvantage, the
need for consultants to work closely with program
officials in the gathering of information and insights is
an advantage.
- Peer review/External review. Long
recognized as the most legitimate approach to evaluating
scientific results (Chubin and Hackett, 1990; ), peer
review has particular applicability to GRAs eminent
scholar program. It is important to distinguish among
three applications of peer review, each potentially
applicable to GRA but with great variations in validity.
The most straightforward use of peer review (and the most
conventional) is for evaluating results of scientific
research papers. The peer review system is a cornerstone
of scientific evaluation and would be the best single
method of determining the scientific significance of GRA
eminent scholars research. A second application of
peer review- for the evaluation of research programs and
proposals- is also quite common,but is much more
problem-laden. Using peer review for such purposes can
lead to certain forms of bias and to the misuse of
scientific standards for policy analysis. However, when
used with care and with safeguards for bias, peer review
remains a helpful means of adjudicating proposals and ex
ante evaluation of research programs. The most
problematic use of peer review is in connection with the
nonscientific (usually economic development) results of
scientific and technological research and development.
The usual presumptions of peer review- that scientific
peers are the most qualified to perform evaluations of
specialists- must be set aside. Practicing scientists
have no special claims of expertise in evaluating or
projecting the applications of economic impacts of
science and technology. However, experiments using
broad-based panels with a variety of types of
"peers" have shown that it is possible to adopt
some of the usual peer review practices for broader,
economic impact evaluations of science and technology
(e.g. Bozeman and colleagues, 1991). A major advantage of
peer review in evaluating GRA is that it is a well-known,
potentially valid, and relatively inexpensive approach.
Setting up peer review panels for the evaluation of the
scientific accomplishments of eminent scholars is not
difficult. Other applications of peer review should be
used with caution and considerable professional advice.
Peer review is especially useful in conjunction with
bibliometric approaches, the other most common and valid
means of assessing the quality and scientific impact of
research. Since peer review is most often used in
connection with published work, it would not be advisable
to use this method until eminent scholars have had
several years (four or five) to develop their research
programs. One problem with peer review is that the
persons reviewed are likely to have the competence to
suggest reviewers, whereas GRA staff are not. But
"snowball sampling" can be used to develop peer
sets- first having the eminent scholars nominate peers,
and then having those nominated identify other potential
reviewers. Given the objectives of GRA, traditional
scientific peer review may be less important that a more
broad-based "external review." A somewhat
different kind of panel assessment - an "external
review" - would be useful in assessing the
management efficacy of GRA (and, indeed, has already been
performed for some GRA programs). Here, a panel of
public-sector, private-sector, and academic individuals
with expertise appropriate to research management would
assess GRAs strategic direction, operations and
methods of investment. Individuals selected to serve on
such a panel must clearly have no connection to or
beneficial interest in the program. Panel reviews of this
kind are used by agencies such as the National Science
Foundation and the National Institute of Standards and
Technology to assess the efficacy of major sponsored
programs and centers.
- Content analysis. Now used in a
variety of social sciences applications, content analysis
is perhaps an ideal means of answering the important but
always elusive question- what is the impact of GRA and
its program on perceptions and images of the Georgia
business climate? While there are several well-developed
computer models for content analysis, the recommendation
for GRA evaluation is a simple analysis of mass media
references to Georgia business. By examining the number,
direction, and favorability of such references it is
possible to chart changes in the image of the Georgia
business climate. While it is not possible to determine
the exact contribution of GRA to changes in image, such
an approach at least provides some valuable descriptive
information. Furthermore, by examining the content of
references to GRA it may be possible to make at least
some direct connections.
- Surveys. Survey research is widely
used in evaluation studies. In assessing technology-based
economic development programs, end-user surveys are
especially pervasive and often prove quite useful. In the
GRA-case, the "end-users" would be companies
that have been involved in specific GRA projects and
centers and/or commercialized GRA-supported technologies.
Surveys are often used to try to develop measures of
actual program outcomes (e.g., "how many jobs
retained?"). However, particularly for
technology-based longer-term projects, companies often
find it hard to accurately provide answers to such
questions. In the most complete evaluation designs,
control groups of non-assisted companies are established.
But this would be difficult to do for GRA, except perhaps
in the context of a much larger study of state technology
program impacts. Nonetheless, surveys can be a gauge to
assess the likely presence of certain outcomes, and can
be very useful for determining private inputs,
perceptions and global satisfaction with program
activities. One problem with surveys is that high quality
work is expensive. Estimates are that each completed
survey costs about $10-15, not including billable
hours for labor. While these figures vary according to
data gathering technique (e.g. face-to-face, phone,
mail). The more valid techniques are usually the most
expensive ones. A significant problem with most survey
research efforts, especially mailed surveys, is response
bias. Often there are systematic and relevant differences
between persons (companies) responding to a questionnaire
and those who choose not to respond. But several
techniques have been developed for documenting and
adjusting for response bias. While survey research is not
one of the techniques most valuable for GRA, we recommend
some consideration to the use of surveys to understand
the extent to which GRA research and technologies
disseminate to Georgia industry.
- Bibliometric analysis. Since about
the mid-1970s, there has been an explosion in the
development of various bibliometric techniques,
especially citation analysis. A number of studies have
shown that citation analysis can be extremely useful for
determining not only the value of research but also in
charting its course (see Irvin and Martin, 1983).
Citation techniques range from the simple- such as simply
counting citations to a researchers work- to
sophisticated studies examining citation networks and
citation communities, and providing weights and indices
pertaining to the "quality" of citations,
citation "decay curves" and so forth (e.g. Rip,
1988). In almost every case citation analysis is a good
starting point for assessing the scientific impacts of
research- research that is not cited can rarely be
demonstrated to have been of much scientific significance
(though the affirmative claim is more difficult to
document than the negative one). It is also possible to
use reference databases to examine indicators, such GRA
external faculty research awards (for example, from the
National Science Foundation) and patent applications and
patent issues associated with GRA projects and companies.
Data drawn from bibliometric and other reference
databases can be combined to ascertain relationships of
GRA technology investments to current or promising
application domains; identify how GRA researchers rank
among other researchers in selected technological fields
in terms of publications, citations, patents, and other
measures; and compare research activity in Georgia with
other states (see, for example, Watts, Innovation
Forecasting, 1996). We recommend that simple citation
analysis is a good beginning point for assessing
scientific accomplishments, not only of GRA eminent
scholars, but also for their research groups. However,
there is a lag between the time research is published and
then cited and the lag times vary across fields. Such
issues of interpretation mean that citation analysis
requires some professional expertise. Simple citation
analysis is quite inexpensive since it may involve little
more than (closely supervised) graduate students
compiling citations from the Science Citation Index.
More sophisticated techniques using citation and other
reference databases should also be explored, although
these will be more costly.
- CBA/ROI. Cost-benefit analysis
(CBA) and related return-on-investment (ROI) techniques
seek to provide a framework within which a range of
sometimes diverse benefits and costs can be arrayed and
aggregated to provide estimates of net benefits and
paybacks. The attractiveness of these techniques is that
they provide a "number." Policy-makers are
accustomed, especially in the field of economic
development, to findings such as "five dollars of
economic benefit resulted from each dollar invested in
the program." The chief problem with CBA is that it
is highly sensitive to the particular assumptions of the
model being used and simple changes in the model can lead
to drastically different results. Some factors that
differ from one analysis to the next include: the measure
of the opportunity costs for investment (discount rates),
the degree to which and ways in which overhead
investments and equipment costs are internalized, and,
particularly, calculation of multiplier effects (which
can take on the characteristics of "numerical
fiction"). None of this means that cost benefit
analysis is invalid, only that it is of little use
without in-depth understanding of the assumptions upon
which the analysis is based. A danger of cost benefit
analysis, especially for the unsophisticated user, is
that it appears extremely precise and
"scientific," but, in fact, is subject to a
number of judgments in the adoption of assumptions. If
used with caution, cost-benefit analysis can be useful
for a wide range of GRA programs and, particularly, can
provide a convenient index of the economic impacts of GRA
programs. One inexorable problem is that benefits and
costs for GRA occur in different streams: investments are
being made now for benefits that may take many years to
fully materialize. If models do not properly address lag
effects and time horizons, the programs effects can
easily be under- or over-valued.
- Benchmarking. Currently a
fashionable method of assessing impacts of
technology-based economic development programs,
benchmarking can be quite useful if its two core
requirements are met: (1) identifying the appropriate
benchmark programs and (2) developing valid benchmark
measures. Neither of these requirements is easily met. In
one sense, all programs are unique. The programs
elements, geographic setting, and implementation
structures are never identical. Thus, the question
becomes "how similar is similar enough?" In
examining programs from other states, there are literally
hundreds with some significant similarities to GRA but
only a handful are really comparable. Indeed, one
prerequisite for benchmarking of GRA programs would be
investigative work beyond that undertaken for this
project. But it does appear that a few adequate
benchmarks can be identified. In short, while we include
benchmarking as a "recommended method," it is
with the assumption that sufficiently comparable programs
can be identified. Given the rate at which benchmarking
is being adopted as an evaluative technique (e.g.
Tornatzky, et al., 1995), this seems a likely
eventuality. Despite the generation of numbers in
benchmarking analysis, it is important to underscore the
strong interpretive element involved in the method.
Benchmarking is actually closer to case studies in its
methodological character than it is to more objective
approaches (the measures generated can easily disguise
this fact).
- Input-output analysis. Input-output
models estimate the spending patterns that a company or
industry produces. They are often used to predict the
additional household income, business revenues, state tax
revenues, local tax revenues, and employment associated
with program-induced expenditures and business investment
decisions. As Input-output models have relatively few
additional data requirements (once a model has been
built); they can provide results quickly; they are
flexible in terms of their ability to analyze specific
industries (at the four-digit SIC level); and, outputs
are easy to understand and communicate. Input-output
models also have several disadvantages. Input-output
models cannot address issues such as attribution (whether
the additional economic activity is or is not causally
related to GRA investments), strategic advantage (such as
the strategic value of clusters of high tech companies in
certain industries), business cycles (such as the impact
of recessions on spending and respending), and changes in
technology. Further, the technology matrix which
contributes to input-output models ability to link
purchasers and suppliers was developed in 1987 (although
a 1992 version is expected) which decreases the validity
of results for highly technology-driven, dynamic
industries such as the GRA industries. The results of
these models do not include an analysis of the impacts of
the additional economic activity on the costs providing
state or local services, which increases the risk that
interpretations will overstate impacts. Likewise, results
are highly influenced by payroll and salary levels: the
higher the wages, the greater the estimates of additional
economic activity in these models (Riall, 1991).
- Systems and flow analysis. The term
systems analysis has come to mean many different things
and there is no generally agreed-upon use of the term.
The approach we feel is most useful for GRA, because it
is easily implemented and has value for management, is
"flow analysis." What we mean by this is quite
simple: charting the intended (and actual) course of GRA
program activities and expected consequences. This is
inevitably a useful exercise even in cases where it
provides little direct insight into outcomes and
causality. By having a chart of "how things are
supposed to work" it is easier to understand how
expectations and reality diverge. Unlike other methods
which require considerable professional expertise,
charting the flow of activities is actually an activity
better performed by program staff because, in the first
place, they learn by doing and, in the second place, they
have first-hand knowledge of the program elements and
intended consequences. While this is a useful management
analysis tool and is helpful for formative evaluation, it
is not much help in measuring impacts. Despite its
limits, it is strongly recommended as an approach that is
useful, inexpensive, and has immediate management
applications.
- Performance indicator systems. This
term refers, quite simply, to the development of a set of
critical indicators that can be used for program
monitoring. There is little analysis beyond the
development of indicators but they should be revealing of
progress if not causality.
- Diffusion and Network Studies. There
is a long history of diffusion studies in technology
policy and, particularly recently, powerful techniques of
network analysis are become available as a result of
developments in various social sciences outposts. But the
approach recommended here is elementary. By charting
networks of users and providers it is possible to provide
a model of GRA programs reach. This can be
accomplished very easily by simply interviewing
individuals as to persons encountered and sources of
information. Elementary network analysis and charting of
diffusion requires very little professional expertise and
can be performed relatively inexpensively.
Recommended Evaluation Approaches
We recommend two different evaluation regimes
for assessing GRA and its activities. We refer to these as
"routine evaluation" and "comprehensive
evaluation." Routine evaluation implies the investment of
modest resources and does not require expertise of external
consultants. Comprehensive evaluation is more thorough-going but
requires greater resources and the use of external evaluation
consultants. We recommend that each be pursued, but at different
intervals. Routine evaluation should be performed annually or
bi-annually; comprehensive evaluation should be performed on a
three to four year cycle.
"Routine"
Evaluation
Typically, valid evaluation requires
considerable technical expertise and commitment of substantial
resources. But often it is possible to engage in useful
evaluation activity even when evaluation is performed on a modest
budget and by persons who are not highly trained evaluators. In
the GRA context, two types of useful evaluation activities can be
performed at very little expense and without the need for great
evaluation expertise. We recommend GRA be evaluated every year or
two on the basis of (1) performance indicators; (2) flow
analysis.
- Performance indicators. Performance
indicators provide base-line data invaluable for
management and evaluation. Not only will performance
indicators (or the type provided in Table 3) help
managers and other stakeholders track and monitor
progress, but these same performance indicators will
prove useful as part of more comprehensive evaluations
employing rigorous analysis of data. While the list of
performance indicators given in Table 5 is not complete
or final, it could provide a useful starting point for
discussion. The set of performance indicators adopted
should have widespread support among stakeholders and
should be amenable to routine collection. Ideally,
agreement should be sought on a manageable number of
priority performance indicators.
- Flow analysis. As a management
tool, GRA researchers and managers may wish to adopt some
form of flow analysis or logic models, models requiring
the specification of means and ends and showing paths
from particular activities to particular outcomes. A wide
variety of techniques are available for such analysis
(see Vaupel and Behn, 1986 for an overview). If there is
an intention that research or program outputs diffuse to
a user community, it is generally a good idea to have in
mind specific targets and expected paths between the
production of output and the dissemination to targeted
users. It is all too often assumed that good, technically
viable work will, as a matter of course, be used.
Evidence shows that this is sometimes not the case. The
use of any evaluation or management decision tool exerts
some costs. But the amount of direct outlays required to
adopt these three approaches to routine evaluation should
prove minimal.
"Comprehensive"
Evaluation
OPB and GRA may wish to consider setting aside
a percentage of program money to devote to evaluation. This is a
common practice and has led to the production of high quality,
highly usable evaluations. Two familiar examples are the
resources invested by NIST and state manufacturing assistance
programs and for the Department of Energys Energy Related
Inventions Program (Brown, Curlee and Elliott, 1995). With a sum
set aside, resources would be available every three to four years
for a comprehensive evaluation. While each of the recommended
methods presented in Table 2 is worthy of consideration, we feel
that a good balance is provided by using (in addition to the
methods employed in the routine evaluation): (1) a survey-based cost
benefit analysis; (2) case studies; (3) content
analysis. (4) external and peer review. By using these
approaches in combination for a comprehensive evaluation, one
could ensure with case studies an in-depth portrait of
program activities with attention to the details that contribute
to success; provide for objective monetary-based impacts by using
cost benefit analysis, and give some insight into the
crucial issue of changes in perceptions (e.g. business climate)
by using content analysis. To properly conduct a
cost-benefit analysis, it would be necessary to conduct
interviews and surveys with firms to define, identify, and
attribute appropriate treatments for costs and benefits. A full
set of public benefits and costs would also need to be
identified.
Peer review remains the best approach to
providing valid, credible evaluation of scientific research. We
recommend that each major GRA program be submitted to peer review
four to five years. The primary use of peer review should be
evaluation of the quality of the scientific work and the peer
panels should be comprised of scientific experts with intimate
knowledge of the scientific fields addressed by GRA researchers.
Peer review is much less useful in determining the economic
potential of scientific work. Thus, external reviews
should accompany peer reviews. Panels of industry advisors are
more appropriate for assessing the economic utility of GRA work.
The periodic use of peer review and external review will
provide an assessment of scientific quality and the relevance of
quality scientific and technical outcomes to the long-term
economic development goals of GRA.
The balance provided by this combination of
methods is further illustrated by considering the extent to which
the use of these methods addresses the issues identified in the
interviews of GRA stakeholders.
Relevance to Evaluability Assessment
In this section, we re-visit some of the issues
raised in the evaluability assessment. The section examines the
evaluation design recommendations in light of the ease of and
ability to conduct a measured assessment of impacts.
- Differences in stakeholder
perspectives. The need for a range of indicators to
address differences in stakeholder perspectives about
what constitutes success is accommodated by the adoption
a the set of performance indicators. Since there
is no consensus about which particular indicators to use,
stakeholders should be involved in assessing indicators
before adoption. At the same time we recommend that the
evaluation be as unobtrusive as possible and mindful of
other demands on program participants time. We
recommend that comprehensive evaluations be performed
with minimal data gathering responsibilities on program
participants. With respect to the routine evaluations, we
recommend that program participants be actively involved
in the development of performance indicators, but that,
in the interest of efficiency, those indicators serve
multiple purposes (i.e., budgeting, planning, evaluation)
and, once established, be changed only for good cause.
While peer review can require a good deal of time, the
time will chiefly be contributed time on the part of the
external scientific community.
- Time-lags. Many GRA program
effects will be realized only in the long term, 5-20
years. The routine evaluation will not be adequate to
capture long-term effects, but the establishment of a
baseline set of performance indicators, measured
yearly, will provide the ability to, in the long run,
measure such change. Also, assumptions and approaches
used in cost benefit analysis incorporate
approaches to dealing with time lags and streams of
benefit. Nevertheless, the length of time required for
benefit will always be an evaluation constraint and
should be given considerable attention.
- Indirect links between program
intervention and desired program outcomes
("attribution"). GRA program outputs
occur within a broad environment and many other factors
in that environment can be expected to have much greater
impacts on such factors as creation of new business.
Thus, changes in the interest rate, availability of
capital and labor, and other such factors should have a
much greater impact than GRA. While the attribution of
outcomes is always difficult, case studies are
particularly useful for understanding chains of
causality. It is widely recognized that changes in
perceptions (e.g., state image, business climate) are
among the more important contributions of
technology-based economic programs, but this vital aspect
is rarely evaluated. While the case study approach can be
quite useful for analysis of perceptual change (for
example, by determining reasons for business start-up or
relocation in case studies performed), the use of content
analysis for these purposes could be an important
evaluation innovation. Carefully implemented content
analysis can reveal differences that occur over time
in media perceptions. While such changes probably occur
relatively slowly, the four-five year evaluation period
suggested for the comprehensive evaluation should provide
sufficient time for measurable change.
- Industry-specific effects and overall
economic and technological changes. While jobs
creation and retention data are the most familiar and
easily understood measures of economic change, we caution
against the use of job creation data as the sole
indicator of economic change, except perhaps as a minor
component of an indicators system. If there is an
interest in using jobs as an indicator, then evaluation
should focus on audited jobs rather than
unsubstantiated self-reports.
- Difficulty of developing
counter-factual evidence and controls or benchmarks.
Many stakeholders mentioned benchmarking as a means
of determining GRA success or as a means of developing
counter-factual evidence. While comparative assessment is
beneficial, we do not at this time recommend a formal
benchmarking process. In the first place, we are not able
to identify programs that are entirely satisfactory
benchmarks for GRA. In the second place, the primary
objectives of benchmarking can be achieved with the
methods we have recommended. Case studies can be
used for in-depth (but non-statistical) comparisons
between GRA and comparable programs. The set of performance
indicators can be used as a de facto bench mark,
assuming the availability of comparable information from
other programs.
Evaluation Expertise Requirements
We do not feel that a highly expert evaluation
team is needed for the routine evaluations. It may be useful to
employ a professional evaluation team for the one-shot review and
establishment of performance indicators.
The comprehensive evaluation should not be
undertaken unless professional evaluation research personnel are
employed. The skills required for more comprehensive evaluation
reside in a number of institutional contexts including consulting
firms, universities, and professional associations. Sometimes
government agencies (more common in federal than state
government) have their own highly expert evaluation units. There
are familiar trade-offs involved in the choice of one or another
source of evaluation expertise (see Bozeman, 1979) including, for
example, contextual knowledge vs. perceived disinterestedness,
rigidity vs. over-eagerness to please, and expertise vs.
availability. But regardless of the institutional provider and
particular actors chosen for the evaluation, the evaluators
should have experience in a wide variety of methods. Too often
evaluators have an evaluation "hammer" used on every
policy "nail." The mix of techniques and methods
recommended here suggests a "one-tool" evaluation team
is inappropriate. Similarly, the evaluators should have
considerable knowledge of technology-based economic development
programs. The complexities of GRA are such that evaluators not
well-versed in such programs are much less likely to provide
valid information.
References
Bozeman, B. (1979). Public Management and
Policy Analysis. New York: St. Martins Press.
Bozeman, B. (1993). "Peer Review and the
Evaluation of R&D Impacts," in B. Bozeman and J.
Melkers, (eds.), Evaluating R&D Impacts, New York:
Kluwer Publishing, pp. 36-49.
Bozeman, B. and D. Coursey (1992).
"Benefits and Problems in Technology Transfer: A National
Survey of U.S. University and Government Laboratories," IEEE
Transactions in Engineering Management, 132-141.
Brown, M.A., T.R. Curlee, and S.R. Elliott.
(1995). Evaluating Technology Innovation Programs: The Use of
Comparison Groups to Indentify Impacts. Research Policy
24(5):669-685.
Brown, M.A., L.G. Berry and R. Goel (1991).
"Guidelines for Successfully Transferring
Government-Sponsored Innovations," Research Policy,
20, 121-143.
Chubin, D. and E. Hackett (1990). Peerless
Science: Peer Review and U.S. Science Policy. Albany, NY:
State University Press.
Coburn, C. and Berglund, D. (1995). Partnerships:
A Comprendium of State and Federal Cooperative Technology
Programs. Columbus, OH: Batelle Memorial Institute.
Cook, T. and D. Campbell (1979). Quasi-experimentation.
Boston: Houghton Mifflin.
Cosmos Corporation (1996). A Day in the Life
of the Manufacturing Partnerships: Case Studies of Exemplary
Engagements with Clients by MEP Centers. Gaithersburg, MD:
National Institute of Standards and Technologies.
Dunn, W. (1994) Public Policy Analysis,
second edition. Englewood Cliffs, NJ: Prentice-Hall.
Irvine, J., and B.R. Martin (1983) Assessing
Basic Research: Some Partial Indicators of Scientific Progress in
Radio Astronomy. Research Policy 12(2):61-90.
Melkers, J. and S. Cozzens (1996) Performance
Measurement in State-Science and Technology Programs. Paper
prepared for the Annual Meeting of the American Evaluation
Association, Atlanta, Georgia.
Oldsman, E. (1997) "The Impact of the New
York Manufacturing Extension Program: A Quasi-Experiment,"
In Shapira, P. and J. Youtie (Eds.) Manufacturing
Modernization: Learning from Evaluation Practices and Results,
Atlanta, Georgia: Georgia Tech Research Corporation.
Riall, B. William (1991). "Local Economic
Impact: Costs and Benefits of Development." Atlanta,
Georgia: Georgia Tech Research Corporation.
Rip, A. (1988) "Mapping of Science,"
in A. Van Raan (ed.) Handbook of Quantitative Studies of
Science and Technology (New York: North Holland).
Roessner, J. D., Y. Lee, P. Shapira, and B.
Bozeman (1996) "Evaluation of Iowa State Universitys
Center for Advanced Technology Development." Atlanta,
Georgia: Georgia Tech Research Corporation.
Shapira, P. and J. Youtie (1995). "Georgia
Manufacturing Extension Alliance: Overview of the Evaluation
Plan," In Shapira, P. and J. Youtie, Evaluating Industrial
Modernization. Atlanta, GA: Georgia Tech Research Corporation.
Stokey, E. and R. Zeckhauser (1978). A
Primer for Policy Analysis. New York: W.W. Norton and
Company.
Tornatzky, L., P. Waugaman, L. Casson, S.
Crowell, C. Spahr, and F. Wong (1995). Benchmarking Best
Practices for University-Industry Technology Transfer,
Southern Technology Council.
Weimer, D. and A. Vining (1989). Policy
Analysis. Englewood Cliffs, NJ: Prentice Hall.
Yin, R. (1989) Case Study Research.
Newbury Park: Sage Publications.
Youtie, J. (1997) "Toward a Cross-Case
Analysis of Outcomes of Exemplary Engagements by MEP
Centers." In Shapira, P. and J. Youtie (Eds.) Manufacturing
Modernization: Learning from Evaluation Practices and Results,
Atlanta, Georgia: Georgia Tech Research Corporation.