What is Distance Learning?
- Introduction to Distance Learning
- The History of Distance Learning
- California Considerations (with particular interest to California adult schools)
- Distance Learning Design
- Planning and Administration
- Distance Learning Evaluation
- Distance Learning Online
Evaluating Distance Learning
Generally, evaluation is used to determine the degree to which program objectives are met through the procedures used by the program. The evaluation determines whether or not the outcomes or results predicted by the program occurred and if their occurrence was due to the project.
Unfortunately program evaluation is often viewed as an isolated activity that administratively functions apart and separate from the actual project or program. It should be part of the overall administrative process with the purpose of answering pragmatic questions of decision makers who want to know whether to continue a program, extend it to other sites, or to modify it. If the program is found to be only partly effective in achieving its goals, the evaluation research is expected to identify the aspects that have been unsuccessful and recommend the kinds of changes that are needed.
It is essential that evaluation and feedback be part of all distance learning programs. In most instances the evaluation will include learner and program performance information. The learner performance will be based on standardized and curriculum mastery measures.
Many adult education programs are remiss in utilizing even informal program evaluation techniques to monitor and modify their programs. While most administrators intuitively know how their programs are working, they lack the systematic data to verify their instincts.
This section is quite detailed and may provide more detail on evaluation than is generally needed. The annual Innovation Program reports cited in the next section provide a good model of how to use data in describing and defining overall program performance and outcomes.
How Effective Is Distance Learning?
The Distance Education Clearinghouse provides a listing of distance learning evaluation studies and topics. Possibly the best known effort is the work of Thomas L. Russell, who has examined research studies going back to 1928. This research shows that the is "no significant difference" between distance and classroom instruction.
Descriptive statewide California data on ESL, ABE, and adult secondary education / GED learners participating in adult school distance learning programs are quite positive. These annual reviews are reported at the California Distance Learning Project Web site (Innovation Program Reports). While the pre-post testing data are not representative, they show that the Innovation Programs for the most part perform better than historical norms. The 2003 – 2005 report concludes that "...When comparing classroom data with the Innovation Programs, it is clear that the distance learning programs are particularly successful in providing ESL learning opportunities. Local research data on student persistence and retention support these findings.
The Innovation Programs meet the three crucial benefit–cost criteria necessary to be accepted by adult education providers and the California Department of Education. These programs are effective, efficient, and equitable. This is the fifth year that these summary conclusions have been supported. They indicate the continued success of the Innovation Program initiative."
The data used in the reports provide a general guide for describing program participation and outcomes.
Evaluations are conducted in two stages:
Formative the formative evaluation is conducted during the project implementation. The purpose is to determine the level and efficiency of the project activities and to identify problems that need correction. The formative evaluation informs the project administrators of successes and problems in the project to date so that corrections can be made. It contributes to the improvement of the project. Methods include data collection, documentation, site visits, interviews, focus groups, program viewing, student and teacher observation, as well as other methods that may be developed based upon the project.
Summative the summative evaluation is performed at the end of the project and refers to the impact of the project on students, staff or elements of the program addressed in the project’s objectives. The purpose is to assess the overall success and impact of the project, measuring learner achievement, and how well the project objectives were met. Summative approaches are concerned primarily with measuring a project’s predicted outcomes in an effort to determine whether or not the program or project intervention produced an independent effect or impact on the predicted outcomes. The evaluation report guides project decisions at the end of the project to modify, expand, replicate or discontinue the program, as well as to inform others who may conduct similar programs.
Summative evaluation most commonly uses quantitative product, or outcome indicators, and data sources, such as performance, knowledge assessment instruments, portfolio assessment, or structured interviews.
Formative and summative evaluations are not considered separately. The formative evaluation contributes to and forms the summative evaluation. There are basically two ways to conduct summative evaluations; criterion-based studies and comparison studies.
The criterion-based evaluation design determines how well the project met the predicted objectives. The objectives specified in the project proposal are used as the standard to determine effectiveness. Usually, this would be the performance of the participants which indicates impact (improved test scores, ability to demonstrate a new skill, etc.). The conditions of performance and the level of proficiency are also noted and measured. Now you can develop instruments that will measure performance against the standard, or criterion, established from the project's goals and objectives.
Comparison studies determine if one program is more effective than another. One could compare a regular program with a pilot program to contrast the outcomes of the regular program with the pilot (experimental) program. If the experimental program produces the desired effect, this design will show that the project rather than another variable produced the outcome. Comparison, or norm groups, are students who are matched to the target group (by random selection or on some predetermined list of attributes and characteristics), and are pre-tested on the same measures, but are excluded, from the intervention activities.
Comparison groups that have been matched are called control groups because theoretically they control for other variables that might account for differences in performance between the groups. Evaluations using comparison groups are usually considered more valuable in determining whether or not the project would be successful for adoption or adaptation by others.
A time-sample design would provide for continuous and periodic collection of student work over time. Analysis of performance on this work looks for trends or patterns of students change that could be inferred to result from the project intervention.
Authentic or Alternative Assessment: One trend in education is to use assessment procedures that do not rely on standardized tests. The advantage is that alternative assessment can provide a more authentic description of the area being measured. These tend to be more "qualitative" and include "portfolio assessment" or the collection of work samples that can be analyzed against a set of predetermined criteria. Educational technology projects provide an opportunity to develop and use alternative assessment techniques.
The evaluation design that will usually provide the most information is the pre-post comparison group design. The post-test only design does not control for preexisting conditions, or variations, in performance or knowledge. The control group design produces credible results that in the past have convinced skeptics of the worth of many programs. Unfortunately, the use of control groups is often impossible for technology based projects.
Qualitative approaches exist that may work which do not require control groups. Two of these models include the following:
Most projects use the criterion-based design for groups or individuals, or the pre-post test design. This design lacks the control that separates extraneous variables making it difficult to attribute the desired or predicted outcome to the intervention. However, the criterion-based design does make it possible to assess the degree to which the predicted outcomes were attained. In a pre-post test design, the test norms, in effect, become the criterion. This design is much more useful if there is some external standard available (such as national or state norm scores) to be used as the standard for comparison.
Summative evaluations tend to use quantitative measures such as standardized or criterion-referenced tests. Qualitative techniques can be used as well, such as portfolio assessment.
The strengths of quantitative and experimental designs are that:
- When appropriate, this model minimizes evaluator bias by defining data collection and analysis procedures simply and concretely; and
- procedures lend themselves well to replication and cross-comparison with other program locations.
Weaknesses of quantitative and experimental designs are that:
- It is easy to get misled in data analysis and falsely assume that the program is causing the outcomes, when in fact, those outcomes are really being caused by unidentified intervening factors and that the results can be generalized to a population when in fact the students studied, surveyed, or tested do not represent a random sample of that populations; and
- the evaluation design is so structured and rigid that valuable, but unanticipated outcomes may be missed because those outcomes have not been expressed as variables.
Comparison/control group designs are used to determine the specific effects of the intervention outcomes. These approaches are limited in terms of the breadth information they provide, and they are difficult to implement outside the laboratory environment. Even though experimental design and control groups traditionally have been advocated in evaluation studies, qualitative methods have been given increasing attention in recent years. Qualitative designs seek to describe and explain the program within the larger context of the educational setting.
Rather than entering the study with a pre-specified classification system for measuring program outcomes, the evaluator tries to understand the program and its outcomes from a more qualitative or participant perspective. The emphasis is on detailed description and in-depth understanding as it emerges from direct contact and experience with the program and its participants. Using more ethnographic methods of gathering data, qualitative techniques rely on observations, interviews, case studies, and other means of fieldwork.
There are a number of reasons to use qualitative design:
- The program emphasizes individual outcomes.
- There is an interest in the dynamics of program processes and program implementation.
- Program staff wants detailed descriptive information to assist in program improvement.
- Unobtrusive observation is needed.
- Unanticipated outcomes or unexpected side effects are a concern.
- There is a need to add depth, detail and meaning to empirical findings.
The greatest possible dangers to qualitative evaluation are an inexperienced evaluator and the loss of objectivity. If an evaluator is a participant in the project, or has a stake in Its outcomes, there is a threat to objective observation. This can be offset by using a recognized expert in the subject, who is also independent of the project's success or failure, perform the evaluation.
Strengths of Qualitative Measures
- These models and their variations employ the quality of divergent responses to performance assessment of higher order and critical thinking skills.
- By being relatively untied to pre-established objectives, criteria, and outcomes measures, these methods lend themselves well to the detection and interpretation of anticipated factors and results that may shed new light on program strengths and weaknesses.
- Provide good documentation on lessons learned for others to use who are contemplating developing or installing similar programs.
Weaknesses of Qualitative Measures
- Results are not easily compared or aggregated with other studies.
- There is no standard means to control for evaluator bias or lack of suitability for the evaluation task.
- Replicating the evaluation at other sites is highly problematic, given the reliance on evaluator subjectivity at the expense of standard fixed evaluation criteria.
- Lack of systematic random sampling and statistically analyzable data makes generalization difficult.
- Validity is greatly dependent on evaluator expertise and independence.
- Measuring quality can be labor intensive and therefore expensive.
For the purposes of evaluating educational technology projects, combination of both approaches is desirable.
Evaluation is the act of making judgments about a program’s worth. Evaluation is different from research although both may use the same methods. Learning outcomes and learner progress are very important to all useful evaluation strategies.
Research design isolates the variables being studied. This is best accomplished in a laboratory where the researcher can exercise control over the conditions of the experiment.
In conducting field research, the control of possible confounding variables occurs by establishing identical conditions between two groups, and then randomly assigning students to participate in either the experimental group or the control group. The only difference in the group’s experience is the educational technology application being studied. Extraordinary efforts are made to prevent any other differences from occurring that might contaminate the results. The measurements of the variables being studied are precise and specific. Mathematical procedures are applied to test the results statistically to determine if the differences noted between the two groups could have occurred by chance alone. The process is designed to enable the researchers to accept or reject their hypothesis concerning the effects of educational technology on the learning outcomes.
The certainty of the relationship between the cause and effect of results is usually expressed as a percentage of probability. This is often described as the significance level. For example, "The results were determined to be significant at the .05 level." This means the likelihood that the differences between the two groups occurred by chance alone is only five percent.
Evaluation, on the other hand, is very practical. Its purpose is to help people make decisions about a distance learning intervention. These decisions may involve whether to continue or end the program, change it to improve its application or whether to expand the technology to other classrooms or other disciplines. The use of the evaluation results are not theoretical, they are practical and specific to a particular program.
Because of this, the evaluation approach is much different from research. While research tries to reduce the number of variables being studied, evaluation examines as many factors as possible. The idea is to describe as much as possible all the things that could have affected the program.
Many people have the mistaken impression that all evaluation is done after a project is completed. Actually, formative evaluation will improve a project during development and implementation phases. Formative evaluation provides feedback during the program development and implementation. The progress the learner is making can be monitored to see what works and what doesn't, allowing the administrators to fine-tune the project with midcourse corrections. Formative evaluation involves data collection, analysis and documentation from initiation of a project to its completion.
Summative evaluation determines the overall effectiveness of a project. The data from the formative evaluation is helpful in analyzing the final results and making recommendations. The evaluation approach must be designed to fit the technology project.
The literature clearly suggests that distance education is in an expansion phase with new institutions joining the ranks of those who are currently offering telecourses and other distance learning programs through a variety of media. The role of the Internet and world wide web and hybrid models being used in the California Innovation Programs are not yet covered substantively in the research literature. However, statewide descriptive data are positive.
The term "performance measurement" refers to the regular, ongoing, measurement and reporting on important performance aspects of programs. The primary focus is to track the outcomes (or results) of programs. For example, feedback on problems found during a formative evaluation could lead to program efficiencies by learning from the experiences that others had early in a project. It can also identify success stories worthy of local or national coverage.
The outcome information derived from a performance measurement process has several more specific uses including the following:
- To help identify where problems exist and where action is needed to improve program outcomes.
- To help focus programs on the mission of achieving results.
- To help motivate employees to continually seek to improve services to their customers.
- To assist in budget development and justification.
- To help track whether actions taken in the past have led to improved outcomes.
- To better communicate with elected officials and the public.
Limitations of Outcome Measurement Information
The major limitations of outcome measurements are the following:
- Outcome data obtained will usually not tell the impact of the program on the measured outcomes. Data on outcomes do not tell why the results are as they are. Usually other factors outside the control of the program (and probably the organization as a whole) contribute to the results. This particularly applies to indicators that attempt to measure desired end outcomes. This means that worse-than expected, or better-than-expected performance should not be the occasion for automatic blame-setting or, alternatively, praising the program. Additional examination is needed to access causes for shortfalls, or better than expected outcomes.
- Managers of publicly-supported programs need to know whether they are winning or losing. The score does not indicate why the score is as it is, but provides vital information for program decision making. Projects should provide explanatory information about unexpected or unusual outcomes, along with outcome data.
- The state-of-the-art of outcome measurement is limited. Perfect measurement and complete coverage of all relevant program outcomes should not be expected. The objective of practical outcome measurement is to provide the information on program quality and outcomes, not perfect information.
- Outcome measurement requires time and effort to develop the process, to collect the information each year, to tabulate and analyze it, and to report it. The key question is whether over the long haul the information will provide the necessary information.
To determine the extent to which the program itself has affected the outcomes, sometimes called program "impact," more in-depth analysis is needed. Ad hoc, special program evaluations can be undertaken to estimate program impacts and help determine why programs fall short of, or exceed, performance expectations. Formal program evaluations are done infrequently on most programs, especially small programs, and, thus, do not provide the regular feedback on program progress needed to help managers manage.
As outcome measurement data becomes available, the information should be highly useful to those undergoing program evaluation studies. Regularly collected outcome data should also help the Department and its program offices determine their future evaluation needs, i.e., identifying areas that the outcome measurement data indicate need attention.
Categories of Outcome Information
It is useful to distinguish between various categories of outcome information. Outcome indicators can usually be classified as one of the following:
Inputs. These indicate the amount of resources, such as the as the amount of funds and number of employees involved in a particular distance learning program.
Outputs. These indicate the products and services produced by a program. Outputs are important for measuring internal work performance, but do not indicate the extent to which progress has occurred toward achieving a program’s purpose. For example, an output indicator might be "the number of math instructional modules produced." Outputs will generally measure the activities of the program and individual projects, rather than the activities of students or teachers who participate.
Outcomes. These provide information on events, occurrences, or conditions that indicate progress toward achievement of the mission and objectives of the program. It is usually useful to distinguish between intermediate outcomes and end outcomes.
Intermediate Outcomes are outcomes that are expected to lead to the ends desired, but are not themselves "ends." Intermediate outcomes generally indicate the extent of progress toward an ultimate, end result (such as higher student achievement). The distinction between outputs and intermediate outputs is not always clear. For example, the indicator, "Number of courses by type, provided to participants by educational institutions." can be classified as an intermediate outcome because it measures to what extent schools are actually participating in the program, rather than measuring, more passively the availability of the course.
End Outcomes are the desired results of a program. For example, a key end outcome "the percentage of students whose test scores improved significantly in courses in which distance learning technologies had been introduced and were a significant part of the instruction."
People will likely disagree in some cases as to whether an indicator is an end or intermediate outcome. Context is important, and since program missions may change over time, the classification of particular indicators may also change. When in doubt, it is often helpful to refer back to the mission /objective statement to make such determinations.
A standard set of outcome indicators for distance learning programs includes:
- the overall project mission statement;
- the general objectives that relate to the mission;
- more specific, but still general, outcomes sought that relate to each objective;
- the specific performance indicator (s) for which data need to be collected to track progress on the indicator; and
- the likely data source(s) of each indicator.
Data Sources and Recommended Data Collection Procedures
The outcome indicators that are finally selected must face the test feasibility and practicality. That is, the distance learning project should be able to obtain reasonably accurate data on each indicator, in a reasonably timely manner, and for an affordable cost in staff time and dollars. This section addresses the following data collection procedures suggested for each outcome indicator.
It is important to determine who will be responsible for the collection of which items of information. In addition, it must be clear to where and by whom they will be processed.
This discussion focuses only on collecting data on outcomes. Program impacts that indicate the extent to which the project/program affected the observed outcomes are not discussed. All outcome indicators are affected by outside factors - factors not fully controllable by the program or its individual projects. For obtaining information on project and program impacts, in-depth ad hoc evaluations are needed. To undertake such in-depth evaluations effectively, comparison groups will usually be needed.
A Distance Learning Program Evaluation Model
A simple distance learning program self evaluation contains these features. It assumes that the target audience is the program administrator, superintendent, board, and state agency.
- Purpose -- the program goal and specific objectives
- Target -- description of the targeted user
- Intervention -- a description of the distance learning intervention(s) and activities to support the intervention(s)
- Participants -- descriptive statistics on the learners -- see the TOPSpro summary data used for the California report (Innovation Program Reports)
- What did participants learn -- curricula content and learning gain data drawn from standardized testing. CASAS is used in California to measure reading and listening skills gains.
- How did participants apply their learning -- learning mastery data drawn from authentic or alternative assessments
- Participant satisfaction -- learner evaluations and comments on the intervention
- Staff self evaluation -- instructor evaluations and comments on the intervention
- Summary and Recommendations -- an examination of the intervention's strengths, weaknesses, learner participation and outcomes, and meeting the distance learning objectives
When possible, a third party evaluator should design and conduct the program evaluation. Emphasizing the practical aspects is important, especially when incorporating user evaluation comments and making program recommendations.