Report on Student Evaluations of Instruction
Ad-hoc Committee on the SEI
State University of New York
II. The History of the Current SEI at New Paltz 1990-2000
III. Overview of the 2000 Report
IV. Current Use of the SEI on Campus
V. Summary of Student Survey, Fall 2004
VI. Summary of Faculty Survey, Fall 2004
VII. Statistical Analyses Examining Current SEI Form
This report provides an overview of information gathered about Student Evaluations of Instruction (SEIs) at SUNY New Paltz by a committee charged with this task by the central faculty committee, Academic Affairs. We have examined perceptions about the SEIs, past studies of the instrument, and its current uses at the College in order to provide information and recommendations that can be used for its revision. This revision is motivated by on-going, widespread criticisms of the SEI by faculty and the possibility of adopting changes that have been successfully implemented at other universities.
There has been a great deal of work in developing, evaluating, and criticizing the SEIs by other committees in the past: in particular the College Task Force on Teaching (1990-1992), which produced the current SEI form (along with other recommendations) and a subcommittee of the Academic Affairs Committee on the SEI, which produced a thorough and carefully written report in the spring of 2000. This report draws on the insights of this past work and integrates it with information that we have gathered in the last year to make our recommendations. It provides a brief overview of two surveys about the SEI that were conducted by the Office of Institutional Research on behalf of the Academic Affairs Committee: a survey distributed to students and a survey distributed to faculty. It also offers an updated discussion of the ways that the results of the SEI are currently being used on campus and a further discussion of the statistical validity of the SEI.
In the early 1990s, a “Task Force on Teaching” was formed to examine the evaluation of teaching and revise the “Student Opinion of Instruction” or SOI form, which had been in use at the time. Under the leadership of Hadi Salavitabar from the School of business, the Task Force on Teaching met on a weekly basis for a period of two years to develop, test, and adopt a new student-evaluation-of-teaching instrument. The task force also consulted with faculty members through a survey and tested recommended instruments in a number of classes. The Task Force presented their recommendations to the Academic Senate and the College Faculty. They recommended a Student of Evaluation of Instruction Form A with sixty questions, grouped thematically from which the instructor could select in order to construct and revise the instrument. These questions were not meant to be included in personnel folders so that the instructor would be free to honestly assess his or her teaching. This task force also recommended Student of Evaluation of Instruction Form B with 22 questions; this is the form that the campus is currently using. In addition, they developed evaluation forms for peer evaluation and department chair/dean evaluations. Yet, Form B is the only formal instrument for assessing teaching that is currently in use.
In 1997, David Blankenship, Chair, Academic Affairs Committee, wanted to assess faculty perceptions about the Student Evaluation of Instruction (SEI) prior to a committee discussion on whether the form should be revised. He distributed a survey about the SEI that garnered suggestions for revising particular questions on the instrument and raised questions about its overall validity. Several concerns were raised that the instrument should not be revised without a more comprehensive analysis of its effectiveness. A subcommittee of Academic Affairs undertook an examination, with particular attention to scholarship about teaching evaluations and their statistical validity. In 2000, this subcommittee submitted a report that is discussed in the following section.
This report provides a broad and careful examination of the SEIs and their use within the university. The committee defines the purpose of the report as primarily educative but they also make the following recommendations:
The report discusses a survey of department chairs regarding the SEI and provides a detailed review of literature about the reliability and validity of the SEI. (We refer to both of these issues in the context of our report below.) The report points out that “numerous studies demonstrating the reliability, validity, and lack of bias of [student] ratings (Abrami, d’Appolonia, & Cohen, 1990; Feldman, 1988; Hinton, 1993; Lowman, 1984, 1994, 1995; Marsh 1984, Marsh & Roche, 1997; McKeachie, 1997) have not diminished faculty skepticism about their meaning and value” (p. 2). Yet, the report also highlights literature that questions the validity of the SEI, such as Greenwald and Gilmore’s study (1997), which found that course workload adversely influenced student ratings (p. 10). The 2000 report also raises questions about whether the SEI measures “effective teaching” as opposed to some other construct. They argue that the SEI requires “correlations of student ratings with measures of student learning including multi-section and multitrait multimethod studies” in order to meet a higher standard of validity. Furthermore, the report raises the concern that the SEI might be biased in favor of a particular model (transmission) of teaching (Kolitch, E. & Dean A., 1999).
The 2000 report raises important questions about the SEI, its meaning and its use. As a result, the report recommended further investigation and a thorough re-examination of how the SEI is used at college. Our committee lacks the time or resources to address all of the concerns raised by the committee in 2000, but we provide several responses below in the section entitled, “Statistical Analyses Examining Current SEI Form.” Despite the important critical questions that have been raised, we proceed with more specific recommendations for revising the SEI, assuming a meaningful degree of validity to the instrument. Although there is disagreement within the scholarship discussed above about questions, such as the influences of grades on ratings, there is widespread agreement that student evaluations measure something meaningful about the practice of teaching (see Greenwald & Gilmore, 1997, who generally argue that high quality SEI instruments are valid indices of teaching).
We contacted representatives of the groups on campus involved in the assessment of faculty teaching on campus to ask how each of them uses the SEI: members of the central faculty committee on tenure and reappointment, department chairs, deans, and the Provost.
Three members of the Tenure and Reappointment Committee all agreed that SEIs were used alongside the examination of syllabi and peer observations to assess the quality of teaching. Two of the respondents emphasized the value of peer observations as greater than the SEIs. Some of the members emphasized the use of syllabi, assignments, and handouts, alongside the SEI. Two of the respondents stressed the need to provide information, such as class size and average grade given, in order to help the committee interpret the SEI scores. One member explained that he did not pay much attention to the written comments on the back of the SEI, while the other members did not mention these comments.
The Tenure and Reappointment committee obviously does not use the SEI as a means of improving teaching given that the committee’s purpose is to provide a one-time evaluation of teaching ability. Yet, the respondents who discussed the use of the SEI within their departments mentioned that peer observations were used to improve faculty teaching rather than the SEI. It seems that the SEI is only used as a “red flag” when abnormally low scores suggest a problem with teaching. In this way, the SEI may help to identify certain kinds of problems with teaching that the department or the school can help address, but it does not usually help faculty improve teaching when their scores fall within a normal range.
The Dean of the School of Education also explained his use of the SEI, along with peer observations of teaching, to assess the teaching ability of faculty. He places a higher value on peer observation. He also indicated that he uses low SEI scores to address problems with individual faculty members through their department chairs. If there is no improvement, he then meets with the individual faculty member to discuss aspects of teaching that appear problematic based on the SEIs.
The Dean of Liberal Arts and Sciences keeps track of norms on the SEI for each department and examines individual scores relative to these norms. He also considers norms for the course and the level at which the course is taught. He pays attention to all of the questions and looks for areas of exceptional strength or weakness, especially with regard to fairness, responsiveness, and accessibility. He then reviews the instructors’ discussion of their own teaching, and provides a response in his letter reviewing their performance for DSI, tenure, reappointment and promotion. He primarily relies on summaries of the qualitative responses from the department level.
The Provost explained that he examines individual scores in relation to general patterns for courses at a similar level and discipline. He pays particular attention to SEI’s where there are disparities in the scores among different questions on the SEI. He does not put much weight on qualitative responses, unless there are written comments on nearly all of the SEIs for a class.
Although there is general skepticism about the value of the SEI and a strong sense that it must be accompanied by other measures of teaching, there is lack of consistency among respondents about what these other measures might be and how they ought to be used. Some respondents emphasized the careful examination of syllabi, while others emphasized peer observations. One respondent from the tenure committee pointed out that peer observations often lack a critical appraisal of teaching, making them less useful for evaluating teaching. There was also a question about whether peer observations are required for tenure review. Given the ambiguity about the use of other measures of teaching, the SEI scores seem to receive more attention overall despite the skepticism about the instrument’s value by those who are assessing teaching on campus. This suggests that changes in the SEI and its use must be considered alongside changes in the use of other measures of teaching effectiveness.
The student survey was conducted online through Blackboard and includes 277 responses. It used a five point scale where 1 = Strongly Agree, 5 = Strongly Disagree and 3= Neutral. We examined the mean of all student responses on the survey. We considered mean responses in the 2.5-3.5 range to be neutral and therefore not representing a strong preference on the part of the student body as a whole (See Appendix I for a complete summary of the survey). Students’ responses were neutral on six of the nine questions.
The faculty survey produced 104 responses. It also used a five point scale where 1 = Strongly Agree, 5 = Strongly Disagree and 3= Undecided. We examined the mean of all faculty responses on the survey. We considered mean responses in the 2.5-3.5 range to be neutral/undecided, and therefore not representing a strong preference on the part of the faculty as a whole (See Appendix II for a complete summary of the Survey).
Many written comments were also garnered on the faculty survey. Comments that appeared a number of times were used to inform the recommendations in Section VIII below.
Of the varied pieces of information gathered to help inform possible changes to our SEI form, we conducted analyses to examine some basic psychometric issues regarding our current SEI.
The 2000 report from a prior SEI Subcommittee of the Academic Affairs Committee addressed the issues of reliability and validity of the current SEI in a relatively general sense. That is, that report delineated issues that need to be addressed in establishing test reliability and validity without actually conducting such analyses. According to that report (pp. 12-13), such a statistical undertaking should include analyses such as:
“• task analysis defining the construct
of ‘effective’ teaching including classroom
observations and interviews with subject matter experts (e.g., faculty, students)
• development of an item pool based upon extensive interviews with faculty
• administration of pilot surveys in different academic departments
• use of factor analysis to identify the dimensions underlying the student ratings
• computation of item reliabilities and test-retest reliability
• correlations of student ratings with measures of student learning including multi-
section and multitrait multimethod studies
• research on suspected sources of bias”
The work presented here focuses on a subset of these possible analyses designed with the intention of examining basic reliability and validity of our SEI. Specifically, the analyses here focus on the issues of internal reliability, potential multi-factorial structure underlying the SEI items, and criterion validity.
The data analyzed in this section are drawn from the Fall 2002 SEI data (courtesy of the Office of Institutional Research).
First, to address ‘internal reliability,’ Cronbach’s alpha analyses were computed for five randomly selected sections (one representing each of the five schools). Alpha is an index of the degree to which the different items within a scale empirically inter-relate and, thus, seem to tap the same underlying construct. With SEI data, we can think of teaching effectiveness as the latent, underlying construct tapped by each SEI item. In other words, each SEI item is designed to measure some facet of teaching effectiveness.
Alpha typically ranges from 0 to +1. As alpha approaches 1, a scale is thought to increase in its internal reliability. Traditionally, alpha coefficients of .7 or greater are considered as demonstrating sufficient reliability.
The alpha coefficients for five randomly selected courses from that semester (Fall 2002) are as follows:
Business Administration .92
Educational Studies .56
These numbers indicate that, for this random sample of classes, the SEI items seem to be generally inter-related, indicating that this scale has sufficient reliability. Note that for the educational studies course, the low alpha is at least partially due to the fact that there was no variance for multiple items.
In addition to these alpha analyses conducted at the level of specific sections of classes, an overall alpha was computed across all sections. The overall alpha was very strong at .97. The results also provided information regarding the ‘item-to-total’ correlation for each SEI item. This index provides information regarding how well each particular item correlates with the other items. The only two items that had item-to-total correlations of less than .7 are: “Contributed to making me an informed and educated person” and “Returned students’ work in a reasonable timeframe.” While these items clearly tap important elements of teaching, this analysis suggests that scores on these items are less inter-related to the other SEI items compared with all other items.
An additional analysis was conducted to examine a potential multi-factorial structure underlying the SEI. Put simply, this means that we looked to see if the SEI scores empirically reflect one single dimension of teaching effectiveness (compared with multiple discrete dimensions). To examine this issue, we conducted a principal-axis factor analysis across all sections. All 20 SEI items were included. This analysis revealed, quite strongly, that there is really only a single dimension underlying our SEI scores; only one factor emerged as an important dimension underlying teaching effectiveness, and, importantly, each of the 20 items was postively related to this dimension.
An additional kind of analysis conducted to examine the SEI pertains to whether this instrument discriminates effectively among courses representing different academic fields. In other words, if this SEI is a valid index of teaching-relevant outcomes, it should be sensitive to differences across disciplines. This particular kind of validity may be thought of as ‘criterion validity.’ Criterion validity is generally the ability of a test to covary with other variables that are conceptually related to the construct thought to underlie the test. Thus, in the analyses described here, academic discipline (operationally defined in terms of different schools within the college) was examined in terms of whether SEI scores varied across disciplines. Such an outcome would demonstrate some basic level of criterion validity.
To examine whether the SEI does in fact discriminate among courses representing different fields, 20 one-way ANOVAs were computed using School as the between-subjects factor and each SEI item as dependent variables. In lay terms, a one-way ANOVA (i.e., analysis of variance) examines whether the means for some item vary significantly across different groups (in this case, groups represents different schools in the university). Each ANOVA yielded a significant overall effect, indicating that there was significant between-school variability across each SEI item (see Appendix for a table that includes the actual raw data). For instance, for the item “Was well prepared for class,” the ANOVA yielded a significant between-group effect (F(4, 16,188) = 14.06, p < .05). In other words, the means for this item across the five schools were, in some combination, statistically different from one another. Follow-up analyses revealed that this between-group effect was largely attributable to teachers from the liberal arts and sciences (M = 1.41) as well as teachers from Education (M = 1.41) scoring, on average, significantly lower than teachers from Fine and Performing Arts, Sciences and Engineering, and Business (Ms = 1.46, 1.48, 1.50) respectively. General trends across the disciplines can be seen in the Appendix.
Note that this analysis is not designed to provide judgments regarding whose teaching is best. Rather, the fact that means across each item of the SEI demonstrated significant between-school variability is presented here to highlight the fact that this instrument is in fact sensitive to important structural factors (such as general academic area). This information may imply that a particular instructor’s SEI data should be understood at a within-school level rather than in a university-wide manner. Baseline data regarding SEI, based on this reasoning may be more fair if it is school-based. In other words, given the significant differences in the means of SEI items in this analysis, School-level baselines may make more sense for understanding a particular instructor’s scores compared with University-wide baselines. For instance, courses offered in the School of Science and Engineering may yield, on average, less positive scores than those from courses offered in Education; this fact may well pertain to the nature of the subject matter. Perhaps judgments of instructors’ teaching should take this point into account by the use of localized norms.
In any event, while the statistics included here are not comprehensive, the analyses presented suggest that while the SEI is likely far from perfect, it seems to possess some basic qualities speaking to reliability and validity.
See Appendix with Descriptive Statistics for each SEI item across the schools of the University.
This report assumes that the college will continue using some kind of evaluation, similar to the current SEI, and we are suggesting relatively modest changes in the instrument and its use. The recommendations are based on the information discussed above: The College Task Force on Teaching of 1992, Faculty Survey on the SEI in 1997, the report on the SEI by a subcommittee in 2000, the 2004 Student Survey, the 2004 Faculty Survey, the report on current uses of the SEI above in section IV, and the statistical analyses provided above in section VII.
1. Faculty need to be informed more clearly about how the SEIs are used by the college to evaluate them for reappointment, promotion, and salary increase. A clear explanation of the SEIs use by the administration should be provided (perhaps online, as in the example of Syracuse University). In particular, such a statement should explain for whom the SEI is mandatory and how much access administrators (including department chairs) have to SEI data. There also remains a great deal of confusion over how candidates should include SEI information in their dossiers (especially written responses), given the concern with reducing the length and size of dossiers. (2004 Faculty Survey shows confusion about how the SEIs are used, as does the 2000 SEI Report).
2. The role of the SEI in evaluating faculty for reappointment, promotion, and tenure (evaluation) should be distinguished from the use of the SEI to improve the teaching practices of the individual faculty member (assessment). The latter function would be served better by alternative assessments of teaching (see A.3) and a specialized portion of the SEI that is not part of the faculty member’s official dossier. (In current educational discourse ‘assessment’ refers to feedback that one can use to achieve better performance, whereas evaluation refers to a fixed decision about performance that does not necessarily allow for future improvements.) (Task Force on Teaching, 2000 Report, 2004 Faculty Survey).
3. There should be alternative formal assessments of faculty teaching that accompany the SEI in decisions about re-appointment, promotion, and discretionary salary increase, such as formal peer evaluations that address some general questions about teaching. The informal nature of some peer evaluations currently leads to an over reliance on the SEI (Task Force on Teaching, 2000 Report, Report on Current Use).
4. Questions about the validity of the SEI (and similar surveys at other campuses) suggest that it should not be used to discriminate among faculty members with summative scores in the same range (namely, good to excellent teachers). Yet it may help identify instructors who have an ongoing pattern of scores that are markedly lower than their department or school. The current use of SEI seems to fit this recommendation (that summative scores should not be used to rank faculty), but there is significant uncertainty among the faculty about whether the SEI scores are being used in this way (2000 SEI Report, 2004 Faculty Survey, Report on Current Use).
1. Faculty should have the option to select some evaluation questions that are not revealed to the relevant committees involved in central faculty governance or the administration. This would greatly improve the likelihood that individuals and departments could use the SEI to critically assess and improve teaching (2004 Faculty Survey, Task Force on Teaching).
2. In addition to a list of common questions used across campus, there should be options for SEI questions that can be tailored to different schools, departments, instructors or courses on campus (2004 Faculty Survey, Task Force on Teaching, Statistical Analyses).
3. Qualitative responses should be preserved and perhaps developed further in any revision of the SEI. Qualitative responses should be prompted by one or more open-ended questions (e.g., at Purdue University students are asked “What is the most valuable thing you learned in this course?”) (2004 Faculty Survey [written comments]).
4. As a whole, the faculty (and the students) appeared to be undecided about the value of SEIs online. Yet, this committee and the faculty as a whole have not been exposed to a specific model for SEIs online. We recommend that SEIs not be administered online until a specific proposal (or set of proposals) for how this will be undertaken can be vetted by the faculty. (2004 Faculty Survey, 2004 Student Survey).
5. Given that the questions on the current SEI seem to be measuring a single underlying construct, it may not be necessary for there to be so many questions on the common SEI form that is used across campus (Statistical Analyses).
6. Students should not be required to sign their names on the SEI forms that they complete (2004 Faculty Survey).
7. Information about the individual student’s commitment and/or performance should be included on the form (e.g., Purdue and SUNY Geneseo ask students, “What grade do you expect to receive in this course?”; and Geneseo also asks, “Rate your level of involvement in the activities in this course”). This information would likely be included on the optional assessment form rather than the universal (evaluative) form (2004 Faculty Survey [written comments]).
8. The following questions have been the target of multiple criticisms and should be revised or eliminated. Many of the criticisms were articulated in written responses on the 2004 Faculty Survey.
#3 “Contributed toward making me a more educated informed person.” (This question seems ambiguous to many faculty members.)
#9 “Was confident and competent in the subject matter” (These are two different questions and the students in some courses may not be qualified to speak to the faculty member’s competence).
#12 “Treated students with fairness and concern.” (Like #9, this question seems to refer to two different characteristics of the instructor).
#14 “Was easy to approach outside of class” and #15 “Was available for meeting with students during office hours.” (These questions do not take into account the constraints on adjunct faculty; perhaps these questions should be given for full time faculty only).
# 16 “Adjusted his/her teaching to reflect the students’ level of comprehension” (This question seems too ambiguous to many faculty.)
#17 and #18 can be combined, e.g., “Gave assignments/exams that were appropriately related to the course.”
9. Several faculty members recommended a new question that probes the academic climate of the classroom, e.g., “Students in this course could ask questions and/or felt free to do so.”
10. Finally, in order to develop a set of instruments that meet the above guidelines, we propose that a task force be created that would include one faculty member from each of the five schools (elected by members of the schools) and the director of institutional research as an ex officio member. In particular this task force would: 1) develop a statement regarding the use of the SEI by all bodies involved in personnel decisions; 2) specify evaluative SEI questions to be used across campus, as well questions that faculty members could select for their own assessment on a separate form; and 3) recommend whether the SEI should be administered online, and if so how. The results of this task force would be presented to the Academic Senate in a timely manner (we hope prior to June 2006).
ABRAMI, P. C., d'APOLLINIA, S., & COHEN, P. (1990) Validity of student ratings of instruction: what we know and what we do not know, Journal of Educational Psychology, 82, pp. 219-231.
CHURCH, A. H. & WACLAWSKI, J. (2000, April) Is there a method to our madness? Survey and feedback method effects across five different settings. In M. Sederbur & S. Rogelberg (Chairs), Improving the Survey Effort: Methodological Questions and Answers. Symposium conducted at the 15th annual meeting of the Society of Industrial-Organizational Psychology, New Orleans, LA.
CLEVELAND, J. N., MURPHY, K. R., & WILLIAMS, R. E. (1989) Multiple uses of performance appraisal: Prevalence and correlates. Journal of Applied Psychology, 74, pp. 130-135.
D'APOLLONIA, S. & ABRAMI, P. C. (1997) Navigating student ratings of instruction, American Psychologist, 52, pp. 1198-1208.
FELDMAN, K. (1978) Course characteristics and college students' ratings of their teachers: what we know and what we don't, Research in Higher Education, 9, pp. 199-242.
FELDMAN, K. (1988) Effective college teaching from the students' and faculty's view: matched or mismatched priorities? Research in Higher Education, 28, pp. 291-344.
GREENWALD, A. G. (1997) Validity concerns and usefulness of student ratings of instruction. American Psychologist, 52, pp. 1182-1186.
GREENWALD, A. & GILLMORE, G. (1997) Grading leniency is a removable contaminant of student ratings, American Psychologist, 52, pp. 1209-1217.
HINTON, H. (1993) Reliability and validity of student evaluations: testing models versus survey research models, PS: Political Science & Politics, 26, pp. 562-569.
KOLITCH, E. & DEAN, A. V. (1999). Student ratings of instruction in the USA: Hiddenassumptions and missing conceptions about “good” teaching, Studies in Higher Education, 24, 27-42.
LOWMAN, J. (1984) Mastering the Techniques of Teaching (San Francisco, Jossey-Bass).
LOWMAN, J. (1994) Professors as performers and motivators, College Teaching, 42, pp. 137-141.
MARSH, H. & ROCHE, L. (1997) Making students' evaluations of teaching effectiveness effective, American Psychologist, 52, pp. 1187-1197. (Chicago, The University of Chicago Press).
MCKEACHIE, W. (1997) Student ratings: the validity of use, American
Psychologist, 52, pp. 1218-1225.
I. Quantitative summary of Fall 2004 SEI Student Survey
II. Quantitative summary of Fall 2004 SEI Faculty Survey
III. Descriptive Statistics for each SEI item across the schools of the University