University of Auckland
Scott W. VanBenschoten
Charles R. Robinson
Christopher T. Lowenkamp
Administrative Office of the United States Courts
Background and Research Question
THE PREDICTION OF RISK is ubiquitous in modern society (Beck, 1992). Physicians consider risk when treating patients, financiers consider risk when making investments, and psychologists consider risk when working with clients. Within the criminal justice system, predictions of risk guide discretion at all points (Gottfredson & Tonry, 1987). When police officers choose between formal citations and verbal warnings, they evaluate risk; when judges impose sentences upon defendants, they evaluate risk; and when community corrections officers monitor the conditions of pretrial defendants, parolees, and probationers, they, too, evaluate risk.
Over time, research suggests, professionals within the justice community develop the ability to distinguish high-risk offenders from those who present little risk of reoffending (Fong, et al., 1990; Mossman, 1994). They do so by drawing upon their own personal experiences, using heuristics and other mental shortcuts to simplify complex calculations (Nisbett & Ross, 1980). But this kind of professional (or clinical) judgment is limited to the experience of the decision maker and is subject to a host of faults: unreliable evaluations, discretionary decisions based upon biases and stereotypes, and politicized administration (Walker, 1993). An alternative approach is to use statistically-derived instruments to predict actuarial risks of violence, dangerousness, reoffending, rearrest, or reconviction.
The statistical prediction of recidivism risk has an 80-year history, and can be traced at least as far back as the 1928 parole prediction instrument developed by Ernest Burgess (Burgess, 1928). Early attempts to use actuarial risk assessment in the justice system were often controversial, particularly given high rates of false positives (Selective Incapacitation, 1982). Evaluators identifying subjects as dangerous were wrong twice as often as they were right (Monahan, 1981). Nevertheless, despite these flaws, research suggested that actuarial prediction outperformed the clinical judgment of even trained professionals across an array of disciplines (e.g., Meehl, 1954). The superiority of actuarial assessment over unstructured clinical judgment is a finding that has been replicated by many researchers (Grove & Meehl, 1996; Harris, 2006). One meta-analysis of 136 studies concluded that statistical predictions were 10 percent more accurate than clinical judgments and were dramatically more accurate one third of the time (Grove, et al., 2000). The accuracy of assessment instruments also appears to have improved (Hilton, et al., 2006). A more recent meta-analysis of 67 studies concluded that actuarial assessment generally is 13 percent more accurate than clinical judgment and is 17 percent more accurate in predictions of future violent or criminal behavior (Ægisdóttir, et al., 2006).
Today, the academic debate is no longer about whether actuarial assessments out-predict clinical judgments; that debate is long since over (Monahan, et al., 2001). Even the skeptics of actuarial risk prediction now acknowledge a consensus that actuarial judgments consistently outperform clinical ones (Harcourt, 2007; Litwack, 2001). Instead, the current debate is about whether there is any place in risk assessment for clinical judgment (Hanson, 2009). Some researchers argue for a synthetic approach, combining actuarial and clinical techniques (e.g., Gottfredson & Moriarty, 2006; Sjöstedt & Grann, 2002, Sreenivasan, et al., 2000). After all, for all its strengths, actuarial prediction is not particularly good at accounting for exceptional circumstances, predicting rare events, or predicting risk for young people (for whom there is less historical information available) (Bullock, 2011). Other researchers, however, argue for an actuarial-only approach (e.g., Grove & Meehl, 1996; Quinsey, et al., 1998). They claim that the introduction of clinical judgment only reduces the accuracy of the instrument. And after all, “[e]ven if actuarial methods merely equal the accuracy of clinical approaches, they may save considerable time and expense” (Dawes, et al., 1989: 1673).
Numerous commercial risk assessment instruments are available, all predicting recidivism about equally, all more accurate than unstructured clinical judgment (Yang, et al, 2010). Many jurisdictions use commercial instruments such as the PCL-R, CAIS, COMPAS, or the LSI-R. Other jurisdictions have adapted off-the-shelf instruments to fit their specific needs or have developed their own in-house assessment tools. Used effectively, these assessment tools allow probation officers to accurately assess risk, a requisite first step in employing evidence-based practices (Harris, 2006; VanBenschoten, 2008).
Yet despite the lengthy history of statistical risk assessment and despite a substantial body of research demonstrating that actuarial predictions outperform unstructured clinical judgment, probation officers—both in the United States and abroad—have exhibited skepticism, ambivalence, and outright hostility toward actuarial assessment devices. Irish probation officers have cultivated an attitude of “resistance” to assessment instruments (Fitzgibbon, et al., 2010). In England, Horsefield suggested that, using their clinical judgment, “it is not difficult for probation service staff to identify who is likely to commit further offences” (2003: 377), and argued that the real value of using actuarial risk instruments lies in justifying the operations within the probation service, competing for resources, and regulating staff behavior. In the United States, Schneider and her colleagues (1996) reported similar attitudes among Oklahoma probation officers. Officers held negative-to-neutral views about risk instruments (e.g., only 15 percent thought risk instruments are more accurate than officer judgment) but thought actuarial tools were useful in justifying supervision levels to the public and legislature. Lynch (1998) reported that California parole officers deliberately subverted directives issued by their actuarial risk managers. But even managers appear to express reservations about the value of risk assessment instruments. In a 2003 national survey of community corrections agencies, 61 percent of respondents described themselves as satisfied or very satisfied with the risk instruments used in their departments, but a full 39 percent described themselves as neutral, uncertain, or dissatisfied (Clem, 2003: 22).
The tension between professional judgment and actuarial risk assessment affects the federal probation and pretrial services system as well. Risk assessment is not new to the federal courts. The district court for the District of Columbia began using a risk prediction scale, the “U.S.D.C. 75,” in 1970 (Hemple, et al., 1976). This instrument was renamed the Risk Prediction Scale 80 (RPS 80) and adopted for use throughout the probation system in January of 1981 (Eaglin & Lombard, 1981). In September of 1997, the RPS 80 was replaced by the Risk Prediction Index (RPI), an eight-question, second-generation risk assessment tool (Lombard & Hooper, 1998). But many probation officers did not use the RPI scores they calculated (VanBenschoten, 2008), and did not always link supervision practices to risk levels (Lowenkamp, et al., 2006).
Responding to the Criminal Law Committee’s endorsement of evidence-based practices in the supervision of defendants and offenders (Judicial Conference, 2006), probation staff at the Administrative Office of the U.S. Courts have developed a new, fourth-generation risk assessment instrument, the Federal Post Conviction Risk Assessment (PCRA). The PCRA was validated on a large sample of federal probation cases (see article in this issue by James Johnson et al.).
We were interested in whether use of the PCRA would improve the ability of federal probation officers to accurately assess risk. On the one hand, 50 years of research suggests that actuarial prediction consistently outperforms unstructured professional judgment (e.g., Ægisdóttir, et al., 2006; Grove & Meehl, 1996; Grove, et al., 2000; Monahan, et al., 2001); on the other hand, federal probation officers are considered to be the “crème de la crème” of community corrections officers (Buddress, 1997: 6). They are well educated, well trained, and often come to the federal system with substantial practical experience. Would the use of the PCRA allow even federal officers to improve their ability to assess risk?
The question of whether the use of the PCRA would improve the risk assessment skills of federal probation officers was investigated during four regional training meetings convened during 2010 and 2011. Federal probation officers from districts in the greater Washington, DC metropolitan region gathered in Washington, DC to participate in PCRA training; officers from districts in the eastern United States gathered in Charlotte, NC; officers from districts in the middle of the country gathered in Detroit, MI; and officers from districts in the western United States, including Pacific islands, gathered in Salt Lake City, UT. Approximately 150–350 officers attended each of the training meetings.
Prior to the training session each officer was asked to complete an eight-hour online training program that reviewed the fundamentals of risk, need, and responsivity (Andrews, et al., 1990). At each session, trainers explained to the participating officers that they would be asked to assess an offender’s risk based on a videotaped mock intake interview and supplementary written documentation. Specifically, they were told that they would be asked to place the offender in the case vignette in one of four risk categories (low, low/moderate, moderate, or high) and to identify the offender’s three most important criminogenic needs (in rank order). The description of the risk levels was not defined; thus the officers needed to define for themselves what each risk level meant. Although the probation officers were in a large group setting, the trainers emphasized that officers were not to discuss their rankings of risk or identification of criminogenic needs until they submitted their data collection form.
The case vignette consisted of a 24-minute mock intake interview (based upon an actual case, with identifiers and key case details modified in order to protect the offender’s anonymity). The probation officer in the vignette asked the offender—a man in his fifties with a long history of methamphetamine addiction and firearms charges—a series of questions about the offender’s criminal behavior, employment, social networks, cognitions, substance abuse, time in custody, and current accommodations. Supplemental written materials included a presentence report and release paperwork from the Federal Bureau of Prisons.
The offender in the vignette was working and lived in a stable residence. He participated in treatment, remained free of drug use, and could articulate a relapse prevention plan. He did not associate with anti-social peers and was in the process of developing a pro-social network. The correct score, according to the PCRA, was low/moderate risk. Specifically, the numerical score was 6 (PCRA scores range between 0 and 18).
After the video concluded, officers were given as much time as needed to identify the risk level and three top criminogenic needs. Officers typically took between five and ten minutes to review the supplementary material and submit a complete data collection form. They were not provided with the correct score after this first exercise.
On the second day of the training, after learning the scoring rules of the PCRA and practicing on several scenarios, probation officers viewed the training vignette for a second time. Instead of using their professional judgment to identify the offender’s risk level and criminogenic needs, they were asked to use the PCRA and identify a risk score. The officers were shown the same video and were provided with the same written supplementary materials. Once again, they were asked to score the case independently and to provide their answers to the trainers. These actuarial (PCRA) risk assessments were collected and compared with the risk assessments made with clinical judgments.
Risk category (low, low/moderate, moderate, or high) is an ordinal variable. As such, typical measures of central tendency and measures of dispersion do not apply. We were, however, interested in whether officers can accurately assign the offender into the correct risk category unaided by actuarial risk assessment and if the PCRA increases the reliability of the assessment of risk and thereby risk classification. To evaluate the effect of the PCRA on reliability of risk assessments, we used the consensus measure (Tastle, et al., 2005) to measure dispersion. The consensus measure is a measure that ranges in value from 0 to 1, with 1 representing complete agreement among those that ranked an item (in our case risk category) regardless of the category chosen. A value of 0 on the consensus measure (complete dissention) would be the result when two equal groups of participants rank a case at the far ends of the scale. This characteristic of the consensus measure is important, as it allowed us to determine whether the officers’ categorization of risk was consistent, regardless of whether or not their assessments agreed with the results of the PCRA.
A total of 1,087 officers identified a risk category for the case vignette when asked to do so without administering the PCRA. A total of 1,049 officers provided a risk categorization for the case vignette using the PCRA. The distributions of these ratings are presented in the following two figures.
Figure 1 displays the frequencies (percentages are in parentheses) of risk categories identified by the officers using clinical judgment (without the use of the PCRA). As indicated in Figure 1, it is clear that the largest identified category of risk for the case vignette is moderate risk. Just over 50 percent of the officers indicated, based on the information provided, that the offender was moderate risk. Thirty percent of the officers identified the offender’s risk level as low/moderate, while 17 percent identified the risk level as high. A much smaller percentage (2 percent) identified the offender’s risk level as low. Given this distribution of scores, a calculation of the consensus measure (Cns) yielded a value of 0.66.
Figure 2 displays the distribution of risk categories assigned by officers when using the PCRA to guide their determination of risk. Note that in Figure 2, only three bars indicate the estimation of risk. No officers identified the offender’s risk level as high when using the PCRA. A second noteworthy feature of Figure 2 is that the largest category of risk identified by the officers accounts for ratings from 954, or 91 percent, of the officers. The consensus measure based on the distribution of these ratings yielded a Cns value of .93, or about 1.4 times as great as the Cns measure yielded from the distribution of ratings in Figure 1. In addition, the officers selected the proper risk category, according to the PCRA, 91 percent of the time. Given that this was only these probation officers’ first or second administration of the PCRA, these results are encouraging.
Federal probation officers made more consistent and more accurate assessments of offender risk when using the PCRA than when using unstructured clinical judgment. Assessments made with the PCRA were more accurate (e.g., more officers correctly identified the risk level) and had greater consensus (e.g., even officers who did not correctly identify the risk level selected categories adjacent to the actual risk level). These findings support the view that, in assigning offenders to the correct risk category, actuarial prediction outperforms unstructured clinical judgment. Our findings are consistent with a robust body of work, collected over many decades (e.g., Ægisdóttir, et al., 2006; Grove & Meehl, 1996; Grove, et al., 2000; Meehl, 1954; Monahan, et al., 2001). But they are still remarkable. It is important to note that federal probation officers have to satisfy very high standards. They must meet medical standards, pass regular background investigations, possess at least a bachelor’s degree from an accredited university, and complete the six-week training program at the Federal Probation and Pretrial Services Training Academy. Typically these officers have prior probation experience from other jurisdictions. Additionally, these highly-skilled professionals are part of a single system with one set of national policies (with local variation) and a uniform training academy. Despite this, the federal probation officers produced a more consistent risk level assignment with the use of an actuarial tool.
The research also indicates that clinical judgments tended to overestimate risk. It is not difficult to understand why. Ansbro notes that probation officers “face the mutually-exclusive targets of high accuracy and high throughput, and exist in a climate where failings in practice will be hunted for if an offender commits a serious offence whilst on supervision” (2010: 266). A signal detection analysis lies beyond the scope of this article, but in a situation where there are dire consequences to missing a true positive (i.e., not identifying a high-risk offender as such) and few direct costs to officers when making false positives (i.e., wrongly identifying a low-risk offender as high-risk), it is easy to see why officers would yield to the so-called precautionary principle identified by Kemshall (1998). Of course, over-supervising low-risk offenders is expensive, and diverts resources away from the high-risk offenders who need them. Austin analogizes this to a “hospital that decides to provide intensive care for patients who have a cold—the treatment is not only unnecessary but expensive” (2006: 63). There is also research suggesting that over-supervising low-risk offenders can make them worse, affirmatively increasing their likelihood of recidivism (Lowenkamp & Latessa, 2004). Actuarial tools can serve as a valuable check against the precautionary principle. They can provide a means of engaging in professional triage, ensuring that resources are allocated where they should be, maximizing community safety while allowing for offender rehabilitation (Flores, et al., 2006).
It is also important to note that specific descriptions of the risk terms were not provided for the officers. This may have caused some of the risk category assignment variation. What “low risk” means to two different officers may vary. This means that the variation in risk assignment may be due to how the case is seen and understood by an officer, but equally concerning is that the difference may also be due to various definitions of language that officers and national policy use related to risk.
In a landmark article, Feeley and Simon suggested that the rise of risk assessment was symptomatic of a shift to a new penology: “[T]he new penology is markedly less concerned with responsibility, fault, moral sensibility, diagnosis or intervention and treatment of the individual offender. Rather, it is concerned with techniques to identify, classify, and manage groupings sorted by dangerousness. The task is managerial, not transformative” (1992: 452). Without question, the use of risk assessment instruments in community corrections has exploded since Feeley and Simon published their article, and its ascendance has been criticized by many thoughtful critics (e.g., Hannah-Moffat, et al., 2009; O’Malley, 2004; Wandall, 2006). Indeed, Harcourt (2007) demonstrates that risk-based justice may actually increase the overall amount of crime in society. In jurisdictions around the world, probation and parole officers have resisted the tyranny of risk and rejected managers’ instructions to manage offenders under their supervision by risk score (Fitzgibbon, et al., 2010; Lynch, 1998). But this view of risk assessment may be too dystopian. Other commentators have realized that the consequences of risk assessment are far more nuanced than its critics suggest. For example, Robinson (2002) notes that actuarialism’s focus on outcomes actually underlies the new rehabilitation of “what works” (see Petersilia, 2004; Taxman, et al., 2004). To be sure, this is a form of rehabilitation that takes public safety as its ultimate object—not the transformation of every individual offender (Robinson, 2002). But instead of contributing to an inexorable increase in prison populations and persons under supervision—a population that exceeded five million, or 1 in 31 U.S. citizens, during 2009 (Pew Center, 2009)—risk assessment can reduce prison and community corrections populations (Bonta, 2008). By operating as a check against the precautionary principle and reducing over-classification, actuarial risk assessment can reduce recidivism among low-risk offenders by ensuring that they are not over-supervised. It can simultaneously reduce recidivism among high-risk offenders by ensuring that these individuals are carefully supervised and provided with interventions that correspond to their criminogenic needs. Instead of stripping the humanity from probation work (Wandall, 2006), actuarial risk assessment with the PCRA can allow federal probation officers to be far more effective in facilitating real transformative change in the lives of offenders.