Understanding assessment ratings

February 21, 2024

By Dr. Gene Kerns, Chief Academic Officer

A regular part of my job involves responding to questions from educators about our Renaissance assessment suite. Often, these questions involve other products in the market. How are Renaissance’s assessments alike or different? How do they compare in terms of performance? Do they use similar methods to compute specific scores?

Answering deep questions often requires that I consult the technical manuals of the assessments in question. For Star Assessments and FastBridge, getting my hands on the necessary information is easy—and not just because I work at Renaissance. It’s been our long-held belief that making technical manuals readily available is an act of transparency that all assessment providers should take.

This is why a web search combining the name of any of our assessments and the words “technical manual” results in multiple hits.

We post these documents publicly, and we allow schools and districts to repost them as well. This practice of full transparency is common with high-stakes assessments. The National Center for Education Statistics, for example, posts the full technical documentation of the National Assessment of Educational Progress (NAEP)—also known as “The Nation’s Report Card”—online for everyone to view.

But being transparent is, sadly, not the case for many other providers in the K–12 assessment space—which should be worrying for school and district administrators.

Understanding an assessment’s accuracy and reliability

This lack of transparency became painfully apparent recently when I received a rather deep question. A district was deciding between Star Assessments and an assessment from another provider. The district wanted to know how domain-level scores are calculated in Star and how this compared to the other provider’s approach. The provider was claiming—without evidence—that its approach would deliver more accurate information for instructional planning.

At first, I was perplexed. Very little information about the dynamics of this differing approach had been provided to the district, and online searches for the assessment’s technical manual were fruitless.

Without a technical manual, trying to understand how an assessment was designed and how it performs is nearly impossible. It’s like asking a structural engineer to evaluate a building’s stability without seeing the blueprints.

For days, I searched for any documentation, and then a colleague struck pay dirt. She found the provider’s technical manual buried in the appendix of an RFP—although the manual’s cover had been modified and its title changed, presumably to protect it from scrutiny or from being discovered through online searches.

Armed with the technical manual, I could finally address the question and verify that the competitor’s approach to domain score calculations was, indeed, different from most of us in the industry.

And, more importantly, that the approach produced far less reliable results.

NCII: Offering rigorous, unbiased ratings of K–12 assessments

Granted, few people must regularly go as deep into such questions as I do. For most educators, the inner workings of their assessment tools can be something of a “black box.” And they’re often fine with this, so long as someone has assured them that the scores are reliable and valid through an objective review.

This is where the ratings provided by the National Center on Intensive Intervention (NCII) are so valuable.

As its website explains, the NCII is housed at the American Institutes for Research and is funded by the US Department of Education’s Office of Special Education Programs. The NCII’s mission is “to build knowledge and capacity of state and local leaders, faculty and professional development providers, educators, and other stakeholders to support implementation of intensive intervention” within a multi-tiered system of support (MTSS) or response to intervention (RTI) framework.

In fact, educators can think of the NCII as a sort of Consumer Reports for assessments and tools related to MTSS and RTI. Of particular relevance to this conversation are the ratings shown on their Academic Screening Tools Chart and Academic Progress Monitoring Tools Chart. These charts can quickly provide school and district leaders with objective evaluations of assessment tools.

Transparency in assessment

Learn more about Star Assessments or FastBridge for pre-K–12 learners.

Talk to an expert

6 tips for making the most of the NCII assessment ratings

If you haven’t used the NCII charts before, I’d suggest keeping the following tips in mind.

#1: Explore all of the tabs

For each tools chart, there are three tabs of information listed across the top:

For screening tools, the tabs are labeled Classification Accuracy, Technical Standards, and Usability features.
For progress monitoring tools, the tabs are labeled Performance Level Standards, Growth Standards, and Usability.

The first tab in each chart is selected by default, so be sure to also click the second and third tabs to see more information about each assessment.

#2: Click on terms you want to learn more about

We’re so used to seeing hyperlinks that we often take them for granted. But as you’re exploring the charts, I encourage you to click on terms and read the NCII’s descriptions.

All of the column headers are clickable, for example, so you can see how the NCII is determining reliability, validity, seasonal classification accuracy, etc. You can also click on each assessment’s name to access detailed information about it.

#3: Look across both tools charts

While screening and progress monitoring abilities are evaluated separately, both are essential features of MTSS. Some assessment tools can perform one task but not the other, which is the case with one of the top 3 assessment tools on the market. Surprisingly, it includes no progress monitoring component.

You have a choice. You can contract with one vendor for screening and another vendor for progress monitoring, and then require your teachers to run and compare reports from different assessments. Or you can choose one tool that is capable of doing both.

#4: Consider all grade levels

Initially, the NCII did one overall rating for each assessment. In an effort to winnow out underperforming assessments, they have continually raised the criteria. Now, ratings are reported for each tool for each grade.

The high school space, in particular, is one where there are very few rated tools, especially for progress monitoring. Again, you have the choice between:

Purchasing one tool for grades K–8 and another for 9–12; or
Selecting one of the few tools with ratings across multiple grade spans, thereby creating consistencies and efficiencies in your district.

#5: Use the “compare” feature

The NCII’s tools charts are long. This is because each tool is rated for each grade and, in the case of curriculum-based measures (CBMs), each probe is rated individually.

You can narrow things down for easier comparisons by clicking the box next to each assessment you’re considering and then clicking the “Compare Tools” button. This will filter out everything else and make direct comparisons much easier.

#6: Check back several times each year

The NCII allows vendors, as they improve or expand their tools or simply collect the necessary data, to regularly re-submit their assessments for review. In general, ratings change once or at most twice each year.

NCII assessment ratings: Key questions to ask

As you’re exploring the charts, you may find yourself wondering what specifically to look for. Here are my suggestions:

Reliability and validity: These are the essential tenets of assessment, and you should avoid tools with anything less than a “Partially convincing evidence” rating (a half circle) in reference to either of these criteria.
Norming: In terms of “sample representativeness,” which is referencing the norming population, look for National or Regional, preferably “with cross-validation.” Sample sizes that are only Local in scope may not have involved enough students to produce robust and fully generalizable norms.
Bias review: Having a bias review is imperative to ensure that the tool serves all student groups.
Screening and progress monitoring capabilities: Look for tools that appear on both charts. Also, a case can be made that progress monitoring is the true “edge of performance” for an assessment, given that this function is asking the tool to detect small increments of growth across relatively short periods of time.

Tools with high ratings for progress monitoring must be reliable and precise. It’s quite ironic that one of the most widely used interim assessment tools—one that touts its superiority and precision—contains no progress monitoring ability.

What not to look for in the NCII assessment ratings

As you review the charts, I’d suggest not expecting to find perfection.

As noted earlier, the NCII has continually raised the bar for assessments to meet. In some instances, they’ve added criteria requiring data that assessment providers have not historically collected, so it takes time to collect that data and submit it for review.

As a result, you may see fields where the reviews note “Data unavailable” (represented by a dash) in relation to a specific criterion. When you see this reflected in the ratings of a given assessment, be sure to also look at that field for other tools. Quite often, you’ll find that a low or open rating for some of even the best tools is for a criterion where virtually no tools are rated.

Recognizing “red flags” in the NCII assessment ratings

Having said this, I need to acknowledge that it’s alarming to find tools in the NCII ratings with “Data unavailable” or “Unconvincing evidence” in relation to their reliability and validity.

Also, a shocking number of tools have not gone through a sufficient bias analysis to ensure that their items are not biased for or against particular demographic groups, and some tools have a sample representativeness (norming population) that is only Local in scale.

These are things those vendors are not going to point out to you, but that you should clearly know.

Transparency in assessment: Seeing inside the black box

In essence, the NCII ratings allow us to peer into the black boxes that many assessments can be to educators who are laypeople in the world of psychometrics. In most cases, consulting these reviews can quickly provide the insight you need to be certain you have chosen a good tool.

Once, an educator contacted me explaining that her district used one of our Star assessments as well as a competitor’s assessment, and the results were not lining up. She wasn’t sure which one to trust. When I took her to the NCII ratings, she saw “Convincing evidence” ratings for her Star assessment on nearly every criterion. For the other tool, there were many instances of “Data unavailable,” “Unconvincing evidence,” or “Partially convincing evidence.”

Once we reviewed the ratings, I asked her if she wanted to take a deeper look at specific Star reports. Her answer: “No, you’ve answered my question. When these tools are reporting conflicting information, I’m going with what Star says.”

While the NCII ratings can answer many questions, they cannot answer all of them. As I noted, they allow us to peer into the black boxes, but sometimes you also need the technical manuals, which show us the insides of the boxes.

If you’re not currently using Star Assessments or FastBridge, ask yourself whether your assessment provider posts their technical manuals publicly. Renaissance’s two largest competitors currently don’t. And I think this should change.

Assessment Masterclass: Assess your assessment

If you haven’t already, explore my new Assessment Masterclass video series, where I take a closer look at key concepts in K‒12 assessment and address some common misconceptions:

Session 1: The Perfect Screening Tool, which explores the true connection between an assessment’s length and its reliability.
Session 2: Learning Progressions, which discusses the critical alignment between an assessment and a state’s learning standards.
Session 3: Focus Skills and Trip Steps, which shows you how to focus instruction and practice on the most essential skills at each grade level.

Learn more

Connect with an expert to discover how Star Assessments and FastBridge will help you improve learning outcomes.

Request a demo

Assessment buyer beware: Is your assessment a “black box”?