NewsTrust Review Tool — Test Results

To evaluate the feasibility of its proposed news rating service, the NewsTrust team designed and tested a prototype of a news review tool in May 2005. Our test results and findings for this research project are included below.

Research Goals

The purpose of our online review tool is to enable citizen reviewers to evaluate the quality of a news story by answering a series of questions about its support of widely accepted journalistic principles such as accuracy, fairness, integrity, etc. Besides designing and testing this news review tool, the goals for this research were to answer the following questions:

  • Can citizen reviewers rate the quality of news content as reliably as practicing journalists?
  • How useful is the public feedback collected by this tool to practicing journalists? to the public?
  • How, if at all, is agreement among the groups affected by the quality of the story?

Prototype

The first prototype of our NewsTrust review tool is presented within a web page and consists of two panels: the left panel features a survey wizard; the right panel shows the full text of the story that is being reviewed. The left survey panel prompts reviewers through a series of questions about the story under review, with two to four questions per page, in a multiple-page wizard format. Most of the questions have multiple-choice answers about how well the story supported key journalistic principles, on a scale of 1 to 5 (radio buttons). Other questions invite reviewers to select multiple options (check boxes), or to write comments in a text box.

Story review questions were designed to help reviewers evaluate how well the story under review supports important principles of journalism, based on codes of ethics and editorial guidelines we collected from respected news organizations from around the world (ncluding: BBC, Gannett Newspapers, International Federation of Journalists, New York Times, Poynter Institute, Radio-Television News Directors Association, Washington Post, Hutchins Commission Report, Project for Excellence in Journalism, etc.).

We typically asked at least one question for each of these selected principles: Accuracy, Credibility, Fairness, Impartiality, Importance and originality.

Key questions asked during each story review included:

Quality Questions

  • How accurate is this story?
  • How well does this story support its main points with factual evidence?
  • How well does this story identify its sources?
  • How impartial is this story?
  • How well does the journalist keep his/her personal opinion out of the story?
  • How well does this story represent all important viewpoints on this topic?
  • How well does the journalist seek out diverse sources?
  • How important is the topic of this story?
  • How well does this story inform you about this topic?
  • How much new information did you get from this story?
  • How would you rate the overall quality of this story?

Note: the above questions are on a on a 1-5 scale, and are referred to below as "performance" questions

Research "Counting" Questions

  • How many unique sources are cited in this story?
  • How many of those are unnamed sources?
  • Does this story favor one political viewpoint over another? If so, which viewpoint? +
  • How many times does the journalist present opinions as facts?
  • How many times does the journalist use derogatory or complimentary words?
  • How many viewpoints are presented in this story?
  • How many interested groups or stakeholders are mentioned in this story?
  • How many of those interested groups were consulted for this story?

Note: the question marked with a plus sign (+) is qualitative, not quantitative.

Story Selection

Our editors selected two recent newspaper stories with different quality levels (one average, one high-quality), in different categories (one breaking news story, one news analysis). We then created two lower-quality versions of the average and high-quality stories by introducing errors across many questions asked by our story review tool, as well as political bias favoring either conservative or liberal viewpoints.

We ended up with four stories, which varied greatly in quality and political bias, as shown below:

Story # Survey Actual Story Description
1 Survey 1 Refinery Story (degraded - low) low quality, conservative bias
2 Survey 2 Filibuster Story (degraded - average) average quality, liberal bias
3 Survey 3 Refinery Story (original - average) average quality, original breaking news story
4 Survey 4 Filibuster Story (original - high) high quality, original news analysis

Note that links above go to individual online surveys and stories for each of our selections.

Methodology

We recruited both practicing journalists as well as volunteer reviewers to test our review tool. Email invitations to participate in our online test were sent in early May 2005 to about 200 practicing journalists and 4,000 citizen reviewers. Journalists had extensive track records in print, broadcast and online news, with at least 5-years experience as news journalists. Citizen reviewers had participated in previous surveys and signed up to volunteer, or participate in new surveys.

We divided respondents into four different groups, one for each selected story. On average, each story was reviewed by about 14 journalists and up to 180 citizen reviewers. We randomly assigned one of our four stories to each respondent within a larger group of reviewers. 

We collected complete responses from 935 reviewers by May 22, 2005:

  • 54 Practicing Journalists (5+ years of news experience)
  • 881 Citizen Reviewers (volunteers with little or no news experience)

Data Analysis

At the end of the test period, we analyzed responses to key questions for each story.

We averaged ratings for each question for each group, and compared them to each other, to determine how close the volunteer group came to the practicing journalist group benchmark for each story version. We then calculated the difference between averages of journalists and volunteer groups for each question and each version class (high and low-quality versions).

We also looked at the standard deviation for each question, to determine which of our proposed questions appeared to offer the closest match between volunteer ratings and journalist ratings for each story and version class. We compared those results with how our respondents rated the effectiveness of each question, to determine a composite score for all questions.

Findings

Here are our findings, based on this research data. A full set of graphs is available upon request.

Overall quality of story - Journalists vs. Citizens

Citizens rated the quality of news content with about the same reliability as practicing journalists.
The average scores for journalists and non-journalists are remarkably close in their review of each story. Scores could be as far as 4 points apart. The average difference between the groups was 0.2 on 1-5 scale, or 6% of the maximum possible difference. The largest difference in overall quality assessment between the two groups was 0.5, or about 12.5% of the maximum possible difference.

Reviewers using the review tool discriminated reliably between high-quality stories and low-quality stories.
Quality scores generally matched our expectations. Stories known to be high quality tend to get higher ratings (3.0 to 3.8), compared to stories known to be low quality (2.3). The average rating difference between a high-quality and a low-quality story is about 0.5. Within each group, ratings varied widely for each story, with an average standard deviation of 52% of the maximum (that deviation was about the same for journalists and citizens). However, average ratings from all reviewers generally matched quality differences between stories. Note that standard deviation may be reduced by adjusting the number of reviewers within each group for maximum reliability.

Overall quality of story - by political viewpoint

• Political viewpoints of reviewers have minor impact on average group ratings.
Differences in ratings by political orientation of the reviewer occurred in some cases (up to 0.8 maximum difference, or 20% difference at peak), but their overall impact on average group ratings was minor. Given human nature, political viewpoints are likely to influence citizen ratings from time to time. To address that issue, the service can strive to insure that representative panels of reviewers from different viewpoints be invited to review any given story. NewsTrust can also make those viewpoints transparent to the public by including for each story a graph like the one above, alongside with the average rating from all reviewers. Lastly, the service might offer a filtering tool allowing readers to correct for results that may be affected by the reviewers' political viewpoints.

Most respondents found the review tool effective in evaluating news quality.
This review tool was generally found helpful by most reviewers in evaluating the quality of news stories (average 3.7 on 1-5 scale).

Respondents were somewhat confident in this volunteer review process.
Confidence in the reliability of this volunteer review process (3.5 average across all respondents) was not as high as effectiveness of the tool or usefulness of its feedback. This confidence level may increase over time, as respondents get a chance to verify the reliability of the service.

Usefulness of feedback from tool

Feedback from this tool is considered useful to journalists and to the public.
Most respondents thought the feedback from this review tool would be useful to journalists (4.0), as well as to the public (4.1).

Most questions selected for the tool seemed generally effective.
Besides that qualitative feedback, there were no significant variations between the overall scores for all our questions (3% deviation between average scores). This overall score for each of the 20 questions was calculated based on these 3 criteria:
- Difference between average ratings of journalist and citizen groups for each question.
- Standard Deviation (average distance from the mean) of ratings for each question.
- Perceived Effectiveness of each question (% of all respondents who found it useful).
That final score is a weighted average that shows overall value of each question as a percentage.
Based on that score, NewsTrust might want to consider the 10 top-rated questions below for the next version of the tool ("counting" questions were excluded, because they take longer to answer):

Top 10 Questions Score
How much new information did you get from this story? 88%
How important is the topic of this story? 86%
How would you rate the overall quality of this story? 85%
How well does this story support its main points with factual evidence? 83%
How well does this story inform you about this topic? 82%
How impartial is this story? 82%
How well does this story identify its sources? 81%
How well does the journalist seek out diverse sources? 81%
How well does the journalist keep his/her personal opinion out of the story? 80%
How well does this story represent all important viewpoints on this topic? 79%

Each reviewer's overall rating matches the average of all their answers.
An automatic calculation of overall story quality seems to match closely the subjective overall ratings from reviewers (3% average difference). This automatic rating is the average of the 10 questions that measure performance, shown above (e.g.: "how impartial is this story?").

Next Steps

Overall, these tests results are encouraging.

Our findings suggest that citizen reviewers can rate the quality of a news story as reliably as experienced journalists, and that our review tool is an effective evaluation method and feedback mechanism for identifying quality journalism.

We are aware that more tests are needed to determine the general public’s response. Larger test samples will help estimate the likely reliability of citizen ratings with a broader population, as well as determine whether results change significantly based on the make-up of the reviewers.

We plan to conduct our next round of tests in fall 2005. If you are a professional researcher and would like to contribute to the analysis of this or future tests, or if you are interested in participating as a test reviewer, please .

Acknowledgments

The following individuals contributed to the design of the NewsTrust review tool, as well as in the planning of this research and the analysis of its results:

Authors:
Fabrice Florin – Executive Director, NewsTrust
David Fox – Technology Director, NewsTrust
John McManus – Grade the News Director

Advisors:
Wes Boyd – MoveOn.org
Krista Bradford – Bradford Executive Research
Robert Cox – Media Bloggers Association
Bill Densmore – Media Giraffe
Kelly Garrett – University of Michigan
Dan Gillmor – Grassroots Media
Bill Mitchell – Poynter Institute
Rory O'Connor – Media Channels
Howard Rheingold – Author of Smart Mobs
Mark Tapscott – Heritage Foundation

For more info, .

UPDATED 09/08/05