Tuesday, October 14, 2014

Call for Papers, Special Issue of Information Retrieval (Springer)

Information Retrieval Evaluation Using Test Collections

Important Dates
Initial submissions due:  30 April 2015
Initial reviewer feedback: 18 June 2015
Revised submissions due: 23 July 2015
Final decisions: 27 August 2015
Information retrieval has a strong history of experimental evaluation.  Test collections -- consisting of a set of queries, a collection of documents to be searched, and relevance judgments indicating which documents are relevant for which queries -- are perhaps the most widely used tool for evaluating the effectiveness of search systems.

Based on pioneering work carried out by Cyril Cleverdon and colleagues at Cranfield University in the 1960s, the popularity of test collections has flourished in large part thanks to evaluation campaigns such as the Text Retrieval Conference (TREC), the Cross-Language Evaluation Forum (CLEF), the NII Testbeds and Community for Information Access Research project (NTCIR), and the Forum for Information Retrieval Evaluation (FIRE).

 Test collections have played a vital role in providing a basis for the measurement and comparison of the effectiveness of different information retrieval algorithms and techniques. However, test collections also present a number of issues, from being expensive and complex to construct, to instantiating a particular abstraction of the retrieval process.

 Topics of interest for this special issue include but are not limited to:
Approaches for constructing new test collections

- choosing representative topics and documents

- minimizing effort

 Test collection stability

- number of topics and documents required

- completeness of judgments (pooling, stratified sampling, ...)

 Evaluation measures

- choosing measures

- relationship with higher-level search tasks and user goals

- relationship with collection features (assumptions regarding incomplete judgments, ...)

 Relevance judgments

- approaches for gathering judgments (crowd-sourcing, dedicated judges, interfaces and support systems for relevance assessments, ...) 

- types of judgments (single or multiple assessments, binary or multi-level, ...)

- human factors (topic creators versus assigned topics, assessor consistency, instructions to assessors, expertise, potential biases, ...)

Test collections as representations of the search process

- assumptions about search behaviour

- user simulation

Relationship between evaluation using test collections and other approaches

- test collections in comparison with user studies and log analyses

Reflections on existing and past initiatives

 Special Issue Editors

Ben Carterette, University of Delaware

Diane Kelly, University of North Carolina

Falk Scholer, RMIT University



No comments: