This document (V20151015) provides guidelines to review artifacts. It gradually evolves to define common evaluation criteria based on our past Artifact Evaluations and your feedback (see this presentation with an outcome of the past PPoPP/CGO'15 AE).

Reviewing process

After artifact submission deadline specific to a given event, AE reviewers will bid on artifacts they would like to review based on artifact abstract and check-list, their competencies, and access to specific hardware and software, while trying to avoid any conflict of interest. Within a few days, AE chairs will make a final reviewer selection to ensure at least 2 reviewers per artifact. Reviewers will then have approximately two weeks to evaluate artifacts and provide a report using this template.

During rebuttal, authors will be able to address raised issues and respond to the reviewers. Finally, reviewers will check if raised issues have been fixed and will provide the final report. Based on all reviewers, AE chairs will make the following final assessment of the submitted artifact:

where "met expectations" score or above means that reivewer managed to evaluate a given artifact possibly with minor problems that reviewer still managed to solve without authors' assistance. Such artifact passes evaluation and receives a stamp of approval.

Note that our goal is not to fail problematic artifacts but to promote reproducible research via artifact validation and sharing. Therefore, we allow light communication between reviewers and authors whenever there are installation/usage problems. In such cases, AE chairs serve as a proxy to avoid revealing reviewers' identity.

Artifact evaluation

The reviewers will need to thoroughly go through authors' guide step-by-step to evaluate a given artifact and then describe their experience at each stage (success or failure, encountered problems and how they were possibly solved, and questions or suggestions to the authors), and then give a score on scale -2 .. +2.

Criteria Score
Documentation Enough to understand and evaluate artifact?
Packaging Nothing missing?
Installation procedure Enough to install and use artifact?
Use case Enough to validate artifact?
Expected behavior Any unexpected artifact behavior (depends on the type of artifact such as unexpected output, scalability issues, crashes, performance variation, etc)?
Relevance to paper How well submitted artifact supports work described in a paper?
Customization and reusability Optional and should not be used for overall assessment - mainly used to select distinguished artifact. We encourage reviewers to check know whether a given artifact can be easily reused and customized. For example, can it be used in different environment, with different parameters, under different conditions, or when using different and possibly larger data set (particularly useful to validate whether machine learning based techniques are meaningful).
Overall score Provide explanation of your score and what to improve during rebuttal.

Methodology archive

To help readers understand which submission/reviewing methodology was used in papers with evaluated artifacts we keep track of all past versions: