Artifact Evaluation for Computer Systems' Research
We work with the community and ACM to improve methodology and tools for reproducible experimentation, artifact submission / reviewing and open challenges!
Home Artifacts Joint Committee Submission Guide Reviewing Guide FAQ Prior AE

This document (V20161020) provides guidelines to review artifacts. It gradually evolves to define common evaluation criteria based on our past Artifact Evaluations and your feedback (see this presentation with an outcome of the past PPoPP/CGO'15 AE).

Reviewing process

After artifact submission deadline specific to a given event, AE reviewers will bid on artifacts they would like to review based on artifact abstract and check-list, their competencies, and access to specific hardware and software, while trying to avoid any conflict of interest. Within a few days, AE chairs will make a final reviewer selection to ensure at least two reviewers per artifact (we strongly suggest three reviewers or even more). Reviewers will then have approximately two weeks to evaluate artifacts and provide a report using an AE template via dedicated submission website (see example).

During rebuttal (technical claritication phase), authors will be able to address raised issues and respond to reviewers. Finally, reviewers will check if raised issues have been fixed and will provide the final report. Based on all reviewers, AE chairs will make the following final assessment of the submitted artifact:

where "met expectations" score or above means that a reviewer managed to evaluate a given artifact possibly with minor problems that a reviewer still managed to solve without authors' assistance. Such artifact passes evaluation and receives a stamp of approval.

Note that our goal is not to fail problematic artifacts but to promote reproducible research via artifact validation and sharing. Therefore, we allow light communication between reviewers and authors whenever there are installation/usage problems. In such cases, AE chairs serve as a proxy to avoid revealing reviewers' identity.

Artifact evaluation

Reviewers will need to thoroughly go through authors' guide step-by-step to evaluate a given artifact and then describe their experience at each stage (success or failure, encountered problems and how they were possibly solved, and questions or suggestions to the authors), and then give a score on scale -2 .. +2.

Criteria Score
Documentation Enough to understand and evaluate artifact?
Packaging Nothing missing?
Installation procedure Enough to install and use artifact?
Use case Enough to validate artifact?
Expected behavior Any unexpected artifact behavior (depends on the type of artifact such as unexpected output, scalability issues, crashes, performance variation, etc)?
Relevance to paper How well submitted artifact supports work described in a paper?
Customization and reusability Optional and should not be used for overall assessment - mainly used to select distinguished artifact. We encourage reviewers to check whether a given artifact can be easily reused and customized. For example, can it be used in different environment, with different parameters, under different conditions, or when using different and possibly larger data set (particularly useful to validate whether machine learning based techniques are meaningful). Note that we also give a prize to highest-ranked reusable artifacts and customizable workflows with a unified JSON API and meta information implemented using CK framework.
Overall score Provide explanation of your score and what to improve during rebuttal.

Methodology archive

To help readers understand which submission/reviewing methodology was used in papers with evaluated artifacts we keep track of all past versions:
Maintained by
cTuning foundation (non-profit R&D organization)
and volunteers!
          
Powered by Collective Knowledge
                     
  
  
  
           Locations of visitors to this page