This document (V20151015) provides guidelines to review artifacts.
It gradually evolves to define common evaluation criteria based
on our past Artifact Evaluations
and your feedback (see this presentation
with an outcome of the past PPoPP/CGO'15 AE).
After artifact submission deadline specific to a given
event, AE reviewers will bid on artifacts they would
like to review based on artifact abstract and
check-list, their competencies, and access to specific
hardware and software, while trying to avoid
any conflict of interest.
Within a few days, AE chairs will
make a final reviewer selection to ensure at least 2
reviewers per artifact.
Reviewers will then have
approximately two weeks to evaluate artifacts
and provide a report using this template.
During rebuttal, authors will be able to address raised issues
and respond to the reviewers.
Finally, reviewers will check if raised issues have been fixed
and will provide the final report.
Based on all reviewers, AE chairs will make the following final
assessment of the submitted artifact:
+2) significantly exceeded expectations
+1) exceeded expectations
0) met expectations
-1) fell below expectations
-2) significantly fell below expectations
where "met expectations" score or above means that reivewer
managed to evaluate a given artifact possibly with minor problems
that reviewer still managed to solve without authors' assistance.
Such artifact passes evaluation and receives a stamp of approval.
Note that our goal is not to fail problematic artifacts
but to promote reproducible research via artifact validation and sharing.
Therefore, we allow light communication between reviewers and authors
whenever there are installation/usage problems.
In such cases, AE chairs serve as a proxy to avoid
revealing reviewers' identity.
The reviewers will need to thoroughly go through authors' guide step-by-step
to evaluate a given artifact and then describe their experience at each stage
(success or failure, encountered problems and how they were possibly solved,
and questions or suggestions to the authors), and then give a score
on scale -2 .. +2.
Enough to understand and evaluate artifact?
Enough to install and use artifact?
Enough to validate artifact?
Any unexpected artifact behavior (depends
on the type of artifact such as unexpected output,
scalability issues, crashes, performance variation, etc)?
Relevance to paper
How well submitted artifact supports work described in a paper?
Customization and reusability
Optional and should not be used for overall assessment - mainly used to select distinguished artifact.
We encourage reviewers to check know whether a given
artifact can be easily reused and customized. For
example, can it be used in different environment,
with different parameters, under different conditions,
or when using different and possibly larger data set
(particularly useful to validate whether machine learning based
techniques are meaningful).
Provide explanation of your score and what to improve during rebuttal.
To help readers understand which submission/reviewing methodology was used in papers
with evaluated artifacts we keep track of all past versions: