This document (V20161020) provides guidelines to review artifacts.
It gradually evolves to define common evaluation criteria based
on our past Artifact Evaluations
and your feedback (see this presentation
with an outcome of the past PPoPP/CGO'15 AE).
After artifact submission deadline specific to a given
event, AE reviewers will bid on artifacts they would
like to review based on artifact abstract and
check-list, their competencies, and access to specific
hardware and software, while trying to avoid
any conflict of interest.
Within a few days, AE chairs will
make a final reviewer selection to ensure at least
two reviewers per artifact (we strongly suggest three reviewers
or even more).
Reviewers will then have approximately two weeks to evaluate artifacts
and provide a report using an AE template via dedicated submission website
During rebuttal (technical claritication phase),
authors will be able to address raised issues
and respond to reviewers.
Finally, reviewers will check if raised issues have been fixed
and will provide the final report.
Based on all reviewers, AE chairs will make the following final
assessment of the submitted artifact:
+2) significantly exceeded expectations
+1) exceeded expectations
0) met expectations
-1) fell below expectations
-2) significantly fell below expectations
where "met expectations" score or above means that a reviewer
managed to evaluate a given artifact possibly with minor problems
that a reviewer still managed to solve without authors' assistance.
Such artifact passes evaluation and receives a stamp of approval.
Note that our goal is not to fail problematic artifacts
but to promote reproducible research via artifact validation and sharing.
Therefore, we allow light communication between reviewers and authors
whenever there are installation/usage problems.
In such cases, AE chairs serve as a proxy to avoid
revealing reviewers' identity.
Reviewers will need to thoroughly go through authors' guide step-by-step
to evaluate a given artifact and then describe their experience at each stage
(success or failure, encountered problems and how they were possibly solved,
and questions or suggestions to the authors), and then give a score
on scale -2 .. +2.
Enough to understand and evaluate artifact?
Enough to install and use artifact?
Enough to validate artifact?
Any unexpected artifact behavior (depends
on the type of artifact such as unexpected output,
scalability issues, crashes, performance variation, etc)?
Relevance to paper
How well submitted artifact supports work described in a paper?
Customization and reusability
Optional and should not be used for overall assessment - mainly used to select distinguished artifact.
We encourage reviewers to check whether a given
artifact can be easily reused and customized. For
example, can it be used in different environment,
with different parameters, under different conditions,
or when using different and possibly larger data set
(particularly useful to validate whether machine learning based
techniques are meaningful). Note that we also give
a prize to highest-ranked reusable artifacts
and customizable workflows with a unified JSON API
and meta information implemented using
Provide explanation of your score and what to improve during rebuttal.
To help readers understand which submission/reviewing methodology was used in papers
with evaluated artifacts we keep track of all past versions: