[ Back to the ACM/IEEE MICRO 2023 Artifact Evaluation website ]

Reviewing Guidelines

This document provides the reviewer guidelines to evaluate artifacts for MICRO 2023.

Reviewing process

Shortly after the artifact submission deadline, the AE committee members will bid on artifacts they would like to review based on their competencies and the information provided in the artifact abstract such as software and hardware dependencies while avoiding possible conflicts of interest. Within a few days, AE chairs will make the final selection of reviewers to ensure at least two or more reviewers per artifact.

Reviewers will then have approximately 1 week to perform an initial 'smoke-test' to check that the artifact runs OK. Once the artifact passes this first test, reviewers will continue with the full evaluation. Reviewers are strongly encouraged to communicate with the authors about encountered issues immediately (and anonymously) via the HotCRP submission website to give the authors time to resolve all problems! Note that our philosophy of artifact evaluation is not to fail problematic artifacts but to help the authors improve their artifacts (at least publicly available ones) and pass the evaluation!

In the end, AE chairs will decide on a set of the standard ACM reproducibility badges (see below) to award to a given artifact based on all reviews as well as the authors' responses. Such badges can be printed on the first page of the paper and can be made available as meta information in some Digital Libraries such as the ACM DL.

Authors and reviewers are encouraged to check the AE FAQ, the dedicated AE google group and the Discord server (MLCommons Task Force on Automation and Reproducibility) in case of questions or suggestions.

Milestone 1: Kick-the-Tires

The goal of this initial phase is to ensure that the artifact works as expected, e.g. no important files are missing and the artifact can be installed and run as expected. At this stage, reviewers are expected to read the paper and the Artifact Appendix, install the artifact and check that they can answer the following questions:

What are the main contributions of the paper and how are these supported by the artifact?
What are the specific experimental claims of the paper (e.g. figures, tables) that the artifact should support?
Are you able to install and test the artifact as described in the documentation?
Do you know how to reproduce each experimental claim (from 2nd question) using the artifact?

On HotCRP, the reviewers will then add a brief note of success or ask clarifying questions. The authors will have a chance to fix bugs or provide further clarifications.

Milestone 2: Full Evaluation

After the initial smoke test, reviewers will proceed with the full artifact evaluation and write a complete review. The review should address the following questions:

Q1: Are all artifacts related to this paper publicly available?
The link can be provided up to two days after the full reviews are due.
Q2: Is the artifact complete, i.e. are all components relevant to evaluation included?

Note that proprietary artifacts need not be included. If they are required to exercise the package then this should be documented, along with instructions on how to obtain them. Proxies for proprietary data should be included so as to demonstrate the analysis.

Q3: Is the artifact well documented, i.e. enough to understand, install and evaluate artifact?
Q4: Is the artifact exercisable, i.e. it includes scripts and/or software to perform appropriate experiments and generate results?
Q5: Is the artifact consistent, i.e. is it relevant to the associated paper and contributes in some inherent way to the generation of its main results?
Q6: Was it possible to validate the key results from the paper using provided artifacts?
Report any unexpected artifact behavior (depends on the type of artifact such as unexpected output, scalability issues, crashes, performance variation, etc).

Where a question is not applicable to a particular artifact, it should be disregarded and the artifact evaluated as appropriate. Once reviews are submitted, they will be made available to the other reviewers in order to have a discussion towards a decision. Reviews can be revised based on the discussions or feedback from the authors.

Badge recommendation

The reviews should further include a specific recommendation for which badge(s) to award:

	The author-created artifacts relevant to this paper will receive an ACM Artifact Available badge only if they have been placed on a publicly accessible archival repository such as Zenodo, FigShare, and Dryad. A DOI will be then assigned to their artifacts and must be provided in the Artifact Appendix! ACM does not mandate the use of above repositories. However, publisher repositories, institutional repositories, or open commercial repositories are acceptable only if they have a declared plan to enable permanent accessibility! Personal web pages, GitHub, GitLab and BitBucket are not acceptable for this purpose. Artifacts do not need to have been formally evaluated in order for an article to receive this badge. In addition, they need not be complete in the sense described above. They simply need to be relevant to the study and add value beyond the text in the article. Such artifacts could be something as simple as the data from which the figures are drawn, or as complex as a complete software system under study. The authors can provide the DOI at the very end of the AE process and use GitHub or any other convenient way to access their artifacts during AE.
	The artifacts associated with the paper will receive an Artifacts Evaluated - Functional badge only if they are found to be documented, consistent, complete, exercisable, and include appropriate evidence of verification and validation. We usually ask the authors to provide a small/sample data set to validate at least some results from the paper to make sure that their artifact is functional.
	The artifacts associated with the paper will receive a Results Reproduced badge only if the main results of the paper have been obtained in a subsequent study by a person or team other than the authors, using, in part, artifacts provided by the author. Note that since it may take weeks and even months to rerun some complex experiments such as deep learning model training, it may not be possible to award this badge to every artifact.
	[Pilot project] To help digest the criteria for an "Artifacts Evaluated – Reusable" badge, we have partnered with MLCommons to add their unified automation interface (MLCommons CM) to the shared artifacts. We opine that MLCommons CM captures the core tenets of ACM "Artifacts Evaluated – Reusable" badge. Hence, we have added it as a possible criteria to obtain the ACM "Artifacts Evaluated – Reusable" badge. The authors will get free help from MLCommons and the community via the public Discord server or can try to add the MLCommons CM interface to their artifacts themselves using this tutorial.