Artifact Evaluation for Computer Systems' Research
We work with the community and ACM to improve methodology and tools for reproducible experimentation, artifact submission / reviewing and open challenges!
Home Artifacts Joint Committee Submission Guide Reviewing Guide FAQ Prior AE

Frequently Asked Questions

This page continuously evolves and can be modified directly at GitHub!

Should my software artifacts be open-source?

No, it is not strictly necessary and you can provide your software artifact as a binary. However, in case of problems, reviewers will not be able to fix it and will likely give you a negative score.

Is Artifact evaluation blind or double-blind?

AE is a single-blind process, i.e. authors' names are known to the evaluators (there is no need to hide them since papers are accepted), but names of evaluators are not known to the authors. AE chair is usually used as a proxy between the authors and the evaluators in case of questions and problems.

In the future, we would like to move to a fully open, community-driven evaluation which was successfully validated at ADAPT'16 - your comments and ideas are welcome!

How to pack artifacts?

We do not have strict requirements at this stage. You can pack your artifacts simply in a tar ball, zip file, Virtual Machine or Docker image. You can also share artifacts via public services such as GitHub or BitBucket. Please see our submission guide for more details.

However, from our past Artifact Evaluation, the most challenging part is to automate and customize experimental workflows. It is even worse, if you need to validate experiments using latest software environment and hardware (rather than quickly outdated VM and Docker images). Currently, some ad-hoc scripts are used to implement such workflows. They are difficult to change and customize, particularly when an evaluator would like to try other compilers, libraries and data sets.

Therefore, we decided to develop Collective Knowledge Framework (CK) - a small, portable and customizable framework to help researchers share their artifacts as reusable Python components with a unified JSON API. This approach should help researchers quickly prototype experimental workflows (such as multi-objective autotuning) from such components while automatically detecting and resolving all required software or hardware dependencies. CK is also intended to reduce evaluators' burden by unifying statistical analysis and predictive analytics (via scikit-learn, R, DNN), and enabling interactive reports. Please see examples of a live repository, interactive article and PLDI'15 CLSmith artifact shared in CK format. Feel free to contact us, if you would like to use it but need some help to convert your artifacts into CK format.

Is it possible to provide a remote access to a machine with pre-installed artifacts?

Only in exceptional cases, i.e. when rare hardware or proprietary software is required, or VM image is too large, or when you are not authorized to move artifacts outside your organization. In such case, you will need to send an access information to the AE chairs via private email or SMS. They will then pass this information to the evaluators.

Can I share commercial benchmarks or software with evaluators?

Please check the license of your benchmarks, data sets and software. In case of any doubts, try to find a free alternative. Note, that we have a preliminary agreement with the EEMBC consortium to let authors share their EEMBC benchmarks with the evaluators for Artifact Evaluation purposes.

Should I make my artifacts customizable? How can I plug in benchmarks and datasets from others?

It is encouraged but not strictly necessary. For example, you can check how it's done in this artifact (distinguished award winner) from CGO'17 using an open-source Collective Knowledge framework (CK). This framework allows you to assemble experimental workflows from a growing number of artifacts shared in a customizable and reusable CK format with a simple JSON API and meta information. You can also share your own artifacts (benchmarks, data sets, models, tools) in the CK format.

Do I have to make my artifacts public if they pass evaluation?

You are not obliged to make your artifacts public (particularly in case of commercial artifacts). Nevertheless, we encourage you to make your artifacts publicly available upon publication of the proceedings (for example, by including them as "source materials" in the Digital Library) as a part of our vision for collaborative and reproducible computer engineering.

Furthermore, if you have your artifacts already publicly available at the time of submission, you may profit from the "public review" option, where you are engaged directly with the community to discuss, evaluate and use your software. See such examples here (search for "example of public evaluation).

How to report and compare empirical results?

You should undoubtedly run empirical experiments more than once! There is no universal recipe how many times you should repeat your empirical experiment since it heavily depends on the type of your experiments, machine and environment.

From our practical experience on collaborative and empirical autotuning (example), we usually perform as many repetitions as needed to "stabilize" expected value (by analyzing a histogram of the results). But even reporting variation of the results (for example, standard deviation) is already a good start.

Furthermore, we strongly suggest you to pre-record results from your platform and provide a script to automatically compare new results with the pre-recorded ones preferably using expected values. This will help evaluators avoid wasting time when trying to dig out and validate results in stdout. For example, see how new results are visualized and compared against the pre-recorded ones using Collective Knowledge dashboard in this CGO'17 distinguished artifact.

How to deal with numerical accuracy and instability?

If the accuracy of your results depends on a given machine, environment and optimizations (for example, when optimizing BLAS, DNN, etc), you should provide a script/plugin to automatically report unexpected loss in accuracy (above provided threshold) as well as any numerical instability.

How to validate models or algorithm scalability?

If you present a novel parallel algorithm or some predictive model which should scale across a number of cores/processors/nodes, we suggest you to provide such an experimental workflow which could automatically detect the underlying topology of a user machine (or it can at least be configurable), validate your models or algorithm scalability, and report any unexpected behavior. In the future, we expect to use public repositories of knowledge where results will be automatically validated against the ones continuously shared by the community (1, 2).

Is there any page limit for my Artifact Evaluation Appendix?

There is no limit for the AE Appendix at the time of the submission for Artifact Evaluation, but there is a 2 page limit for the final AE Appendix in the camera-ready conference paper. We expect to have a less strict limit in the journals willing to participate in our AE initiative.
Maintained by
cTuning foundation (non-profit R&D organization)
and volunteers!
          
Powered by Collective Knowledge
                     
  
  
  
           Locations of visitors to this page