Based on the community feedback, we provided an extra option of open evaluations to let the community validate artifacts which are publicly available at GitHub, GitLab, BitBuckets, etc, report issues and help authors fix them. Note, that in the end, these artifacts still go through traditional evaluation process via the AE committee. We successfully validated at ADAPT'16 and CGO/PPoPP'17 AE!
That's why we collaborate with ACM to unify packing and sharing of artifacts as reusable and customizable components using Collective Knowledge framework (see ACM announcement). CK helps automate and unify your experiments, plug in different compilers, benchmarks, data sets, tools, predictive models to your workflows, and unify aggregation and visualization of results. Please, check out this CGO'17 article from the University of Cambridge ("Software Prefetching for Indirect Memory Accesses") with CK-based experimental workflow which won a distinguished artifact award:
Furthermore, if you have your artifacts already publicly available at the time of submission, you may profit from the "public review" option, where you are engaged directly with the community to discuss, evaluate and use your software. See such examples here (search for "example of public evaluation").
If you have more than one expected value (b), it means that you have several run-time states on your machine which may be switching during your experiments (such as adaptive frequency scaling) and you can not reliably compare empirical results. However, if there is only one expected value for a given experiment (a), then you can use it to compare multiple experiments (for example during autotuning as described here).
You should also report variation of empirical results together with expected values. Furthermore, we strongly suggest you to pre-record results from your platform and provide a script to automatically compare new results with the pre-recorded ones preferably using expected values. This will help evaluators avoid wasting time when trying to dig out and validate results from "stdout". For example, see how new results are visualized and compared against the pre-recorded ones using CK dashboard in the CGO'17 distinguished artifact.
There is a 2 page limit for the AE Appendix in the camera-ready CGO,PPoPP and PACT paper. There is no page limit for the AE Appendix in the camera-ready SC paper. We also expect that there will be no page limits for AE Appendices in the journals willing to participate in our AE initiative.