Abstract: There is a growing desire to create computer systems that can communicate
effectively to collaborate with humans on complex, open-ended activities.
Assessing these systems presents significant challenges. We describe a
framework for evaluating systems engaged in open-ended complex scenarios where
evaluators do not have the luxury of comparing performance to a single right
answer. This framework has been used to evaluate human-machine creative
collaborations across story and music generation, interactive block building,
and exploration of molecular mechanisms in cancer. These activities are
fundamentally different from the more constrained tasks performed by most
contemporary personal assistants as they are generally open-ended, with no
single correct solution, and often no obvious completion criteria.
We identified the Key Properties that must be exhibited by successful
systems. From there we identified "Hallmarks" of success -- capabilities and
features that evaluators can observe that would be indicative of progress
toward achieving a Key Property. In addition to being a framework for
assessment, the Key Properties and Hallmarks are intended to serve as goals in
guiding research direction.