Proposed Standard: Human-AI Systems Test & Evaluation (HAISTE)
Posted August 04, 2025
Purpose and Background:
HFES proposes to develop an HFES/ANSI standard on the topic of Human-AI Systems Test & Evaluation (HAISTE). A core working group has been working to establish an initial methodology and metrics for test and evaluation (T&E) of AI systems when working in conjunction with humans, particularly in high-risk applications. The ability of AI implementations to meet its objectives when working in realistic conditions with actual users is critical. An effective capability for the co-development and testing of the AI system as a part of a joint human-AI system is critical to its success. It is envisioned that a lifecycle approach to testing will be needed, beginning at the early stages of system development through initial deployment and continuing over time changes to the AI system occur.
The T&E of joint human-AI systems must consider (a) the behavior of the AI system and the ability of people to interact with it successfully to accomplish their goals, (b) the ability of people to detect and react appropriately in situations in which the AI is inaccurate (i.e. AI blind spots) and, (c) changes to human performance when AI is present that may create unintended consequences. The purpose of the process is to ensure resilient performance of the human-AI system in the face of both normal and off-normal situations that may be beyond the boundary conditions of the AI system.
Potential Canvassees
HFES seeks to identify potential canvassees consisting of those organizations, companies, government agencies, standards developers, individuals, etc., known to be, or who have indicated that they are, directly and materially interested in the proposed standard. We abide by the ANSI Essential Requirements and will ensure:
- Openness - Participation shall be open to all persons who are directly and materially affected by the activity in question.
- Lack of Dominance - The standards development process shall not be dominated by any single interest category, individual, or organization.
- Balance - The standards development process should have a balance of interests. Participants from diverse interest categories shall be sought with the objective of achieving balance.
If you are interested in serving as a canvassee for this proposed standard, please complete the form no later than September 4, 2025.
Pre-Canvass Survey Form
Read the information below for more about the proposed standard.
Scope:
The target audience for the standard is AI developers, testers, researchers, and certification bodies involved in the oversight of AI systems deployment in different contexts. The HAISTE standard is anticipated to address T&E of AI systems working in conjunction with humans, to include:
- T&E of effects of AI system on human performance, cognition and perception
- T&E of performance and output quality of joint human-AI system
- Humans interacting with fully autonomous AI
- Humans (and others) potentially impacted by fully autonomous AI
- High risk and low risk situations
- Intentional misinformation/fraud
- Occupational settings and non-occupational settings
AI systems based on any type of approach (symbolic, machine learning or hybrid) will be considered, as applied to parts of a task or to a combination of tasks. Humans may include those who are interacting with the AI to perform tasks, who rely on the output of AI systems, or who are affected by the behavior of the AI system.
A number of issues, while important, are viewed as out-of-scope for this standard including:
- Verification and validation of the AI system alone
- Software security
- Legal Compliance
- Privacy
While the test and evaluation of AI systems to support joint human-AI system performance is viewed as critical for managing and mitigating risks as a key component of ethical AI, other aspects of AI ethics (e.g. employment, social impacts) will not be considered as in-scope for this standard.
Process:
The process for the proposed standard is being managed through the HAISTE working group, which was established in October 2024. The group currently consists of 12 participants from the DOD, industry, and academia, as well as international participants. The HAISTE working group has expertise in the development and testing of human-AI systems across many domains including military, aviation, maritime, road transportation and rail transportation, as well as general applications. At this time, the working group has been focused on developing an initial outline and approach for the human-AI system T&E. This outline will be used as a foundation for the development of the HAISTE Standard. Once approved, the HAISTE Standard development group will follow the processes outlined by ANSI.
Content:
The document will be considered as an industry guidance standard. Key sections of the document are anticipated to include:
- Introduction, definitions, goals and scope
- Challenges for human-AI systems, risk framework & human-AI error classes
- Operational design domains (ODD) and evaluation approaches
- T&E life-cycle approach across the Human Readiness Levels (HRL)
- Establishing human use requirements for AI systems
- Human-AI systems testbeds
- Scenario development
- Methods and metrics for evaluation
- Training
- Design guidelines