Beyond Verification — What Responsible AI Really Demands of Human Experts

by Elizabeth M. Renieris, David Kiron, Steven Mills, and Anne Kleppe. MIT Sloan Management Review

Tuesday, 12 May 2026

For the fifth year in a row, MIT Sloan Management Review and Boston Consulting Group (BCG) have assembled an international panel of AI experts that includes academics and practitioners to help us understand how responsible artificial intelligence is being implemented across organizations worldwide. In our first post this year, we explored how organizations should think about AI’s impact on the workforce, with our experts stressing that responsible AI means looking beyond the safety of AI systems to address real-world consequences for workers and economic stability.

This time, we asked our panel to react to the following provocation: Responsible AI efforts fail if they don’t cultivate human experts who can verify AI solutions. On the surface, there is broad consensus, with a clear majority (84%) of our panelists agreeing or strongly agreeing with the statement. But a deeper dive reveals that panelists define verification far more expansively than the provocation implies. Rather than treating it as a narrow, output-by-output check, they describe verification as the work of applying human judgment across an AI system’s life cycle, interpreting context, designing tests, auditing workflows, setting thresholds, weighing when AI should not be relied on at all, and carrying the accountability that machines cannot. Understood this way, verification is not a final checkpoint but the connective tissue of responsible AI, encompassing the design, oversight, and accountability that organizations need to scale alongside the systems themselves. Below, we share panelist insights and offer our practical recommendations for organizations seeking to cultivate the human expertise their responsible AI governance efforts depend on.

Humans provide the context for verifying AI outputs. ForHumanity founder Ryan Carrier backed the consensus that responsible AI efforts must cultivate human expertise to verify AI outputs because, as he puts it, “context matters.” Similarly, TÜV AI.Lab CEO Franziska Weindauer notes, “AI solutions operate within complex real-world contexts, and human experts are essential to interpret results, detect failures, and ensure that systems function as intended.” As GovLab chief research and development officer Stefaan Verhulst explains, “Many of the most significant risks of AI are societal rather than technical, such as misalignment with public values, harmful impacts on vulnerable groups, or inappropriate deployment contexts.” Those risks, many experts contend, are precisely the ones hardest to address with a wholly technical solution.

For some, context is irreducibly human and cannot be captured in machine-readable form alone. As OdeseIA president Idoia Salazar explains, “Not everything is translated into data, such as context in a specific situation.” Distinguished member of the investments committee of the Co-Develop fund’s Yasadora Cordova agrees that responsible AI requires “contextual sensitivity” — a quality that, in her view, “cannot be automated.” Jai Ganesh, Ph.D., vice president of technology, connected services, engineering, at Wipro Ltd., adds, “Situational awareness is another area of concern for AI systems where an output that is correct may be culturally insensitive or legally problematic in a specific country or situation.” Automation Anywhere’s Yan Chow similarly observes that “humans identify sociopolitical nuances and shifts that data cannot capture.” For these reasons, National University of Singapore provost Simon Chesterman concludes that “however sophisticated the model or elaborate the governance framework, someone must still be capable of asking whether a system is reliable, lawful, and appropriate in context,” a responsibility, in his view, that requires human expertise.

If context cannot be fully captured by machines, the practical consequences are significant. Carrier argues that “domain experts are necessary to provide feedback and risk assessments that result in well-tailored controls, treatments, and mitigations designed to tackle the specific and unique risks presented by context-dependent AI deployment and usage.” Salazar goes further, contending that “no matter how advanced a tool is, it cannot be the one to guarantee that its outputs are fair, safe, or appropriate to the context.” For Ganesh, the risks are heightened with “edge cases, rare scenarios, and new contexts where AI systems tend to break down,” and he believes “catching these failures requires human judgment and deep domain expertise.” Chow agrees that human expertise is critical for building “expert-validated guardrails for the edge cases where AI is most fragile.” Moreover, he argues that “responsible AI frameworks collapse into compliance theater without human experts because AI cannot perceive dynamic context.”

Losing human expertise poses an existential threat to organizations. The concern is not only that AI systems will fail without human expertise to verify outcomes but that organizations may lose human expert capacity over time. Cordova argues that “organizations that delegate verification only to AI erode the institutional capacity to audit it as expertise atrophies and junior staff never develop independence.” Likewise, consultant Linda Leopold cautions, “If we always let AI do the work for us, we gradually lose the expertise needed to oversee it,” and “we need to keep human judgment sharp enough to challenge it.” EnBW chief data officer Rainer Hoffmann says, “Responsible AI efforts fail not because humans cannot verify every AI decision but because organizations lack the expertise to govern how AI systems should be evaluated, monitored, and deployed responsibly.”

The business stakes, through this lens, are fundamentally human. As Australian National University’s Belona Sonna contends, “The core objective of responsible AI is not only to design systems that align with ethical principles but also to ensure that humans remain capable of intervening when misalignment occurs.” Put differently, Salazar says that responsible AI “needs people who are prepared not to delegate to machines what remains a fundamentally human responsibility.” Without this capacity, the question of whether responsible AI requires human verification of AI outputs becomes moot — as no one left has the expertise to do it.

Human verification alone does not scale. Despite broad support for the importance of cultivating human expertise, many experts cite concerns about the scale and scope of human verification. Wharton School professor Kartik Hosanagar explains: “There are many settings where it’s helpful to have human verification. But there are many others where human verification is infeasible because of the scale of verification needed.” Hoffmann agrees that for “applications that process large volumes of data or detect patterns beyond human capability, output-by-output human verification is neither feasible nor meaningful.” For some experts, requiring human verif

Beyond Verification — What Responsible AI Really Demands of Human Experts

Sam

Sam

Ask Me Anything!