The concept of explainable AI (XAI) first emerged in parallel with the concept of artificial intelligence, dating back to the 60s. Early expert systems, for instance, were designed to replicate reasoning processes that were transparent to the user. It is only in the relative recent that the inner workings and logic of artificial intelligence have become so opaque as to be unknowable to human operators. The overall architecture may be know, but the inner workings a mystery.
With the advent of machine/deep learning, and with a subsequent push by the likes of DARPA for explainable AI in response, it became clear that for AI-based systems to be safely deployed in domains like medicine and defence, it was crucial to have visibility into their decision pathways and explanatory abilities. Being able to understand and probe AI reasoning was tabled as necessary for oversight and improvement of burgeoning autonomous agents with AI-based cognitive models.
With the advent of large language models (LLMs) there seemed to have been a brief lull in awareness of explainable AI as the excitement of the potential of near-human like intelligence overshadowed the sound drive for research into XAI. Early XAI methods like generating logic traces and confidence metrics for expert systems, increasing AI transparency and accountability, gave way temporarily to black-box models that work on effective inference. The vast parameter spaces of LLMs made interpreting decisions and reasoning post-hoc rather than architecting innate explainability into systems. But with a new generation of AI proliferating, the importance of understandable AI is resurfacing, with leaders in the field calling for laws, regulations, oversight and even a moratorium on AI development such that people can perceive that they may control or at least trust the autonomous agents that will soon be ubiquitous.
The Renewed Need for Explainability
It is foreseeable that autonomous agents powered by artificial intelligence, those making their own decisions and acting on them, will shortly proliferate across high-impact domains, such as law, medicine and defence, and be offered as products for our homes. The way I see it, there will be an equitable renewal of a need for explainability. Who will trust autonomous agents where the inner workings of their cognitive architecture is unknowable?
While very successful in view of their impact and applicability, what I envisage is that when the impressive capabilities of large language models and their elk are looked back on, it will be in the light that they detracted from sound cognitive model discipline. Their lack of transparency has the potential to become deeply concerning when deployed in sensitive, consequential settings.
Oversight demands explainability in order to probe the reasoning behind decisions and ensure alignment with human values. But the vast parameter spaces and statistical nature of modern AI render post-hoc explanations of LLM-based decision making inadequate for engendering trust or enabling accountability. There is inherent opacity in LLM reasoning such that deciphering, after the fact, why an autonomous agent made a pivotal, potentially risky decisions is virtually impossible.
The scale and intricacy of these modern systems confound explainability needs even further. Which is all okay if the results of decisions are benign, but as autonomous agents become more ubiquitous across healthcare, transportation, criminal justice and beyond, the lack of architectural transparency threatens dangerous misalignment and erosion of public trust. Tackling this requires focusing explainability at the AI design level, not just post-deployment. The path forward necessitates fresh approaches, and my work with Perceptible focuses on bringing a level of transparency back to the cognitive architecture of AI-based systems when employed in autonomous agents.
That 'autonomous agents' isn't an everyday term is evidence that resistance to turning on the switch to full autonomy of AI is high. It isn't so much that people don't want autonomous agents working among us, or that we can't perceive of helpful serving robots and the like, it is more that without transparency over decision making we won't wisely ever flick that switch.
A Fresh Approach with Humanistic Architecture
Perceptible holds the view that explainable AI requires architecting transparency directly into the cognitive model itself, where post-hoc analysis of decision making can be made over logs of transactions that lead up to a decision. What Perceptible proposes with the architecture we are developing is humanistic architecture designed holistically for understandability. Even if poor decisions are made, at least there will be an audit trail of the inputs leading to a decision.
To approach this, Perceptible is designing a cognitive architecture based on Transactional Analysis (TA), a theory of psychology that analyses transactions based on their effect and intent. Internalising this to an AI or artificial general intelligence (AGI), and storing a record of the inner dialog, will provide a foundational mechanism for both getting to autonomous decision making and its retrospective analysis after the fact. More so, the architecture necessitates that decisions are made first having considered the input of parent, adult and child role-playing agents internal and integral to the cogntitive architecture of the autonomous agent.
Central to TA are the roles of Parent, Adult and Child (P-A-C) and where human transactions may be analysed as coming from any one of those three roles. An autonomous agent wouldn't be limited to just those prototypes in thinking, further including an organiser, a worrier, a procrastinator, an action taker etc, but the transactions made as each role communicates with the others to effect their effective will, will provide the mechanism for decision making. In the context of autonomous agents, the AGI plays the role of adjudicator, reflecting over the wants and needs of the different roles, but making the final decision. If even made by an LLM acting as an AGI, that decision may be extrapolated and stored as an adjudicatory narrative over the decision made.
Which is to say, Perceptible's P-A-C framework incorporates relatable modules modelled on facets of human personality, all of which modern AI is capable of emulating. It's storing the transactions leading to decision which is key. This allows tracing how cooperative decisions emerge from the controlled collaboration between archetypes playing specialized roles.
Perceptible wrote earlier on how coupled language model modules co-jointly working together, such as AutoGen, will invariably lead to artificial general intelligence. Rather than seeing each module as separate parts, they become the whole, with each part playing its role in inner dialog.
By logging transactions between the Parent, Adult, Child and the meta-coordinating AGI module, we enable tracing decision pathways within autonomous agents. Decisions that are not made within the standards required by humans may be analysed and corrective action taken.
Rather than combing through millions of opaque model parameters, people can analyse agent choices based on simpler archetype interactions. I believe this transparency-by-design best offers an approach that will align autonomous reasoning with human values.
Envisaging the future
Imagine in the near future AI assistants are common in homes. The Smith family has a household assistant named Alex. One day, the parents are debating whether their teenagers should have a later curfew on weekends.
The Parent archetype in Alex wants to protect the teens, advising an earlier curfew. But the Child archetype empathizes with the teens' desire for more freedom, pushing for a later curfew. The Adult examines factors like maturity levels and peer norms logically.
Alex's AGI coordinator makes the final judgment, synthesizing the best parts of each perspective. It decides on a modestly later curfew, while also recommending the parents have a thoughtful discussion with the teens on responsibility. This balanced approach aligns with human values.
The Smiths review Alex's adjudication process and find the integrative decision reasonable. Having visibility into Alex's recorded archetype deliberations provides accountability and confidence in its judgments.
Or let's take an example in the business world.
Acme Inc. has an AI business assistant named Jenny to help with operations. The company is choosing between two plans for expansion - opening a new office overseas vs acquiring a competitor. The executives consult Jenny for advice.
Jenny's Parent archetype considers employee workload, advising gradual expansion may be more prudent. The Child archetype empathizes with employees nervous about rapid change, recommending the acquisition to maintain culture. The Adult analyses financials, projecting the new office will maximize long-term revenue.
As the AGI coordinator, Jenny synthesizes these perspectives. She recommends a phased approach - first opening the new office to fuel growth, then acquiring the competitor once operations are stable. The executives review Jenny's archetype deliberations and find the rationale balanced and sensible.
By examining Jenny's recorded architecture transactions, Acme's leaders gain useful insight into her strategic thinking. Architectural transparency enables the oversight required to build the trust required by autonomous agents advising on business decisions.
Cultivating Transparent and Cooperative AI
Perceptible believes focusing explainability at the architecture level will cultivate cooperative autonomous agents that can evolve responsively under human oversight. A P-A-C/TA-based framework aims to produce systems whose behaviour can be analysed by tracing transactions between comprehensible components representing human archetypes. While still an emerging capability, embedding explainability into the agent's structure itself offers potential to build confidence in AI advancement by at least allowing humans to inspect and intervene in the loop when required.
Of course, explainable architecture alone does not guarantee ethical alignment, and rigorous real-world testing will remain essential. But in my view, responsible development of AI demands explainable design in addition to rigorous training and auditing. Opaque systems operating autonomously, regardless of performance, will struggle to earn public trust or opportunities for widespread integration. However, on the path ahead, AI will only realise immense possibilities in an acceptable manner if technologists refocus on constructing AI capable of explaining its actions clearly, truthfully and accountably from the ground up.
The rebirth of explainable AI is here and Perceptible aims to play its part.