EPISODE #42

Your AI Agents Are Built Wrong

Guest: Chris Butler

Chris Butler is a product leader at GitHub, based in Oakland, California. He’s building and guiding human-centered products at the intersection of AI, decision-making, and team alignment, with prior product leadership experience across major platforms including Microsoft, Facebook, KAYAK, and Waze. Chris is a writer and speaker whose work focuses on practical methods for reducing bias and improving outcomes in product development.

Episode Summary

Chris Butler of GitHub makes a bold case that most enterprises are designing AI agents the wrong way. Instead of building agents that mimic job titles, he says leaders should build agents that create specific artifacts inside governed workflows. In this episode, he explains how that shift changes automation, trust, and cross-functional execution, and why the future of agentic AI belongs to teams that focus on outputs, not impersonation.

Listen to other episodes

Join AI Realized Community

Resources

Articles & Documents

Agents Should Produce Artifacts, Not Impersonate Roles" by Chris Butler Published inAI Realized Now newsletter
Safe Outputs documentation
Blog post
Substack

Agents and Advice

Agentics Beyond Code— Open source collection of GitHub Agentic Workflows for non-engineering roles (launch readiness, compliance review, status reporting, transcript parsing, process review, and more).
GitHub Agentic Workflows — Framework by GitHub Next with security architecture including token-scoped permissions and safe outputs that restrict agent write actions to narrow, declared output channels
Liminal Practice — Chris's consulting and learning-and-development practice focused on AI; offers workshops and help implementing agentic workflows in organizations

Research and Data

McKinsey 2026 AI Trust Survey — Nearly two-thirds of respondents cite security and risk concerns as the top barrier to fully scaling agentic AI; only about one-third of organizations have reached a governance maturity level adequate for autonomous agents
- McKinsey: Trust in the Age of Agents
- [McKinsey: State of AI Trust in 2026
HBR IT Project Overrun Study — "Why Your IT Project May Be Riskier Than You Think" by Bent Flyvbjerg and Alexander Budzier, Harvard Business Review, September 2011: average IT project cost overrun of 27%; one in six projects is a "Black Swan" exceeding 200% over budget
- Why Your IT Project May Be Riskier Than You Think, HBR article
Gary Klein's Pre-Mortem Research — Prospective hindsight technique for identifying project risks; improves failure identification by approximately 30%. Grounding for the Adversarial PM concept.
- Pre-Mortem on The Uncertainty Project
Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents (referenced as "VenBench" in transcript) — Benchmark for testing long-term coherence of autonomous agents; demonstrates how agents experience "decoherence" over time on longer tasks

Books and Publications

- Red Teaming: How Your Business Can Conquer the Competition by Challenging Everything by Bryce G. Hoffman (Crown Business, 2017) — Referenced in the interview guide research notes as grounding for the Adversarial PM concept Ag

FAQs

rtifact-scoped agent design is a framework for enterprise AI where each agent is tied to a single, specific output rather than a broad job title. According to Chris Butler, Director of Product Operations at GitHub, agents should produce bounded artifacts like status reports, compliance documents, or decision records. The key principle is that an agent's value is measured by the artifact a human can review and act on, not by how closely it mimics a coworker.
The "read wide, write narrow" model is a security architecture for enterprise AI agents developed by GitHub Next. Agents receive permissive read access to gather context from multiple data sources, but their write permissions are restricted to a single, declared output channel, such as one discussion category or one report type. Chris Butler of GitHub recommends that leaders ask three questions about any agent: What can it read? What can it write? Under what conditions?
An adversarial PM is an AI agent that monitors team decisions and posts structured counterarguments to automate healthy dissent. Created by Chris Butler at GitHub, the agent identifies the most consequential decisions being made and challenges them, drawing on pre-mortem research by Gary Klein that shows prospective hindsight improves failure identification by approximately 30%. Butler also describes variants including a "chaos monkey for process" that detects groupthink or stasis and introduces deliberate disruption.
AI agents can continuously annotate and update organizational documents so they stay current rather than going stale. Chris Butler of GitHub describes agents that review transcripts, automation logs, and "how we work" documents on a weekly basis, then flag outdated processes or suggest new automations. This replaces the manual auditing work traditionally done by program managers and ensures that strategy documents, decision records, and operational guides reflect actual team behavior.
Agentics Beyond Code is an open-source GitHub repository created by Chris Butler that contains ready-to-use agentic workflows for non-engineering roles. It includes workflows for status reporting, transcript parsing, process review, adversarial PM, launch readiness, compliance review, and living document maintenance. The workflows use GitHub's agentic workflow framework with token-scoped permissions and safe outputs, and Butler provides skills that help teams select and implement the workflows most relevant to their problems.
Chris Butler of GitHub argues that personifying AI agents as full coworkers "loses the plot" because agents are combinations of deterministic and non-deterministic instructions, not individualized decision-makers. Instead, enterprises should focus on what agents produce. Butler's experience building internal tools at GitHub showed that the value of an agent exists only when a human interacts with its output artifact, whether that is a draft document, a notification, or a decision record. Even an agent that only occasionally produces an artifact is valuable if that artifact drives human action.

Full Transcript

Welcome to AI Realized, the podcast for enterprise executives leading AI adoption. From tackling security, data, and operational challenges to navigating organizational transformation, AI deployments offer a unique opportunity to redesign our organizations from the inside out.

I'm Christine Elwood, your host for today's episode, and today we are talking with Chris Butler, the director of Product Operations at GitHub. Chris, welcome back to AI Realized.

Chris has been with us once before, so we're going to be catching up with him on what he's been working on lately.

He leads a team called Synapse that builds AI-powered tools to automate how cross-functional teams work. This is internal uses of AI, and he's always got very interesting cases of implementation, adoption, and friction that he will be sharing with us today.

And he's pushing a kind of provocative argument right now, and that's that enterprises are designing their AI agents wrong.

Which I think is going to be really fun to talk about. So his premise is that instead of building agents that impersonate roles, like, oh, you know, here's an agent that's like a product manager, they should be building agents that produce specific artifacts.

And he's recently written an article for us about this for AI Realize Now, so you can follow up and read a little bit more about it in that article.

But today we're going to dig into what that means in practice for enterprise leaders deploying agentic AI.

@3:56 - Chris Butler

And I want to take a minute to just welcome you, Chris. Thanks for having me again.

@4:02 - Christina Ellwood (christina.morelandassociates@gmail.com)

I'm excited to chat with you. It's always fun to have you on. You're full of interesting information and experiences because you're a very prolific user of AI and you are working with people who are prolific users of AI.

So you've got really two perspectives. So you recently wrote about how the most common mistake in enterprise agentic AI is scoping agents to titles instead of to outputs.

Let's dig in on that a little bit.

@4:30 - Chris Butler

Talk a bit about that and why you think that's an important distinction. Yeah, I think it's really important that when we talk about the way agents work, like I think at first there was a lot of speculative kind of thought about that they would join the team as another teammate, that that would be someone that you would like kind of invite into meetings as you would like another co-worker.

And I think that loses the plot a lot on the way that actually, you know, people are, you know, individualized experiences.

We have value. We use, we make decisions, but the reality is, is agents are really just kind of a combination of deterministic and non-deterministic instructions that are kind of operating.

And so to me, I think the idea of like personifying these agents so much to be like a full coworker is the wrong thing.

And I think this is related to the way that jobs will change within organizations based on AI in the sense that, you know, like the product manager, for example, right, like is a bunch of different tasks, decision making, like alignment, like conversations.

You know, those are all things that a product manager would do, but if we try to decompose the things that it does, that the person does actually, there are some things that actually can be automated.

So for example, you know, one of the early things we did with Synapse is this project called Bloom, which was really about giving early feedback to product managers about the initiatives that they're building.

And so that feed, early feedback would be in the form of a bunch of different types of like, we would almost call them skills, we would call them skills now because it's really more about the idea of like, we want to give feedback on how aligned with the strategy is this?

Or are there? Gaps in this plan in some way. And the other part of it was that we wanted to start to give them a way to automatically generate drafts of the 10 to 15 different documents that they have to create when they're actually bringing something to market, right?

And that includes go to market, kind of like public roadmap item content or blog post announcement announcements, but also like compliance.

So how is this when it comes to kind of responsible AI or privacy or security or something like that?

@6:25 - Christina Ellwood (christina.morelandassociates@gmail.com)

Or governance.

@6:26 - Chris Butler

Yeah, or governance, any of these things really, when it comes to all of the different teams that a product manager has to work with.

And what I started to see when we would use these different drafts of documents was that it was actually this like combination of the draft and then the human review to then turn in or push to other teams so that they could then do their work.

And so from that perspective, agents for me really started to become that they were almost like an embodiment of an artifact to a certain extent.

And what we really wanted to do is we wanted to be able to provide that this document is actually a draft, right?

It's not actually an official thing until a human is removed. And honestly, like when an agent is going off and doing all of its research, it's, it's checking a bunch of different kind of data sources, and then it's generating something.

It really doesn't matter that much until that artifact actually exists. So a human can interact with it. And in fact, if, if like a human is not interacting with something that comes out of the agent, that agent is actually probably not that valuable, right?

Even if it's only occasionally that the agent will actually push an artifact, it's that artifact of a, even something as simple as a notification.

Is actually the thing that matters in this case. And so it's, it's, how am I using this information that was generated in part of my workflow?

@7:37 - Christina Ellwood (christina.morelandassociates@gmail.com)

And so that is kind of the future of the way that I think we should consider agents and we should not consider them full coworkers.

I think that is a really important aspect of this. Well, one thing that, I mean, I used to be a product manager, so I'm always interested in your use cases because I, I sort of lived the pain.

@7:49 - Chris Butler

It's been a long time since I did it.

@7:51 - Christina Ellwood (christina.morelandassociates@gmail.com)

I know a lot of the models in terms of how we go about doing it have changed, but one of the things that hasn't changed is that what is in and what is out changes all

And one of your early cases, as I recall, when we were at the executive roundtable was related to triaging feature requests.

So if I've got an asset that's been drafted and I have a feature request that comes in that gets approved to be included in that particular, having a way in which the agent is smart enough to update the right documents with the right information and create some kind of indicator of when it was most recently updated is super valuable, because the recipient of it needs to identify whether or not there's a big gap or a little gap between the draft and the version that I'm looking at.

Is that a fair...? Yeah, I think that's very fair.

@8:45 - Chris Butler

And I would also say that what we're actually trying to do, what we're using agents to do is to not only kind of help us build these artifacts in some way, because that helps us do our job, but also to build a better network between the different artifacts.

And so, you know, I think a good example of maybe where I would draw the line... It's like transcripts coming from meetings actually hold a lot of context for what is being decided within the team, right?

And it's not always, it doesn't always make it into the issues or the other artifacts that are durable in some way, right?

And so some of the automations, and so I created this open source repo called agentics beyond code, which is really like a combination of all of the different non-coder workflows that I've been building internally and some that are more speculative that I think are kind of interesting.

We can talk about those, but like that example of the transcript parser, it really isn't the transcript that matters so much.

It's more of the fact that the transcript when it's parsed, it can actually update a particular issue with a comment about what was discussed or decided at that point, or there should be a PR, you know, there's a lot of people when they end up using GitHub as kind of their basis for all of this content, that they may create a bunch of different markdown files that are different ADRs or like basically decision documents.

And so being able to auto generate drafts of those things, because that ADR is actually used not only by humans later, so like, why did we make this decision?

What were the other options we could have considered, but when you start to like include coding? Agents in here.

That decision set is actually very important to provide context to the coding agent to then go and do the right thing.

And I would argue that we're talking about documents like a strategy document, things like how does this team do its work, like the how we work type of documents.

Those are really, really valuable. And to your point, like they get out of date very quickly. And so how do we actually then use these agents to automatically annotate these different documents?

So some of the workflows that I think are really interesting is like the how we work kind of, there's, there's a retrospective that an agent can do.

looks at all the transcripts. It looks at all the automations. It looks at the how we work document. And it says, hey, actually, you know, here's one opportunity of something we could automate, right?

And then two, you actually don't do this process anymore, even though you say you do. And so we should remove it, right?

Like we should say that this is now deprecated in some way. And so I kind of call these like living documents.

And I think agents really help us do this in a way that before it required like a program manager or TPM to be really on top of it to constantly be editing those things and auditing them themselves, which is a huge time suck for those people.

@11:00 - Christina Ellwood (christina.morelandassociates@gmail.com)

Right. And so anyways, that's, where I think it gets really interesting for agents is that if we consider them as like, these agents are helping upkeep the how we work document that actually is the how we work document is the most important part.

We may add and remove agents that are doing different things on that, but it's the how we work document that we really care about.

Yeah, and I'm really glad you brought up this sort of interstitial element, right, there is this kind of fabric that exists between all of the systems that connects them together, but also acts as the place where we do the, the decision making about updates.

And so I'm really glad you brought that up. And so do you have an agent in your agents beyond code GitHub repo that is specifically for that interstitial component?

@11:39 - Chris Butler

Yeah, and so being able to, there's a couple different ones that are in there, you know, using things like transcripts to then update the how we work document, or looking at decision documents that are generated from a meeting, to then say, like, how many decisions are we making that are in alignment with our strategy document, and which ones are misalignment with our strategy document.

And I don't mean misalignment like. In a bad way, necessarily, because a lot of the times teams that are on the ground, like they're seeing much more than their executives are because they're so close to the work.

They may be making decisions that show that there's a gap in our actual strategic understanding of the world at that point.

It's almost like the idea of, you know, when we do escalations up in a hierarchy to like leaders within the team, if you start to look at the themes of those escalations, which is another thing that agents could start to pull out of these like different escalation documents that come out, we can start to say, hey, actually, there's a gap.

In the way that people do execution, because there's nothing in the strategy that says how they should make this decision.

And so that's the reason why this theme of escalation keeps on coming up is because we need to like either better communicate the strategy or refine the strategy in some way.

And so I think those are really important. And so I would I would point to like, yeah, any of these ones that are kind of like the transcript to issues is like a really important one for kind of sewing all that stuff together, the things where it's doing kind of the process review and kind of like how we work reviews on a weekly basis.

These are all things that are on different cases. So I think that's really, really valuable. I mean, I think one that I just kind of made some updates earlier this week on was really this like kind of how we do status reporting is something that requires a lot of legwork by the agents.

So like you want to go and look at all of the different issues and PRs that have been updated in the last week.

And we want to then match that to our strategy to say, hey, here's a summary draft of what happened in the last week so we can send it to a leader.

But there's a human shaping component. So the idea of like, is our status red, yellow, green, right? Like, is it really red or is it just yellow because I'm shaping this for the leader and whether I want them to actually get involved or not.

And so what this does is this actually triggers the creation of a Google Doc that is the draft so that humans can collaborate because that is a better place to do that than maybe in an issue.

And it cross links those two different documents that when you resolve the comment on the Google Doc, it tells the agent that now it can publish the actual like status that has been like kind of drafted by the agent.

@13:59 - Christina Ellwood (christina.morelandassociates@gmail.com)

And then. then. And reviewed by, and kind of formed by the human. And then it automatically updates the people that need to be updated on Slack.

And so that's an example of where, kind of like what we're doing is we're trying to make sure that the agents are doing the parts they should, and we want the humans to do the parts that they're doing.

@14:13 - Chris Butler

And the human gets the final kind of sign off of, yes, this is ready to now talk to everybody.

@14:17 - Christina Ellwood (christina.morelandassociates@gmail.com)

And we just take away the toil of them having to like copy and paste stuff to different channels and do all that work.

Wow, that sounds amazing and super valuable. And I'm excited to mention to the audience that you've open source this agentics beyond code and GitHub, right?

So how would someone like myself take advantage of the agentics beyond code repo?

@14:42 - Chris Butler

Well, I think the first thing is that this is fairly GitHub centric, at least right now. And so it does kind of look at the fact that you're keeping key documents mostly in GitHub repos, that you're using things like projects and issues, that you're using kind of PRs.

And so that is one thing that it does kind of assume right now. But I actually ended up creating...

We skills for almost any one of these, like IDE or like, you know, Claude Cowork or Codex, where you can basically have a conversation with the skill to say, here's the type of problem I'm trying to solve right now, or here's the type of team I have, and it will help you basically pull all of those workflows over, compile them, and put them in the repo that you need them to be in.

And so I've tried to make it as simple as possible so that you can like piecemeal pull the different processes that are most helpful.

And actually, the skill will also tell you like, probably first thing to do is these three skills for the problems you have, and then here's the next like, I'm sorry, agents for the problems you have, and then here's like five more agents that once you start to have data running through it, here's the next ones to do.

And so and also just by the way, I'm always happy to help people if they do need help implementing this inside of their organizations.

It's something that I do through like both a consulting and learning development practice standpoint.

@15:49 - Christina Ellwood (christina.morelandassociates@gmail.com)

Well, that's actually really good to know. And we should add that to the show notes of how someone would get in touch with you if they wanted to do such a thing.

You also make. A design argument, actually, that the agent's trust boundary should be that it can read widely, but write only through a narrow declared output channel.

Now, governance is a huge topic right now, and in fact, we're doing an executive roundtable on the 17th of June on this very topic of governing agents at scale.

And this strikes me as one of the pieces of advice that would be helpful for the attendees and the participants to discuss.

@16:29 - Chris Butler

Yeah, and I'll be there on that as well, so I'm excited to be part of that conversation. Yeah, and I think like the hypothesis here is that when we start allowing agents to use tools or to edit artifacts inside the enterprise environment, there's a real possibility that it will do something wrong, and that's because it's a very non-deterministic system, right?

Like even though it's following instructions because of context and because of also like security concerns. So if you have, for example, an open source project that allows people to like submit issues.

peace, long as we're There is actual a possibility of like prompt injection for any of the agents you have on that particular repo, right?

So what I would argue is that we want agents to be able to go out and like look at things and actually this is all based off of kind of a security model that GitHub Next, they created this agentic workflow framework.

And part of it is that it is scoped by tokens. And so that is something that you kind of say, yes, you probably want to have access to these different repos in this different way.

And you want it to be more permissive because more context in this case from those artifacts is helpful a lot of the time.

And then they have something called safe outputs, which really like restricts down the type of thing that it can take action on inside the environment.

And I think that's really, really important because if I have this like status reporting agent, for example, there's really never a time that the status reporting agent should change the status of different items because it's not appropriate.

Like it's even, even if the transcript one will update these with comments that is helpful, but this status reporting one should not do that.

We'll And so in this particular case, that agentic workflow is actually scoped down to creating one issue, sorry, one discussion in one category with that starts with this like title of like, you know, basically weekly status.

And the reason why that it makes me feel much safer is that even though I have many, many different agents that are out there doing stuff inside of this enterprise environment, I know that the likelihood of them stepping on each other is much less because they're all scoped down to the one artifact that they should be dealing with.

Now, there are possibilities for agents that pick up off of that discussion that was just posted that then do other work, for example.

And so this is another job for another type of agent, which is like trying to look for the way that agents may interact and look for actually bad interactions.

Like, I would argue that, you this comes from like the GitHub Next team, that it is actually less about evals, especially when we're talking about internal tooling.

And it's more about this idea of kind of AB testing your way towards the best set of instructions, the best set of models, and for your environment, because your environment also changes over time.

to... Your information environment is constantly evolving inside of an enterprise organization. So anyways, I think it is really, really powerful to say that agents can kind of read a lot of context, but then only take one action.

And that's one of the reasons why I think the model that agentic workflows from GitHub Next is using is going to be a model that a lot of people use in the future.

Now, it's still really hard to deal with like tokens. That is probably the biggest pain to actually deal with for these things.

And I think there is the possibility for people being too permissive with the tokens that they actually provide to these agents.

I think there's a lot of work over the next couple of years where we're going to dial that in more.

But for now, I think as long as we're restricting the impact on the artifact environment, I think it's much safer than it would have been just having like random tool use for an agent.

@19:44 - Christina Ellwood (christina.morelandassociates@gmail.com)

Okay, good. Thank you. That's very clear description, too. So one of your other provocative ideas is something you call the adversarial PM.

@19:53 - Chris Butler

Yeah.

@19:55 - Christina Ellwood (christina.morelandassociates@gmail.com)

And this is an agent that finds two or three of the most... And then post-structured counterarguments, so basically it is automating dissent.

@20:10 - Chris Butler

Yes.

@20:11 - Christina Ellwood (christina.morelandassociates@gmail.com)

So talk about the inspiration behind building that and the impact it's had.

@20:16 - Chris Butler

Well, so I've been doing a lot of workshops with different teams, both inside, you know, basically like helping them understand how PMs should utilize AI.

And I think there's this loop that I've started to discover, which is that I will start with maybe an idea, very messy collection of thoughts that I'll maybe like just, you know, talk out through and then put into a, you know, some type of tool.

That beginning requires actually the LLM is basically like almost like rubber ducking 2.0 is what I'm calling it. So this idea that like I'm talking to something and then it's giving, because I'm externalizing my knowledge, it can also then do intelligent critique in some ways.

And so adversarial stances are really, really helpful for people to like hone their ideas. And so I. I something, I get critique, and then based on all of the kind of things that I respond to or ignore, right, like it really matters most what the human is doing there, I then have the system to create an artifact, and that could be a document that could be a prototype, and then I critique that.

And so this idea of like back and forth creates this like cycle of context collection that I think is really helpful for PMs.

And so the adversarial PM is one, I'm a huge proponent of like adversarial thinking and red teaming and stuff like that within the world of product management, because there's a lot of the time that even ourselves or our team may be not thinking through or thinking about the possibilities that we should be.

And so the adversarial PM is really to push people's thinking and make them think harder about this, and I have lots of different versions of this.

I have one which is like the grouchy mean senior engineer and like telling you your idea is a bad idea.

I have another workflow that's in that agentics beyond code, which is I call kind of a chaos monkey for process.

And what it does is it looks at all of your kind of things that are happening inside the information environment, and then it starts to try to detect if there's too much like stasis or groupthink happening, and then tries to stir up the process in some random way.

And so this comes from Chaos Monkey from Netflix and Chaos Engineering, which is like Google SREs really talk about this stuff a lot.

It really is the fact that there are plenty of times where we start to fall into this kind of like groove, and that's helpful when we have like good thinking or thinking that is ideal.

But there are times where we don't know that we're in this groove, and it's actually not helpful to us.

And so all of these different types of like processes, I think, are just trying to enrich the environment by saying, hey, you know, your idea is not always perfect.

And so let's try to help you explore what are the possibilities. An adversarial PM as like an agent in your environment, I think, is really interesting and pretty helpful.

Like even the rude like Q&A one, which is the kind of mean engineer that we had inside of GitHub, people would like laugh at it because it is really mean.

And but at the same time.

@22:59 - Christina Ellwood (christina.morelandassociates@gmail.com)

And.

@23:00 - Chris Butler

They would say, like, actually, these are probably things that people are thinking. And so I do need to have a good answer for this.

And the thing that's so interesting about this is that there's no reputational risk for these agents to be mean, because I don't respect the agent.

The agent is just a process automaton. So, but if I get this feedback, it really does help me a lot.

@23:16 - Christina Ellwood (christina.morelandassociates@gmail.com)

Yeah, it makes good sense. And is the adversarial PM part of the Agentix Beyond Code? Yeah, that's right.

@23:22 - Chris Butler

It's another one of the workflows that's in there.

@23:24 - Christina Ellwood (christina.morelandassociates@gmail.com)

Yeah, I think that's very cool. And really one that's super helpful, you know, the adversarial model is, I think, been one of the most valuable unlocks for teams.

Yeah, right. So in your Team of Tomorrow talk, which was a talk that people can find that Team of Tomorrow on YouTube.

I'll put it in the show notes as well. You showed a future where agents sit in the kickoff meeting alongside product managers.

So this was something harkening back to you at the beginning of this discussion today, where you're basically saying, let's not personify the agents, let's focus on the outputs.

But in this. So case, you actually have an agent that participates in meetings. Talk about that and what value is it bringing and what advice do you have for people who are thinking about using such a thing?

@24:08 - Chris Butler

So I have a different kind of side project called Room Clarity, which is really about experimenting with this. And what it means to bring an agent into the meeting is not that the agent is sitting there as another attendee that is automatically part of the conversation.

Like it's listening in on the conversation and it can actually based on and I think what's really interesting here is that there's actually multiple types of kind of agents you might want to have in the meeting.

Like you might want to have a facilitator that's helping keep the meeting on track. You might want to actually have like an assumptions agent that is like, hey, there's an assumption here that we should probably challenge.

Or even the idea of like a maybe even the adversarial thinking one, like a premortem type of thing. And the way that I think is the ideal model right now is that actually want the humans to be able to have the conversation that they need to have.

Right. Right. But we want them if they notice that one of these agents is kind of in the way it kind of works as a queue of things.

we just That are popping up as it's monitoring the meeting. And this is something that you would like use as a dashboard for your meeting, basically.

And it starts to collect decisions that are starting to form. It starts to look for action items and risks and things like that.

But the agents can also then say, you know, hey, I think like this is a potential problem. What's really interesting is that if the humans then talk about it, then it is actually valuable.

But if it's not talked about, then it should just be discarded. And so I think that's what I'm saying is that in the end, what we're trying to do is the artifact in this case is actually probably the set of decisions and the things that the human talked about those decisions.

But they're provoked in some ways to talk about things maybe more broadly or more directly or more in a detailed way based on these agents trying to push them in a different direction based on what that basically is like a skill is the way to think about it.

And these skills get to exist within this meeting. And so it's a provocational agent as opposed to an adversarial agent.

Yeah, I would say like provocation is something and provocation adversarial.

@25:59 - Christina Ellwood (christina.morelandassociates@gmail.com)

think it just it's it's the depends on the tone a little bit. Well, it also depends on the context.

In this case, if you're asking it to argue with you, you're obviously looking for adversarial. But in the context of the meaning, it's provoking the opportunity to have a discussion about something that it's discerning that may or may not be significant.

@26:14 - Chris Butler

Well, and again, the end goal is not necessarily what the agent says. It's how the humans react to it.

And then the log of what decisions are made or what next steps need to take place that the humans identify as valuable.

And so, again, I think that's why it's trying to join these two ideas of like the artifact is what mattered.

In this case, it's the decision record. Right. And the conversation on the decision. But then it's also trying to bring them in as a kind of like provocational or adversarial.

And I think that different types of meetings require different stances from these agents. Right. One of the things I'm starting to think about more is like, you know, in the like if we do like a user interview, I think that is a very different thing.

You might want to have agents that are helping you, you know, not gloss over something that you should have asked a better question about or a sales meeting where, you know.

Could you ask this question to get more information? Or if you're in an architecture meeting, are we considering the actual architecture implications?

So I think there's something there about the type of meeting and the types of provocations or adversarialness that you need or help in general that these agents could start to provide.

But I think the model is going to be really key, the user experience model. Because, again, they should not be speaking up randomly in the meeting.

@27:21 - Christina Ellwood (christina.morelandassociates@gmail.com)

But if someone notices it and they say they want to talk about that, that actually is a good signal that that agent brought up something that was valuable.

Yeah, it seems like there's probably more refinement that will happen in your thinking as it gets used more context, too.

So you've also talked about how frontier AI models diffuse very quickly, but task libraries, rubric sets, and decision archives don't.

And you're saying, I think, that the compounding advantage that we gain with AI itself is about that institutional memory and the artifacts that they create, rather than about the model.

@27:55 - Chris Butler

You want to talk a little bit about that? Yeah, I mean, I guess with all of the churn... churn...

Of like new model releases, you know, I mean, even the Mythos model that just came out recently from Anthropic, like there's constant improvement in the way that these models can be interacted with.

And I think a lot of people are talking about this as kind of three components, which is the model itself, the harness around it, which includes context, and then the user experience, which they call app, right?

And so I think what needs to go into that harness is actually the context of the enterprise. And that is really the part that differentiates your use of the model from anybody else's use of that model.

Now, there are definitely like aspects of the harness that are like, how are we prompting it inside of this, or what skills are being used, or what, how are these agents being configured and instructed in some way?

And so I see that as part of the harness. And then the app part of it is really, how do I then engage with this artifact?

And it should ideally be in a way that makes sense to me. So like that example from the status reporting, the Google Doc is a much better way to actually do collaborative editing of something across multiple people, rather than say, like in a PR or something else.

And so that's why I think like that, that context. ...artifacts, though, within that harness is really your differentiator when it comes to the way you do your business.

And so those durable artifacts become more and more important over time, especially because if they're mostly caught in people's heads right now, like a leader's head, like usually there is a strategy document, but it hasn't been updated for six months.

It's highly polished, so it actually doesn't like, it doesn't help you make hard choices, but every single time that that product leader goes in and does like a product town hall or does a prioritization meeting or, you know, an escalation meeting, they are actually conveying their model for that.

And so what we need to do so that agents can actually do the right work inside of these enterprise environments, we need that context written down somewhere and not just like ephemeral in someone's head or stuck in a transcript recap somewhere.

@29:42 - Christina Ellwood (christina.morelandassociates@gmail.com)

And so that to me is why these like these core artifacts are just so important. Well, I mean, it goes back to this idea that there's corporate intelligence in the way in which we operate, interact, make decisions and communicate.

And that's part of the reason that companies are increasingly using private models, multiple models. And behind the firewall as a part of that sort of corporate intelligence.

And you're just referencing that in the context of the harness section. And I think there's a lot of refinement that we're going to see.

And in fact, I'm wondering what you think, the leaders that are listening today, what should they start building today, especially since we just had some new models come out from Anthropic yesterday, what should they start building today so that they have that compounding advantage a year from now?

@30:28 - Chris Butler

I think one is identifying the way, like where different types of artifacts exist today. And they probably like IT knows this because they have to pay the bill for those platforms, right?

So inside of a place like GitHub, we will have a huge graph of information inside of GitHub, right? We'll also have a huge graph of information that's in M365.

And there's probably like a couple other places, like, you know, for example, any type of sales tools or customer support tools, right?

And so the question becomes, how do you start to map these things to understand? And where is it best for a human to actually do the work on this?

And where is it best for an agent to take action and create those artifacts in some way? And so I think that is where, you know, with status reporting, we constantly kind of see this weirdness between M365, which is where we want to do our document drafting and shaping, but the content is all in GitHub.

And so being able to then offer automation capabilities between those two different graphs is actually a huge problem for enterprise.

And the reason why is because of the fact that, like, there are security and privacy issues that will come up for that.

Now, choosing a platform that does allow you to kind of like guardrail the agents, I think, helps improve that in a lot of ways and makes it a lot less likely to actually do something bad.

But that to me is the most important thing is that we where are these key documents that we think are going to be like help inform the context of everything.

Right. And that could be even the employee manual, potentially. Right. So we need to identify what do we think are these durable things.

Now, I am also, like, working on another weird side project, which is about, like, how do you map, like, say you're.

Google Drive, your M365 to understand what documents are actually starting to rise in importance, which ones are hub documents that everybody references, which documents are stale, and maybe from an ontology standpoint, what's the difference between all these documents that are out there?

Are they in alignment or are they not in alignment, right? And so I think there's a lot that providers like Google Drive or M365, and I think a lot of maybe homegrown solutions, like the one I'm talking about right now, are going to start trying to do that idea of understanding the actual ecosystem.

Because I would say that like the SharePoint kind of like, unless you highly curate your SharePoint or your Google Drive like folder, like it's not very helpful to be able to just like look at it, because you don't see the activity on these documents, you don't see how these documents are relating to each other.

And so I think that's something that's gonna be worked on a lot. And I think actually, if enterprises start to focus on what are the most important documents, and they track them in a way that is actually helpful and have agents updating them or annotating them.

@32:58 - Christina Ellwood (christina.morelandassociates@gmail.com)

I think that gets them all. Like, really much further along than a lot of people that just have, like, scattered content everywhere inside their ecosystems.

Yeah, well, I mean, you might remember Randy Friedman, who's with Cognizer, and that's one of the problems that they solve on the legal side, right, is where are all the documents in the enterprise, right?

And categorizing them and doing the various work associated with making them available to use properly by an agent. So, moving on to our closing section here, I'm very interested in what resources that you would recommend today for listeners who want to explore Artifact Scoped Agents, your Agentix Beyond Code project, or other things that you are finding really valuable today in your work in implementing AI.

@33:46 - Chris Butler

Well, I would say, like, you know, definitely go to the Agentix Beyond Code repo. There's a lot of, kind of, examples in there of ways that I've tried to create, like, valuable actual agents with.

So I would say go check that out. I think it is really interesting to be able to adopt some of these patterns to other places as well.

so if you are building with a different platform than, say, GitHub for where you store a lot of your context, I think that these are patterns that people will use.

just have to figure out. And then they can use these instructions and the way that we're splitting out instruction and policy into helpful ways.

I think those are patterns that they should be able to reuse in almost any kind of technology environment. So that would be my main pointer is basically to go and check that out.

And then if they do need help, again, like I said, I'm always happy to help out if they need to get some guidance on how to do this inside their organization.

@34:42 - Christina Ellwood (christina.morelandassociates@gmail.com)

Sounds like that could be a really good workshop for us to offer to our community as well. So we should talk about that.

And in this phase of the AI revolution, I mean, you've been a leader a long time. I'm curious, what leadership skill do you find most valuable today?

@34:57 - Chris Butler

I mean, I think right Now there's a huge rush towards being able to just build a bunch of stuff.

I think the thing that we are missing a lot of the time is the ability to ask really good questions.

And I mean that both in the sense of like of your teams asking good questions of them, but also when we talk about like understanding whether this thing is actually more valuable or not, like what are the questions we should be asking, you know, the people that are using these tools.

And I am a big believer in like user research, especially from a qualitative standpoint to understand what's going on.

And so I think being really good at like pulling out unbiased kind of, or from the point of view of that person, like their true opinion about something I think is really, really important because there's going to be lots and lots of change inside the environment over the next couple of years.

And that just means that people are going to be way more heads down just like trying to just do execution, but we're not actually going to understand each other more.

And so that to me is a really important leadership skill.

@35:54 - Christina Ellwood (christina.morelandassociates@gmail.com)

I like that, yes, and it's not one I've heard mentioned before, but I think it's absolutely true.

@36:00 - Chris Butler

If our listeners remember just one thing from today's conversation, what would you like that to be? I mean, that agents should not be co-workers and that the artifacts that they create and the way that the humans actually work with those artifacts are really the most important things.

And so you should always be thinking about that stuff first rather than, hey, let's try to create this random thing that we think might be good, but not really think about how the human deals with it.

@36:23 - Christina Ellwood (christina.morelandassociates@gmail.com)

That's what I would say. Okay, great. Well, Chris Butler, Director of Product Operations at GitHub. Thank you so much for joining us once again on AI Realized.

We really appreciate your time.

@36:33 - Chris Butler

Thank you so much for having me and great talking with you.

Your AI Agents Are Built Wrong

Guest: Chris Butler

Episode Summary

Resources

Articles & Documents

Agents Should Produce Artifacts, Not Impersonate Roles" by Chris Butler Published inAI Realized Now newsletter

Safe Outputs documentation

Blog post

Substack

Agents and Advice

Agentics Beyond Code— Open source collection of GitHub Agentic Workflows for non-engineering roles (launch readiness, compliance review, status reporting, transcript parsing, process review, and more).

GitHub Agentic Workflows — Framework by GitHub Next with security architecture including token-scoped permissions and safe outputs that restrict agent write actions to narrow, declared output channels

Liminal Practice — Chris's consulting and learning-and-development practice focused on AI; offers workshops and help implementing agentic workflows in organizations

Research and Data

Books and Publications

FAQs

What is artifact-scoped agent design?

What is the "read wide, write narrow" trust boundary for AI agents?

What is an adversarial PM agent and how does it work?

How can AI agents maintain living documents in an enterprise?

What is the Agentics Beyond Code open-source repository?

Why should AI agents not be treated as coworkers?

Join AI Realized