Observability Day San Francisco: The Future of AI and Observability Is Bright
AI and observability are no longer separate conversations—they’re deeply intertwined. Across keynotes, panels, and demos, speakers at Honeycomb's Observability Day San Francisco unpacked what that means for engineering teams today: faster insights, smarter tools, and new challenges to solve.

By: Ken Rimple

Introducing Honeycomb MCP: Your AI Agent’s New Superpower
Watch Now
AI and observability are no longer separate conversations—they’re deeply intertwined. Across keynotes, panels, and demos, speakers at Honeycomb's Observability Day San Francisco unpacked what that means for engineering teams today: faster insights, smarter tools, and new challenges to solve.
From deep technical dives to real-life customer stories, the event showcased how teams are navigating this new landscape, including answering really hard questions, like “Is AI going to take my job?” and “What skills do I need to learn in order to effectively leverage AI?”
Below, you’ll find a recap of each session.
Keynote: Observability Engineering: scenes on team, from the second edition
Charity Majors, CTO and Co-founder
Honeycomb
A Charity talk is always filled with frank, no-BS information and observations. This one was no different. Charity is in a unique position to look back at the observability culture she helped define: working on a new edition of Observability Engineering and serving as Honeycomb’s CTO. She noted how observability teams now control 20-25% of infrastructure spend, and while most organizations are paying observability prices, they’re only getting monitoring results: “There is no other team in the engineering ecosystem that can have as much impact to your business and your bottom line as your observability team. Not even close.”

AI now amplifies the need for fast feedback loops. Observability should be managed as an investment rather than a cost center, serving as the critical sensemaking capability for complex distributed systems that can impact both customer happiness and developer velocity.
“The prediction that I was too chicken to make or put down in print in 2022,” Charity shared, “was that the unified storage model is inevitable, and the multi-pillars model is doomed.” The key is still to shed the three pillars model of storing each telemetry type in its own store, and focus on wide events, fast querying, and using feature flags to enable small, constant deployments into production.
Human-centric observability in AI systems: the story of Fin.ai
Kesha Mykhailov, Product Engineer
Intercom
Kesha, Product Engineer at Intercom, brought down the house with his informative and entertaining tale of how the Fin.ai teams initially struggled with performance problems. If you’re unfamiliar with Fin.ai, they’re an AI agent that fulfills customer service needs—so performance is crucial.
Initially, 19 separate teams were tasked with making improvements to their services, but when Intercom connected all services to Honeycomb to enable distributed traces, they began to see the big picture.

Once Fin.ai started using more than half a dozen LLMs to get its work done, Kesha applied observability, tracking LLM tokens and costs. They were able to improve the speed of their agent, making customers happy, and also improve the costs, making finance happy as well.
Kesha pointed out that the key to really understanding your systems holistically is to empathize with the users, because unless you can express your telemetry in ways that match their journey, you can't really understand their pain. “The art of observability is to surface real operational boundaries,” Kesha explained. “This is our job as observability engineers.”
What's next for the Honeycomb platform
Jessica Kerr, Engineering Manager of Developer Relations
Honeycomb
Observability does more than say whether your software is working. It can say how your business is working
Jessica Kerr
No session would be complete without a product demonstration, and Jessica Kerr (Jessitron, to those in the know) delivered. Starting with Honeycomb Canvas, a chat built right into the Honeycomb UI, she asked it to look into the Honeycomb environments and datasets she'd have access to as a new engineer.
With a single question, Canvas reported what datasets were available, representative traces, performance and throughput, error rates, API callers, service architecture, and more. Next, she used Claude Code and the Honeycomb MCP to review an error found in Canvas, and asked it to recommend a fix.

"We have a mantra in our product of no dead ends, so you can always go somewhere." From there, Jessitron showed how to quickly search traces, logs, and metrics, using Honeycomb's fast columnar data store, and how to use BubbleUp to quickly detect outliers.
Then, she showed flexible dashboards, stressing that they are built using Honeycomb queries. She discussed another new feature, Anomaly Detection, which leverages machine learning to detect problems in your data without forcing you to define triggers.
Other releases were also discussed, including tags for SLOs and boards, our upcoming enhanced metrics release, subqueries in your Honeycomb queries, resource-based access control (RBAC) for those organizations that need it, telemetry pipeline health, and the brand new telemetry pipeline builder. Keep your eyes peeled for more info soon!
From DevOps to AI
Emily Nakashima, SVP of Engineering, Honeycomb (moderator)
David Williams, Staff Site Reliability Engineer, LaunchDarkly
James Bland, WW Tech Lead, Data and AI, AWS
This DevOps panel focused on how AI is transforming the job of a site reliability engineer, which brings both opportunities and challenges for engineering teams. While AI excels at generating code for small tasks, the panel stressed that the real bottleneck now has shifted from code creation over to verification and testing, making test-driven development more critical than ever. Because AI systems function as black boxes, they require new observability approaches.

AI right now is making computer science a legitimate science. We're treating AI as this black box and we're putting inputs in, then trying to look at the outputs, then changing the inputs again to see if we can get different outputs. But there's no way to introspect what's happening inside that black box.
James Bland
One of the liveliest parts of the discussion focused on the concept of an AI SRE. David Williams was skeptical: “I think language is an imperfect output of intelligence. And I think taking that output, and using it as the input into a system that tracks the statistical relationship between those words... it didn't stop to ask me a question, because it doesn't know and it doesn't think.
David Williams
The panel concurred that highly contextual knowledge and human expertise are still needed for solving complex problems, and that the human must be at the center of any AI loop.
AI is like chocolate, and observability is like strawberries
Liz Fong-Jones, Field CTO
Honeycomb
Liz, a former AI skeptic, shared how she's reframed her view to AI realist. She argued that AI, like chocolate, is powerful when applied thoughtfully but problematic when overused, working brilliantly with complementary ingredients like observability (her "strawberries"), but failing when forced into inappropriate contexts: “AI without observability is a liability, because it means you're creating code that no one understands, and no one can understand it because they can't figure out what's going on.”
Drawing from her recent experience spending $713 on Anthropic tokens to build a Honeycomb SLO feature prototype, Liz demonstrated that while AI adds valuable throughput to engineering teams, it doesn't necessarily increase speed and still requires human expertise and attention.

But you cannot say, we're going to lay off all the engineers, and AI is going to do all of it for code that is running in production. You cannot outsource that to someone else. A computer cannot be responsible for a business decision.
Liz Fong-Jones
How to thrive in an AI-native world
Adam Jacob, CEO
System Initiative
Adam Jacob, creator of Chef, Co-founder of Chef Software, and now CEO of System Initiative, has been at the heart of infrastructure-as-code tooling. In his talk, which started with a dive into how LLMs "think" using token prediction, he made the point that they are creative, but not intelligent. Without hallucinating to some degree, the creativity won't exist, but that means using external tools like agents that execute calls to deterministic tools, such as MCPs.
From there, Adam stressed a common theme of the day, that context rot makes models dumber as context gets bigger. He explained how the way you train your model should mirror how you ask it questions, that exposing your own tools with MCPs allow many different agents and workflows to emerge, and how failing with policy and compliance errors at the tool, rather than inside the LLM itself, will help the LLM make changes and try something different.
Above all, Adam recommended a human in the loop. This, provided with deterministic tools and guardrails against autonomy for crucial actions, can keep your LLM interaction under control for crucial systems.
MCPs and o11y = the perfect production engineer
Austin Parker, Director of Open Source
Honeycomb
Austin Parker delivered an enlightening talk on the Honeycomb MCP, and how it is an evolution of the approach of using investigative observability feedback loops to solve problems with applications, but now using AI and LLMs with agents.
Just as developer can use the Playwright MCP to automate a browser, and the GitHub MCP to manipulate their repositories, they can now use Honeycomb's MCP, from developers in their IDEs asking questions about application performance, to SREs doing complex investigations using natural language queries, to observability teams who need to accelerate migrations, modify alerts, SLOs, and many other tasks.

I think we often judge LLM agents unfairly because we drop them into a zero environment or zero data environment and expect them to do better than a human would do with zero data.
Austin Parker
Austin showed how MCPs process incoming data. An example of a trace was given: if you sent it in as an image, a table, JSON or CSV all impact the number of tokens and how the data is processed. All in all, a great in-depth talk about how MCP servers work and interact with LLMs, with practical examples from the work done by building the Honeycomb MCP.
The effects of AI on modern software development
Austin Parker, Director of Open Source, Honeycomb (moderator)
Charity Majors, CTO, Honeycomb
Rob Zuber, CTO, CircleCI
In this end-of-day panel about the changing landscape of software development brought on by AI, LLMs, agents, MCPs, and tons of enabled tools, the general consensus of whether AI will replace every developer was that "Mo code equals Mo problems" (A standout Charity Majors quote right there).

We've been at this [software development] game for 70ish years. And now we're asking LLMs to do the job the way that a human would do it.
Rob Zuber
Answering about an audience member's question on how hard it will be to look at AI-generated code five years later, the panel felt the problem of understanding it is hard whether it was generated then or now: "I really don't trust my code. You should never trust anyone's code. First rule of software: Don't trust the code."
Charity Majors
A key struggle for organizations now is to reckon with disposable, AI-generated code. AI can help with analysis, assisting in writing tests, and instrumenting for observability. But another big concern is whether this mass of generated code should or could be converted into production code, or replaced by either newer hand-written/AI-assisted or generated code.
Time cycles keep shrinking, but it's going to be this sort of permeable boundary between when software is disposable, but then gets promoted to be permanent. Hopefully that's a conscious choice and not just, oh well, we committed it and it's there.
Rob Zuber
The panel agreed that crucial code will always require engineers who are focused on good design, architecture, and sound code and tests—in short, a human-driven, careful process. AI may help engineers arrive at changes quicker, but it shouldn't always be used to generate that code. On the other hand, experimental, throwaway code is being generated for non-programmers, as well as software engineers to research, prototype, build temporary applications, and even land applications in production. It will require just as much, if not more, instrumentation and observability.
Wrap-up
And that's a wrap on a great day of talks! Product demonstrations showcased new capabilities, while panel discussions delved into the evolving role of SREs and the crucial need for human expertise amidst AI-generated code.
Ultimately, the consensus was clear: while AI offers immense potential, it is observability—and the human intelligence behind it—that remains critical for truly understanding, verifying, and delivering reliable software in this AI-native world.