B

Behind-the-Scenes: Semiotic 3 with real-time streaming data

I’m releasing a new version of Semiotic, Semiotic 3, with first-class support for real-time streaming data and AI collaboration. You might think the timing odd, that maybe I used AI for this. You’d be right. I’m not the best programmer. I was never the best programmer. I know that. I tell people that, all the time. I know it because I have worked with the best programmers — either directly or, more often, via their libraries — such as owing so much of my career to Mike Bostock, the creator of D3. I can see how they approach problems and research solutions. I’ve always built things that need their support, so I have always tried to excite them and get their help to develop the parts of a feature that I know needs improvement, but that I’m no expert in. Things like build systems or web workers. I hope that’s no surprise. In my experience, real products are built by teams.

The original Semiotic started out at Netflix as the Abacus Viz Framework — Abacus was our internal UI package we used to create apps for reporting and AB Testing. Without James Womack working on it, I never would have got it off the ground. Without Susie Lu’s work on annotations and design, it would have been a tech demo. Still, I remember during a Reddit AMA at its release, someone asked me, “Is this really ready for prime time?” Later, Tom MacWright helpfully improved the build system and Oleksii Raspopov dramatically improved its performance so we could use it in the DEX and Data Prism at Noteable. Every one of those people is a better programmer than I am.

Semiotic was conceptually popular (>2000 stars) but never a widely adopted library. I think that’s because I ran afoul of Conway’s Law and shipped the org — the org was just me and my obsessions. I remember demoing this to Miles McCronklin over at Facebook and he said why even have the frames at all wasn’t everything at its core marks and glyphs and channels? But I thought there needed to be a middle layer in abstraction between the Grammar of Graphics and a charting library that exposes <BarChart>. That’s what the frames are–but what I did not think of was how everyday devs who were not experts in D3 would want to use it. I loved all the escape hatches in Semiotic. I’m sure it drove people nuts. It also suffered in performance and was a hefty library. Making things super performant is something I pride myself on, but not on the library side. That’s where AI really helps. It crafted the higher-order components that I knew would make more sense to folks. It optimized the data pipelines in ways that I knew were necessary but that I didn’t know how to do.

Semiotic shipped with a lot that isn’t there anymore. Crazy stuff like radial violin plots, sketchy rendering and word clouds using force-directed algorithms. Its development allowed me to explore data visualization in different directions, sometimes emphasizing the primacy of annotations while other times seeing just how far I could put categorical data visualization (really far, it turned out, a bar chart really is parallel coordinates). But it couldn’t compete with the focused simplicity of other charting libraries and was more like an ecosystem with one prolific user.

And so, like many open source projects, it settled down. I still used it because it was easier and more effective than charts in D3 and I didn’t like the lack of scope of any other charting libraries. And I hate XML composition for data visualization so I couldn’t use visx (this is a personal preference, I otherwise love visx and the folks who made it, as they well know). As I settled into my role as principal engineer of a data streaming company, I had ideas for how to improve Semiotic, but they required a team of engineers who has specializations on technology that I didn’t, so I let them sit.

But with AI maturing so fast, I realized I had an opportunity to finally leverage unlimited better programmers to clean up and modernize Semiotic. Give it some tests and improve its performance, that sort of thing. And while I was doing it, I remembered the talk I gave at Current a year and a half ago about realtime data visualization encodings. And I remembered a concept I had learned while working on Flink data visualization about how Flink thinks that streaming data is a superset of static (or batch) data. And I thought, well, why not, why not make Semiotic a realtime-first library. And with that concept in play, another realization hit me: if I’m using AI to do this, other folks will use AI and the library should treat AI and vibe coders as first-class citizens alongside traditional UI developers.

My experience updating Semiotic was in that way very different from most of the AI-assisted coding I have followed. Much of it is from the ground up. Most products being made are not deeply understood at their core technical level so the AI and the vibe coder are learning (and often stumbling) together. And much of the criticism happens when those best programmers try to see if Claude can one shot their performance architecture better than they could and it fails.

But that’s not what happened here. I wasn’t asking Claude to figure out what a data visualization library should be. I’ve spent a decade figuring that out. I know where the bottlenecks are, I know which abstractions are load-bearing and which ones are vanity. What I didn’t have was the ability to execute on all of it myself at the level of quality it deserved. I knew Semiotic needed higher-order components that would make more sense to everyday developers. I knew the data pipelines needed optimization. I knew we needed proper test coverage. I could describe, in detail, what “better” looked like for each of these things. I just couldn’t always write the code that got there.

That turns out to be the Goldilock’s Zone. AI is not great at making architectural decisions for you, and it’s not great at knowing what the right abstraction is for a domain it doesn’t understand. But if you can say “this component needs to handle streaming updates by diffing incoming data against a rolling window and only re-rendering affected marks” — if you can be that specific about the what and the why — it can absolutely land the how. AI didn’t replace the best programmers I’ve worked with. It replaced the best programmers I couldn’t recruit. 

The result, I think, is astounding.  Semiotic 3 has server side rendering. It is full of tests. Every chart has a streaming mode, except for the hierarchical ones because of the vicissitudes of hierarchical datasets. The old me would have made it work in a hacky way that people would not have been able to follow and so I would have had a cool tech demo that ultimately dragged down the maintainability of the library (trust me, I’ve been there).

Of course, it will have immediate utility to improve the Flink streaming charts we deploy at Confluent. It will empower my stakeholders to imagine greater UIs that put streaming front-and-center visually. And, I hope, it will let others experiment and develop novel modes of communicating with data, which is something I’ve always wanted. But also… there are sankeys with frickin’ particles. Do you know how long I’ve wanted to give people the ability to make particle sankeys?

Principal Engineer at Confluent. Formerly Noteable, Apple, Netflix, Stanford. Wrote D3.js in Action, Semiotic. Data Visualization Society Board Member.