Bringing Apache Kafka to the enterprise

In January 2018, IBM made a small investment to build an event streaming platform, from the ground up.

Design team

3 designers

Date

2018 – 2020

My Role

Design and front-end prototyping

Background

The challenge

In today's world there is a copious amount of data, billions upon billions of daily transactions. It's often hard for businesses to take advantage of this abundance of data, resulting in lost revenue and valuable missed opportunities. This has prompted a rise in event-driven architectures.

The business opportunity

IBM Event Streams was the result of a small investment by IBM and enables developers to easily connect their legacy and modern infrastructures to see/access these events from one centralised location. Rather than connecting 100s of systems to 100s of other systems, all of these systems now just have to connect into one. Now companies can use this data to create responsive and intelligent applications.

Apache Kafka is the open source technology that powers this. As founding members of the design team we were tasked with adopting this open source technology, and designing an experience that would allow users with all manner of experiences to harness its capabilities and unlock their businesses data.

The as-is of Kafka

In January 2018, we ran a large number of workshops with stakeholders, had interviews with external users of Apache Kafka and held internal design workshops to explore the current 'as-is' scenario. We also craeted a stakeholder map, in order to understand the scope of existing business hierarchies.

This as-is scenario map is a typical IBM Design Thinking task, identifying the current scenario we'd be designing into.
This as-is scenario map is a typical IBM Design Thinking task, identifying the current scenario we'd be designing into.
Stakeholder map to explore all the different parties that we may need to consider.
Stakeholder map to explore all the different parties that we may need to consider.

This research phase identified a few areas of opportunity, largely around the provision of education for the relatively new Kafka technology and in the onboarding of new users.

We also discovered that partners from well-known companies were managing their own Kafka deployments and were screaming out for an enterprise partner such as IBM. A few were using SaaS offerings but IBM already had a primitive offering in this space (Event Streams was to replace this).

Our personas

From the as-is scenario map and the stakeholder map, we identified a few key personas to help guide us through the design process whilst we navigated this rather complex territory.

My focus
Image of the persona Lesley the Full stack developer

Lesley

Full stack developer

Works for 'Zoom Air' (a fictional airline). She has increasing amounts of real-time data to be consumed to fuel predictive analytics. The new application needs to react to events in near real-time as they occur.

Developer
Constant Innovator
Productivity follows her
Ability to understand depth of technology
Creates applications
Understands the business needs

Image of the persona Kevin the The Event Streams Administrator

Kevin

The Event Streams Administrator

Works for Zoom Air. Manages the infrastructure of their private cloud (IBM Cloud Private).

Configure, deploy and run a Kafka in IBM Cloud Private (and know it has successfully installed).

He knows there is an IBM Kafka service in ICP which he wants to find, configure, deploy and get running successfully.

Basic Kafka knowledge
Knowledge of ICP

Once understanding the roles and responsibilities of each, along with numerous user interviews (conducted by our researcher), we made a first attempt to craft some hills/user stories, an IBM Design Thinking method for keeping alignment with large teams.

Hills/user stories

User story 1

Kevin, the Event Streams admin, can deploy Event Streams into ICP (IBM Cloud Private) and be confident it's fully functioning within 15 fully occupied minutes.

User story 2

Starting from a sample application, Lesley, the full stack developer, can use Event Streams to write events to, and visualise data in Kafka, enabling incremental app creation.

User story 3

While using Event Streams, Lesley can intuitively give feedback, receive assistance and raise issues, getting a response in one working day, regardless of whether she’s using the free or paid version.

Designs

The story

The hills/user stories above were how we kept alignment with the wider offering team. My main responsibility was for hill 2. For this created a story which formed a thread throughout our design work: Zoom air is a commercial airline, they've got a new initiative around schedule delays, with the goal being to reduce the expense of reaccommodating disrupted customer by 10%.

Lesley organically hears of Kafka, and discovers IBM's offering through IBM's 'Kafka basics' education which she's able to find easily online. This should help her gain familiarity with the core Kafka concepts, and the benefits of using it.

She decides it'd be a good fit for the new initiative the airline is launching around reducing schedule delays to while also continuing to increase NPS. Kafka seems a good fit because of this initiative's reliance on near real-time events.

Lo-fi exploration of the key concepts

We first started to document the product using post-its. Our researcher took these and validated some of the concepts we were exploring to novice Kafka users, the flow explored the basics of what we thought our product needed to do, and took our sponsor users around the key areas.

It was becoming increasingly clear that our sponsor users were even less familiar with Kafka than we previously thought. A few iterations later and we were focussed on the install experience, testing if a staggered install experience would be beneficial to users to help guide them through the process as well as educating them at the same time.

This prompted us to design a mid fidelity flow for the 'topic create experience'. This allowed novice Kafka users to create a basic topic (the crux of Kafka) without any additional documentation.

Building out the prototype...

In addition to the above screens, here are a few of the additional wires that we strung together into a flow for testing.

What we learned
  1. The majority of our sponsor users loved the 'hand holding' approach to the install, which gave us the opportunity to explain some of the key concepts of Kafka, such as partitions, replicas and retention. We ended up building something similar to this in the first release. (For those that didn't like the approach, we designed a compromise to reduce the hand holding, which actually became the basis of a Carbon pattern for the rest of IBM to use.
  2. We also found that users really hated the idea of our product having logging as a first class citizen. All already had existing logging solutions that they wanted us to integrate with.

The 'Think' conference

This was an opportunity to test a more end to end flow. We testing the whole getting started experience, including a concept of a starter app

What we learned

This largely just validated the business need for Kafka, and confirmed the concepts we were planning to build. One thing that gained huge popularity after Think was the concept of a 'Message browser', a way of navigating the messages was extremely well received and, as such, actually became a large focus.

We received lots of opinions on what our 'monitoring' focus should be, this was largely focussed around which technologies we should plug into, and what we should do natively.

Users were keen on the simulated topic idea as a way of learning Kafka and Event Streams. We also had lots keen to see the proposed "Kafka Basics" education as nothing like this (something that targets novice Kafka users) currently existed.

Monitoring

Monitoring was quite a tricky piece of work. Whilst our research suggested users wished to see all manner of information about their Kafka system, often this wasn't technically feasible. We had many disheartening conversations where our proposed monitoring designs were torn apart after the development teams had done a spike to test feasibility. Below are an array of different designs that were explored.

During this process, IBM Design also released a major update to its design language, Carbon. We now had the task of weaving these new redesigns into our plan. Below are some explorations:

What we learned
Whilst the other designers and I were extremely fond of the large figures in these designs, our users were less than keen due inefficient use of real-estate! This informed the chosen direction.

Message Browser

As mentioned previously, the intent of the message browser was to be one of our sponsor user backed differentiators. It was however also one of the pieces that required the most engineering effort.

As an aside

This also happened to be one of the pieces of this project that I was most proud of, as the developers and myself had created what I'd consider to be the perfect design/development working relationship. They let me into the code, I designed with them, everyone felt like they owned the project which was great once the technology started to throw us curveballs!

Lo-fi

TO DO: MAKE A PROP

Final

TO DO: MAKE A PROP

Introducing Elasticsearch

Our earliest ideas for the message browser were all reliant on the Elasticsearch 'buckets aggregation' concept. Elasticsearch was therefore introduced into the product stack to handle this for us. We began to take the bucket concept further, to explore some of the interactions that we might need to get right in order to make a successful experience.

The intention was to give a huge amount of control to the user in terms of viewing their data. We made a real effort to make this as transparent as possible; this is not IBM's data, this is the customer's data and allowing visibility to a potentially huge amount of data was key. However, we needed the performance of Elasticsearch to bring this to life.

Elastic has this concept of 'buckets', these are effectively groups of data, devised by an algorithm. For our purposes we grouped our data by time, recording every millisecond.

one of the
one of the
We even explored 'live view visualisations', which would use some kind of particle simulation to indicate current message throughput in a more visual way.
We even explored 'live view visualisations', which would use some kind of particle simulation to indicate current message throughput in a more visual way.

I was then tasked with taking this to hi-fi. This involved working hand-in-hand with the UI squad to iron out the technicalities from the mid-fis. The direction we chose was a large, 'almost-fullscreen', interactive experience. The idea here was to give the user visibility into their potentially enormous amount of data (could be millions of data points per second).

What we got wrong

Cracks with Elasticsearch began to show after our first release. To be performant we needed to group data more unreliably than we had first hoped. We also had issues with rendering the amount of data Elasticsearch was capable of dealing with in the front-end.

When speaking to users they consistently liked the idea of a highly visual 'message browser' centred around a graph. We'd user-tested all of the previous designs and they were met with positivity and "I needed this yesterday" kind of responses. However what we didn't think to check was how much they'd like this versus the additional weight it would bring to their installation.

Eventually it turned out the heavyweight feature was routinely removed by clients installing Event Streams, due to the heavy footprint not being offset by the benefits. Clients were able to use other monitoring tools to achieve similar results (just not quite as tailored).

So we removed Elasticsearch...

We had to scale our vision for this experience considerably and resort to using more standard rest API-esque technology... not ideal but this was a compromise our customers were willing to take to reduce the overall footprint. This all happened at the same time as Carbon X so we used the Carbon X delivery pipeline to accommodate these 'updates'... downgrades?

We went back to step 1, and I worked with one of the lead UI developers to iron out issues and to understand the development effort required, so that we could design something both useful but also realistic given the constraints of the technology.

This suddenly meant we didn't have the ability to see and plot the graph across the whole message retention. We also couldn't view message content, however in testing it seemed this would have been a 'nice to have' feature, and definitely not something every user would have access to (privacy laws etc). So the challenge was to work with the APIs we had and create a slick user interface that still provided the same level of message visibility as before.

What we learned
Before these curveballs, the design centered around seeing messages live, and being able to browse them freely. With hindsight this wasn't particularly useful and the new version prioritised allowing users to search by specific parameters, rather than browsing.

Understanding the API

By backtracking to lower fidelity wires, I was able to work with the developers to really understand and design the interactions against the constraints of the API e.g. loading old messages vs live messages, pagination, persistent select behaviours etc. This was actually a really challenging but enjoyable experience, to get a really deep understanding of the API in order to achieve the best UX possible.

After many hours with the developers to understand the technology, I felt able to increase the fidelity. Eventually I ironed out every state the developers needed in order to build. This involved:

  • Initial load
  • Seeking by offset vs time
  • All partitions vs individual partition
  • Scroll behaviours
  • Individual API loading states and error states (incl. pagination loading states)

I also created a react demo that showcased some of the key patterns, and also built some microinteractions which ended up in the final code.

Duo concept

Concept piece/Sponsor users for the new design language (Now Carbon X)

After our first release, Event Streams got a pretty good reputation (both design and development) for being a product that released quickly, in an agile fashion, and because of this we were able to circumvent some of the hurdles. As a result, I was approached to test run some early iterations of the new design language. (Note this actually pre-dated Carbon X).

These never made it past a concept piece but I played these back to the Carbon team along with thoughts and opinions. It was a shame these never made it beyond basic concepts as I think they could have been great! (It did convince them to implement a dark mode later on though!)

Working with development

One of my strengths is being able to work with developers. One of the ways that I do this is to work with them at the code level where possible/convenient. I was one of very few designers that had access to their products code base, and I was eventually made a mandatory reviewer (for any front-end work) in our Git repo. It was often considered that I did the tidying/polishing. The developers also really appreciated that I didn't have to communicate with them to fix the literally thousands of CSS tweaks I routinely made.

Before

TO DO: MAKE A PROP

After

TO DO: MAKE A PROP

Extras

Whilst I am not an illustrator, I was quite proud of these ones which were accepted onto the marketing website.

I also did a few explorations around how we could use interactions to improve our geo-replication features.

Some motion

Consider these mid-fidelity wireframes that happen to move). Whilst we all loved the concept of this, the development costs were simply too high for such a small team:

Create scroll prototype:

Hi-fis