Designing the future of human-AI collaboration through transparent and easy-to-use interfaces

Responsive image

Gradient

Most applications of AI today are based on using data to train a machine learning model to predict an outcome. While this interaction model works well where a task can be easily defined (automated), it fails to work when the task is not clearly defined. Music creation is one of many examples where the current blackbox-driven models don’t work, and existing state-of-the art algorithms are unable to perform well. For example, a musician looking to compose a new piece of music using existing machine.

Approach

I started the project by mapping out the space of AI research, evaluating available tools (apps, plugins, frameworks), their usability problems, and in general gathering insights about their designs. In addition, I worked with 6 composers throughout the design and ideation process to validate ideas and understand their needs when interacting with generative systems.

Problem

While machine learning works well in applications where a task can be easily generalized from training data, it fails at augmenting users in tasks where the goal is diverse, personal and not clearly defined. For example, current tools fail at: delivering novelty (new combinations not seen in the data) and offering granular control to users. This in turn, makes it difficult for musicians to find desired musical outcomes, overwhelming them with an infinite number of ideas, all very unrelated to the artist’s initial intention. In summary, the current design paradigm that machine learning applications are based on needs to be challenged if AI is to be a tool that can augment human capabilities beyond what either a human or machine can achieve in isolation.

Acheivements

Gradient introduces an alternative approach to designing with AI by focusing on making its algorithms transparent and easy to understand by its users. As a result, the interface allows composers to express their ideas by interacting with the algorithm to steer it towards novel outcomes. Instead of trying to create a system that composes for them, Gradient builds a way that musicians can compose with. By mixing simple components in a node-graph interface, new patterns can be reconfigured to create infinite new musical permutations. While Gradient focuses on music generation, the learnings acquired here can be used in different applications such as text generation or scientific discovery.

Services

Product Design
User Reserach

Capabilities

Data Visualization Design
Conception and Design
User-centered Design
Iterative Prototyping & Wireframing
User Research & Insights

Location

Berlin

Duration

4 months

01User Research

Evaluating existing tools

I began my research by evaluating existing tools through looking at their interaction models (how they work and what metaphors are they based on) and their information architecture. I used these tools myself to compose and explore musical ideas and see how they work. I followed a process similar to heuristic evaluation and task analysis, with the exception of not engaging a group of specialists. This step helped me understand how these applications differ, the spectrum of possibilities I could focus on, and what problems I was interested in solving moving forward.

My domain-specific knowledge in music composition and production allowed me to simulate the path a user might take from exploring websites, comparing value propositions, to installing and using tools.

I was interested in mapping out:

  • Interaction models: how is the system designed and how does the user interact with it? What are the options and outcomes?
  • Underlying technology: what machine learning technologies are they using (deep learning vs GANs)? Value proposition: how do they present themselves to artists? What features do they claim and what promises do they make?
  • Value proposition: how do they present themselves to artists? What features do they claim and what promises do they make?
  • User base: what user groups are they targeting?
  • Musical output: what does the combination of interface and technology create? What kinds of musical output do certain systems prioritize?

List of tools investigated

Key insights

  • Most tools are in the experimental stage: these tools work well but are not necessarily useful for professional use. They express ideas and show what’s possible.
  • Deep personalized music: many projects in this category focus on generating music that react to user inputs (biofeedback, sensors, etc). This group of projects focuses on ambient music generation or custom-made tracks that match users’ intention for contexts such as meditation, workout or driving.
  • Licensing free AI-generated music: the majority of the projects I investigated focus on videographers and vloggers as their user-base. The musical output isn’t anything unique, but that doesn’t defeat their purpose, which is generating easily accessible music at a low cost.
  • Frameworks for developers: these tools make programming easier, but have a high learning curve and are often designed for creative coders. These frameworks feature certain model architectures and training algorithms.
Responsive image

Defining which users to focus on and why

The evaluation enabled me to understand the whole gamut of users different tools were targeting, and how certain user groups are currently being served by the tools available. By defining what each group needs and the tasks they are trying to solve, I was able to see if the solutions available could be improved upon. Specifically, I was able to uncover that composers were the least served by existing tools. Their needs are very specific and are the most challenging to design for. Composers need tools that are offering fine-grained controls, are deeply customizable, while also being easy to use. This in particular seems to be the focus in machine learning design over the next years: how do we make machine learning easily accessible not only to data scientists and programmers, but to everyone who wants to be augmented by what the technology can offer?

Another major insight I had was how the concept of augmentation was being used as a buzzword without considering what it might entail from an interaction design perspective. A lot of applications that claimed to augment composers in music creation tasks were in fact automating tasks for them.

User Research Overview

  1. User interviews & contextual inquiry
  2. Co-design exercises and sessions
  3. Unfocused group panel discussion at CTM

My interview process was focused mainly on qualitative understanding of their workflows, tasks and emotions regarding the creative process.

User interviews

Once I defined my user group, I set out to understand how different composers approach the task of music composition. I visited them at their studios instead of meeting them remotely to get a better sense of their tools and their environment. I was also interested in understanding how they thought about music conceptually, and how concepts were translated into how they approached their tools (software, hardware, instrument).

I centered the interviews on a few open-ended questions to guide the discussion:

  • User flow: How do they navigate from having an idea to finishing a composition? How do things begin and how things end?
  • Entry points: What are the main entry points to their creative process? What are the structures that might guide the process?
  • Conceptual models: What are the most prominent conceptual models that different composers have in common?
  • Previous experience with ML: What are their frustrations, expectations and fascination with the technology?
Responsive image

Key insights

Entry points and axioms

  • Exploration vs. intention: a tension between knowing what they want to hear and trying different things out until they find something
  • Errors & glitches: exploring how tools brake as a form of finding interesting ideas and unexplored paths
  • Process-based: musicians set up different processes ahead of time and plan their sessions before they engage in composing something
  • Copying other ideas as starting points: using an existing idea to initiate the composition process, and eventually abandoning the original idea as the new idea takes shape
  • Seeds and prototypes: ideas that can center the composition around one anchor point to inspire the development of new ideas

Conceptual models: How do they think about musical events

  • Time-based notation: notes on a page followed by their intrinsic relationship (composers and technically trained musicians)
  • Frequency spectrum: instruments are spread across frequency spectrum (low to high)
  • Gesture-based: instruments and notes are connected by chain of events and are bound together (improvisation ensembles, complex chain of events)
  • Networks: musical events are the result of interaction between agents in a network (modular synth, Max/MSP)
  • Agents, stage & storytelling: each agent is a sound that interacts with other agents to tell the story, entering the stage at different times to express different ideas
  • Rhythm-centric: a rhythmic idea that centers the music piece (techno, percussive tracks)
  • Harmony and melody: melody moves on top of a narrative in different rhythms (counterpoint)

Unfocus group CTM

In addition to the user interview, I also led a panel discussion with 5 different musicians about their experience in working with machine learning in their creative process. The recording of the panel can be seen in full length here:

Summarizing insights

User problems with pre-trained models in artistic context

  • Expectation vs. reality
    All artists seem eager to try out Ml in the creative process, but get frustrated after trying existing tools
  • Different backgrounds
    Music education isn’t equal amongst artists, and not all artists are knowledgeable about music theory
  • Augmentation instead of automation
    All artists were opposed to having the model generating fully fledged compositions
  • Problems with curation
    All users complained about inability to sort through the model output since the possibilities are endless
  • Lack of originality
    Musicians complained about not being able to get unique results from apps that used generic training data
  • Lack of transparency
    There’s not a way for artists to communicate with the model, which makes for a very frustrating creative experience

02Problem Definition

How might we?

  • Augment artists in their creative process with a high level of granularity?
  • Design interfaces that mediate the communication with the algorithm to enhance artists’ workflow?
  • Create ways that people can shape and train their algorithms to their own needs through transparent interfaces?
  • Shift away from the random output model to something you can actually understand and manage?
  • Give artists full control of the algorithms without having to code their models and become programmers?
  • Design an expressive language that is natural and intuitive and doesn’t exclude self-taught musicians?

Shallow learning: a new approach to machine learning design

If deep learning is defined by large datasets, narrow tasks, and large computing power, “shallow learning” is the exact opposite: simple models that need human guidance to work, are transparent, run well locally, and work well with small datasets.

Design principles

Design principles are a good way to define the principles that the design solutions should capture. They work as safeguards for quickly validating the different ideas that might come up along the way. Good design principles describe the most important elements of the solution without being prescriptive and work to align future solutions with themes found earlier in the ideation phase.

  1. Example-based interaction: musical examples are in themselves a way to communicate intent with the model
  2. Simple vs complex workflows: the experience offers simple modes with advanced controls when necessary
  3. Open-ended and diverse: the language structure can be further refined by users themselves through learning
  4. It grows with users: knowledge accumulates, and the system is biased towards user concepts and patterns learned from previous interactions
  5. Errors don’t exist: regardless of user input, the system should also produce sounds and musical events
  6. Expressive, yet abstract interfaces: balance between granular control and abstract forms of expression

03Ideation & Solution

Responsive image

Music Machine Language: a no-code visual programming language

MML is a visual language that enables the model to express the concepts it learned from training data visually. By exposing the model to a visual interface, users can further manipulate the output by changing the weights, operators, connections, and parameters. MML works in two ways: it tells users what the model sees, and offers musicians a way to steer the algorithm towards novel outcomes the model has never seen. Instead of exposing the neural network as it is to users, Gradient abstracts the parameters through simpler components to achieve the best balance between granularity, usability and control.

  1. Visual abstraction to mediate how musicians think about musical ideas vs. how the algorithm understands music
  2. Works in two ways: you can use the language to visualize musical concept, as well as use it to express new ideas
  3. You can program musical relationships without having to code them by hand
Responsive image

Abstract scores and non-conventional music notation

MML draws heavily on the abstract musical scores experiments done in the 60s and 70s by electronic musicians and avant-garde composers. In these scores, artists explored ways to encode musical ideas through visual abstractions as opposed to composing notes on a page (music notation). I was interested in exploring ideas that lived in-between music notation and abstract visual representation, but that could also be used to generate ideas instead of only representing them visually (two-way connection). I spent a month trying to find a solution that achieves such balance, and that could be robust enough to encode a variety of ideas. Once I found a potential way to solve the problem, I set out to test the solution implementing a proof of concept in Max/MSP. The proof of concept featured very few operators, but was able to work seamlessly as a no-code programming language to assist the composition process.

Responsive image
UPIC, a system designed by Iannis Xenakis, translates graphic notation to sound on a 1:1 relationship.

MML: a no-code visual programming language

Responsive image

Defining the flows

Responsive image

I designed the app to match musicians’ approach to the composition process, and as such it has four different entry points:

Responsive image
  1. Import midi seeds into Gradient to start a new composition process: musicians need to first import midi files. MIDI is a protocol to encode musical events, similar to music notation. MIDI files can have one to many instruments on a timeline.
  2. Manipulate seeds: once the musical ideas (seeds) have been imported and analyzed by the algorithm*, users can change the visual node-graph generated by the algorithm to change the musical output. By changing the visual graph, it’s possible to compose new pieces of music, or generate different variations of existing ideas
  3. Combine different seeds in different ways: multiple seeds (musical ideas) can be imported into Gradient. Once imported, they can be mixed in different ways. Each seed features an input and output which enables composers to mix these ideas in infinite ways and one idea can be used to modulate different ideas to generate new outputs. For instance, you can mix a drum pattern with a piano to have them playing closely together.
  4. Record, and finish ideas gradient works well with existing music software, so that the output (music events) can be recorded into softwares such as Ableton, proTools or logic very easily. Gradient isn’t designed to compose fully fledged ideas, thus, letting musicians edit and finish compositions is an essential part of the workflow.

Building a high-fidelity prototype

Responsive image
Responsive image

Importing MIDI files

Once a midi file is imported into Gradient, the software builds up a map of the relationship between note events and how instruments are related to each other. The system looks into all the instruments at once and tries to find groups and relationships between them. For instance, two instruments could be linked together or in opposition to each other, either by events (one only happens when the other is mute) or by interval and harmonic relationships. Once analyzed, these relationships are plotted onto the visual language.

Deleting nodes and changing connections

Using MML to compose new ideas

Connecting seeds to create new ideas

04Conclusion

Validation and next steps

I was able to validate the main concepts regarding the interface design, user flows, and managed to build a working prototype in Max/Msp to validate the technical viability of the system. I’m currently looking for research partners, funding, and ML engineers to join the research exploration, and to continue developing the project into a working prototype to further iterate on it’s most challenging aspects. Please get in touch if you’d like to hear more about the project, or if your interests are aligned with the research outlined here.

Similar projects

Responsive image

XaiPient

Building a machine learning platform aimed at increasing transparency in AI systems through human-friendly explanations

I worked with the funding team to help them turn years of technical research into a tangible working product. I worked on UX research, I produced prototypes, gathered insights from real users and designed and tested the beta release.

→ Read case study

Get in touch

  • ricardoasaavedra@gmail.com