Case Study Prime: MERLIN

Data Exploration

Adrian Galvin

Published in

Thesis Modules

16 min readApr 9, 2019

Context

The MISR instrument team at the NASA Jet Propulsion Lab is a group of scientists, engineers, and software specialists who collect, format, host, and distribute a key data product from the TERRA satellite. This data product contains the location and properties of 56,000 wildfire smoke plumes occuring around the globe starting in 2008. The team believes that its data is critical for a variety of scientific endeavors such as: comprehending climate change, decoding atmospheric dynamics, and disaster mitigation.

Problem

This team faces a challenge: their data set is mostly unused by the community of climate and atmosphere scientists world wide, who would benefit from it enormously, because the data is stored on a difficult to access archive as individual text files. The interface for exploring this data is old and does not provide sufficient visual feedback, filter control, abstract high-level visualization, or download functionality. I worked with a team of designers and computer scientists over the summer to build an appropriate exploratory research interface for this data product, which improves on the previous interface in all of the previously mentioned dimensions. The MISR Exploratory Research and Lookup Interface (MERLIN) joins a long and illustrious line of software named with nested acronyms, a NASA specialty. I had the privilege to continue working with one of the JPL science leads, Dr. Mika Tosca, for the entire thesis research process.

Research Approach

Since this case study was longer than the micro-studies, and is intended to extend the capacities of an existing system in the service of researchers, I applied a multi-modal course of discovery tactics in order to orient myself and hear from scientists about where they thought the opportunities were. This also gave me a chance to reveal information about some of the investigative superstructure of this thesis: coming to understand how scientists construct knowledge about the phenomena which they study. My investigative research had three components: a series of insight-focused usability studies, an in-situ longitudinal workflow analysis, and a descriptive insight process questionnaire.

Research Precedent and Conceptual Framing

The three studies were chosen specifically to address the aims and concerns which I discovered in my literature review. First Saraiya, North, Lam, and Duca’s classic insight based methodologies for assessing the capability of visualization interfaces for exploratory research tasks, although specifically focused on the field of bioinformatics, are nonetheless highly useful models or prototypes for the studies which I built and conducted. Indeed, in the conclusion and discussions they synthesize and describe abstracted concepts, definitions, and assessment criteria which are specifically made to be generalized to other fields of science. They also directly call for other researchers to take up, modify, and adapt their insight focused methodologies for use in other fields, which I have done. {CITATION}

“This longitudinal study is just the beginning of this line of work, and there is much more research to be done. More studies need to be conducted with different subjects and tools in diverse domains, in order to extract broader abstractions and patterns of the visual analytics process.”{CITATION}

My first study, an observational investigation of what insights could be revealed through the MERLIN interface and how scientists were able to connect those insights into a coherent story or hypothesis, was deliberately designed to assess the construct validity of both the interface and my model of insight within climate science. The second study was a follow on, a modified diary study which the scientists wrote in over a two week time period, which aimed to establish the environmental validity of my insight model. The final study asked the researchers to describe in writing how they understood their own process of insight and discovery in the context of their self-selected ‘most impactful paper’. I posit that the combination of historical, longitudinal, and observational studies provides a multi-faceted look at how climate and atmospheric scientists construct knowledge through the complex non-linear process of insight and discovery.

Participants

I worked with 7 researchers working at 4 different institutions, all climate or atmospheric scientists who are actively publishing as part of their regular employment, and who have a PhD in climate or atmospheric science or are currently earning one. These scientists were recruited informally through the personal network of my primary research partner Dr. Mika Tosca. One of the main challenges of my research is revealed here: although all of these researchers are working in a related field, the actual topics and modes of study employed by each is significantly different and varies even from paper to paper. This variation adds to the already significant challenge of working with experts described earlier.

Study 01: Insight-Based Methodology for Assessing Wildfire Visualizations

There are several instruments on Earth-orbiting satellites which catalog and measure the properties of wildfires and their attendant smoke plumes. The Multi-angle Imaging SpectroRadiometer (MISR) aboard NASA’s EOS flagship TERRA, is unique because of the length of its mission, but more importantly because of its ability to geometrically capture the height and spatial arrangement of smoke plumes in addition to their radiative power. It is generally believed that there is a correlation between fire radiative power and smoke plume height, and the MISR data product provides a unique context for investigating this hypothesis. The instrument itself returns image strips, which are searched for smoke plumes, and the characteristics of each smoke plume are measured using the MISR INteractive eXplorer (MINX) software. Each plume and its attendant values are then stored in text format. Atmosphere and climate scientists use this large data set by statistically analyzing or modeling the whole set, or using subsets to perform further comparison or analysis. Much like bioinformatics studies which this research is based on, the magnitude and complexity of the MISR dataset makes it prohibitively difficult to extract insight from without computational methods.

Most usability studies assess the ability of an interface to enable a user to accomplish a know task or outcome. Because the end state is known ahead of time, each user’s workflow can be cataloged, timed, and described; revealing a direct rating or measure of how usable the software is. However, we face an additional problem in the characterization of visualization interfaces for exploratory research because their purpose is specifically to reveal previously unknown or unstudied insights. It is therefore necessary to design and facilitate a study which addresses the open-ended and non-linear nature of exploratory research in wildfire smoke plumes. To this end, rather than measuring user performance and accuracy on known tasks, we aim to recognize, quantify, and describe moments of novel insight.

Previous studies of exploratory research interfaces focused on a cross-comparison of the efficacy of several competing interfaces designed for the same task. Because MERLIN is the first software designed for this task it was not possible to compare its performance to a control or another piece of software. Therefore this study focuses on characterizing the the thinking of the scientists themselves: what do they see, how do they identify relevant anomalies, how do they make connections to previously seen images or their own knowledge. In other words, this study aims to describe the way that scientists encounter data, extract insight, and connect those moments of insight into a hypothesis for further study. From my perspective, this is a window onto how scientists comprehend phenomena.

Participant describing what they see in the data distribution

Participant describing what they see in the geographic distribution

Scientists were asked to explore for 1–2 hours, no specific task was given for them to accomplish, simply to see what in the data set they wanted to study. We followed a standard think aloud protocol, audio recording the scientists monolog, and transcribing it for analysis. We use the Saraiya, North, Duca, and Lam definition of insight “An individual observation about the data by the participant, a unit of discovery.” Transcriptions were broken down into individual grains of meaning, and categorized according to a modification of the system developed by Liu and Heer.

Observation is a piece of information about the data which can be obtained from a single state of the visualization system. An observation can be made at the visual level, or at the data level.

Recall is prior knowledge or personal experience brought into working memory to help reason about the visualization.

Question is an indication of a desire to examine an aspect of the data. Questions do not necessarily have to be phrased to end with a question mark, it is a desire to know more.

Hypothesis is a well structured novel direction for future research, or a desire to know something that is outside of the bounds of this data set.

Paper Topic is a complete research direction which the scientist affirms could lead to a novel contribution to their field.

These coded segments were reconstructed into insight maps which visualize and elucidate the scientists thinking. Although there is significant variation in how the scientists proceeded, higher level patterns were extracted which will be discussed in the implications section.

Study 02: Insight-Based Longitudinal Study

In the third study we present, we found that atmosphere and climate scientists can take up to 7 years to publish a paper from conception to publication, the average time to publication being 3.35 years. It is important to note that not all of this time is spent on exploration and insight, but the process can extend up to multiple years in some cases, with a minimum time on the order of months. If we are to validly study how scientists understand phenomena which they successfully take to publication, we must address the issue of context and time frame. The previous study placed researchers in a room with the test proctor exploring the data continuously and describing their actions at the same time. This is quite different from the way that those scientists would work in their lab over months and years. To explore this more complex and sprawling process, we developed a log booklet with ‘insight reports’ which were filled out by the scientists by hand over a period of two weeks and returned. The advantage of this study over the previous controlled environment study is that it allows the subjects to communicate to us from inside, or just after, the moment of insight. This study has the corresponding disadvantage of being significantly lower resolution than the controlled environment study, because the subjects will be limited to the medium of writing, and will write only small amounts. This study aims to disturb the scientist’s normal method of analysis as little as possible, aside from the time taken to fill out the probe and return it.

Study 03: Insight Process Questionnaire

The previous two studies discussed here focused on the process of exploratory research through visualization. However, the final goal of exploration is not simply to gain insight, but to connect a series of insights into a coherent story which constitutes new knowledge. The previous studies necessarily are not connected to published works, because the insights addressed were novel to the scientists at the time of testing. The questionnaire collected written accounts from scientists of how they gained and formulated insights into a publication quality narrative on their self-selected ‘most impactful’ paper. 7 questions were sent digitally, scientists took two weeks to craft written responses and returned them for analysis.

For this study I employed a self-authored coding system, because this study is meant to address some of the differences between the scientific method and the design process in order to look for effective places of intervention. The coding is therefore more of a high level categorization or classification.

Object a single thing which a scientist will engage with in some manner, most frequently by manipulating, refining, observing, manufacturing or fitting. Common objects in this study were: data, numerical models, scientific laws, simulations, and maps.

Action what the scientist does to the object in question. Most frequently this would observing, correlating, refining, generating, weighting, or aggregating. Actions are noted as uni-directional or bi-directional.

Indicators are world-states or principles which the scientist is looking for to know when a process is complete. For example researchers frequently looked for repeatability, novelty, literature gaps, narrative completeness, or causality.

Conclusion is the purpose of the paper, the singular thing which the scientist views as their contribution to knowledge expressed in each paper.

Each researcher’s process was mapped out using these four elements. Elements were not prescribed beforehand, but were synthesized by aggregating similar types of objects and actions. The maps provide an exploratory look into how different scientific research processes are, yet they can be seen to be made of similar components.

Results

The insight map opposite comes from study 01, the controlled environment study. It is a demonstration of the Liu and Heer principle that “visualization resonant with the pace of human thought … leads to greater data set coverage … and better questions.” {CITATION} The map is a visualization of each of the interface actions, visualizations, observations, questions, insights, and the final hypothesis which the researcher explores in this 51 minute session. Although it is difficult to characterize the complexity of the researcher’s visual thinking, certain critical high level patterns begin to emerge. There is a cycle, which starts with an interface adjustment, leading to a period of sequential observations about the new visualization. After a variable number of these stated observations, the researcher arrives at a new question that she wishes to examine. Since the interface allows her to explore the data quickly and fluidly enough, she is able to find the answer to her question in enough time to keep a coherent exploration moving. After 1–3 of these cycles of adjustment, visualization, observation, and question she is able to extract a higher order hypothesis, which is indicated by an ellipse above the main action line on the map. After five of these higher order cycles, she is able to assemble a paper topic which is built up from the five previous hypotheses. This study participant expressed that she was “entirely confident” that this hypothesis was be a publication caliber research question. The next step would be to correlate this subset of MISR data with external data sets such as temperature, humidity, and cloud fraction. For this diagram, the repeated observations would be the potential moments of insight, seeing new things or seeing old things in a new way. After a variable number of these insights, a question emerges which guides the next cycle of observation. Along the way, intermittently but generally toward the end of the cycle, a hypothesis may occur to the researcher.

The three following insight process maps were synthesized from study 02, the insight-based longitudinal booklet study. They represent shorter, cohesive explorations which the researchers performed at different times during the 2 week booklet study. Where the previous study showed details on how insights were assembled into a single research hypothesis, these smaller maps show how a single moment of insight is gleaned from a sequence of observations and questions about several individual visualizations. Note that there is variation in the length and number of observations that a researcher makes before making an interface adjustment. There is also variation between the number of sequential visualizations viewed before a moment of insight strikes. This captures the unpredictable, non-linear thinking pathway described in the literature. {CITATION} But the cyclical pattern of adjustment, visualization, observation, question, and insight is quite stable across all data returned from this study.

This set of figures examines the relationship between what visualizations the scientist views, the questions they ask, and the hypothesis which they eventually come to.

The following maps come from study 03, the insight process questionnaire. The five flow diagrams represent the objects, actions, indicators, and conclusions of five scientists on their self-identified “most impactful” publication. The indicators are further subdivided into initial drivers, and completion indicators. On a high level, all scientists descriptions of their process followed a similar format: initial drivers which launch a research process which is continued until the scientist sees strong enough completion indicators, at which point the deliverable is ready.

Indicators of both kinds display coherence with pragmatic outliers. All initial indicators involve anomaly detection or knowledge gaps in previous work, except for one participant who reported that their research was ‘always meant to be part of my dissertation’. I hypothesize that there is likely an earlier driver relating to anomaly detection or knowledge gaps earlier in this researcher’s career which motivated the direction of inquiry which led them to begin their PhD. Completion indicators are less cohesive, but all researchers describe a clear novelty to their work, except for one pragmatist who put an end to their work due to funding. The novelty in researchers work might have been a method, a correlation, a causation, or a description, but always the researcher emphasizes the novelty and repeatability of their contribution.

The research process contains two subgroups: objects and actions. Objects of engagement include data sets, models, maps, and literature. Actions were less coherent, including: matching, correlating, refining, comparing, aggregating, and generating.

Implications and Discussion

In this set of studies, we extend previous insight methodologies to the realm of climate and atmosphere science. Data suggests that visualizations of appropriate speed and flexibility enable researchers to explore in an iterative and connected flow, assembling scientifically useful hypotheses on timescales which are greatly reduced from previous methods. This suggests that there is great opportunity for improved visualization systems to greatly increase the pace and quality of scientific research in the field of atmosphere and climate science. Our research, although limited in scope, also supports the notion that the principle of supporting insight and discovery through reactive visualization is generalizable, with appropriate adaptation, across the branches of science. With further research in this field and others, universal principles and descriptions of the process of insight generation and discovery in science as a whole may one day be understood and described.

Our study reveals and describes a stable cyclical process in which researchers adjust a visualization state, make sequential observations, which build to investigative questions that inspire a new adjustment to the visualization state. As this process of insight sparking proceeds, higher order hypotheses are assembled within the researchers mind. We suggest that further research should be conducted to move beyond our pilot study and describe this cycle fully.

The insight process questionnaire data suggests that although individual scientist’s research processes are highly variable, there are higher order similarities to be observed. The heavy reliance on numerical data and models is at once the strength of science, and also a limitation under particular circumstances. Especially in the exploratory phase, which we focus on here, intuitive and careful visual representations may be able to scientists define and refine their pathways of inquiry with increased speed and success.

There is a potential homology between the scientific process as described by researchers in the insight questionnaire and the design process as described by Donald Schoen. It may be a canard, but it is interesting to note that many scientists described an iterative process of correlation, matching, and harmonization as being central to their research process. In Schoen’s description, the designer converses with an externalized form, harmonizing each additional line of a sketch to the essential quality of the emerging design. It is entirely possible that scientists also progressively harmonize the fit of their models to the structure of observed data in a way that is similar on a high level to the process of design. This may indicate a further potential area of collaboration and investigation.

Opportunity: Public Communication of Complex Ideas

In previous micro case studies, I suggest ways that a designer can function as a useful component of a research team, recursively feeding visual forms into the process which potentially spark novel or unexpected moments of insight. In this case study, I would like to explore the possibility that effective visualizations can also be of use in the context of education, media, and public understanding of science. In the context of research, visualizations of appropriate clarity and flexibility allow scientists to offload some of the cognitive work of understanding complex data sets which helps them to examine more complicated phenomena than pure abstract or numeric data analysis. In other words, the clarity of the visualization allows researchers to access understanding which they might not otherwise have been able to. It is worth asking, does this clarity also have utility in furthering public or student understanding of science? Although this is not the main thrust of my research, it is nonetheless a potentially valuable second order benefit.

North American wildfires in the year 2018

Seasonal crop burning on the continent of Africa 2018

These images, and accompanying animations, convey their stories with dynamic and evocative color, and clear cognitive connection to fire. In the case of North American wildfires, compelling imagery can help to convey the magnitude and anomalous power of the phenomenon invovled. In this case, a dark basemap without country and state labels was chosen in order to clear the scene and foreground the fire data. The data is hexbinned in 25 kilometer sections, with radiative power and smoke plume height averaged for each hexbin. A dark red to bright yellow color scheme was chosen in order to maintain a close cognitive connection to fire, but is reversed from the expected coloration of fire. Fire generally displays a gradient coloration from bright yellow at the base to dark red at the tip. This visualization inverts this color scheme in order to provide maximum contrast between the hottest fires and the basemap. In this way, the most devastating and powerful fires are visually highlighted. Motion adds an additional level of clarity to the digital version of these visualizations. Hexpillar motion closely mimics the jumping and quavering of live flames, which evokes the sense that an entire continent is burning.

Role of Design

In this case, the careful choices of research quality visualization design help translate the complex data of research into a readily comprehensible form for education or public communication. The massive, slow moving, and abstract nature of planet level phenomena makes them difficult to grasp for many people. Most scientific visualization, although effective for research purposes, is not affective within the larger community of humanity. Visualizations which convey phenomena with more dynamism and visual sophistication might prove to be useful in helping to further the understanding of global climate phenomena.