Global Fisheries Visualization
PROCESSING.PY DATA VIS EXPLORATION
Global fisheries are experiencing unprecedented pressure. In the United States, export proportions and processing methodologies bear examination. This semester of work is a collaboration at Carnegie Mellon between advisor Arthi Krishnaswami and design masters student Adrian Galvin.
Week 1: Getting Into The Data
Some details about the data: we want to see them ordered by magnitude of catch, not separated by environment. The export amount is already included in the domestic catch number.
An open question is: how does more than 100% of a domestic catch get exported?
For a first assignment, we want to visualize the magnitude of tonnage caught as simple bars, including explorations of the following factors:
- Compare two years
- Midline baseline and distal baseline
- Arrayed by total tonnage, or highest export
Sketches are below, I will build out simple processing.py visualizations for each of these bar arrangements for 2006 and 2007 as a first round prototype.
After examining the format, I wrote a simple script which parses the data into a dictionary object mapping each fish to it’s export, domestic, and percentage values. Currently the script exports just one CSV per year. I can easily generalize this script to produce a list object which contains each year dictionary in its indices in order from 2006 to 2014. With the addition of some sorting logic in processing, this should allow me to render each of the desired visualizations for each year.
Now, it’s time to plan all the necessary visualization components: how data will be stored and called, how the screen geometry will work, and some quick pseudocode structure. This planning will make the coding simpler and smoother as I switch to Processing.
Week 2: Data Parsing Bugs
The first script that I wrote seemed sufficient, but when I switched into Processing, it was not usable because my script did not sort the data correctly into export order and domestic order, nor did it take into account an edge case in which two fish species had identical export or domestic tonnage numbers for the same year. This case still seems incredibly unlikely to my intuition but it nonetheless exists in the first year that I chose to work with, 2007. It’s a good thing that this bug came up early, since I would have had to deal with it later down the line once I got into visual form exploration.
The first challenge that I had to overcome was sorting by export or import numbers. I have been using a dictionary which maps:
{‘fishSpecies’ : [exportValue, domesticValue, percentage] …}
However this becomes tricky, because you cannot sort lst objects effectively while taking their full data set along with them. In other words, you have to sort a list of int values, which works fine, but then the rest of the values are decoupled which makes the sorting pointless. My solution was to generate two separate dictionaries with a simplified form:
{exportValue : [fishSpecies] …}
{domesticValue : [fishSpecies0, fishSpecies1] …}
This makes the ordering operation much simpler. Subsequent bugs had to do with storing multiple fish species per export or import value due to the rare edge case in which two species had identical import or export values. The final data output is a two dimensional list in the following form:
[ [‘fishSpecies0, exportValue, importValue] … ]
Week 2: First Visualizations
Processing visualizations are complete for 2006 and 2007, with midline bisection arrangement, sorted by export and domestic catch for both years. There are some issues with visualizing the species for which we export more than we catch. Because the export is supposed to be a percentage of the total catch, it is difficult to know where to place the split. However this is an effective first glance at the real data.
2006
2007
2008
2009
2010
2011
2012
2013
2014
Week 2: First Visualizations
The massive numbers for catfish domestic catch and salmon export make it hard to see the whole field effectively. This is rendered at 2.5% pixels per ton. It may be necessary to employ a logarithmic scale to make the visualization more readable. Although it would also become less clear.
There is a clear trend toward exporting more and catching less domestically. Sweeping up and down all of the years smoothly demonstrates this trend quite well. Next steps could include:
- Color exploration
- Labelling
- Log scale implementation
- Viewing all years
- Domestic on the left, export on the right
- Perform the study in dollar values
- Perform the study for wild-caught fish
Week 2: Feedback
We should build briefs which include description of what you are seeing and what we cannot account for. Potentially: summary, data stories, and un-accountable issues. Over time we are exporting more, catfish and salmon are massive outliers.
NOTE: there are two data sets, the first data set is aquaculture (FUS numbers), we will need to perform the study on the second data set.
Week 3: New Data Set and Cleaning
I received US national wild caught fish data, there were some issues. Inconsistencies such as numerical comma separation or not caused some problems for my parsing and sorting algorithm. In addition, because I am working from CSVs, separating the fish name with an internal comma causes unnecessary issues.
For the next iteration, I am focusing on interactive capability. The goal is to parse the entire set of years and generate a dictionary with the following structure:
{ ‘year’ : [aquaCultureValues,wildCaughtValues] … }
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
Design Round 1
Light gray gutters and color implementation. Uneven gutter width that matches bar height on alternate lines helps to guide the eye horizontally to the data, but minimizes visual noise on the far side of the mid-line. This helps visual cohesion and legibility. All type and formerly black elements are 95% gray which reinforces visual cohesion.
Sketch for Region Breakdown
Design Round 2
To boost readability I implemented custom type faces in processing, matching the headers to their respective bar color. I experimented with three different layouts, data close to the center, spaced further out, and all right aligned. The spaced out right alignment seems to make the most sense, especially since it maintains the vertical alignment of numeric place values. Due to the lighter weight of the copy type, I used a slightly darkened shade as compared to the bar. Additionally, the copy type adaptively darkens if it is on top of a bar, in this way sufficient contrast is maintained irrespective of whether the type sits on the lighter gray or the darker blue.
I chose to right align the data headers right, but the species headers are offset to the left. The year is an additional column offset to the left. This was done so that the eye can sweep up and down the sets of top ten species names without visual interruption. Year buttons are aligned with the species names on the left, and with the right aligned domestic numerical data on the right. This helps the visualization feel more cohesive and logical. The truncate button is aligned along the left margin in the same column as the current year, this helps to differentiate it from the year buttons and establish a clear left edge to the visualization. All caps titles, as well as three type weights further differentiate type elements, and an addition to the sorting algorithm standardizes unreliable capitalization in the data sets during parsing.
Comparison Over Time
Previous versions of this visualization system focused on comparing aquaculture and wild caught numbers in a single year. All years could be accessed through UI components, but this does not support looking for larger trends across years. An expanded system was built to enable this use case. All years are aligned on the Y axis, with toggles to switch between aquaculture and wild caught. Not only does this enable vertical visual scanning across all years, but it also provides a foundation for print graphics, which this visualization will be used for.
Now, macro-trends become visible which were obscured in previous work. For example, total amount of aquaculture production, particularly in catfish is steadily reduced between 2006 and 2015. However, the outlier salmon increases steadily during the period of study. It is still unknown how much of this tonnage is truly aquaculture from the year that it is recorded. Team research suggests that some wild caught, foreign processed fish, is recorded in aquaculture erroneously. In wild caught fish, there is a slight decrease in catch from 2006 to 2008, but from 2008 to 2015 there is a clear trend upward toward a higher total catch.