Slide 1: This talk will describe some recent work that I have done on accessibile statistical graphics and colour. I am going to emphasise the journey rather than just the destination, reflect upon some of the joys of working in the field of Statistical Computing and Graphics, and attempt to drawing out some general wisdom about creating software, which is mostly what I do. Slide 2: The hist() function can draw a histogram AND it returns information about the histogram that it drew. Slide 3: The VI() function from the 'BrailleR' package takes the information about a histogram and turns it into a text description of the histogram. In combination with a screen reader, this provides some information about the histogram for blind or visually impaired R users. Slide 4: The 'ggplot2' package is a very popular package for generating plots in R Slide 5: Debra Warren, in a Masters Project, added support for 'ggplot2' plots in 'BrailleR' Slide 6: The text description generated from a 'ggplot2' plot includes information about colour scales used in the plot. Slide 7: The text description of colour reports colours in the #RRGGBB format that R uses for colours, which is not very easy for a human audience to interpret. #RRGGBB gives the amount of red, green, and blue as pairs hexdecimal digits. Each pair ranges from 00 to FF. For example: #F8766D is lots of red and medium amounts of green and blue (and light reddish something or other); #00BFC4 is no red, but quite a lot of both green and blue (some sort of turquoise). Slide 8: The 'roloc' package was created to convert #RRGGBB colour specifications into colour names. If all we were going to talk about was the ending, that would be it; job done. But the more interesing part is how we got here ... Slide 9: One issue that 'roloc' faced was what colour names to use. R contains a long list of colour names. This list is very similar to colour names that we can use in CSS and SVG. Slide 10: There is also a list of "simple" HTML colours. Slide 11: XKCD also has a list of colour names based on a large online survey. Put simply, the list of colour names that people use is not completely obvious and there are several ways to do it. Slide 12: Because it was not clear which list of colour names was best (for all purposes), this became a parameter of the 'roloc' functions; the user can select the list of colour names (with the R colours as the default). The general software creation wisdom from this situation is: if you see multiple ways to do something and you cannot see a clear winner, leave your options open. Slide 13: A list of colour names has an #RRGGBB colour associated with each name. But there are many #RRGGBB colour specifications with no corresponding colour name. Slide 14: For a particular #RRGGBB colour specification, we need to find the "closest" colour name, but we need a metric that we can use to measure "distance" between colour specifications. Slide 15: All #RRGGBB colour specifications can be visualised as a 3D cube, with the amount of red on one dimension, the amount of green on another, and blue on the third dimension. Slide 16: The distance between #RRGGBB colours can be calculated as just the length of the straight line between the colours in RGB space. Slide 17: But there are problems with RGB space. For example, RGB is not perceptually uniform; a distance of 1 in one part of RGB space does not appear the same size as a distance of 1 in another part of RGB space. This means that RGB is not a very good space to perform distance calculations within. There are other colour spaces, like CIE XYZ, and it is possible to convert between these colour spaces. Slide 18: All these formulas are designed to show is that there is a mathematical relationship between RGB and XYZ. This means that we can take any RGB colour and calculate an XYZ specification for the colour. Slide 19: Another colour space is CIE Luv. The value of this colour space is that it is (more) perceptually uniform. This means that it is a better space to perform calculations of distance within. Slide 20: These formulas are designed just to show that there is a mathematical relationship between XYZ and Luv. So we can take any RGB colour and generate an Luv colour. Slide 21: However, there are still more colour spaces and there may be different contexts within which we wish to determine "closeness", so again we allow the user to select how to measure distances between colours. So now we have the ability to choose a colour list and a colour metric to perform the conversion from colour specification to colour name. Slide 22: Notice that 'roloc' presents the conversion in two ways: a simple character vector and a colour swatch graphic. Slide 23: There is actually a function underneath called colourMatch() that does the actual conversion; both colourName() and colourSwatch() use that function to get the result and just present the result in different ways. Slide 24: We can also see that some colour names correspond to exactly the same colour specification. Here, both "red" and "red1" correspond to the colour specification "#FF0000". Slide 25: The 'roloc' package has two more functions, colourNames() and colourSwatches(), that allow for multiple colour names matching a colour specification. We have functions that report the closest colour name match AND we have functions that report the "N" closest colour name matches. Both of these functions are also built upon the colourMatch() function. Slide 26: The design of the functions within the 'roloc' package is satisfying because: there is a single function that calculates all of the information needed to determine a matching colour name; there are separate functions for a single match and for multiple matches, because those are different types of results; although both of the "swatch" functions produce the same type of result, there are two different functions to match the two functions that produce character results. The general software wisdom from this situation is: the design of functions within a package often requires thought and refactoring, but it is very satisfying when you get it right. You can tell that you have not got it right yet when it feels awkward and inelegant. Slide 27: This is the half-way point in the story. We have the 'roloc' package and it is elegant and heart-warming. But how well does it solve the original problem ? Here are 6 different shades of orange; how well does 'roloc' do at converting these to colour names? Slide 28: The original problem was that I could not easily understand the #RRGGBB colour specifications, but now I have a new problem: I cannot easily understand the colour names! (for some colour lists) What colour does the name "burlywood1" conjure up for you? Slide 29: Another problem is that some colour lists are not detailed enough; everything just comes out "Karaka" Slide 30: Fortunately, because I designed 'roloc' so well, this is not a disaster. I can blame it all on the limitations of the colour lists. The general software wisdom in this situation is: It is much more fun to create general software tools that solve sets of problems than it is to create specific solutions. Slide 31: On the other hand, it is good to actually produce a useful solution. This is the second half of our story - finding a colour list that produces understandable and accurate colour names. Slide 32: The definition of the ISCC-NBS System of Colour Designation certainly sounds like it should fit the bill. Slide 33: The ISCC-NBS system contains understandable colour names that can still discern between quite similar-looking colours. Slide 34: But making use of the ISCC-NBS system is not going to be straightforward. The first obstacle is that ISCC-NBS is defined in terms of the Munsell colour space; this is a very beautiful and well-structured colour space, but it does NOT have a very simple relationship with RGB. Slide 35: Furthermore, ISCC-NBS colour names correspond to regions of Munsell colour space (not single points in colour space). This means that "distance" has to be calculated differently. Slide 36: In order to get an ISCC-NBS colour list for use with 'roloc', we need to be able to convert RGB colour specifications into Munsell colours. Note that this conversion goes via yet another colour space called CIE xyY. Slide 37: If a colour specification lies within a region, its distance to that region is zero, otherwise its distance is infinity. This is another reason for allowing multiple colour name matches; it is possible for more than one colour name to lie within the same ISCC-NBS colour block (as the colour specification). Slide 38: Unfortunately, there is no mathematical equation to transform from RGB to Munsell. We only have a known conversion for a finite set of Munsell colours. Slide 39: The conversion as far as CIE xyY is pretty easy - it is the next step, from xyY to Munsell that is hard. Slide 40: Fortunately, there is a Python package called 'colour' that can perform the conversion from xyY to Munsell. There is also an R package that lets us call Python code from R. Slide 41: And the final step from Munsell to ISCC-NBS is also pretty easy. So all we have to do is connect these packages together. The software wisdom from this situation is: make use of existing solutions where possible; it not only saves you time, but code that has been out there and used by lots of other people is going to be much more robust than code you write for yourself. Slide 42: The bad news is that this jigsaw solution does not fit nicely into an R package because the Python part places a large burden on the end user to install Python and the 'colour' package. And that is not straightforward on Windows. The software wisdom from this situation is: for a solution to be useful, it must be portable. Slide 43: One way around this problem is to use the Python package to do all possible conversions and just include the pre-calculated conversions with the package. Slide 44: This is a lot of conversions, but it is a finite number. Slide 45: Unfortunately it takes a long time to run all of the xyY to Munsell conversions ... Slide 46: ... pretty much an entire day in fact. Slide 47: Fortunately, each conversion is independent of every other conversion, so we can run them in parallel and bring the time down considerably. The software lesson from this situation is: modern computers have multiple CPUs and it is very useful to know how to get all of them working at once. This is pretty easy to do in R. Slide 48: Unfortunately, the Munsell to ISCC-NBS conversion is even slower. The important feature of this graph is that it is exponential; it gets steeper and steeper. Slide 49: The reason it is slow is because it has a loop that incrementally grows an R data frame (so there is a LOT of copying of objects in memory). Slide 50: We can write a vectorised (non-loop) version of the ColorBlockFromMunsell() function and that makes it MUCH faster. The shallow straight line is the new, faster version. Slide 51: This sort of fix is possible because R packages tend to be open source, so we can see the code, copy the code, modify the code ... Slide 52: ... and share our new code. Using version control systems like github allows those changes to be tracked against the original code and supports ongoing copying, modification, and sharing with others. The software wisdom from this situation is: an open source community generates an environment where sharing is the norm. This leads to better communication and resolution of problems. It also leads to a friendly, supportive, and generous environment that is very pleasant to work in. Slide 53: Unfortunately, the final set of RGB colour specifications plus ISCC-NBS colour names is very large - way too large to distribute as an R package. Slide 54: But we do not actually need to store all of the RGB specifications; we can just remember the order in which we generated the specifications (and the colour names). Slide 55: If we do this, and if we store the colour names as an R object (rather than a text file), then we only really store every unique colour name, and the size of the pre-calculated colour name object comes WAY down. Slide 56: So we now have another R package, 'rolocISCCNBS', which contains an ISCC-NBS colour list and an ISCC-NBS colour metric. How does it do on the set of orange colour specifications? Not bad (?) Slide 57: So this is the somewhat disappointing ending. We may have created a colour specification to colour name conversion that performs reasonable well and this will provide a little bit of help to a small community of R users with accessibility issues. But it was a lot of fun getting to this point. And it was still very rewarding to produce a solution that is elegant, flexible, and extensible. Slide 58: Slide 59: