The Add to Plot window lets your customise your graph, explore patterns in the data, and highlight important features. The options available to you depend on the type of variables you have selected and the plot drawn.
At the top of the Add to Plot window is a drop down that lets you select from one of the available panels:
This panel allows you to control most of the visual aspects of the graph, including size and colour.
This section is common across all graphs, with the exception of the options listed in the "Plot type" dropdown.
Plot type: allows overriding the default plot chosen by iNZight, which is based on dataset size.
Background colour: you can customise the background colour of your graphs to suit your preference, or make certain features easier to distinguish.
You can specify colours using names or HEX codes.
Overall size scale: this lets you adjust the overall size of everything drawn on the screen.
This works as a baseline, so all other size settings will be multiplied by this value.
Control point sizes on scatter plots and dot plots, hexagons on hexbin plots, and bar-widths on histograms.
Size by variable: On scatter plots, you can choose a numeric variable to be used to resize the points.
Change the colour of points, hexagons, and bars. Colours can be chosen from the list, or you can type your own. See choosing colours for more information.
Colour by variable:
(scatter plots / dot plots / hexbin plots / one-way barplots).
Select a variable to code colours. Several choices for colour palettes will be offered.
Categorical variable: Each category will be assigned a colour (depending on your chosen palette)
Scatter plots and dot plots: Points are coloured according to the level of the variable.
Hexbin plots: hexagons are coloured depending on the proportion of points in each category that fall within that hexagon.
One-way bar plots: the bars are segmented depending on the proportions of each category and coloured accordingly.
Numeric variable: a linear scale is set up from the lowest to the highest value, and each point is coloured based on its value.
If you click the Use rank checkbox, the colours will be based on quantiles, rather than absolute values. This can be helpful if you have a few very large values of the chosen variable.
NOTE: if you choose a numeric variable on a hexbin plot, iNZight will convert it to a 4-level categorical variable.
Cycle levels: You can cycle through the various levels (or quantiles for numeric variables), making it stand out from the others, by clicking the left and right arrows. You can adjust the number of quantiles to use by adjusting the number in the box.
Select different symbols to use on scatter plots and dot plots.
Symbol by variable: You can also chose a categorical variable with 5 or fewer levels, and iNZight will give each its own symbol. This is particularly useful when creating colour-blind friendly plots, or for black-and-white printed graphs.
Diamonds and triangles can take a long time to draw if you have a big data set (at least on Windows). Circles (the default) and squares seem to be OK.
iNZight will turn off transparency if you change plotting symbol to one of these; you can put transparency back, but iNZight may become unresponsive while it tries to draw the graph.
On scatter plots, hexbin plots, and grid density plots, you can add various types of lines to the graph.
Trend curves are fitted using linear regression models, which minimises the overall vertical distance between points and the line.
The formula for the line (which can be found by clicking the Get Summary button after adding a trend line) depends on the type of curve fitted.
In these equations,
y is "Variable 1", the primary variable of interest,
x is "Variable 2".
The greek letter \(\beta\) ("beta") represents an unknown value,
and iNZight picks the best value to make the line fit the data as
well as possible.
The subscripts (\(\beta_0, \beta_1, \ldots\)) simply indicate different unknown values.
Linear: a straight line fitted through the points: $$ y = \beta_0 + \beta_1 x $$ It has an intercept, \( \beta_0 \), which represents the value of \(y\) when \(x=0\), and a slope, \(\beta_1\), which describes the change in \(y\) for a unit change in \(x\).
Quadratic: a curved line with at most one bend: $$ y = \beta_0 + \beta_1 x + \beta_2 x^2 $$
For more details, see this page of the Data to Insight course.
These curves are fitted without the restrictions of the regression models shown above. iNZight uses a loess smoother, and you can control the degree of "smoothness" using the slider.
If you have a large data set, you can add quantile smoothers, which draw lines at not only the middle (median) of the data, but also at the quartiles, 25% and 75%, and the 10% and 90% quantiles if the sample size is large enough.
In some cases, you may have ordered data that makes sense to connect points, for example a time series. If you specified a categorical colour by variable, you can optionally connect points within each level of that variable.
If you specified a categorical colour by variable, you can fit a trend curve through each level of that variable. By default, the lines will have the same slope, but different intercepts, so they will be parallel. If you want the lines to be completely independent, you can uncheck the parallel trend lines box and each level will also be given its own intercept.
Line Width Multiplier will adjust the thickness of lines.
The line of equality is useful when the units of the two variables are the same (for example, "before" and "after" measurements).
Type your own labels to override those automatically created by iNZight.
To remove labels, just enter one or more spaces into the box and press enter.
For scatter plots only.
Jitter: This adds a small amount of noise to each value, which helps to separate out discrete variables (for example, age).
This allows you to adjust the limits of the axes, effectively allowing you to "zoom in" on specific regions.
On bar charts, you can adjust the total number of bars shown at a time. This is useful if a categorical factor has too many levels to display all at ones.
This allows users to label points in one of three ways (text labels, colour labels, and related points), using one of three methods (clicking, selecting values, or labelling extreme points).
How do you want to label points?
With the same level of allows you to easily locate points related to the selected points.
The main use of this would be to locate points with the same level of a chosen factor (e.g., if you are exploring survey data, you may wish to identify observations in the same cluster), or you may want to retain points selected over multiple graphs (e.g., in the Gap Minder data set, included in the
Data folder, has several observations of countries over multiple years. In this case you can label certain coutries and track them over time).
Note: when you do this, the point you clicked will be highlighted for easier reference.
How do you want to select points?
Clicking with the mouse: After clicking "Click to Locate ...", you can then click a point on the plot. iNZight will then label it using the options you defined above. You can select multiple points by clicking the button again and locating new points without losing the current selection.
Note: due to the way the software works, you may see a "Busy" cursor after you click the "Click to Locate ..." button. This will not go away until you click a point, so go ahead and click points ignoring the cursor.
Select by value of ...: this will allow you to select a variable to use to select points.
If the variable is a factor (or a numeric with less than 20 unique values) a slider will appear allowing you so quickly select the levels you wish to label. If the variable is numeric, or you want to select multiple levels, you can click the "Select levels ..." button to do so.
Extreme values: This will use a very simple algorithm that will select the points "farthest away" from the bulk of the data.
This will display a slider, allowing you to select more or less points as desired. Once you have selected the points, you can optionally save the selection (e.g., if you want to track the same observations over multiple plots) you can click the "Save these points ..." button. From there you can use the "With the same level of" option to select related points, etc.
The statistic used is Mahalanobis' distance.