# Pretty Pictures: Creating More Appealing Graphs

##

## Colors Available for R

Creating more appealing graphs may require a greater variety of colors. A list of available colors is available here:

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

R also supports RGB and Hex color values. Using col2rgb() and col2hex() (available in the gplots library) one can find starting points to begin tweaking custom colors as described here: http://en.wikibooks.org/wiki/R_Programming/Graphics#Colors

## About Graphs Available for R

There is a wide variety of possibilities for visualizing data in R. There are many examples here. The GGPlot2 library, which will be discussed later, is especially helpful.

##

## Not all Respondents are Created Equal / Correlation Graphs

The graphs we have looked at before are simple in some ways. They do little to illustrate relationships. Nagging questions remain: Do individuals who ask more questions learn more? Do individuals who answer more questions learn more? These relationships have not been apparent in our graphs before now.

## Basic Scatterplot

In statistics and research, it is common to use a scatter plot to show relationships among variables. For example, a scatter plot might graph minutes of exercise per day on the x-axis (horizontal axis) and graph the measure of BMI (Body Mass Index) on the y-axis. There would be a lot of variation (if using unrounded numbers) and patterns would emerge.

To learn about graphing simple scatter plots in R, you may want to look here. Additionally, you may want to use the command help(plot) in R.

For our study, we want to see if there is a relationship between members' participation in GNU/Linux and FOSS community tasks and activities and their perception of the benefit of involvement in GNU/Linux and FOSS communities. Question sections 3a and 3b were used to determine the frequency with which respondents participate in software, communication, and other activities related to GNU/Linux and FOSS communities. Question 4 component 2 asked respondents to identify to what extent they felt involvement in the FOSS community is educationally beneficial.

For this graph we will use the plot() function and a few familiar parameters. The syntax is plot(x,y) where x is considered the independent variable and y the dependent variable.

**main** - "main" refers to the main title centered above the graph.

**xlab** - xlab indicates a label for the horizontal (x) axis.

**ylab** - ylab indicates the label for the vertical (y) axis.

To create a simple scatter plot illustrating the relationship between perception of benefit and overall participation enter:

`plot(Q3a3bConsolidated, Q4.2, xlab="Overall Participation", ylab="Perceived Benefit", main="Perceive Involvement with\n the FOSS Community as\n Educationally Beneficial" )`

Which produces the following graph:

Though this graph does provide some additional insight, it is a bit difficult to decipher. One problem is the overlap of values. 4603 respondents completed the survey; many of them had similar, if not identical, answers. In order to create a more insightful graph we will use the ggplot2 library to create heat graphs. The color coding of the heat graphs will help illustrate the frequency of response combinations.

##

## GGPlot2

GGPlot2 is a large graphics library that offers a plethora of ways to visualize data. Have a look here.

### Installing GGPlot2

We will be using the ggplot2 library which is full of many wonderful tools to visualize data in R. It is certainly worth taking a look at the website maintained by Hadley Wickham at http://had.co.nz/ggplot2. (After completing (some of) this tutorial, you should be able to generate many of the graphs available in ggplot2.) You may also want to look here: http://www.statmethods.net/

To install GGplot2, enter:

`install.packages(“ggplot2”)`

We will also need to install the hexbin package:

`install.packages("hexbin")`

### Loading Libraries

Now, we need to specify that we will be using the ggplot2 library.

Enter the following text:

`library(ggplot2)`

`library(hexbin)`

###

### A Basic Heat Graph / Better than a Scatter Plot

As in the previous graph, we will compare participation (as measured by sections 3a and 3b) with respondents' perceptions of GNU/Linux as being educationally beneficial. (Remember from Algebra that when graphing coordinates the format is (x,y) where x is the horizontal axis (independent variable) and y is the vertical axis (dependent variable); whereby, the independent variable (x) affects the dependent variable (y).)

We will compare “Q3a3bConsolidated” (the sum of respondents 'participatory' activities) with “Q4.2” ('Do you feel that involvement in the FOSS community is educationally beneficial?').

We will first create a data vector 'd' using the following parameters:

**newlpp** - we will specify for ggplot2 which data set we intend to use.

**aes()** - the aes function describes to ggplot how "the data are mapped to visual properties (aesthetics) of geoms." See also `help(aes)`

.

To create the data vector 'd', enter:

`d <-ggplot(newlpp, aes(Q3a3bConsolidated, Q4.2)`

If you were to enter 'd' you would receive a blank graph as we have not actually defined any layers.

Now, we create a 'heat map' using hex tiles. We use a heat map because other graphic visualizations become obfuscated through overlap. (For example, two responses graph to the same point, whereby one response with one point looks the same as 200 responses that graph to one point.)

To view the heat graph illustrating the relationship between overall participation and perceived benefit of involvement with FOSS, enter:

`d + geom_hex()`

**Wait a second!** A blank screen will initially appear; it may take a few seconds for the graph to be generated.

Then you should see the following graph:

### Adjusting the Graph

#### 1. Adjust the size

The graph may look better with larger hex tiles and labels. We will now trying varying sizes of hex tiles.

(For further information please look here: http://had.co.nz/ggplot2/stat_binhex.html)

We can create larger hex tiles with the commands:

`d + geom_hex(bins=5)`

or

`d + stat_binhex(bins = 5)`

The output is the same.

Creating:

These may be too large. We can try:

`d + stat_binhex(bins = 25)`

These are a bit too small.

This seems a good value:

`d + stat_binhex(bins = 10)`

This can also be generated with qplot:

`qplot(Q3a3bConsolidated, Q4.2, data=newlpp, geom="hex", xlim = c(0, 60), ylim = c(-2, 2), bins=10)`

#### 2. Add labels

The variable names are a bit awkward. We may want to label the axes with something more human readable. (See also: http://www.statmethods.net/advgraphs/axes.html)

Let's label the X axis 'Level of Participation' and the Y axis 'Perception of Involvement in FOSS Community as Educationally Beneficial'.

First enter:

`d <-ggplot(newlpp, aes(Q3a3bConsolidated, Q4.2)) + geom_point() + xlab("Level of Participation") + ylab("Perception of\n Involvement in FOSS Community as\n Educationally Beneficial")`

(Be sure to close all your parentheses correctly.)

Then enter:

`d + stat_binhex(bins = 10)`

or

`d + geom_hex(bins = 10)`

Which produces the following graph:

The red hex tiles indicate a greater number of respondents indicating that they perceive involvement with FOSS communities as educationally beneficial. Also, there appears to be a diagonal trend highlighting a slight correlation between participation and perceived education benefit.(A line going up and to the right indicates a positive slope; a line going down and to the right indicates a negative slope.)

#### 3. Add a line of best fit.

A line of best fit helps us to visualize the overall trend a bit better.

First we will want to determine the slope and the intercept of our line of best fit.

Notice that the dependent variable is listed first here:

`coef(lm(Q4.2 ~ Q3a3bConsolidated, data=newlpp))`

We obtain our intercept and slope:

`1.09016456 0.02313583`

To graph this line of best fit with the previous graph, enter:

`d + geom_hex(bins=10) + geom_abline(intercept = 1.09, slope = .023, colour = “red”, size = 1)`

Which produces the following graph:

Later, we will look at central tendency which may be more informative.

#### 4. Save data to a file.

We will want to save our data to a file. In order to do this we:

1. Open an image file (as a device)

2. Run the graphing command

3. Close the device (to keep the file from being written to again).

We open the image file:

`png(filename="participation_perceived_benefit_involvement_abline.png" ,width=800, height=600)`

R can produce multiple image formats. PNG was chosen as it is preferable for many reasons.

To create the graph without specifying the size (which may look better), we enter:

`png(filename="participation_perceived_benefit_involvement_abline.png")`

The enter the command to create an image:

`d + geom_hex(bins = 10) + geom_abline(intercept = 1.09, slope = .023, colour = "red", size = 1)`

Now we close the file:

`dev.off()`

This creates the following image:

You should be able to view the image in an image viewer of your choice.

**<---Previous Index Next--->**

## Comments (0)

You don't have permission to comment on this page.