# Learning About Packages

There are plethora of packages available for R. A good place to begin looking is the CRAN website.

## The Rseek toolbar search plugins

You may consider adding the Rseek search engine plugin to search for R packages. It is available here: http://www.rseek.org/

## The Hmisc Package

The Hmisc package is a more advanced statistics package containing a greater variety of functions. To learn more about the Hmisc package you can go to http://crantastic.org/packages/Hmisc.

### Installing the Hmisc Package

You can install the Hmisc package within R.

To do so you will enter:

` install.packages("Hmisc")`

There may be an error message that the /usr/local/lib/R/site-library is not writable. If you are the only user on your machine, it will make little difference if it is installed to your home directory. If not, you may want to run R as root (`sudo -i && R`

) in order to install packages

You will be asked to choose a CRAN mirror. You may want to choose the location closest to you.

### Load the Hmisc Library

We will be using the 'rcorr' function from the Hmisc library. If we try to use the 'rcorr' function without loading the Hmisc library we will receive the following error message:

`Error could not find function "rcorr"`

To load a library type library and in parentheses the library name. In order to load the Hmisc library enter:

`library(Hmisc)`

### Use Rcorr

Similar to the use of 'cor' above we can use 'rcorr' to determine the correlation between 2 variables. For example we may want to see how total frequency of (software + communication + miscellaneous as seen in Q3a3bConsolidated) correlates to overall learning (Q6Consolidated).

We would enter:

`rcorr(Q3a3bConsolidated, Q6Consolidated)`

We see that X has a 1:1 correspondence with X and Y has a 1:1 correspondence to Y.

The correlation value of X and Y is .47 (it has been rounded) which is one of our strongest correlation values.

'n =' tells us the number of available responses; here it is 'n=4603' for the 4603 respondents who completed the survey.

It would also be possible to use:

`cor(Q3a3bConsolidated, Q6Consolidated)`

## Finding the *R*^{2} Value

Many statisticians and mathematicians consider the r^{2} value (the r value multiplied by itself) to be a more important measure that the r value. The r^{2 }value is referred to as the coefficient of determination. To learn more about the coefficient of determination look here: http://en.wikipedia.org/wiki/Coefficient_of_determination

To view the coefficient of determination (the r squared value) for overall learning (Q6Consolidated) in relation to overall participation (Q3a3bConsolidated) we would use:

`summary(lm(Q6Consolidated ~ Q3a3bConsolidated))$r.squared`

which gives us a value of:

`[1] 0.2238651`

## Finding the adjusted **R**^{2} Value

The adjusted r squared value takes the number of variables into account. A good explanation of this and other statistical terms is available here: http://www-personal.umich.edu/~sdcamp/up504/module+regression.html

To view the adjusted r squared value for overall learning (Q6Consolidated) in relation to overall participation (Q3a3bConsolidated) we would use:

`summary(lm(Q6Consolidated ~ Q3a3bConsolidated))$adj.r.squared`

which returns:

`[1] 0.2236964`

As we are only comparing two variables and had 4603 respondents the r^{2}; value (0.2238651) is very close to the adjusted r^{2}; value (0.2236964).

## Is this less random than a coin toss? Determining Significance

How do we know that the findings are not random, that they are 'significant*'? (Significant in this context does not mean meaningful but rather non-random)**. (If at this point you would like to debate the meaning and purpose of p values and their interpretation within the context of research, feel free to send me your interpretation and analysis of p values within the context of this study.) P values are almost always listed by researchers to justify the 'significance' of their research. Be aware though, the importance and interpretation of P values is not a simple given - http://en.wikipedia.org/wiki/P-value#Frequent_misunderstandings.

In summary, p values < .05 indicate a sort of bare minimum of statistical significance (often for social sciences), those less than .01 are considered tolerable in the sciences, and values less than .001 and .0001 are considered highly significant. This has a lot to do with sample size; as our survey had 4603 respondents all data has a p-value < 2.2e-16. This is the smallest number that R can handle and indicates that p is less than .001. This indicates that there is less than .1% chance that the results were random, and at p<.001 the results are highly significant. (This does not correct for any problems, if the questions asked had little value...)

### Using a T test

To insure that our data is not random we can use a t-test. A t-test basically determines if the null hypothesis is valid. The null hypothesis is basically an assertion that all the data was random. See here: http://en.wikipedia.org/wiki/Null_hypothesis When the null hypothesis is proved invalid, it indicates that it may be worth further examining the data. For more information please see: http://en.wikipedia.org/wiki/Student%27s_t-test#Independent_one-sample_t-test

For information within R you can use the help function. For information about the t-test, you can enter:

`help(t.test)`

The study consists of one sample group. There were not various test groups or a control and a test group. We might want to determine whether respondents' answer to question 4 part 1 whether they felt that the use of FOSS is educationally beneficial.

We would enter:

`t.test(Q4.1)`

This gives us the 'p' value (which tells us the results are not random):

`p-value < 2.2e-16`

This is the smallest number that R can handle and indicates that p is less than .001. (There is less than .1% chance that the results were random.) At p<.001 the results are highly significant.

#

#

This test illustrates a variation from an idealized bell curve. It also tells us that the mean of x is 1.54. Possible values were between -2 and 2, so for a group with predicted equal preference the value should have been 0. A value of 1.54, the trend is for respondents to feel that the use of FOSS is educationally beneficial.

**<---Previous Index Next--->**

## Comments (0)

You don't have permission to comment on this page.