My First R Stuff
‘A basic R tutorial for psychologists’
My First R Stuff
- My first attempt to use R
- R is a calculator first of all
- Create Variables
- What we can do with Variables?
- Dataframes or Matrices
- Functions
- Logics
- Read data in R
- A practical example
- An applied example of indexing
- Visualising data in ggplot first.
- Group your data and prepare them for the analyses
- Do a 2 x 2 x 2 factorial ANOVA
- A real barplot with means and errorbars
- Do pairwise comparisons with a simple t test?
- Transform your matrix from long to wide and make it SPSS or other crap software ready.
- Transform matrix from wide to long
- Thanks a lot for listening/reading and hope this will be useful for you.
My first attempt to use R
R is a programming language for statistical computing, similar to other programming languages (like matlab). What are its advantages? Honestly the most obvious one for me is that I can go from a computer to the other without panicking about software licenses, drivers, spending two hours installing toolboxes. Moreover, there is a broad community out there (beyond psychologists, notably people that do not share much their resources) ready to answer more or less dumb questions and responses to them. I believe also is a powerful tool to teach for future jobs but also to have a better grasp on the “power” of data analysis. It forces to better understand the methods that we use.
R is a calculator first of all
[1] 4
[1] 45.80952
Guess you are thinking, very cool but this is what everyone teaches you in any course of programming and it is boring. And my answer is yes, is damn boring, I tried hard to get rid of these basics that sometimes are just confusing if you want to use R for your data but I can’t!
But let’s try to go on quickly
Create Variables
Data can be stored into variables, variables have arbitrary names (also funny ones). In R you can create variables using the “<-” also called assignment operator. You can also use “=” but since the “=” sign has a use in some other bits where the assignment operator does not work I would advice you to use the “<-”
Here an example:
What we can do with Variables?
Well first of all we can use them to calculate stuff for instance
[1] 20
[1] 26 30
[1] 660
Dataframes or Matrices
In R and for data analysis we will have to have notions of what vectors are but also of matrices. in R these are called dataframes and differently from the matlab matrixes they can have nice colnames and rownames. A Dataframe is a two-dimensional vector which can have names for rows and columns. How do you create a dataframe?
data.frame(x = seq(10,1), y = seq(1,10))
The code says: Make a dataframe with two column variables, x and y. X contains a sequence from 10 to 1 and a sequence from 1 to 10. We can assign this dataframe a name and recall it as much as we do with variables. Another crucial symbol for R base is the $ (cash matters, a lot) Anytime you have dataframes you have basically few variables stored into columns, you can call them by attaching to the name of the df a dollar (see below)
df <- data.frame(x = seq(10,1), y = seq(1,10))
#When you assign it you need to call it to see its result normally
df# A tibble: 10 x 2
x y
<int> <int>
1 10 1
2 9 2
3 8 3
4 7 4
5 6 5
6 5 6
7 4 7
8 3 8
9 2 9
10 1 10
[1] 10 9 8 7 6 5 4 3 2 1
[1] 1 2 3 4 5 6 7 8 9 10
Okay cool, but what we can do now with that?
I know you might are thinking, I could do this in excel in 5 minutes, but what’s the advantage? First of all here we can easily Index stuff, call elements of vector or a dataframe using the “[]” brackets. The first element indicates rows, the second columns
[1] "Capricciosa"
[1] 7
[1] 1 2 3 4 5 6 7 8 9 10
#You can also be cool and embed a vector in the indexes
df[c(1:5),] #I want only the rows from 1 to 5 of my df and of course both columns# A tibble: 5 x 2
x y
<int> <int>
1 10 1
2 9 2
3 8 3
4 7 4
5 6 5
Indexing will become crucial if you will ever start using for loops and all that fancier computer stuff but I will show you later how squared brackets are useful to call single elements or elements that correspond to certain features
Functions
Of course in R there are functions you can apply to your nice variables and numbers. Many of them are embedded into specific packages (all for free, no hidden costs(yet)) but just to show you quickly a few:
[1] 15
[1] 11
[1] 30
[1] 5.5
[1] 5.5
Logics
A logic is anything that returns a TRUE of FALSE. Is important that you know that logics are called this way because the word will come in any googled thing about R The most common logic operators, at least those that I know, are:
- “==” means equal to
- “&” means AND, returns TRUE when both conditions are met at the same time
- “|” means OR, returns TRUE when one condition or the other are met
- “>” greater than
- “<” smaller than
- “!=” not equal to
Logics will be very important for If statements.
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE
Read data in R
The functions for reading in data in R depend on what kind of file you have. The most common options are .csv, .txt, and an Excel file. Let’s look at each one of those.
These functions have many arguments that help you defining some parameters of the text you are about to read.
read.table("~/myexperiments/myexperiment/mydata/data.txt", sep = "\t", header = TRUE)
With this command I am reading a text file in a specific path and with a specific name “data.txt”. I am telling R that this file uses tab separator “sep =” " and to read the first column as name of the variables of the data frame (set this to false if your columns do not have a name by default). The command above uses a function to read the file but reading isn’t enough if we mean to work with these data in R. We need to store it into a variable!
Rawdata <- read.table("~/myexperiments/myexperiment/mydata/data.txt", sep = "\t", header = TRUE)
If the file you are aiming to read is a .csv then the function “read.csv” will do the job.
A practical example
Datafile available to download: menu.csv
Note that this dataframe has been downloaded from Joey Stanley Website who has a site full of great tutorials.
Make sure you change the path below so that it matches the path where you downloaded the datafiles locally.
#If you are on windows instead of / you can use double \\
#or just / (copying the path and inverting their orientation)
#dataset taken from the fantastic website of Joey Stanley credits to him! (www.joeystanley.com)
menu <- read.csv("https://marco2gandolfo.github.io/rforpsychologists/menu.csv", sep = ",")Now I will introduce you some functions and few ways to look and slice/filter this nice dataset
#what is the structure of the dataset, this gives you few crucial informations about the levels of your #factors and the number of your variables
str(menu)'data.frame': 260 obs. of 6 variables:
$ Category: Factor w/ 9 levels "Beef & Pork",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Item : Factor w/ 260 levels "1% Low Fat Milk Jug",..: 76 77 228 229 230 245 12 11 14 13 ...
$ Oz : num 4.8 4.8 3.9 5.7 5.7 6.5 5.3 5.8 5.4 5.9 ...
$ Calories: int 300 250 370 450 400 430 460 520 410 470 ...
$ Fat : num 13 8 23 28 23 23 26 30 20 25 ...
$ Sugars : int 3 3 2 2 2 3 3 4 3 4 ...
# A tibble: 6 x 6
Category Item Oz Calories Fat Sugars
<fct> <fct> <dbl> <int> <dbl> <int>
1 Breakfast Egg McMuffin 4.8 300 13 3
2 Breakfast Egg White Delight 4.8 250 8 3
3 Breakfast Sausage McMuffin 3.9 370 23 2
4 Breakfast Sausage McMuffin with Egg 5.7 450 28 2
5 Breakfast Sausage McMuffin with Egg Whites 5.7 400 23 2
6 Breakfast Steak & Egg McMuffin 6.5 430 23 3
# A tibble: 6 x 6
Category Item Oz Calories Fat Sugars
<fct> <fct> <dbl> <int> <dbl> <int>
1 Smoothies & S… "McFlurry with M&M\x89\xdb\xa… 7.3 430 15 59
2 Smoothies & S… McFlurry with Oreo Cookies (S… 10.1 510 17 64
3 Smoothies & S… McFlurry with Oreo Cookies (M… 13.4 690 23 85
4 Smoothies & S… McFlurry with Oreo Cookies (S… 6.7 340 11 43
5 Smoothies & S… McFlurry with Reese's Peanut … 14.2 810 32 103
6 Smoothies & S… McFlurry with Reese's Peanut … 7.1 410 16 51
# A tibble: 10 x 6
Category Item Oz Calories Fat Sugars
<fct> <fct> <dbl> <int> <dbl> <int>
1 Breakfast Egg McMuffin 4.8 300 13 3
2 Breakfast Egg White Delight 4.8 250 8 3
3 Breakfast Sausage McMuffin 3.9 370 23 2
4 Breakfast Sausage McMuffin with Egg 5.7 450 28 2
5 Breakfast Sausage McMuffin with Egg Whites 5.7 400 23 2
6 Breakfast Steak & Egg McMuffin 6.5 430 23 3
7 Breakfast Bacon, Egg & Cheese Biscuit (Regu… 5.3 460 26 3
8 Breakfast Bacon, Egg & Cheese Biscuit (Larg… 5.8 520 30 4
9 Breakfast Bacon, Egg & Cheese Biscuit with … 5.4 410 20 3
10 Breakfast Bacon, Egg & Cheese Biscuit with … 5.9 470 25 4
An applied example of indexing
[1] "data.frame"
# A tibble: 10 x 4
Category Item Oz Calories
<fct> <fct> <dbl> <int>
1 Breakfast Egg McMuffin 4.8 300
2 Breakfast Egg White Delight 4.8 250
3 Breakfast Sausage McMuffin 3.9 370
4 Breakfast Sausage McMuffin with Egg 5.7 450
5 Breakfast Sausage McMuffin with Egg Whites 5.7 400
6 Breakfast Steak & Egg McMuffin 6.5 430
7 Breakfast Bacon, Egg & Cheese Biscuit (Regular Biscuit) 5.3 460
8 Breakfast Bacon, Egg & Cheese Biscuit (Large Biscuit) 5.8 520
9 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (R… 5.4 410
10 Breakfast Bacon, Egg & Cheese Biscuit with Egg Whites (L… 5.9 470
#I want to take the first 10 observations of all variables excluding the first three columns -c (see negative sign)
menu[1:10, -c(1,2,3)]# A tibble: 10 x 3
Calories Fat Sugars
<int> <dbl> <int>
1 300 13 3
2 250 8 3
3 370 23 2
4 450 28 2
5 400 23 2
6 430 23 3
7 460 26 3
8 520 30 4
9 410 20 3
10 470 25 4
#I want to take the first 30 observation of column 4 and 6. For this we need the syntax c() for #vectors
menu[1:10,c(4,6)]# A tibble: 10 x 2
Calories Sugars
<int> <int>
1 300 3
2 250 3
3 370 2
4 450 2
5 400 2
6 430 3
7 460 3
8 520 4
9 410 3
10 470 4
#once we know what we want to select from the main dataframe we can assign it to a variable
selmenu <- menu[1:30,c(1,3,4,6)]Very cool, but sometimes, especially with bigger dataframes, to slice your dataframe you don’t want to rely only on its position in the matrix. Could be quite confusing. Let’s explore some other ways to do that in base R and let’s start to have some more fun!
#Hi base R, show me the first 6 lines in which the variable fat is == 0 excluding column one to 3
head(menu[menu$Fat == 0,-c(1:3)])# A tibble: 6 x 3
Calories Fat Sugars
<int> <dbl> <int>
1 20 0 2
2 15 0 3
3 140 0 39
4 200 0 55
5 280 0 76
6 100 0 28
#show me those lines where fat is 0 and they are not beverages or coffees and teas and store them in a #variable
zerofat <- menu[menu$Fat == 0 & menu$Category != "Beverages" & menu$Category != "Coffee & Tea",]Congratulations! You arrived this far and you won an undressed side salad! :)
What have we learnt so far?
Some very basic R language and syntax that will help us to understand further
To read a file into R
To view it and to slice it or filter it.
Not bad but there is a long way to go still.
Process and visualise data with the Tidyverse
As said a particular power of R is to be able to choose from many open-source packages to do stuff. One of the most famous ensemble of packages for datascience is what is called the Tidyverse. This macro-package is not for analysis directly but for data wrangling data cleaning and visualisation.
Let’s install it! The sintax is always the same what changes is the name of the packages, remember that installing a package is not enough. To use it you need to load it as well. See commands below:
install.packages("tidyverse") #to install it
What is great about it (besides that is widely used and that it has a lot of support online) is that it is very coherent and likes to have the data always organised in the same way for all the functions that it uses. A tidy dataframe is simply a dataframe in which every observation is in a Row and every variable is in a column. In a psychology example is a dataframe in which in each row there is a trial and in each column all the independent and dependent variables for instance:
tidyme <- data.frame(Subj = rep(1:20, each = 250), condition = c(rep("Cong",each = 125), rep("Incong", each = 125)), rt = rep(rnorm(250), each = 20 ))
str(tidyme)'data.frame': 5000 obs. of 3 variables:
$ Subj : int 1 1 1 1 1 1 1 1 1 1 ...
$ condition: Factor w/ 2 levels "Cong","Incong": 1 1 1 1 1 1 1 1 1 1 ...
$ rt : num 0.565 0.565 0.565 0.565 0.565 ...
Subj condition rt
Min. : 1.00 Cong :2500 Min. :-2.480240
1st Qu.: 5.75 Incong:2500 1st Qu.:-0.675953
Median :10.50 Median :-0.005626
Mean :10.50 Mean : 0.041263
3rd Qu.:15.25 3rd Qu.: 0.736367
Max. :20.00 Max. : 2.688866
| Subj | condition | rt |
|---|---|---|
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
| 1 | Cong | 0.5654544 |
We will start with some very basic examples of data visualisation and some correlations and ttests on our fancy menu dataset. But, most importantly, we will learn how to wrap the data, create new columns and use all this tidyverse functions (from its dplyr package) that allow you to group the data and to average stuff. One thing at a time!!
Remember at this point of the tutorial you absolutely need to have the tidyverse ensemble of packages installed and loaded if you haven’t done it yet.
library(tidyverse)
Visualising data in ggplot first.
ggplot2 is a very complex and powerful tool for data visualisation. Even if it is quite hard to use it can give you instant satisfaction by doing pretty graphs with few lines of code. Let’s get all the reward that it gives for now. First of all its philosophy is relatively straightforward. the “ggplot()” function wants to know what is the dataframe from which we wanna make the graph and an “aes()” argument in which we define the x variable and the y (dependent measure usually) axes of the graph. The “+” sign is crucial for ggplot2 language and is honestly very handy. You can add elements to your graph by simply adding a plus sign. Each element is expressed in a function and what I usually do is to add to the “ggplot(df,aes(x,y))” what is the geometry, the type of graph that I want to do: “ggplot(df,aes(x,y)) + geom_point()” The function geom point will do a nice scatterplot. I am done with the talking, remember the structure is pretty much always like this one, the rest is about making it prettier with many many functions as it can be 100% flexible in each feature you want to modify.
# A tibble: 6 x 6
Category Item Oz Calories Fat Sugars
<fct> <fct> <dbl> <int> <dbl> <int>
1 Breakfast Egg McMuffin 4.8 300 13 3
2 Breakfast Egg White Delight 4.8 250 8 3
3 Breakfast Sausage McMuffin 3.9 370 23 2
4 Breakfast Sausage McMuffin with Egg 5.7 450 28 2
5 Breakfast Sausage McMuffin with Egg Whites 5.7 400 23 2
6 Breakfast Steak & Egg McMuffin 6.5 430 23 3
##Let's say I want to have a look at the linear relationship (I guess!) between calories and ##fat
calories_vs_fat <- ggplot(menu, aes(Calories,Fat)) + geom_point() #super basic scatterplot
#but to see it you have to call it
calories_vs_fat#now let's play a little bit to add elements(regression line for instance).Do it one at a time
caloriesPretty <- calories_vs_fat + #call the previous plot we did and start adding stuff
geom_smooth(method = lm, se = TRUE) + # add line
ggtitle("The higher the calories the more the fat") + #add a title
theme_classic(base_size = 18) + #add a nicer theme
coord_cartesian(ylim=c(0,60)) + #let me see only values from 0 to 60 on y
coord_cartesian(xlim = c(0,1200)) + #let me see only values until 1200 kcal
theme(legend.title = element_blank()) #remove legend title
caloriesPretty #now let's check the pearson's R with the cor function. This takes argument x(Calories), #y(Fat) and method = "pearson" (in this case)
cor(menu$Calories, menu$Fat, method = "pearson") #return only the R value! pretty high![1] 0.9044092
Pearson's product-moment correlation
data: menu$Calories and menu$Fat
t = 34.048, df = 258, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8795250 0.9243604
sample estimates:
cor
0.9044092
Nice, now let’s try other geoms other types of graphs that might be useful to you.
“geom_histogram” Histogram
“geom_point” Scatterplot
“geom_boxplot” Boxplot
“geom_line” LinePlot
“geom_bar” A Barplot
#boxplot
ggplot(menu, aes(Category,Calories)) + geom_boxplot() +
theme_classic(base_size = 12, base_family = "Times")#line plot, not sure this is super appropriate but to have an idea of the graph type.
ggplot(menu, aes(Fat, Calories, color = Category )) +
geom_line() +
coord_cartesian(xlim=c(0,30))Group your data and prepare them for the analyses
Now is time to play with realistic data
We want to start doing more serious stuff now. And for that we will need “real” data What we will do?
Average a matrix by participant by condition to make it ready for an ANOVA
Compute new variables from the existing ones
Filter Participants or certain observations out
Perform an ANOVA with the “ez” package
We will take FAKE (randomly generated) data of an imaginary experiment which represent a nice mixed anova design.
We compare two different image categorisation experiments (1 with face pictures the other one with objects). Different participants participated to the two experiments.
In each experiments participants were stimulated either to a FaceArea or a ObjectArea Moreover in both experiments the image to judge was either congruent or incongruent with another image in the background.
So this is a simple 2 (face/object experiment(Between)) x 2 (TMS site, facearea/objectarea) x 2 (congruent/incongruent irrelevant image)
What do the experimenters expect expect? In the face experiment effects on RT (and maybe also on the congruency) when stimulating the face area (site of interest) but not for the object area (control site). In the Object experiment the opposite is effected , some effect on RT for the object area but not the face area.
To show you how to filter data we will exclude 4 people to exclude over 46, and we will have thus 21 people per experiment which went through two stimulation sites and 2 rt conditions (congruent, incongruent).
It smells like the experimenters are hypothesising for at least an interaction Experiment x Site where there are some specific effects on Reaction times in each task depending on which site has received the stimulation. Also we might expect these effects to be modulated by the congruency between the relevant image judged and the image in the background.
Remember that these data are not real so what we hope or hypothesised will probably not appear!
Link to the dataframe to download randomdata
Remember to change the path if you have downloaded the file locally.
data <- read.csv("https://marco2gandolfo.github.io/rforpsychologists/randomdata.csv", sep = ",", header = TRUE)
#quick inspection to check if things are right
head(data, 10) X SubjID Site Task Congruency RT Acc
1 11247 p64 ffa object cong 906.8283 0
2 9908 p59 lo object incong 737.1026 0
3 13723 p72 lo object cong 634.5864 0
4 4440 P16 ffa face incong 528.5841 1
5 4874 P17 lo face cong 1011.2421 0
6 12645 p68 lo object cong 1124.3055 0
7 5755 P20 ffa face cong 882.6555 1
8 4104 P15 ffa face cong 731.4093 0
9 13762 p73 ffa object incong 918.9980 1
10 13461 p72 ffa object cong 989.9528 0
Observations: 14,352
Variables: 7
$ X <int> 11247, 9908, 13723, 4440, 4874, 12645, 5755, 4104, 13…
$ SubjID <fct> p64, p59, p72, P16, P17, p68, P20, P15, p73, p72, P09…
$ Site <fct> ffa, lo, lo, ffa, lo, lo, ffa, ffa, ffa, ffa, lo, lo,…
$ Task <fct> object, object, object, face, face, object, face, fac…
$ Congruency <fct> cong, incong, cong, incong, cong, cong, cong, cong, i…
$ RT <dbl> 906.8283, 737.1026, 634.5864, 528.5841, 1011.2421, 11…
$ Acc <int> 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
#Check with xtabs how many trials by subject by site participants went through
#Ideally this is all matched
head(xtabs(~data$SubjID + data$Site)) data$Site
data$SubjID ffa lo
P01 156 156
P02 156 156
P03 156 156
P04 156 156
P05 156 156
P06 156 156
Two simple basic things already bother me. The first is that we have a column x whic we do not need, the second one is that our experiment between factor is called “Task” and, for clarity, we will rename it.
From now on I will use some dplyr functions
“%>%” Not directly a dplyr function also known as PIPE (c’est ne pas un pipe) this function works like the plus in ggplot 2. Allows you to keep going with the operations on your data without the need of useless code typing. You will love it. You can type it with a shortcut in R studio “cmd + shift + m”
“select” regards COLUMNS and you can pick those that you need
“filter” regards ROWS and is useful to exclude rows (conditions of a variable or participants)
“mutate” computes a new column based on operations on other columns
“rename” renames a specific column, equivalent to base r colnames(data$Task) <- “Experiment”
“group_by” genius fantastic function to wrap up your dataframe, for instance by participant by condition and make it ready to compute things on each member of the group. So is a pre-requisite to be able to collapse by participant by condition
“summarise” goes with the group_by and once a sumple is grouped you are able to summarise those groups using any function, as mean by participant by condition for RT, mean accuracy etc..
Let’s start one by one and will be simple! Remember, I said that I am annoyed by a useless X first column and the name of the between factor which I wannt to be called Experiment
dataclean <- data %>%
select(-X) %>% #unselect the column X mind about the - sign before
rename(Experiment = Task) %>% #rename as Experiment the variable Task
filter(!SubjID %in% c("P19","P01","p58","p63"))Let’s take a short break before averaging for the analyses. Select and Rename are pretty clear. I created a new variable called dataclean in which I removed the X dirty useless variable and renamed as I wanted the Task variable into Experiment. Then I filtered and the syntax might be slightly more obscure. The “%in%” sign means “Include”. The Exclamation mark means “!” NO!!!! So I am telling the function to keep All the Rows which Have a subject ID value that DO NOT contains “P19”, “P01”, “p58”, “p63”.
Now I think we are ready to generate a matrix in which we have mean Accurate RT mean accuracy by participant by condition. We will also use the mutate function to create a new column which in cognitive psychology is defined as “Efficiency”. ReactionTimes/MeanAccuracy. So we give a cost in RT to those people that have been really accurate and a little discount to those that have been less accurate but very fast.
Let’s do IT!
AveMat <- dataclean %>%
group_by(SubjID, Experiment, Site, Congruency) %>% #conditions to average for..
summarise(meanRT = mean(RT[Acc==1]), meanAcc = mean(Acc)) %>% #apply function to ave
mutate(Efficiency = meanRT/meanAcc)Done!
We grouped by the conditions we wanted to do operations independently. In Each participant, in each experiment in each site in each RT condition we summarised to apply the mean function on RT (only accurate Trials) and on accuracy. After that we made a new column that computes the efficiency by dividing these new variables. I explicitly wanted to include mutate in this flow to show you its power. We can compute new variables based on columns we have just created and not yet seen.
Do a 2 x 2 x 2 factorial ANOVA
For the ANOVA I will use the commonly and nice used package EZ. To make more friendly its results to look at I will use the package psychReport.
Install them please
install.packages("ez")
install.packages("psychReport")
Now let’s load them
The ez package has many functions and they all pretty much have the same structure. The ez functions want as argument:
- the dataframe you want to do the ANOVA from (remember this dataframe must be always in the long, not in the wide format, as our AveMat is)
- the wid or case identifier, a unique value per case or Subject (in our field is normally the subject variable)
- within factor
- between factor if any.
These are only the main arguments you can give to ezANOVA but there are more (where you specify, if needed, covariates, the type of sum of square). If you want to explore this you can refer to the help of ez which are pretty well done I believe. To open the ezAnova help or any help for any function please type:
?ezANOVA
Another handy function I use from this package is ezStats. This returns a matrix with mean and standard deviation of each condition and it is useful to see what is the direction of our effects.
Let’s start our analysis by studying the Efficiency variable. This will be our dependent variable, our wid argument will be the SubjID, our within arguments the Site and Congruency (Within factors) and our between factor will be Experiment.
I will also try in the code to use the psychReport package to show you a pretty table of your results. I will use the nice aovTable function that wants the name of the variable where you store the model as an argument. Remember that to make this your ez model must have the argument “return_aov” and “detailed” equal to TRUE
#store my anova into a variable that I call EFF_Model
EFF_Model <- ezANOVA(AveMat, dv =.(Efficiency), wid = .(SubjID), within = .(Site,Congruency), between = .(Experiment), return_aov = TRUE, detailed = TRUE)
#use aovetable function to make a nice table stored in EFF_Modelpretty variable
EFF_Modelpretty <- aovTable(EFF_Model, numDigits = 4, dispAovTable = TRUE)════════════════════════ ANOVA:EFF_Model ═══════════════════════
Effect DFn DFd F p p<.05 pes
Experiment 1 40 0.0136 0.9079 .91 0.0003
Site 1 40 0.5397 0.4669 .47 0.0133
Congruency 1 40 11.4802 0.0016 ** 0.2230
Experiment:Site 1 40 0.1066 0.7457 .75 0.0027
Experiment:Congruency 1 40 0.0039 0.9505 .95 0.0001
Site:Congruency 1 40 0.4903 0.4878 .49 0.0121
Experiment:Site:Congruency 1 40 0.0330 0.8567 .86 0.0008
────────────────────────────────────────────────────────────────
#ezStats to show the means and sd of each condition
ezStats(AveMat, dv =.(Efficiency), wid = .(SubjID), within = .(Site,Congruency), between = .(Experiment))# A tibble: 8 x 7
Experiment Site Congruency N Mean SD FLSD
<fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 face ffa cong 21 1682. 131. 178.
2 face ffa incong 21 1890. 375. 178.
3 face lo cong 21 1704. 149. 178.
4 face lo incong 21 1834. 355. 178.
5 object ffa cong 21 1700. 178. 178.
6 object ffa incong 21 1886. 364. 178.
7 object lo cong 21 1679. 227. 178.
8 object lo incong 21 1820. 481. 178.
Nice eh!
A real barplot with means and errorbars
Now we want to do a proper plot and this time we will tell ggplot to do some statistics on the matrix that we have to make a nice barplot that represent the mean and error bars that represent the standard error from the mean. Stat_summary wants to know which geometry the stats will return, which function and on which axis this is applied. Also we add position “dodge” (relatice to the distance of the bars) and color “black” (to color the border of the bars in black) arguments to make it prettier. Note the labs argument useful to give a name to the labels.
Effbar <- ggplot(AveMat, aes(Site, Efficiency, fill = Congruency )) +
stat_summary(geom = "bar", fun.y = mean, position = "dodge", color = "black") +
stat_summary(geom = "errorbar", fun.data = mean_se, position = position_dodge(.9),width = 0.2) +
labs(y = "Efficiency ms/Acc", x= "Stimulation site")
Effbar Nice Plot, but we are not quite there, we have error bars we have colors we have cool stuff but not yet the experiment factor in. To show an additional factor I like to use the argument “facet_wrap(~factor)”. Let’s try to add it together with a nice theme to make the graph look prettier. Remember, now that our graph is stored we can straight add a plus to add elements to our graph, very handy.
Effbar + facet_grid(~Experiment) +
theme_classic(base_size = 18, base_family = "Times") +
scale_fill_grey() +
coord_cartesian(ylim = c(1250,2000))I think this is a pretty acceptable paper friendly graph. There are many other personalisations and themes to explore. I created sort of my own theme for graphs and has been a pain but you can fully personalise it. Another good way to try stuff is to install packages that have expanded the amount of themes you could find. One of these is ggtheme.
Do pairwise comparisons with a simple t test?
#ttest on efficiency as a function of congruency effect in the ffa site in the face experiment
Effttest <- t.test(Efficiency ~ Congruency, data= AveMat, paired = TRUE, subset = Site =="ffa" & Experiment == "face")
#Call it
Effttest
Paired t-test
data: Efficiency by Congruency
t = -2.3259, df = 20, p-value = 0.03065
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-394.73237 -21.46785
sample estimates:
mean of the differences
-208.1001
# A tibble: 1 x 8
estimate statistic p.value parameter conf.low conf.high method
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 -208. -2.33 0.0307 20 -395. -21.5 Paire…
# … with 1 more variable: alternative <chr>
#ttest of congruency as afunction of congruency only in the object experiment
#note that here I used the "subset" argument to filter the data
Effttest <- t.test(Efficiency ~ Congruency, data= AveMat, paired = TRUE, subset = Experiment == "object")Now friends we are at the end of this tutorial and what I want to do is a thing that I believe is useful.
Transform your matrix from long to wide and make it SPSS or other crap software ready.
I think this is useful, you can export a matrix and make it user friendly for other software. For some obscure reason someone does not want you to analyse your data in R? Do you want to double check in another software if you have done things right?
While I think this is useful I strongly encourage you to abandon licensed software for analyses. Especially SPSS. If you feel you want to have a friendlier interface than R please use softwares like jamovi or jasp. I still believe that learning this eventually will give you more chances in the angry job market or anyway to expand your knowledge about the analyses that you run. They are open source, new and well done. Here we will make matrices that would be happily accepted by these softwares.
We will try to use the unite function from tidyr (part of the tidyverse) to join in one column all the condition names into a column (for instance “ffa_congruent”). After that we will use the spread function to spread the efficiency values by condition. Another package that does that beautifully is the reshape2 package but I believe is best to keep consistency among packages.
After that we will use the “write.csv” function to write a csv and store it in our working directory. write.csv gets two argumens here
write.csv(dataframe, "path we want to write the file to")
wide_eff <- AveMat %>% #take the averaged matrix
select(-meanRT, -meanAcc) %>% #remove for now the meanrt and mean Acc
unite(Condition, Site, Congruency) %>% #unite all the within factors in a column called "condition"
spread(Condition, Efficiency)
head(wide_eff, 10) #check if it looks right# A tibble: 10 x 6
# Groups: SubjID, Experiment [10]
SubjID Experiment ffa_cong ffa_incong lo_cong lo_incong
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 P02 face 1591. 1723. 1799. 2701.
2 P03 face 1834. 1951. 1600. 1915.
3 P04 face 1988. 1591. 1837. 2300.
4 P05 face 1836. 1794. 1649. 1669.
5 P06 face 1649. 1806. 1772. 1525.
6 P07 face 1729. 1400. 1615. 1385.
7 P08 face 1867. 1923. 1777. 2390.
8 P09 face 1570. 1968. 1590. 2099.
9 P10 face 1557. 2176. 1666. 1729.
10 P12 face 1529. 1404. 1629. 1181.
Transform matrix from wide to long
Here we use the “gather” function to go back into a long format matrix. I give it 3 arguments:
- The name of the collapsed column that indicate the conditions “here” conditions
- The name of my dependent measure that has been gathered.
- The name of the columns, preceded by a minus, which I do not want to gather.
After that I use the handy “separate” function where I give it the name of the column which I want to separate, the names (as characters) of the new columns I want to create and the name of the separator.
long_eff <- wide_eff %>%
gather(condition, Efficiency, -SubjID, -Experiment) %>%
separate(condition, c("Site", "Congruency"), sep = "_")
head(long_eff)# A tibble: 6 x 5
SubjID Experiment Site Congruency Efficiency
<fct> <fct> <chr> <chr> <dbl>
1 P02 face ffa cong 1591.
2 P03 face ffa cong 1834.
3 P04 face ffa cong 1988.
4 P05 face ffa cong 1836.
5 P06 face ffa cong 1649.
6 P07 face ffa cong 1729.
Thanks a lot for listening/reading and hope this will be useful for you.
My aim is to make you become like this guy in the image below chasing people to talk about how cool “R” is.