Project
R Project
Throughout this book, I will use the ‘R Project’ as the core workspace unit. It serves as a centralized location where all relevant materials, such as R scripts (.R
) and data files, are consolidated. There are various ways to organize your project, but I prefer creating a single ‘R Project’ for a set of scripts and data that contribute to a specific publication (see example here). To set up an ‘R Project,’ you’ll need to have both RStudio and the base R software installed. While R can be used independently, I highly recommend using it in conjunction with RStudio due to the latter’s numerous features that facilitate data analysis. You can download R and RStudio from the following websites:
- R (you can choose any CRAN mirror for downloading)
- RStudio
Upon launching RStudio, you will encounter the interface depicted in Figure 10.2. Initially, the interface comprises three primary panels: the Console
, Environment
, and Files
. The Console
is where you write code and execute calculations, data manipulation, and analysis. The Environment
panel lists the saved objects, while the Files
panel displays files in a designated location on your computer.
After pasting the script into the console, you will notice that the variable x
appears in the environment panel.
x <- c(1, 2)
x
is an object that stores information. In this particular case, we have stored a sequence of 1
and 2
in the object x
. Once you have stored information in x
, you can access it by simply typing x
.
x
## [1] 1 2
Excellent! While it is possible to work with your data using this approach, it is important to highlight that it is generally regarded as a poor practice. As your project evolves, you will accumulate a substantial amount of material, such as writing more than 2000 lines of code for a single project. Consequently, it becomes essential to implement effective code management strategies. How do you presently handle code management?
Script Editor
It is highly recommended to manage your scripts in the Editor
instead. The Editor
is where you draft and fine-tune your code before executing it in the Console
. To create space for the Editor
, press Ctrl + Shift + N
. A new panel will appear in the top left corner. Let’s type the following script in the Editor
. Please note that the key combination Ctrl + Shift + N
assumes a Windows or Linux operating system. If you’re using a Mac, you can use Command + Shift + N
instead.
y <- c(3, 4)
Then, hit Ctr + S
to save the Editor
file. RStudio will prompt you to enter the file name of the Editor
7.
File Name
It is also crucial to establish consistent naming rules for your files. As your project progresses, the number of files within each sub-directory may increase significantly. Without clear and consistent naming rules for your files, navigating through the project can become challenging, not only for yourself but also for others involved. To alleviate this issue, consider implementing the following recommendations for file naming:
-
NO SPACE. Use underscore.
- Do:
script_week1.R
- Don’t:
script week1.R
- Do:
-
NO UPPERCASE. Use lowercase for file names.
- Do:
script_week1.R
- Don’t:
Script_week1.R
- Do:
-
BE CONSISTENT. Apply consistent naming rules within a project.
- Do: R scripts for figures always start with a common prefix, e.g.,
figure_XXX.R
figure_YYY.R
(XXX
andYYY
specifies further details). - Don’t: R scripts for figures start with random text, e.g.,
XXX_fig.R
,Figure_Y2.R
,plotB.R
.
- Do: R scripts for figures always start with a common prefix, e.g.,
Structure Your Project
If you fail to save or haphazardly store your code files on your computer, the risk of losing essential items becomes inevitable sooner or later. To mitigate this risk, I highly recommend gathering all the relevant materials within a single R Project
. To create a new R Project
, follow the procedure outlined below:
- Go to
File > New Project
on the top menu - Select
New Directory
- Select
New Project
A new window will appear, prompting you to name a directory and select a location on your computer. To choose a location for the directory, click on the ‘Browse’ button. When organizing your project directories on your computer, I highly recommend creating a dedicated space. For instance, on my computer, I have a folder named github/
where I store all my R Project
directories.
The internal structure of an R Project
is crucial for effective navigation and ensures clarity for both yourself and others when it is published. An R Project
typically consists of various file types, such as .R
, .csv
, .rds
, .Rmd
, and others. Without an organized arrangement of these files, there is a high probability of encountering significant coding errors. Therefore, I place great importance on maintaining a well-structured project. In Table 10.1, I present my recommended subdirectory structure.
Name | Content |
---|---|
README.md |
Markdown file explaining contents in the R Project . Can be derived from README.Rmd . |
code/ |
Sub-directory for R scripts (.R ). |
data_raw/ |
Sub-directory for raw data before data manipulation (.csv or other formats). Files in this sub-directory MUST NOT be modified unless there are changes to raw data entries. |
data_fmt/ |
Sub-directory for formatted data (.csv , .rds , or other formats). |
output/ |
Sub-directory for result outputs (.csv , .rds , or other formats). This may include statistical estimates from linear regression models etc. |
rmd/ |
(Optional) Sub-directory for Rmarkdown files (.Rmd ). Rmarkdown allows seamless integration of R scripts and text. |
Robust coding
While it is not mandatory, I highly recommend using RStudio in conjunction with Git and GitHub. Coding is inherently prone to errors, and even the most skilled programmers make mistakes—without exception. However, the crucial difference between beginner and advanced programmers lies in their ability to develop robust coding practices accompanied by a self-error-detection system. Git plays a vital role in this process. Throughout this book, I will occasionally delve into the importance of Git and its usage.