10 steps to build a R Package

Packages in R have made analytical solution development a lot easier by providing ready to use pre-build functionalities, saving analysts a lot of time and effort. Also, more importantly, which also happens to be a fact, packages have extended the ability to use a far greater, wider and deeper knowledge and skill with a lesser know-how. Many people today can work with Linear Regression, clustering, decision tress and Time Series models, without having to fully master the art of mathematics that works in the background.

The thought of building a R package can be heavy one with lots of questions, but it is in fact a lot simpler and easy to create you own packages and share with the open source community. By the time you will finish this article you would have already created your first package, congratulations!!!.

Lets take dplyr package for example and view how it is listed in CRAN repository. This is the standard structure in which packages in R are documented and all the packages are required to be developed in the same structure, if are to be listed in CRAN repository. Additionally, you could list your package in Github with far lesser requirement of standard procedures.

dplyr_example

10 Steps to build a R package

1. The Idea behind the package

What is the purpose for which the library is being build and what functionality it is going to provide is the foundation for every package development. It is in fact the toughest, yet most important, part of the entire process. One way to evaluate the idea would be to ask four fundamental questions:

  1. What value will my package add to the community?
  2. What would be the functionalities of my package?
  3. Are there any packages with similar functionalities to the one being thought of?
  4. Will my package offer enhanced functionalities to the one existing?

By the end of fourth question if you still feel positive enough with no dilemma. Then its time to get hands on and roll the package out to the community.

ideation1672422153.png
2. Build the functions

Code the function to realize the idea and provide all the functionalities being thought off. Certain code ethics, apart from the regular coding ones, to keep in mind while developing the code would be:

Hard ones- avoid, and the function will fail

  1. Account for variations in the input data. We might develop the code with a sample dataset at hand but in process we need to ensure that the code can handle other data formats and dimentions
  2. Consider the possibility of poor quality input data (missing values, blank rows etc.)
  3. Don’t use deprecated functions in the code

Soft ones – good to have

  1. Proper naming conventions, given the code is no more private and would be available in open source repositories for people to use and modify
  2. Ensure lesser operations and faster code run

3. Rigorous Testing

Often times developing the code is quite faster then signing it off through the testing process. Testing should ensure:

  1. Evaluating the function with all the possible variation of the input data (input data dimensions, missing/additional columns, different datatypes etc.)
  2. Evaluating the reaction of the function to exceptions (missing or additional argument parameters, different datatype than expected etc.)

4. Create a project directory

Follow the below steps to create a package directory. The setup will generate all the important folders and files, in a standard format, as required for the package development. The files would then need to be customized for the package being developed.

r_package_step1
Step – 1 Create a new directory
r_package_step2
Step – 2 R Package
r_package_step3
Step – 3 Select package name and directory path
scree_after_directory_setup
Package project directory setup is complete

The folders listed in the right bottom pane is the standard directory structure for R package and not to be messed with. Further, we will make certain changes inside the files to customize it to our package requirements.

Install required libraries:

1. roxygen2 – R requires an object documentation for every package which contains operational details about the package. Without object documentation the package will fail during check for package meeting set requirements.

It follows a defined structure and is accessed by using ? or help(). The files are written using custom syntax, loosely based on LaTeX. Instead of writing it by hand, roxygen provides a framework for auto generation of the documentation in the defined structure. The documentation is generated based on the information provided using roxygen tags in the roygen comments block, placed right at the top of the source R code. The .rd file rendered is stored in ~/man folder of the package directory.

[http://r-pkgs.had.co.nz/man.html]

2. devtools – Package provides a defined set of functions, which assists and makes package development much easier and faster. It provides functions to scan and check if package meets set requirements, renders roxygen tags to .rd file, checks for availability of the all the required files.

[https://www.rstudio.com/products/rpackages/devtools/]

3. Rtools – Provides a set of tools required for building R package in Microsoft Windows. Rtools version must be compatible with R version for it to function.

[https://cran.r-project.org/bin/windows/Rtools/].

To install Rtools:

Option-1: Install Rtools using the below command (requires installr package)

install.Rtools(choose_version = FALSE, check = TRUE, GUI = TRUE,page_with_download_url = “https://cran.r-project.org/bin/windows/Rtools/”)

The command will open a pop-up dialogue to ask for the required version. Ensure version same as R is selected.

Option-2: Download required version of Rtools from the link and run the exe file.

In either of the process,

1. It will open an installation setup dialogue. Continue with all the default selection. Ensure to check “Add Rtools to the system path” and finish the installation
2. Once installation is complete, navigate to the Rtools folder and copy the path to mingw_64\bin or mingw_32\bin (based on the 64/32 bit R version) and add the location to PATH in environment variables.
3. Check if Rtools is compatible and ready to run using below command. TRUE, indicates Rtools has been successfully installed

check_rtools_install1

4. Not installing a compatible version to R would throw an error while building and running the package. Using the below command will check if the correct version exists.

install.Rtools(choose_version = FALSE, check = TRUE, GUI = TRUE,page_with_download_url = “https://cran.r-project.org/bin/windows/Rtools/”)

check_rtools_install2

5. Edit Description File

The file holds, importantly package metadata (information about the package: title, description,version etc.), author information, package requirements (external libraries required etc.). This is a key element in the package development process given the amount of information it holds including, what libraries needs to be loaded to run the function, who is the author to reach on any issues.

Description file of dplyr package listed in CRAN gives a glimpse of the kind of information is rendered from the description file.

dplyr_example

A few important description fields to include:

1. Title: It is the header of the package. Needs to be in title case (i.e first letter of each word needs to be in capital). It needs to be short to avoid truncation and not end with period(.).

2. Version: Follows format (Major.Minor.Patch). Start with 0.1.0 version and keep updating the version number as and when changes to the package functionalities are made.

3. Description: Provide a detailed explanation of the package on the purpose and functionalities of the package. However, place only as much information as that would help anyone using the package understand the package in brief. More detailed information over the functionalities, detailed backend process and algorithms would go in README file

4. License: Open source is licensed under GPL-2 (General Public Licence – v2.0). Under the licence people are free to use to source code and make modifications to it and use or re-distribute it.
[https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html]

5. Imports: Lists all the packages that the package would need in order to work. However import does install/load (i.e. like install.packages(“package name”)

6. Suggests: List the packages whose absence would not halt the execution of the package but are required for generating examples to run tests (package: datasets), build vignettes (package: knitr). Packages listed under suggests is not imported automatically and would need to be manually installed

7. LazyData: Implies how would the execution take place. when true the external data included/linked in the package will load only when it is required by some statement rather than with the package load.

More information: [http://r-pkgs.had.co.nz/description.html]

Open the description file from the package directory and edit as below.

description_file

6. Write R file – Source file of the package

Open the hello.R file, placed in the R folder within the project directory and replace the contents of the source file with below roxygen comments block. Rename the file and save.

#’ @title
#’
#’ @description
#’
#’ @param
#’
#’ @return
#’
#’ @examples
#’
#’ @export

source_code

1. @param Parameters have to be comma separated without space separation or can we written in seperate line for better formatting

2. @return Return type is an optional entry, yet specifying it will allow the package user to know the type of object the package will return

3. @examples Provides various examples on how in different function can be used. Running example() will run all the examples provided and produce the sample output for each of them.

4. @export Export specifies the functions available for use in the package. If there are certain function which are masked to the users they need not be specified in here

5. @import Imports all the functions from the specified package

6. @importfrom Imports specified functions from the specified package.

There are certain other roxygen comments as well which can be used as required

[http://r-pkgs.had.co.nz/man.html

Add your function below the rxoygen comment blocks

7. Generate NAMESPACE file

NAMESPACE file contains information on imports and exports for the package. It uses directives and each directive defines a R object and specifies if it will need to be exported from this package to be used by package users, or imported from another package to be used during this package run.

More information: [http://r-pkgs.had.co.nz/namespace.html#namespace-NAMESPACE]

NAMESPACE file is auto generated by rendering roxygen comments and shouldn’t be modified by hand. Run the below command to generate the NAMESPACE file using the roxygen export and import tags from source code. The same run also generates .rd file (help file) which will be discussed below.

devtools::document()

namespace_file

8. Create help file

Run the below command to generate the help file. The same run also generate namespace file.

devtools::document()

To view the object documentation(help file) generated

?desc_stats
or
help(desc_stats)

help_document

9. Build Package

Once all the file have been successfully created, its time to build the package. From main menu bar select Build -> Configure Build Tools

Check option “Generate documentation using Roxygen” and “Use devtools package if avialable”

fill “Check Package – R CMD check additional options box” with –as-cran

step1_build_package

Once configured, select Build -> Clean and Rebuild to build the package. If Clean and Rebuild runs successfully without an error, it will mean library has been successfully loaded.

10. Test the package

Run example and see if the package is working as expected. This will run the example provided in roxygen comments block.

example(“desc_stats”)

However, to pass the CRAN checks the package has to be more perfect than running. It should not produce any errors, warnings and notes. Also it should contain all the required file and fields populated properly. This can be checked using Check Package option from Build menu.

Read through the checklist and fix the code, in case of any errors, warning or notes. Keep iterating till zero errors, warning or notes. are produced.

Upload the package to pubic Repositories

Generate source file by selecting Build -> Build Source Package.

Upload the file generated on CRAN [https://cran.r-project.org/submit.html]. The package will be scrutinized for possible bugs and reported back over an email. The bugs needs to be fixed and re-uploaded. The process will continue till acceptance confirmation message is received from CRAN.

So, yes this is all in R Package development and by now you would have build you first package. Hope to see them soon on CRAN.

One thought on “10 steps to build a R Package

  1. Hello there! Would you mind if I share your blog with my facebook group? There’s a lot of folks that I think would really enjoy your content. Please let me know. Many thanks

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s