1. What Is the National Agricultural Statistics Service?
The National Agricultural Statistics Service (NASS) is part of the United States Department of Agriculture. NASS administers, manages, analyzes, and shares “timely, accurate, and useful statistics in service to United States agriculture” (NASS 2020).
The NASS helps carry out numerous surveys of U.S. farmers and ranchers. You can view the timing of these NASS surveys on the calendar and in a summary of these reports. In this publication we will focus on two large NASS surveys. They are (1) the Agriculture Resource Management Survey (ARMS) and (2) the Census of Agriculture (CoA). The ARMS is collected each year and includes data on agricultural production practices, agricultural resource use, and the economic well-being of farmers and ranchers (ARMS 2020). The CoA is collected every five years and includes demographics data on farms and ranches (CoA, 2020). Additionally, the CoA includes data on land use, land ownership, agricultural production practices, income, and expenses at the farm and ranch level. Here is the most recent United States Summary and State Data (PDF, 27.9 MB), a statistical summary of the Census of Agriculture.
To improve data accessibility and sharing, the NASS developed a “Quick Stats” website where you can select and download data from two of the agency’s surveys. However, the NASS also allows programmatic access to these data via an application program interface as described in Section 2. Programmatic access refers to the processes of using computer code to select and download data. Accessing data with computer code comes in handy when you want to view data from multiple states, years, crops, and other categories. If you download NASS data without using computer code, you may find that it takes a long time to manually select each dataset you want from the Quick Stats website. Plus, in manually selecting and downloading data using the Quick Stats website, you could introduce human error by accidentally clicking the wrong buttons and selecting data that you do not actually want. We summarize the specifics of these benefits in Section 5.
2. What Is an API?
An application program interface, or API for short, helps coders access one software program from another. In this case, the NASS Quick Stats API works as the interface between the NASS data servers (that is, computers with the NASS survey data on them) and the software installed on your computer. An API request occurs when you programmatically send a data query from software on your computer (for example, R, Section 4) to the API for some NASS survey data that you want. The API will then check the NASS data servers for the data you requested and send your requested information back. This reply is called an API response.
To use a restaurant analogy, you can think of the NASS Quick Stats API as the waitstaff at your favorite restaurant, the NASS data servers as the kitchen, the software on your computer as the waitstaff’s order notepad, and the coder as the customer (you) as shown in Figure 1. The API request is the customer’s (your) food order, which the waitstaff wrote down on the order notepad. The API response is the food made by the kitchen based on the written order from the customer to the waitstaff.
3. Which Software Programs Can Be Used to Programmatically Access NASS Survey Data?
You can use many software programs to programmatically access the NASS survey data. These include: R, Python, HTML, and many more. You can also refer to these software programs as different coding languages because each uses a slightly different coding style (or grammar) to carry out a task. In this case, the task is to request NASS survey data. Here, code refers to the individual characters (that is, ASCII characters) of the coding language. A script includes a collection of code that, when taken together, defines a series of steps the coder wants his or her computer to carry out. Scripts allow coders to easily repeat tasks on their computers. For example, you can write a script to access the NASS Quick Stats API and download data. Then you can use it — coders would say “run” the script — each time you want to download NASS survey data. You can also make small changes to the script to download new types of data. To run the script, you click a button in the software program or use a keyboard stroke that tells your computer to start going through the script step by step. In the example below, we describe how you can use the software program R to write and run a script that will download NASS survey data.
You can think of a coding language as a natural language like English, Spanish, or Japanese. Code is similar to the characters of the natural language, which can be combined to make a sentence. Each language has its own unique way of representing meaning, using these characters and its own grammatical rules for combining these characters. A script is like a collection of sentences that defines each step of a task. To use a baking analogy, you can think of the script as a recipe for your favorite dessert. Running the script is similar to your pulling out the recipe and working through the steps when you want to make this dessert. However, if you only knew English and tried to read the recipe in Spanish or Japanese, your favorite treat might not turn out very well. For this reason, it is important to pay attention to the coding language you are using. As an example, you cannot “run” a non-R script using the R software program.
4. What Is R and Why Use It?
R is an open source coding language that was first developed in 1991 primarily for conducting statistical analyses and has since been applied to data visualization, website creation, and much more (Peng 2020; Chambers 2020). Open source means that the R source code — the computer code that makes R work — can be viewed and edited by the public. R is also free to download and use. The latest version of R is available on The Comprehensive R Archive Network website. Many coders who use R also download and install RStudio along with it. RStudio is another open-source software that makes it easier to code in R. The latest version of RStudio is available at the RStudio website. As an analogy, you can think of R as a plain text editor (such as Notepad), while RStudio is more like Microsoft Word with additional tools and options.
Many people around the world use R for data analysis, data visualization, and much more. Because R is accessible to so many people, there is a great deal of collaboration and sharing of R resources, scripts, and knowledge. As a result, R coders have developed collections of user-friendly R scripts that accomplish themed tasks. These collections of R scripts are known as R packages. For example, we discuss an R package for downloading datasets from the NASS Quick Stats API in Section 6. There are R packages to do linear modeling (such as the
lm R package), make pretty plots (such as the
ggplot2 R package), and many more. There are thousands of R packages available online (CRAN 2020). One of the main missions of organizations like the Comprehensive R Archive Network is to curate R packages and make sure their creators have met user-friendly documentation standards.
A function is another important concept that is helpful to understand while using R and many other coding languages. A function in R will take an input (or many inputs) and give an output. For example, if you wanted to calculate the sum of 2 and 10, you could use code
2 + 10 or you could use the
sum( ) function (that is
sum(2, 10)). The inputs to this function are 2 and 10 and the output is 12. In this example, the sum function is doing a task that you can easily code by using the + sign, but it might not always be easy for you to code up the calculations and analyses done by a function. This is why functions are an important part of R packages; they make coding easier for you.
5. Why Is it Beneficial to Access NASS Data Programmatically?
As mentioned in Section 1, you can visit the NASS Quick Stats website, click through the options, and download the data. However, there are three main reasons that it’s helpful to use a software program like R to download these data:
- It reduces workload and potential for error — If you wanted to download data on the corn harvest acreage for each of the 100 counties in North Carolina, it would take a long time to click through all the options on the NASS Quick Stats website. You might also mistakenly click the same county twice or mistakenly overwrite a previously downloaded file when you save new data. In other words, this approach introduces the risk of extra human error. This can all be avoided by using a software program like R. To do this task in R, you could write a script to step through a list of all 100 county names. For each of the NC counties, you can then tell R to create a NASS Quick Stats API query, make the query to the NASS Quick Stats API (that is, make an API request), get some data back (that is, get an API response), and save that data to a file. In this scenario, your computer keeps track of county names, datasets, and data file names. Also, it would likely take your computer much less time to finish this task, assuming you have a stable internet connection.
- It creates a reproducible workflow — When coding in R, you create a script (see Section 3). A key benefit of having this script is that it serves as an instruction manual for you or your colleagues if you ever have to redo your analysis. Let’s return to the example above where you wanted to download data on the corn harvest acreage for each of the 100 counties in North Carolina. Say you originally did this analysis for 2017, but later realized you wanted to repeat it for 2012. When you have a script, this adjustment is very simple. You can open your script, change the year to 2012, and then rerun it. If your colleagues wanted to know about sweetpotato acreage in 2017, you could send them your script and they would be able to recreate your analysis for sweetpotatoes, rather than corn. To do this, they would just need to change the crop to sweetpotato and “run” the script.
- It benefits from the online community — The R coding community spans the globe and includes numerous disciplines. It’s very likely that someone has already written code for something you are trying to accomplish. In this case, several coders have developed R code that can access the NASS Quick Stats API. We introduce these different approaches in Section 6.
6. What R Tools Are Available for Getting NASS Data?
Currently, there are four R packages available to help access the NASS Quick Stats API (see Section 4).
rnassqs* — The rnassqs R package was developed in 2015 by Dr. Nicholas Potter at Washington State University and is supported by rOpenSci. This package is not supported by NASS.
usdanassr— The usdanassr R package was developed in 2018 by Dr. Robert Dinterman at Ohio State University. This package is not supported by NASS.
rnass— This rnass R package was developed in 2015 by Emrah Er at Ankara University. This package is not supported by NASS.
tidyUSDA— This tidyUSDA R package was developed in 2019 and is supported by NASS.
*In this Extension publication, we will only cover how to use the rnassqs R package.
7. How to Get NASS Data Using R?
Before coding, you have to request an API access key from the NASS. Going back to the restaurant analogy, the API key is akin to your table number at the restaurant. The waitstaff and restaurant use that number to keep track of your order and bill (Figure 1). You can register for a NASS Quick Stats API key at the Quick Stats API website (click on “Request API Key”). Be sure to keep this key in a safe place because it is your personal key to the NASS Quick Stats API.
Besides requesting a NASS Quick Stats API key, you will also need to make sure you have an up-to-date version of R. If not, you can download R from The Comprehensive R Archive Network. We also recommend that you download RStudio from the RStudio website. As mentioned in Section 4, RStudio provides a user-friendly way to interact with R.
7.1 Let’s Code! Installing R Packages
If this is your first time using a particular R package or if you have forgotten whether you installed an R package, you first need to install it on your computer by downloading it from the Comprehensive R Archive Network (Section 4). To install packages, use the code below. If you have already installed the R package, you can skip to the next step (Section 7.2).
Note: When a line of R code starts with a
#, R knows to read this # symbol as a comment and will skip over this line when you run your code. When you are coding, it’s helpful to add comments so you will remember or so someone you share your script with knows what you were trying to do and why.
Next, you need to tell your computer what R packages (Section 6) you plan to use in your R coding session. (R coders say you need to “load your R packages.”) You can do that by running the code below (Section 7.2).
7.2 Loading R Packages
Once you’ve installed the R packages, you can load them.
Let’s say you are going to use the
rnassqs package, as mentioned in Section 6. You are also going to use the
tidyverse package, which is called a “meta-package” because it is a “package of packages” that helps you work with your datasets easily and keep them “tidy.”
Here, “tidy” has a specific meaning: all observations are represented as rows, and all the data categories associated with that observation are represented as columns. As an example, one year of corn harvest data for a particular county in the United States would represent one row, and a second year would represent another row. Columns for this particular dataset would include the year harvested, county identification number, crop type, harvested amount, the units of the harvested amount, and other categories. You can read more about tidy data and its benefits in the Tidy Data Illustrated Series.
7.3 Setting up the API
Once your R packages are loaded, you can tell R what your NASS Quick Stats API key is. You will need this to make an API request later.
NASS_API_KEY <- "ADD YOUR NASS API KEY HERE"
nassqs_auth(key = NASS_API_KEY)
The first line of the code above defines a variable called NASS_API_KEY and assigns it the string of letters and numbers that makes up the NASS Quick Stats API key you received from the NASS. In this publication, the word variable refers to whatever is on the left side of the
<- character combination. If you think back to algebra class, you might remember writing
x = 1. In R, you would write
x <- 1. The
<- character combination means the same as the = (that is, equals) character, and R will recognize this. Based on your experience in algebra class, you may remember that if you replace
x with NASS_API_KEY and
1 with a string of letters and numbers that defines your unique NASS Quick Stats API key, this is another way to think about the first line of code. Please note that you will need to fill in your NASS Quick Stats API key surrounded by quotation marks.
Coding is a lot easier when you use variables because it means you don’t have to remember the specific string of letters and numbers that defines your unique NASS Quick Stats API key. Instead, you only have to remember that this information is stored inside the variable that you are calling NASS_API_KEY.
The second line of code above uses the
nassqs_auth( ) function (Section 4) and takes your NASS_API_KEY variable as the input for the parameter “‘key.” In this publication, the word parameter refers to a variable that is defined within a function. Some parameters, like “key,” are required if the function is to run properly without errors. However, other parameters are optional. It’s recommended that you use the
= character rather than the
<- character combination when you are defining parameters (that is, variables inside functions). After you run this code, the output is not something you can see. That is, the string of letters and numbers that represent your NASS Quick Stats API key is now saved to your R session and you can use it with other rnassqs R package functions.
You can also write the two steps above as one step, which is shown below.
nassqs_auth(key = "ADD YOUR NASS API KEY HERE")
In this case, you can use the string of letters and numbers that represents your NASS Quick Stats API key to directly define the key parameter that the function needs to work.
You can see a full list of NASS parameters that are available and their exact names by running the following line of code. For example, “commodity_desc” refers to the commodity description information available in the NASS Quick Stats API and “agg_level_desc” refers to the aggregate level description of NASS Quick Stats API data. More specifically, the list defines whether NASS data are aggregated at the national, state, or county scale. You can read more about the available NASS Quick Stats API parameters and their definitions by checking out the help page on this topic.
7.4 Seeing What Parameters Are Available in the API
Before you make a specific API query, it’s best to see whether the data are even available for a particular combination of parameters. For example, say you want to know which states have sweetpotato data available at the county level. You can check by using the
nassqs_param_values( ) function. In this case, you’re wondering about the states with data, so set
param = “state_alpha”. You know you want
commodity_desc = “SWEET POTATOES”, agg_level_desc = “COUNTY”, unit_desc = “ACRES”, domain_desc = “TOTAL”, statisticcat_desc = "AREA HARVESTED", and
prodn_practice_desc = "ALL PRODUCTION PRACTICES". By setting
“domain_desc” = “TOTAL”, you will get the total acreage of sweetpotatoes in the county as opposed to the acreage of sweetpotates in the county grown by operators or producers of specific demographic groups that contribute to the total acreage of harvested sweetpotatoes in the county. By setting
statisticcat_desc = "AREA HARVESTED", you will get results for harvest acreage rather than planted acreage. By setting
prodn_practice_desc” = "ALL PRODUCTION PRACTICES", you will get results for all production practices rather than those that specifically use irrigation, for example.
Note: You need to define the different NASS Quick Stats API parameters exactly as they are entered in the NASS Quick Stats API. Otherwise the NASS Quick Stats API will not know what you are asking for. For example, you will get an error if you write
commodity_desc = “SWEET POTATO” (that is, dropping the “ES”) or write
commodity_desc = “sweetpotatoes” (that is, with no space and all lowercase letters). If you’re not sure what spelling and case the NASS Quick Stats API uses, you can always check by clicking through the NASS Quick Stats website. Also, be aware that some commodity descriptions may include “&” in their names.
7.5 Querying Available Parameters in the API
Next, you can define parameters of interest.
nassqs_param_values(param = "state_alpha" , commodity_desc = "SWEET POTATOES" , agg_level_desc = "COUNTY" , unit_desc = "ACRES" , domain_desc = "TOTAL" , statisticcat_desc = "AREA HARVESTED" , prodn_practice_desc = "ALL PRODUCTION PRACTICES")
After running this line of code, R will output a result. Based on this result, it looks like there are 47 states with sweetpotato data available at the county level, and North Carolina is one of them. Once you know North Carolina has data available, you can make an API query specific to sweetpotatoes in North Carolina. It’s easiest if you separate this search into two steps. First, you will define each of the specifics of your query as nc_sweetpotato_params. Second, you will use the specific information you defined in nc_sweetpotato_params to make the API query. To make this query, you will use the
nassqs( ) function with the parameters as an input. You can define the query output as nc_sweetpotato_data.
7.6 Querying the API to Get Data
Define the NC sweetpotato parameters.
nc_sweetpotato_params <- list(commodity_desc = "SWEET POTATOES" , state_alpha = "NC" , agg_level_desc = "COUNTY" , unit_desc = "ACRES" , domain_desc = "TOTAL" , statisticcat_desc = "AREA HARVESTED" , prodn_practice_desc = "ALL PRODUCTION PRACTICES ")
Query the NC sweetpotato data.
nc_sweetpotato_data_raw <- nassqs(nc_sweetpotato_params)
Look at the first few lines.
head(nc_sweetpotato_data_raw, n = 3)
After running these lines of code, you will get a raw data output that has over 1500 rows and close to 40 columns. You don’t need all of these columns, and some of the rows need to be cleaned up a little bit. You can use the
select( ) function to keep the following columns: “Value” (acres of sweetpotatoes harvested), “county_name” (the name of the county), “source_desc” (whether data are coming from the NASS census or NASS survey), and “year” (the year of the data). You can define this selected data as nc_sweetpotato_data_sel.
Next, you can use the
filter( ) function to select data that only come from the NASS survey, as opposed to the census, and represents a single county. You can do this by including the logic statement
source_description == “SURVEY” & county_name != "OTHER (COMBINED) COUNTIES" inside the filter function. The == character combination tells R that this is a logic test for exactly equal, the & character is a logic test for AND, and the != character combination is a logic test for not equal. Taken together, R reads this statement as: filter out all rows in the dataset where the source description column is exactly equal to “SURVEY” and the county name is not equal to “OTHER (COMBINED) COUNTIES”. You can then define this filtered data as nc_sweetpotato_data_survey.
The last step in cleaning up the data involves the “Value” column. First, you will rename the column so it has more meaning to you. Second, you will change entries in each row of the Value column so they are represented as a number, rather than a character. This number versus character representation is important because R cannot add, subtract, multiply, or divide characters. For example, if someone asked you to add A and B, you would be confused. On the other hand, if that person asked you to add 1 and 2, you would know exactly what to do. There are times when your data look like a 1, but R is really seeing it as an A. You can see whether a column is a character by using the
class( ) function on that column (that is,
nc_sweetpotato_data_survey$Value where the
$ helps you access the “Value” column in the nc_sweetpotato_data_survey variable). If you use this function on the Value column of nc_sweetpotato_data_survey, R will return “character”, but you want R to return “numeric”.
You can first use the function
mutate( ) to rename the column to harvested_sweetpotatoes_acres. Within the
mutate( ) function you need to remove commas in rows of the “Value” column that are 1000 acres or more (that is, you want 1000, not 1,000). You do this by using the
str_replace_all( ) function. Then use the
as.numeric( ) function to tell R each row is a number, not a character. Next, you can use the
select( ) function again to drop the old “Value” column. Finally, you can define your last dataset as nc_sweetpotato_data.
Note: In some cases, the “Value” column will have letter codes instead of numbers. These codes explain why data are missing. For example, a “(D)” value denotes data that are being “withheld to avoid disclosing data for individual operations” according to the creators of the NASS Quick Stats API. You can check the full Quick Stats Glossary. You might need to do extra cleaning to remove these data before you can plot.
7.7 Cleaning the Data
Before you can plot these data, it is best to check and fix their formatting.
# select the columns of interest
nc_sweetpotato_data_sel <- select(nc_sweetpotato_data_raw, county_name, year, source_desc, Value)
# filter out census data, to keep survey data only
nc_sweetpotato_data_survey <- filter(nc_sweetpotato_data_sel, source_desc == "SURVEY" & county_name != "OTHER (COMBINED) COUNTIES")
# check the class of Value column
# fix Value column
nc_sweetpotato_data_survey_mutate <- mutate(nc_sweetpotato_data_survey, harvested_sweetpotatoes_acres = as.numeric(str_replace_all(string = Value, pattern = ",", replacement = "")))
# check the class of new value column
# drop old Value column
nc_sweetpotato_data <- select(nc_sweetpotato_data_survey_mutate, -Value)
# look at the first few lines
head(nc_sweetpotato_data, n = 3)
Now you have a dataset that is easier to work with. The next thing you might want to do is plot the results. Say you want to plot the acres of sweetpotatoes harvested by year for each county in North Carolina. You can use the
ggplot( ) function along with your nc_sweetpotato_data variable to do this.
The resulting plot is a bit busy because it shows you all 96 counties that have sweetpotato data. If you are interested in just looking at data from Sampson County, you can use the
filter( ) function and define these data as sampson_sweetpotato_data. Then you can plot this information by itself.
7.8 Plotting the Data
Now that you’ve cleaned the data, you can display them in a plot.
# plot the data
ggplot(data = nc_sweetpotato_data) + geom_line(aes(x = year, y = harvested_sweetpotatoes_acres)) + facet_wrap(~ county_name)
# filter out Sampson county data
sampson_sweetpotato_data <- filter(nc_sweetpotato_data, county_name == "SAMPSON")
# plot Sampson county data
ggplot(data = sampson_sweetpotato_data) + geom_line(aes(x = year, y = harvested_sweetpotatoes_acres))
The last thing you might want to do is save the cleaned-up data that you queried from the NASS Quick Stats API. It’s very easy to export data stored in nc_sweetpotato_data or sampson_sweetpotato_data as a comma-separated variable file (.CSV) in R. To do this, you can use the
write_csv( ) function. In this case, the NC sweetpotato data will be saved to a file called “nc_sweetpotato_data_query_on_20201001.csv” on your desktop. But you can change the export path to any other location on your computer that you prefer. Do pay attention to the formatting of the path name. This example in Section 7.8 represents a path name for a Mac computer, but a PC path to the desktop might look more like “C:\Users\your\Desktop\nc_sweetpotato_data_query_on_20201001.csv”.
7.9 Exporting the Data
Now that you’ve cleaned and plotted the data, you can save them for future use or to share with others.
write_csv(data = nc_sweetpotato_data, path = "Users/your/Desktop/nc_sweetpotato_data_query_on_20201001.csv")
You can also export the plots from RStudio by going to the toolbar > Plots > Save as Image.
8. Additional Resources
rnassqsR package documentation
- R for Data Science e-book by H. Wickham and G. Grolemund
- Software Carpentry Programming with R tutorial
- Openscapes.org open data science resources page
- Working for Peanuts: Acquiring, Analyzing, and Visualizing Publicly Available DataWard et al. 2020 (see Section 9, References) and associated code by Griffin and Ward. 2019, using the usdanassr R package.
Agricultural Resource Management Survey (ARMS). 2020. United States Department of Agriculture. Accessed: 01 October 2020.
Census of Agriculture (CoA). 2020. United States Department of Agriculture. Accessed online: 01 October 2020.
Chambers, J. M. 2020. “S, R, and Data Science.” Proceedings of the ACM on Programming Languages. 4:84.
Griffin, T. W., and J. K. Ward. 2019. “DSFW_Peanuts: Analysis of peanut DSFW from USDA-NASS databases.” Accessed: 01 October 2020.
Peng, R. D. 2020. R Programming for Data Science. Accessed online: 01 October 2020.
The Comprehensive R Archive Network (CRAN). 2020. Accessed online: 01 October 2020.
U.S. Department of Agriculture, National Agricultural Statistics Service (NASS). 2020. About NASS.
Ward, J. K., T. W. Griffin, D. L. Jordan, and G. T. Roberson. 2020. “Working for Peanuts: Acquiring, Analyzing, and Visualizing Publicly Available Data.” Journal of the American Society of Farm Managers and Rural Appraisers, p156-166.
This work is supported by grant no. 2019-67021-29936 from the USDA National Institute of Food and Agriculture. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the U.S. Department of Agriculture.
Publication date: May 27, 2021
N.C. Cooperative Extension prohibits discrimination and harassment regardless of age, color, disability, family and marital status, gender identity, national origin, political beliefs, race, religion, sex (including pregnancy), sexual orientation and veteran status.