How to Upload a Floder to R Studio

How to Work With Data Frames and CSV Files in R — A Detailed Introduction with Examples

Welcome! If you want to first diving into data science and statistics, so data frames, CSV files, and R will be essential tools for y'all. Let's see how you can employ their amazing capabilities.

In this commodity, you lot will larn:

  • What CSV files are and what they are used for.
  • How to create CSV files using Google Sheets.
  • How to read CSV files in R.
  • What Information Frames are and what they are used for.
  • How to admission the elements of a data frame.
  • How to change a data frame.
  • How to add and delete rows and columns.

Nosotros will apply RStudio, an open-source IDE (Integrated Development Environment) to run the examples.

Let's begin! ✨

🔹 Introduction to CSV Files

CSV (Comma-separated Values) files can be considered one of the building blocks of data analysis because they are used to store data represented in the course of a table.

In this file, values are separated past commas to represent the different columns of the tabular array, like in this case:

image-153
CSV File

We will generate this file using Google Sheets.

🔸 How to Create a CSV File Using Google Sheets

Let'southward create your get-go CSV file using Google Sheets.

Step 1: Go to the Google Sheets Website and click on "Go to Google Sheets":

image-227

💡 Tip: Y'all can access Google Sheets by clicking on the button located at the superlative-right edge of Google's Habitation Folio:

image-228

If we zoom in, we see the "Sheets" push button:

image-156

💡 Tip: To use Google Sheets, you demand to have a Gmail account. Alternatively, y'all tin create a CSV file using MS Excel or another spreadsheet editor.

You volition come across this panel:

image-157

Step 2: Create a bare spreadsheet by clicking on the "+" button.

image-158

At present yous have a new empty spreadsheet:

image-159

Pace 3: Change the name of the spreadsheet to students_data. We will need to use the proper name of the file to piece of work with information frames. Write the new proper name and click enter to confirm the change.

image-162

Footstep iv: In the first row of the spreadsheet, write the titles of the columns.

image-160

When yous import a CSV file in R, the titles of the columns are chosen variables. We will define six variables: first_name, last_name, age, num_siblings, num_pets, and eye_color, every bit you can see right here below:

image-163

💡 Tip: Notice that the names are written in lowercase and words are separated with an underscore. This is not mandatory, but since you will need to access these names in R, it's very common to use this format.

Step v: Enter the data for each one of the columns.

When you read the file in R, each row is called an observation, and it corresponds to data taken from an individual, fauna, object, or entity that we collected data from.

In this example, each row corresponds to the data of a student:

image-164

Pace half dozen: Download the CSV file past clicking on File -> Download -> Comma-separated values, equally you can see below:

image-165

Stride 7: Rename the file CSV file. Yous volition demand to remove "Sheet1" from the default name considering Google Sail volition automatically add this to the name of the file.

image-169

Great work! Now you lot take your CSV file and it's time to offset working with information technology in R.

🔹 How to Read a CSV file in R

In RStudio, the get-go step before reading a CSV file is making sure that your current working directory is the directory where the CSV file is located.

💡 Tip: If this is not the example, yous will need to apply the full path to the file.

Change Current Working Directory

You can change your current working directory in this panel:

image-172

If nosotros zoom in, you can see the electric current path (1) and select the new one by clicking on the ellipsis (...) push button to the right (2):

image-171

💡 Tip: You can too check your current working directory with getwd() in the interactive console.

And so, click "More than" and "Fix As Working Directory".

image-175

Read the CSV File

One time you have your current working directory set, y'all tin can read the CSV file with this command:

image-176

In R code, nosotros accept this:

                > students_data <- read.csv("students_data.csv")              

💡 Tip: We assign information technology to the variable students_data to access the data of the CSV file with this variable. In R, nosotros can separate words using dots ., underscores _, UpperCamelCase, or lowerCamelCase.

After running this command, you will run across this in the pinnacle right console:

image-177

Now you accept a variable defined in the environs! Let'due south see what information frames are and how they are closely related to CSV files.

🔸 Introduction to Data Frames

Information frames are the standard digital format used to store statistical data in the form of a tabular array. When you read a CSV file in R, a information frame is generated.

We can confirm this past checking the type of the variable with the class role:

                > class(students_data) [1] "data.frame"              

It makes sense, right? CSV files contain data represented in the class of a table and data frames represent that tabular data in your code, so they are deeply continued.

If you lot enter this variable in the interactive panel, y'all will see the content of the CSV file:

                > students_data   first_name last_name historic period num_siblings num_pets eye_color 1      Emily    Dawson  fifteen            two        five      Bluish 2       Rose Patterson  xiv            5        0     GREEN 3  Alexander     Smith  16            0        two     BROWN 4       Nora    Navona  16            4       10     GREEN five       Gino      Sand  17            three        8      Bluish              

More Information About the Data Frame

You have several dissimilar alternatives to come across the number of variables and observations of the data frame:

  • Your outset option is to look at the peak right panel that shows the variables that are currently divers in the environs. This data frame has 5 observations (rows) and 6 variables (columns):
image-178
  • Some other culling is to use the functions nrow and ncol in the interactive panel or in your program, passing the data frame equally argument. We get the same results: v rows and six columns.
                > nrow(students_data) [ane] v > ncol(students_data) [1] six              
  • You can also run into more information about the data frame using the str function:
                > str(students_data) 'data.frame':	v obs. of  half-dozen variables:  $ first_name  : Factor westward/ five levels "Alexander","Emily",..: two 5 1 iv 3  $ last_name   : Factor w/ 5 levels "Dawson","Navona",..: 1 iii five 2 4  $ age         : int  15 14 16 16 17  $ num_siblings: int  two 5 0 4 3  $ num_pets    : int  5 0 2 ten eight  $ eye_color   : Gene w/ 3 levels "BLUE","Chocolate-brown",..: 1 3 2 3 1              

This function (applied to a data frame) tells yous:

  • The number of observations (rows).
  • The number of variables (columns).
  • The names of the variables.
  • The data types of the variables.
  • More than data nearly the variables.

You tin can see that this function is really corking when you want to know more than about the information that you are working with.

💡 Tip: In R, a "Gene" is a qualitative variable, which is a variable whose values represent categories. For example, eye_color has the values "Bluish", "BROWN", "GREEN" which are categories, then equally you lot tin see in the output of str to a higher place, this variable is automatically defined as a "factor" when the CSV file is read in R.

🔹 Data Frames: Cardinal Operations and Functions

Now you know how to run into more than data about the information frame. But the magic of information frames lies in the amazing capabilities and functionality that they offering, and then allow's see this in more detail.

How to Access A Value of a Information Frame

Data frames are like matrices, so you lot can admission individual values using 2 indices surrounded by square brackets and separated past a comma to indicate which rows and which columns you would like to include in the upshot, like this:

image-181

For case, if nosotros want to admission the value of eye_color (column 6) of the fourth pupil in the data (row 4):

image-182

We need to utilise this control:

                > students_data[4, six]              

💡 Tip: In R, indices showtime at 1 and the offset row with the names of the variables is not counted.

This is the output:

                [ane] Green Levels: Bluish Dark-brown GREEN              

You tin see that the value is "GREEN". Variables of blazon "cistron" have "levels" that stand for the different categories or values that they can take. This output tells us the levels of the variable eye_color.

How to Access Rows and Columns of a Data Frame

Nosotros can also use this syntax to access a range of rows and columns to become a portion of the original matrix, like this:

image-179

For example, if we want to get the age and number of siblings of the third, quaternary, and 5th student in the list, we would use:

                > students_data[3:5, 3:4]    age num_siblings 3  sixteen            0 4  sixteen            iv five  17            3              

💡 Tip: The bones syntax to define an interval in R is <first>:<end>. Note that these indices are inclusive, so the 3rd and 5th elements are included in the example above when nosotros write iii:5.

If nosotros want to get all the rows or columns, we just omit the interval and include the comma, similar this:

                > students_data[3:5,]    first_name last_name age num_siblings num_pets eye_color iii  Alexander     Smith  16            0        two     Brown iv       Nora    Navona  16            4       10     Dark-green 5       Gino      Sand  17            3        8      Bluish              

We did not include an interval for the columns afterwards the comma in students_data[iii:5,], and so we get all the columns of the data frame for the three rows that we specified.

Similarly, we can get all the rows for a specific range of columns if nosotros omit the rows:

                > students_data[, 1:3]    first_name last_name historic period 1      Emily    Dawson  15 2       Rose Patterson  14 3  Alexander     Smith  16 4       Nora    Navona  xvi 5       Gino      Sand  17              

💡 Tip: Find that yous still demand to include the comma in both cases.

How to Access a Cavalcade

At that place are 3 ways to access an entire column:

  • Pick #1: to admission a column and return information technology as a data frame, you tin can use this syntax:
image-184

For example:

                > students_data["first_name"]    first_name 1      Emily 2       Rose 3  Alexander 4       Nora 5       Gino              
  • Option #2: to get a column as a vector (sequence), yous can utilise this syntax:
image-185

💡 Tip: Observe the use of the $ symbol.

For example:

                > students_data$first_name  [one] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              
  • Pick #iii: You can as well apply this syntax to get the column as a vector (see below). This is equivalent to the previous syntax:
                > students_data[["first_name"]]  [1] Emily     Rose      Alexander Nora      Gino      Levels: Alexander Emily Gino Nora Rose              

How to Filter Rows of a Data Frame

You lot can filter the rows of a data frame to get a portion of the matrix that meets certain conditions.

For this, we apply this syntax, passing the condition as the first element inside square brackets, and then a comma, and finally leaving the second element empty.

image-190

For instance, to get all rows for which students_data$age > 16, we would use:

                > students_data[students_data$age > 16,]    first_name last_name age num_siblings num_pets eye_color five       Gino      Sand  17            3        8      Bluish              

We  get a data frame with the rows that run into this condition.

Filter Rows and Choose Columns

Yous can combine this condition with a range of columns:

                > students_data[students_data$historic period > 16, 3:vi]    historic period num_siblings num_pets eye_color five  17            3        viii      Bluish              

We become the rows that meet the condition and the columns in the range iii:half-dozen.

🔸 How to Modify Information Frames

Y'all tin can modify individual values of a data frame, add columns, add together rows, and remove them. Permit's see how y'all tin can practise this!

How to Change A Value

To alter an individual value of the data frame, you need to use this syntax:

image-191

For example, if we want to change the value that is currently at row 4 and column half dozen, denoted in blueish right here:

image-182

We need to use this line of code:

                students_data[4, 6] <- "BROWN"              

💡 Tip: You can also employ = equally the assignment operator.

This is the output. The value was changed successfully.

image-193

💡 Tip: Remember that the first row of the CSV file is non counted as the first row because it has the names of the variables.

How to Add Rows to a Data Frame

To add a row to a information frame, you need to use the rbind function:

image-194

This function takes 2 arguments:

  • The information frame that y'all want to change.
  • A list with the data of the new row. To create the list, you can use the list() function with each value separated by a comma.

This is an example:

                > rbind(students_data, listing("William", "Smith", 14, 7, 3, "BROWN"))              

The output is:

                                  first_name last_name age num_siblings num_pets eye_color 1      Emily    Dawson  15            2        5      BLUE ii       Rose Patterson  14            5        0     Greenish 3  Alexander     Smith  16            0        2     Brownish iv       Nora    Navona  sixteen            4       x     Chocolate-brown five       Gino      Sand  17            3        8      Blue 6       <NA>     Smith  14            7        3     Chocolate-brown              

But expect! A warning message was displayed:

                Warning bulletin: In `[<-.factor`(`*tmp*`, ri, value = "William") :   invalid factor level, NA generated              

And find the first value of the sixth row, it is <NA>:

                6       <NA>     Smith  14            7        3     BROWN              

This occurred considering the variable first_name was defined automatically as a factor when we read the CSV file and factors have fixed "categories" (levels).

You cannot add a new level (value - "William") to this variable unless you read the CSV file with the value FALSE for the parameter stringsAsFactors, as shown below:

                > students_data <- read.csv("students_data.csv", stringsAsFactors = FALSE)              
image-196

Now, if nosotros effort to add this row, the data frame is modified successfully.

                > students_data <- rbind(students_data, listing("William", "Smith", 14, seven, 3, "Chocolate-brown")) > students_data    first_name last_name age num_siblings num_pets eye_color 1      Emily    Dawson  15            2        v      BLUE 2       Rose Patterson  14            five        0     GREEN 3  Alexander     Smith  16            0        2     BROWN four       Nora    Navona  16            4       ten     GREEN five       Gino      Sand  17            three        8      BLUE half dozen    William     Smith  14            7        3     Chocolate-brown              

💡 Tip: Note that if you read the CSV file again and assign information technology to the aforementioned variable, all the changes fabricated previously will be removed and you volition run across the original information frame. You demand to add this argument to the first line of lawmaking that reads the CSV file and then make changes to it.

How to Add together Columns to a Data Frame

Adding columns to a data frame is much simpler. You need to use this syntax:

image-197

For example:

                > students_data$GPA <- c(4.0, iii.5, three.2, three.15, 2.9, 3.0)              

💡 Tip: The number of elements has to be equal to the number of rows of the data frame.

The output shows the information frame with the new GPA cavalcade:

                > students_data    first_name last_name historic period num_siblings num_pets eye_color  GPA 1      Emily    Dawson  fifteen            two        5      Bluish 4.00 ii       Rose Patterson  14            five        0     GREEN three.fifty three  Alexander     Smith  16            0        ii     BROWN three.20 4       Nora    Navona  sixteen            iv       10     GREEN 3.15 5       Gino      Sand  17            iii        viii      BLUE 2.90 6    William     Smith  xiv            7        3     BROWN 3.00              

How to Remove Columns

To remove columns from a data frame, y'all need to use this syntax:

image-198

When you assign the value Naught to a cavalcade, that column is removed from the data frame automatically.

For instance, to remove the historic period column, we use:

                > students_data$age <- Nada              

The output is:

                > students_data    first_name last_name num_siblings num_pets eye_color  GPA one      Emily    Dawson            ii        5      Bluish four.00 2       Rose Patterson            5        0     Light-green three.50 iii  Alexander     Smith            0        2     BROWN 3.20 iv       Nora    Navona            4       x     GREEN iii.xv five       Gino      Sand            3        viii      BLUE 2.ninety 6    William     Smith            vii        3     BROWN 3.00              

How to Remove Rows

To remove rows from a data frame, you can use indices and ranges. For instance, to remove the offset row of a information frame:

image-200

The [-i,] takes a portion of the data frame that doesn't include the outset row. Then, this portion is assigned to the aforementioned variable.

If we have this data frame and we desire to delete the first row:

image-230

The output is a data frame that doesn't include the offset row:

image-231

In general, to remove a specific row, you need to use this syntax where <row_num> is the row that y'all want to remove:

image-229

💡 Tip: Notice the - sign earlier the row number.

For example, if we desire to remove row 4 from this data frame:

image-232

The output is:

image-233

As you can see, row iv was successfully removed.

🔹 In Summary

  • CSV files are Comma-Separated Values Files used to stand for information in the class of a table. These files can be read using R and RStudio.
  • Information frames are used in R to represent tabular information. When you lot read a CSV file, a data frame is created to store the data.
  • You can access and modify the values, rows, and columns of a data frame.

I really hope that yous liked my commodity and establish it helpful. Now y'all tin work with information frames and CSV files in R.

If you liked this commodity, consider enrolling in my new online grade "Introduction to Statistics in R - A Practical Arroyo "



Learn to code for gratuitous. freeCodeCamp'due south open source curriculum has helped more than forty,000 people get jobs as developers. Become started

searsbothat94.blogspot.com

Source: https://www.freecodecamp.org/news/how-to-work-with-data-frames-and-csv-files-in-r/

0 Response to "How to Upload a Floder to R Studio"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel