Reshaping data.frame from wide to long format

In R, it is common to have data stored in a wide format, where each column represents a variable and each row represents an observation. However, there are situations where you might want to reshape this wide data into a long format, where each observation has its own row. In this article, we will explore how to reshape a data.frame from wide to long format.

Understanding the Problem

Let's first understand the problem. You have a data.frame that looks like this:


                Code    Country        1950    1951    1952    1953    1954
                AFG     Afghanistan    20,249  21,352  22,532  23,557  24,555
                ALB     Albania        8,097   8,986   10,058  11,123  12,246
            

In this wide format, each row represents a country and each column represents a year. The values in the cells represent some variable of interest. The goal is to reshape this data.frame into a long format, where each observation has its own row. The desired format is:


                Code    Country        Year    Value
                AFG     Afghanistan    1950    20,249
                AFG     Afghanistan    1951    21,352
                AFG     Afghanistan    1952    22,532
                AFG     Afghanistan    1953    23,557
                AFG     Afghanistan    1954    24,555
                ALB     Albania        1950    8,097
                ALB     Albania        1951    8,986
                ALB     Albania        1952    10,058
                ALB     Albania        1953    11,123
                ALB     Albania        1954    12,246
            

Reshaping the Data using melt()

One possible solution to reshape the data.frame from wide to long format is to use the melt() function from the reshape2 package. Let's see how you can do that:


                # Install and load the reshape2 package
                install.packages("reshape2")
                library(reshape2)
                
                # Create a sample data.frame
                data <- data.frame(Code = c("AFG", "ALB"),
                                   Country = c("Afghanistan", "Albania"),
                                   "1950" = c(20249, 8097),
                                   "1951" = c(21352, 8986),
                                   "1952" = c(22532, 10058),
                                   "1953" = c(23557, 11123),
                                   "1954" = c(24555, 12246))
                
                # Reshape the data.frame
                reshaped_data <- melt(data, id.vars = c("Code", "Country"),
                                      variable.name = "Year",
                                      value.name = "Value")
            

In this code snippet, we first install and load the reshape2 package. Then, we create a sample data.frame called data. This data.frame has the same structure as the one you provided in the question. Finally, we use the melt() function to reshape the data.frame into the desired long format. We specify the id.vars as c("Code", "Country"), which means that these two columns will remain as they are. The variable.name parameter is set to "Year" and the value.name parameter is set to "Value", which are the names of the columns that will be created for the reshaped data.

Reshaping the Data using reshape()

Another possible solution is to use the reshape() function, which is part of the base R packages. Here's how you can do that:


                # Create a sample data.frame
                data <- data.frame(Code = c("AFG", "ALB"),
                                   Country = c("Afghanistan", "Albania"),
                                   "1950" = c(20249, 8097),
                                   "1951" = c(21352, 8986),
                                   "1952" = c(22532, 10058),
                                   "1953" = c(23557, 11123),
                                   "1954" = c(24555, 12246))
                
                # Reshape the data.frame
                reshaped_data <- reshape(data, idvar = c("Code", "Country"),
                                         varying = c("1950", "1951", "1952", "1953", "1954"),
                                         v.names = "Value",
                                         timevar = "Year",
                                         times = c("1950", "1951", "1952", "1953", "1954"),
                                         direction = "long")
            

In this code snippet, we create the sample data.frame data with the same structure as before. Then, we use the reshape() function to reshape the data.frame. We specify the idvar as c("Code", "Country"), which means that these two variables stay the same in the reshaped data. The varying parameter specifies the columns in the original data.frame that represent different variables. The v.names parameter specifies the name of the column that will store the values. The timevar parameter specifies the name of the column that will represent the variable names. The times parameter specifies the unique values of the timevar column. Finally, the direction parameter is set to "long" to indicate that we want the long format.

Conclusion

In this article, we discussed how to reshape a data.frame from wide to long format in R. We explored two possible solutions: using the melt() function from the reshape2 package and using the reshape() function from the base R packages. Both solutions are effective and can be used based on personal preference. Reshaping data is an essential skill in data analysis and can help in performing various types of analyses and visualizations.