Sometimes you want to convert variable names or text strings to, say, snake_case or CamelCase. This is a common issue when you’re importing spreadsheets or mixed data sources into R.
Variable names
I have a dataset about apprenticeship disputes in early modern London which I’d like to do something with. It’s an Excel spreadsheet from the UK Data Archive.
The column names have been written in ordinary language with spaces between words (“sentence case”), apostrophes and possibly other punctuation. A few columns don’t seem to have names at all.
[1] "ID"
[2] "number of petition"
[3] "catalogue order number"
[4] "forename of apprentice"
[5] "surname of apprentice"
[6] "forename of father"
[7] "father's occupation or status"
[8] "whether the father is a citizen of london"
[9] "parish"
[10] "county"
[11] "whether the father is dead"
[12] "forename of master"
[13] "surname of master"
[14] "whether the master is a citizen of london"
[15] "company or trade of master"
[16] "whether the master is dead"
[17] "other notes about the master"
[18] "the defendant if not the master"
[19] "date of the apprentice's binding"
[20] "...20"
These can be inconvenient to work with in R because you have to remember to use `backslashes` whenever you name them, and there are a lot of columns in this spreadsheet. (Also, the spacing is inconsistent, which is hard to spot.) And they just bug me.
apprentices_xlsx |>select(`number of petition`, `catalogue order number`) |>head()
The clean_names() function is only intended for variable names, but sometimes you want to convert the contents of a variable instead. This might be so that you can use it to construct an ID, or if you’re joining data from different sources where they might be in slightly different formats, you want a quick way to ensure both sides of the join match.
Solution: the {snakecase} package. Again, the default to_any_case() function converts to snake_case, but other variations are available.
In one project I needed to convert text strings to what’s sometimes called kebab-case (like snake_case but hyphens instead of underscores). {snakecase} doesn’t have this as a ready-made function, but it does have a sep_out argument so it’s easy to change the separator.