LibreOffice Calc – splitting contents of cells into multiple columns (e.g., splitting commas)

I periodically have to take a column of text in LibreOffice calc that has names like this “Lastname, Firstname” and split them into two columns. I figure it out every time, but then I forget how I did. So, here’s a quick tutorial on how this is done.

Open your spreadsheet with the cells that need to be split, like this:

Select the column that you want to split:

Then go up to Data -> Text to Columns:

You’ll get the following window:

This window gives you several options for splitting the cells, using commas, spaces, semicolons, tabs, other, etc. I selected just “Tab” and “Comma” but could also select “Space” to get rid of the extra space. However, I’m going to leave the extra space and show you one more function that can be useful in more complex situations. Once you’re done, hit “OK” and you’ll see your single columns split into two:

If you want to get rid of the extra space, there is a LibreOffice Calc function for that. Click in cell C1 and then go up to Insert -> Function. You’ll see this window:

The function you want is in the Text Category (use the dropdown menu) and is called TRIM. Simple TRIM the text in B1 and it will get rid of the extra spaces:

When you’re done with your function, select “OK” and you’ve got your spaces removed. Drag that function down and it will remove all of the spaces:

You can then copy the new column without the spaces and do a “Paste Special” into the old column, overwriting the text with the spaces. Just make sure you turn off the “Formulas” when you do the “Paste Special” and you’ll get just the new text:

Delete the column with formulas and you’re good to go:

 

NY-Mount Marcy

The three of us at the summit marker just below the actual summit.

Summit Date

August 12, 2017 (around 11:00 am)

Party

Ryan Cragun, Mark Woolley, Tom Triplett

Trip Report

In my big swing across the US that allowed me to complete most of the highpoints in the Northeast in 2013, I didn’t manage to fit in Mount Marcy. It’s a solid day hike, and I just didn’t have the time. I ended up arranging a trip to Lake Placid, NY specifically to hike Mount Marcy, with my two hiking buddies.

We all flew into Newark on Friday, August 11th, picked up a rental car, then headed to Lake Placid, stopping in Albany for dinner and food to take up on our hike the next day. We arrived kind of late (close to 11:00 pm) and planned an early start the next morning (on the mountain at 7:00) in order to hopefully avoid the impending rain storm that was forecast for the next day.

The trip reports we read about the hike varied quite a bit. Some suggested it was really challenging, with a lot of uphill and rugged terrain. Others suggested it wasn’t that challenging and was a pleasant hike. We also got variable times and distances for the hike. Some trip reports suggested it would take as short as 4 hours while others suggested as many as 15 (that’s a pretty big range). Mileage estimates were also varied, though with a smaller range, hovering between 12 and 17 miles. Because of all the varied estimates, we planned for a 10 to 12 hour, 17-mile hike, just to be safe. As it turns out, using my GPS enabled watch, I now have much more accurate information on the hike.

We stayed at a B&B in Lake Placid, got up at 6:00 am, and drove straight to the Adirondack Loj. There is a parking fee there ($5.00), and by the time we arrived just before 7:00 am, the lot was getting pretty full. This is obviously a popular destination for hikers. We got our boots and gear on, did some stretching (a requirement once you hit 40), signed the register, and hit the trail.

We made good time for the first three miles or so, covering them in about an hour. The first three miles of the trail are fairly level and it is mostly a well-maintained dirt trail, with a few roots, rocks, and other small objects in the way. But around the 3-mile mark, there was a noticeable shift in the trail and terrain. Not only was there substantially more uphill terrain, but it became rocky to the point that at times you are literally boulder hopping.

Me on a nice patch of the more rugged terrain.
Me on a nice patch of the more rugged terrain.

I’ve climbed a lot of mountains and was impressed with how rugged this trail got. This is not a trail you’d want to attempt in light tennis shoes (unless you’re an experienced trail runner); sturdy boots are a very good idea for this hike, ideally with good ankle support. We didn’t make as good of time on the remaining 4 miles to the summit but still did fairly well.

We arrived at the summit at just under 4 hours. When we arrived, the summit was completely enshrouded with clouds. We had no view whatsoever. We spent about 40 minutes on the summit, eating a little food and chatting with the forest ranger on the top who was reminding people to avoid the vegetation, which they are trying to get to grow back.

The three of us at the summit marker just below the actual summit.
The three of us at the summit marker just below the actual summit.

Alas, about 20 minutes after we dropped off the summit, the clouds broke and we finally had some nice views. It was at this point I took a photosphere:

We got better photos at this point, but we were still worried about the impending rain storm. The top of the mountain is largely exposed rock that wouldn’t be all that fun to ascend or descend in the rain. As a result, we opted not to return to the summit and instead to continue our descent. We stopped a few times on the way down to take advantage of some of the toilets that are along the trail and took a quick detour to the waterfall that is also fairly close to the trail. With our detours and stops, we returned to the parking lot in just under 8 hours. The distance on my watch indicated exactly 15 miles. So, there you have it – it is a 15-mile hike. Our average moving pace was 26 minutes per mile. If you know how quickly you can move on fairly rugged terrain, you should be able to estimate how long the hike will take you. We were passed by a couple who were clearly trail runners. They were the only ones moving more quickly than we were and they probably did the entire hike in 6 1/2 hours. I can see how this hike would easily take 12 hours if you’re not an avid hiker and in good shape. It is genuinely rugged terrain, particularly after the 3-mile mark, and you should be prepared for it.

Obviously, if you can, try to go on a nice day. The views from the top are supposed to be quite nice. But even hiking in cloudy conditions, the terrain was pretty. We passed through multiple types of forest – pine and maple – and really enjoyed ourselves.

Panorama

Directions

R (Linux) – creating a wordcloud from PDF

On my professional website, I use wordclouds from the text of my publications as the featured images for the posts where I share the publications. I have used a website to generate those wordclouds for quite a while, but I’m trying to learn how to use the R statistical environment and knew that R can generate wordclouds. So, I thought I’d give it a try.

Here are the steps to generating a wordcloud from the text of a PDF using R.

First, in R, install the following four packages: “tm”, “SnowballC”, “wordcloud”, and “readtext”. This is done by typing the following into the R terminal:

install.packages("tm")
install.packages("SnowballC")
install.packages("wordcloud")
install.packages("readtext")

(NOTE: You may need to install the following packages on your Linux system using synaptic or bash before you can install the above packages: r-cran-slam, r-cran-rcurl, r-cran-xml, r-cran-curl, r-cran-rcpp, r-cran-xml2, r-cran-littler, r-cran-rcpp, python-pdftools, python-sip, python-qt4, libpoppler-dev, libpoppler-cpp-dev, libapparmor-dev.)

Next, you need to load those packages into the R environment. This is done by typing the following in the R terminal:

library(tm)
library(SnowballC)
library(wordcloud)
library(readtext)

Before we begin creating the wordcloud, we have to get the text out of the PDF file. To do this, first find out where your “working directory” is. The working directory is where the R environment will be looking for and storing files as it runs. To determine your “working directory,” use the following function:

getwd()

There are no arguments for this function. It will simply return where the R environment is currently looking for and storing files.

You’ll need to put the PDF from which you want to extract data into your working directory or change your working directory to the location of your PDF (technically, you could just include the path, but putting it in your working directory is easier). To change the working directory, use the “setwd()” function. Like this:

setwd("/home/ryan/RWD")

Once you have your PDF in your working directory, you can use the readtext package to extract the text and put it into a variable. You can do that using the following command:

wordbase <- readtext("paper.pdf")

“wordbase” is a variable I’m creating to hold the text from the PDF. The variable is actually a data frame (data.frame) with two columns and one row. The first column is the document ID (e.g., “paper.pdf”); the second column is the extracted text. You can see what kind of variable it is using the command:

print(wordbase)

This gives you the following information:

readtext object consisting of 1 document and 0 docvars.
#  data.frame [1 × 2]
doc_id      text
<chr>        <chr>
1   career.pdf "\"      \"..."

R won’t show you all of the text in the text column as it is likely quite a bit of text. If you want to display all the text (WARNING: It may be a lot of text), you can do so by telling R to display the contents of that cell of the data frame, which is row 1, column 2:

wordbase[1, 2]

“readtext” is the package that extracts the text from the PDF. The readtext package is robust enough to be able to extract text from numerous documents (see here) and is even able to determine what kind of document it is from the file extension; in this case, it recognize that it’s a PDF.

The list can now be converted into a corpus, which is a vector (see here for the different data types in R). To do this, we use the following function:

corp <- Corpus(VectorSource(wordbase))

In essence, we’re creating a new variable, “corp,” by using the Corpus function that calls the VectorSource function and applies it to the list of words in the variable “wordbase.”

We’re close to having the words ready to create the wordcloud, but it’s a good idea to clean up the corpus with several commands from the “tm” package. First, we want to make sure the corpus is a plain text:

corp <- tm_map(corp, PlainTextDocument)

Next, since we don’t want any of the punctuation included in the wordcloud, we remove the punctuation with this function from “tm”:

corp <- tm_map(corp, removePunctuation)

For my wordclouds, I don’t want numbers included. So, use this function to remove the numbers from the corpus:

corp <- tm_map(corp, removeNumbers)

I also want all of my words in lowercase. There is a function for that as well:

corp <- tm_map(corp, tolower)

Finally, I’m not interested in words like “the” or “a”, so I removed all of those words using this function:

corp <- tm_map(corp, removeWords, stopwords('english'))

At this point, you’re ready to generate the wordcloud. What follows is a wordcloud command, but it will generate the wordcloud in a window and you’ll then have to do a screen capture to turn the wordcloud into an image. Even so, here is the basic command:

wordcloud(corp, max.words = 100, random.order = FALSE)

To explain the command, “wordcloud” is the package and function. “corp” is the corpus containing all the words. The other components of the command are parameters that can, of course, be adjusted. “max.words” can be increased or decreased to reflect the number of words you want to include in your wordcloud. “random.order” should be set to FALSE if you want the more frequently occurring words to be in the center with the less frequently occurring words surrounding them. If you set that parameter to TRUE, the words will be in random order, like this:

There are additional parameters that can be added to the wordcloud command, including a scale parameter (scale) that adjusts the relative sizes of the more and less frequently occurring words, a minimum frequency parameter (min.freq) that will limit the plotted words to only those that occur a certain number of times, a parameter for what proportion of words should be rotated 90 degrees (rot.per). Other parameters are detailed in the wordcloud documentation here.

One of the more important parameters that can be added is color (colors). By default, wordclouds are black letters on a white background. If you want the word color to vary with the frequency, you need to create a variable that details to the wordcloud function how many colors you want and from what color palette. A number of color palettes are pre-defined in R (see here). Here’s a sample command to create a color variable that can be used with the wordcloud package:

color <- brewer.pal(8,"Spectral")

The parameters in the parentheses indicate first, the number of colors desired (8 in the example above), and second, the palette title from the list noted above. Generating the wordcloud with the color palette applied involves adding one more variable to the command:

wordcloud(corp, max.words = 100, min.freq=15, random.order = FALSE, colors = color, scale=c(8, .3))

Finally, if you want to output the wordcloud as an image file, you can adjust the command to generate the wordcloud as, for instance, a PNG file. First, tell R to create the PNG file:

png("wordcloud.png", width=1280,height=800)

The text in quotes is the name of the PNG file to be created. The other two commands indicate the size of the PNG. Then create the wordcloud with the parameters you want:

wordcloud(corp, max.words = 100, random.order = FALSE, colors = color, scale=c(8, .3))

And, finally, pass the wordcloud just created on to the PNG file with this function:


dev.off()

If all goes according to plan, you will have created a PNG file with a wordcloud of your cleaned up corpus of text:

 

 

NOTE:

To remove specific words, use the following command (though make sure you have converted all your text to lower case before doing this):

corpus <- tm_map(corpus, removeWords, c("hello","is","it"))

Or use this series of functions:

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")

Source Information:

Reading PDF files into R for text mining
Building Wordclouds in R
Word cloud in R
Removing specific words
Text Mining and Word Cloud Fundamentals in R
Basics of Text Mining in R

 

For Advanced Wordcloud Creation:

There is another package that allows for some more advanced wordcloud creations called “wordcloud2.” It allows for the creation of wordclouds that use images as masks. Currently, the package is having problems if you install from the cran servers, but if you install directly from the github source, it works. Here’s how to do that:

install.packages("devtools")
library(devtools)
devtools::install_github("lchiffon/wordcloud2")
letterCloud(demoFreq,"R")

You can then use the “wordcloud2” package to create all sorts of nifty wordclouds (see directions here and here; though see here for converting your corpus into a data matrix, which is what you have to use to create these fancy wordclouds), like this:

NOTE: Color options for wordcloud2 are any CSS colors. See here for a complete list.

R (Linux) – basic installation

To install the R programming environment on Linux is pretty straightforward, but it does require a little bit of know how in order to find the correct packages. As is typically the case with Linux, there are multiple ways to get things done. I like to use Synaptic for installing and removing software, but you can also use the software manager that comes with your Linux distribution (in Linux Mint it’s called Software Manager) or the command line (in KDE based distributions, Konsole).

For the most up-to-date installation of R, it’s actually best to install directly from the R repository. A list of Linux repositories for the R environment is located here. In order to install from the repository, you need to update your list of repositories in Synaptic. To access your repository list in Synaptic, click on Settings -> Repositories.

In the new Software Sources window, click on “Additional repositories” and you’ll get this window:

Click on Add a new repository. You’ll get this window:

The exact information you put into that window will vary based on which mirror you chose. Here is what I added in mine:

deb https://cran.cnr.berkeley.edu/bin/linux/ubuntu/ xenial/

In order to ensure you have the right files and to follow best security practices, you should install the signing key as well. Directions for installing the signing key are found here, but it can be done with a simple command from a terminal:

sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys E084DAB9

Once you have done all of that, you can install R from Synaptic.

First, open Synaptic, which will require your password. You’ll get the basic Synaptic Package Manager window:

Next, in the search box, search for “r-base”. Right-click it and select “Mark for installation” to install “r-base”:

In the above screenshot, I have already installed r-base, so the option “Mark for installation” is greyed out. But, obviously, that’s what I already did. When you select this, Synaptic will automatically select all the other necessary packages (there are about 10 to 15 additional packages necessary for R to run: r-cran-class, r-cran-lattice, r-cran-spatial, r-cran-survival, r-cran-codetools, r-cran-nnet, r-cran-mass, r-cran-boot, r-cran-nlme, r-cran-rpart, r-cran-cluster, r-cran-kernsmooth, r-cran-foreign, r-cran-mgcv, r-cran-matrix, r-recommended, r-base-core).

If you plan on installing any other R packages, it’s not a bad idea to also install “r-base-dev,” as it helps fill in dependencies for other packages.

Once you’ve selected r-base, hit Apply in Synaptic and all the software will be installed.

You now have the base software for R installed.

To open the R environment in a terminal, launch a terminal and simply type “R” at the prompt, like this:

Here’s where things can get a little complicated. To do different things in R requires various libraries or packages. Some of these can be installed using the R terminal while others need to be installed from your Linux distribution’s repositories. To install a library or package using the R terminal, you use the following command once you have opened the R environment:

install.packages(“PACKAGENAME”)

The first time you run this, the R environment will ask you to select a mirror.

Choose one close to your location. R will then install the package, assuming you type everything correctly.

NOTES:

Before you start trying to install additional R packages, it’s a very good idea to install the following Linux packages:

r-base-dev build-essential

If you run into an error message, there are several possibilities. First, check to make sure you typed everything correctly. R is not forgiving on spelling mistakes. Second, if the error is something like:

installation of package ‘PACKAGENAME’ had non-zero exit status

Or

dependency ‘PACKAGENAME’ is not available

There is a good chance that you need to install a package or library using Synaptic (or from a terminal using apt). For instance, to install the “tm” package, there is an unsatisfied dependency (meaning, a library or package that needs to be installed but cannot be installed using the R installer). The dependency is the ‘slam’ package. This can be installed using Synaptic (or, from a terminal, using the command “sudo apt-get install r-cran-slam”). Once you’ve installed the dependency, try re-installing the package and the error messages should go away.

 

NOTES:

I also have found that I like RStudio as an IDE for working with R. It’s a little bit friendlier to use than a straight command line interface as it keeps track of variables and loaded libraries. The personal version for your desktop can be downloaded here.

And a note on RStudio on Linux. I regularly get an offset from the cursor location and where the cursor actually is in the command window. It turns out this is a font issue. If you go up to Tools -> Global Options -> Appearance and change the font to anything else, this problem will go away.

Switzerland – remaining adventures

I was attending my conference July 4th through the 6th, but skipped out on the last day of the conference (July 7th) to go see CERN (the location of the large hadron collider). Debi, Toren, and Rosemary, meanwhile, had a number of adventures. They took the chocolate train through various parts of Switzerland, visiting the Gruyere cheese factory, the Gruyere castle, and the Maison Cailler chocolate factory.

Here’s a video Debi shot of the chocolate extruding and packaging process at Maison Cailler:

Amazingly, they took a picture in front of the Giger Museum, but didn’t know what it was and didn’t go in (I’ve got to go back just for that).

Toren in front of the Giger Museum.
Toren in front of the Giger Museum.

They also took a boat ride from Montreux to Lausanne one day while I was at my conference:

I did sneak in a visit to the Chillon Castle before my conference started one day:

Debi, Rosemary, and Toren at Chillon Castle.
Debi, Rosemary, and Toren at Chillon Castle.

I didn’t get to see the whole castle as I had to make it to my conference in time for the first session that day, but I got to see some of the castle. Again, I’ll have to go back.

The one day I did skip of the conference was so we could go to CERN. Getting tickets was a bit of a nightmare as they have to be reserved in advance, go on sale at 8:00 am Swiss time, and are usually gone in a matter of minutes. Debi and I spent a few days getting up just before 2:00 am so we could get the tickets and eventually got 4 for the last day of my conference.

You obviously don’t get to go down into the actual collider, which is about 90 meters below ground, but they do give you a tour of a control center and showed us some old colliders, like this one where Toren was pushing the self-destruct button:

Toren pushing the "self-destruct" button on an old collider. It was a red button with no label, so I told him it was the self-destruct button and he immediately proceeded to push it.
Toren pushing the “self-destruct” button on an old collider. It was a red button with no label, so I told him it was the self-destruct button and he immediately proceeded to push it.

The tour starts at the welcome center, where they have a nice museum, and then works its way around the campus. We went into a control center, watched a video about particle accelerators, and then got to go into where the original collider is at CERN (from the 1950s; very cool presentation there). There is another museum across the street from the main welcome center, as well as numerous monuments. Here’s a photo in front of one of those monuments:

Rosemary, Toren, and Debi by a monument at CERN.
Rosemary, Toren, and Debi by a monument at CERN.

We also found a little time to stop by the Reformation Wall in Geneva, which is a monument to the Protestant Reformation. We didn’t stay long as we had to get to CERN on time and this happened to be kind of on the way. Here’s a photosphere of the Wall:

I also had Toren pose as though he was each of the individuals remembered by the monument. Here’s one of those photos:

Toren posing as figures in the Reformation Wall.
Toren posing as figures in the Reformation Wall.

While Toren played at the park near the Reformation Wall and Rosemary watched him, Debi and I jogged up to a nearby church where Martin Luther used to preach, where she got a picture of me trying to gain entry:

No one answered. I guess no one is home?!? ;)
No one answered. I guess no one is home?!? ;)

Before heading back for our last night in Switzerland, we stopped for a brief walk around downtown Geneva and got to see the Jet d’Eau and try out some more Swiss chocolate.

Toren and Rosemary at the Jet d'Eau in Geneva.
Toren and Rosemary at the Jet d’Eau in Geneva.

We then caught a train back to Lausanne to pack up for our flight home the next day.