Linux Server – tar files and directories

I recently had to move all my websites from one virtual private server (VPS) to another. When I only had a few such websites, I was okay with using SFTP (via Filezilla) to download all of the files and then upload them to the VPS. It took a while but I was okay with that. With about a dozen websites I host on my VPS, that was just not an option. It was time to finally try to figure out how to use tar more effectively on my server.

Why tar? Tar is similar to zip in that it combines lots of files and/or directories into a single file, a tarball or archive, which makes it much easier and way faster to move. Computers can move one large file faster than they can move lots of small files that, combined, make up the same size. When you have to move roughly 8 gigabytes and hundreds of thousands of files, it is far easier to do so by putting all of those files into a single tarball than moving the files individually. That’s why tar is way better for what I was trying to accomplish.

Sidenote: Tar is just a format for packaging all of the files/directories together. Typically, the files are also compressed using something like gzip, leading to a tarball with the extension “tar.gz.” It is possible to just combine the files without compressing them as will be detailed below.

Why was I so reticent to use tar to move my files? Because my prior experience with tar had resulted in several tarbombs, which are a nightmare. Basically, I had unpacked a tarball into the wrong directory, which resulted in thousands of files being in the wrong place, necessitating me having to figure out which files I should keep and which I should get rid of individually. That took more time than I saved by using tar. And since I was doing everything via SSH on a remote VPS, there was no easy way to clean up the mess. Even so, it was time to use tar. So, I bit the bullet and figured out how to do this. This post is my guide on how to carefully use tar but avoid tarbombs, which suck. Note, GNU has an entire manual on tar that goes into much greater detail than my post.

What do all those letters mean?

Since we’re using a console or terminal to package all these files and not a GUI, we have to specify what we want to do with the files using some letters. That’s what the letters do. Not to worry, though – I’m not going to go through all of them. There are literally dozens of options. I’ll keep this simple. The basic structure of the tar command is as follows:

tar [letters to specify what to do] [output tarball] [files or directories to add]

I’ll give some specific examples of tar commands below, but, first, let’s cover what the letters we’ll be using mean.

  • c – “create”: This tells tar to create an archive, in contrast to modifying a tarball. (Note: This letter alone cannot create an archive.)
  • f – “file”: This tells tar to modify files relative to the archive. (Note: Just c and f can work to create a tarball as will be shown below.)
  • v – “verbose”: This tells tar to show the progress of the archiving process by showing which files have been added to the archive.
  • z – “gzip”: This tells tar to compress the files in the tarball using gzip.
  • j – “bzip2”: This tells tar to compress the files in the tarball using bzip2.
  • x – “extract”: This tells tar to extract the files in a tarball.

A couple of important notes here regarding the letters. First, the order of the letters does not matter. Second, letters that do the same things should not be included in the command (e.g., “z” and “j” should not both be used).

Basic Examples:

Creating Tarballs

I put a couple of my papers and some images into a folder to use to demonstrate basic uses of tar. Here’s a screenshot so you can see what we’re working with:

First up, I’ll create a tarball of the entire test folder but with no compression and use that to show a couple of important elements of the process. Here’s the code:

cd /home/ryan/Desktop
tar cf test.tar tar.test.directory/

The code above changes my directory (“cd”) to the parent folder (Desktop). Then it calls the “tar” software and tells it to create an archive called “test.tar” and file inside it (“cf”) the entire test directory “tar.test.directory/”. What this code doesn’t do is compress the files. This can be seen when comparing the size of the folder – 5.4 mb – with the size of the newly created tarball – 5.4 mb.

Quickly, try switching the order of the letters, from “cf” to “fc” and you’ll see that the outcome is the same. Also, if that is all you switch in the code, you’ll notice if you re-run the same command that tar will not warn you that it is going to overwrite the previous tarball, it simply does it.

One more item to note that is actually really important, particularly when thinking about extracting an archive, is the folder structure inside the archive. If I open the archive using Ark, you’ll see that, because I navigated to the parent directory in my terminal before creating the tarball, the folder structure inside the archive is from the folder where I created the tarball (in this case, the Desktop directory).

I’m going to create the same tarball but I’m not going to navigate to the parent folder and instead will tell tar which folder to compress and where to store the new tarball:

tar cf /home/ryan/Desktop/test.tar /home/ryan/Desktop/tar.test.directory/

Functionally, this is the exact same command and archives all the same content. However, look at the directory structure inside the tarball when I open it with Ark:

That tar creates a different directory structure depending on the code you use is important, particularly if you want to avoid tarbombing. Why is this important? When you extract the tarball, the same directory structure that is in the archive will be created. If you don’t know what the directory structure is inside the tarball and extract it, that can result in all sorts of problems, some of which I’ll address at the end of this post.

The lesson here: the folder structure is based on what you enter into the command to create the tarball. So, if you don’t want lots of folders in your tarball, navigate to the folder above it and create it there. Otherwise, you’ll get the same folder structure.

Let’s add two additional letters: “v” for verbose so we can see what is added to the tarball and “z” to compress the files that are being added to the tarball. Here’s the code (remember, the order of the letters doesn’t matter):

tar cfvz test.compressed.tar.gz tar.test.directory/

This command calls the tar software, tells it to create (“c”) a tarball and add files (“f”), show the progress (“v”) and compress the files (“z”), saving the resulting tarball as “test.compressed.tar.gz” and the last piece is what should be put into the tarball. Note the modification to the file name to indicate that the tarball has been compressed – “.gz”. This extension is usually a reflection of the compression format. So, if you go with bzip2, it would be “.bz2” instead of “.gz”. Here’s how it looks in the terminal.

The resulting tarball is only 3.4 mb, illustrating that the contents were compressed as they were added to the archive.

Variations on the above command might include replacing the “z” with “j” to compress using bzip2 or “J” to compress using “xz.” Additionally, appending a “p” to the letters will preserve the permissions of the files and directories that are added to the tarball (though that is done by default, so it isn’t necessary to include it).

Modifying a Tarball

After creating a tarball, it’s possible you may need to change the tarball by either adding files to it or deleting files inside it. Here’s how to do each of those.

To add a file to a tarball, use “r” and, of course, “f”, like this:

tar rf test.tar file-to-add.odt

This command calls the tar software and tells it to append (“r”) a file to the archive (the “f” is necessary to tell the software to make changes). The tarball that is modified is next “test.tar.” And what is to be added is last “file-to-add.odt.”

Assuming the tarball you want to add a file to is compressed, files cannot be added. Attempting this will likely give you the error, “tar: Cannot update compressed archives.” Instead, you would need to extract the archive, make whatever changes you want, then create a new compressed tarball.

It is also possible to delete a file from a tarball. This doesn’t involve a letter but a word, “–delete.” The structure of the command is a bit different. Here’s what your code might look like:

tar --delete --file=test.tar file-to-delete.odt

This command calls the tar software and tells it you want to delete a file “–delete.” You then have to tell it which tarball and which file, which is done with the “–file=” option, which specifies the tarball and then the file is added at the end.

If you aren’t sure what files you have in your tarball, you can always list those using the “–list” command:

tar --list --file=test.tar

This is particularly helpful if you are looking for a specific file to remove from a tarball as it will also tell you if the file is in a subfolder inside the archive. If so, you would need to modify the code to take that into account:

tar --delete --file=test.tar "folder1/folder2/folder with space/file-to-delete.odt"

Do note that, just like adding files to a compressed tarball, deleting files from a compressed archive isn’t possible.

Extracting a Tarball

Here’s how to extract the files inside a tarball. The basic structure is the same, though, there are some things to consider. First, to extract a tarball, replace the “c” from above when creating it with an “x” which means “extract.” So, with our uncompressed tarball, we would extract it by using the following command:

tar xf test.tar

This will call the tar software, the “x” tells it to “extract” the files (“f”) and “test.tar” is the name of the tarball that is being extracted. Note that this code doesn’t specify where to extract the tarball. That can be added as an option that requires adding “-C”. If it’s left blank, the tarball will be extracted in whichever directory the console/terminal is in. Extracting to a specific directory would like like this:

tar xf test.tar -C /extract/into/this/directory

Of course, if your tarball is compressed and/or you want to see the progress of extracting the tarball, add the corresponding letters. If the tarball is a “tar.gz” you’ll need to add the “z” to decompress the files and “v” to see the progress, like this:

tar xfzv test.tar.gz

Lastly, keep in mind that the folder structure inside the tarball will be replicated when the files are extracted. What does that mean? If you extract a tarball into a folder called “home” but inside the tarball the files you want to extract are stored inside nested folders like “home/ryan/Desktop/archive”, when you extract the tarball, the files will end up in “home/home/ryan/Desktop/archive.” See below for what to do in such a situation.

Rules for Using tar:

Rule #1: Pay attention to the folder or directory structure when creating and extracting a tarball.

Rule #2: You cannot add or delete files inside a compressed tarball, only in an uncompressed tarball.

Bonus:

Let me give a specific situation that will illustrate this for a Linux server (I may be speaking from experience here). Imagine you just downloaded the latest version of WordPress and want to extract it into the following directory: /var/www/example.com/public/. First, check the directory structure of the tarball. In this case, the fine folks who package WordPress archive the files inside a folder called “wordpress.” As a result, when you extract the tarball, it is going to create a folder inside the “public” folder called “wordpress” when, in fact, you want the files to be directly stored into the folder “public.” That’s a problem. How do you move those files? The Linux move command, “mv” can do it, but it’s kind of tricky how:

mv /var/www/example.com/public/wordpress/* .

This tells the OS to move all the files in the wordpress directory (the asterisk “*” does this) up one level (the period “.”). Once you do this, you should then remove the wordpress directory:

rm -R wordpress/

 591 total views,  3 views today

Long Term Storage of Gmail/Email using Mozilla’s Thunderbird (on Linux)

I have email going back to 2003. There have been times when I have actually benefited from my email archive. Several times, I have gone back 5 or more years to look for a specific email and my archive has saved me. However, my current approach to backing up my email is, let’s say, a borderline violation of Google’s Terms of Service. Basically, I created a second email account that I use almost exclusively (not exclusively – I use it for a couple of other things) for storing the email from my primary Gmail account. However, Google has recently changed its storage policies for their email accounts, which has made me a little nervous. And, of course, I’m always wary about storing my data with Google or other large software companies.

Since I have already worked out a storage solution for old files that is quite reliable, moving my old email to that storage solution makes sense. (FYI, my storage solution is to run a local file server in my house with a RAID array so I have plenty of space and local backups. Certain folders on the file server are backed up in real-time to an online service so I also have a real-time offsite backup of all my important files. In other words, I have local and offsite redundancy for all my important files.)

I’m also a fan of Open Source Software (OSS) and would prefer an OSS solution to any software needs I have. Enter Mozilla’s Thunderbird email client. I have used Thunderbird for various tasks in the past and like its interface. I was wondering if there was a way to have Mozilla archive my email in a way that I can easily retrieve the email should I need to. Easily might be a bit of a stretch here, but it is retrievable using this solution. And, it’s free, doesn’t violate any terms of service, and works with my existing data backup approach.

So, how does it work? And how did I set it up?

First, install Thunderbird and set up whatever online email account you want to backup. I’m not going to go through those steps as there are plenty of tutorials for both of them. I’m using a Gmail account for this.

Once you have your email account set up in Thunderbird, click on the All Mail folder (assuming it’s a Gmail account) and let Thunderbird take the time it needs to synchronize all of your email locally. With the over one hundred thousand emails I had in my online archive, it took the better part of a day to do all the synchronizing.

I had over 167,000 emails in my online archive account.

Once you’ve got your email synchronized, right-click on “Local Folders” and select “New Folder.” I called my new folder “Archive.”

Technically, you could store whatever emails you want to store in that folder. However, you’ll probably want to create a subfolder in that folder with a memorable name (e.g., “2015 Work”). Trust me, it will be beneficial later. I like to organize things by year. So, I created subfolders under the Archive folder for each year of emails I wanted to back up. You can see that I have a folder in the above image for the year 2003, which is the oldest email I have (that was back when I had a Hotmail address… I feel so dirty admitting that!).

The next step is weird, but it’s the only way I’ve been able to get Thunderbird to play “nice” with Gmail. Open a browser, log in to the Gmail account you’re using, and empty your trash. Trust me, you’ll want to have the trash empty for this next step.

Now, returning to Thunderbird, select the emails you want to transfer to your local folder and drag them over the “Trash” folder in your Gmail account in Thunderbird. This won’t delete them but it will assign them the “Trash” tag in Gmail. Once they have all been moved into the Trash, select them again (CTRL+A) and drag them into the folder you created to store your archived emails. In the screenshot below, I’m moving just a few emails from 2003 to the Trash to test this process:

Once the transfer is complete, click on the Local Folder where you transferred the emails to make sure they are there:

And there are the emails. Just where I want them. This also means that you have a local copy of all your emails in a folder exactly where you want them. At this point, you have technically made a backup of all the email you wanted to backup.

To remove them from your Gmail account, you need to do one additional thing. Go back to the browser where you have your Gmail account open, click on the Trash, and empty the Trash. When you do, the emails will no longer be on Gmail’s server. The only copy is on your local computer.

Now, the next tricky part to this (I didn’t say this was perfectly easy, but it’s pretty easy). Thunderbird doesn’t really store the Local Folder emails in an obvious location. But you can find the location easy enough. Right-click your Local Folder where you are archiving the emails and select “Properties.”

You’ll get this window:

Basically, in the Location box you’ll see the folder’s location. This is telling you where to find your Local Folder where your email is stored. On Linux, make sure that you have “view hidden files” turned on in your file manager (I’m using Dolphin), the location is your home folder, followed by your user folder, then it’s inside the hidden “.thunderbird” folder followed by a randomly generated folder that Thunderbird creates. Inside that folder, look for the “Mail/Local Folders” folder. Or, simply:

/home/ryan/.thunderbird/R$ND0M.default-release/Mail/Local Folders/
I have opened all the relevant folders on my computer in Dolphin see you can see the file structure.

Since I created an additional folder, there are two files in my Archive.sbd folder that contain all the emails I have put into that folder: “2003” and “2003.msf.” You can learn more about the contents of those files here, but, basically, the file with no file extension stores the actual contents of the emails. The .msf file is basically an index of the emails (I was able to open them both in Kate, a text editor, and read them fine). In short, you won’t have a bunch of files in your archive. You’ll have two files – one with the contents of the emails and one that serves as an index of the emails that Thunderbird can read. These are the two files that you’ll want to backup.

I’ll admit, this was the part that scared me. I needed to know that I could just take those two files, move them to wherever I wanted to ultimately store them and then, when needed, move them back into my Thunderbird profile and read them just fine. So, I tested it repeatedly.

Here’s what I did. First, I clicked on my backup folder in Thunderbird to make sure all of the emails were there:

I then went into the Local Folders directory and moved the files to where I want to back them up.

I then clicked on a different folder in Thunderbird and then hit F5 to make Thunderbird refresh the Local Folders. It took a couple of minutes for Thunderbird to refresh the folder, but eventually, it did. Then, I selected the 2003 Archive folder again and the emails were gone:

This is what I expected. The emails are in the “2003.msf” and “2003” files on my backup server. Now for the real test. I copied the two backed up files back to the Archive.sbd folder in my Thunderbird profile, selected the parent folder in Thunderbird, and hit F5 to refresh the folders again. It took a minute for the folder to refresh, but eventually, when I clicked on the 2003 folder and…

The emails are back!

It worked!!!

What this means is that you can put all of the email you want to back up into a folder; that folder is stored in your Thunderbird profile. You can then find the relevant .msf file and its corresponding data file, move them wherever you want for storage and, if needed, move them back to your Thunderbird profile (or, technically, any Thunderbird profile using the same folder structure) and you’ll still have all of your email.

This may not be the most elegant method for backing up your email but it’s free, it’s relatively simple and straightforward, and works reliably. Your email is not in a proprietary format but rather in an open source format that can actually be ready in a simple text editor. Of course, it’s easiest to read it in Thunderbird, but you have the files in a format that is open and secure.

BONUS:

If you don’t think you’re going to need to access your email on a regular basis, you can always compress the files before storing them. Given that most of email is text, you’ll likely realize a savings of close to 50% if space is at a premium (once I moved all of my 2003 email to storage, I compressed it and saw a savings of 60%). This will add a small amount of time to accessing the email as you’ll have to extract it from the compressed format, but it could mean pretty substantial space savings depending on how many emails you’re archiving.

EXTRA BONUS:

This is also a way to move emails between computers. I ended up using this approach to move my email archives to my laptop so I could go through my old email while I’m watching TV at night and delete all the emails I’ll never want in the future (you could of course do that with the email online before you archive it). I’m pretty good about deleting useless emails as I go, but I don’t catch them all. With the .msf file and its accompanying data file, I was able to transport my email archives to whichever computer I wanted and modify the file, then return it to my file server for long term storage.

 3,912 total views,  1 views today

Linux – Bulk UnRAR

If you’re not familiar with RAR files, they are like ZIP files. RAR is an archive format.

I had a collection of more than 150 RAR files in a single folder I needed to unrar (that is, open and extract from the archive). Doing them one at a time via KDE’s Ark software would work, but it would have taken a long time. Why spend that much time when I could automate the process.

Enter a loop bash script in KDE’s Konsole:

for i in *.rar; do unrar x "$i"; done

Here’s how the command works. The “for i in” part starts the loop (note: you can use any letter here). The “*.rar” component indicates that we want the loop to run through all the RAR files in the directory, regardless of the name of the file. The “do” command tells the loop what command to run. The “unrar x “$i”” component tells the software to use the unrar function which unpacks the archive. The “x” in that command tells the software to use the directory structure inside the RAR archive. The final piece of the loop after the second semi-colon – “done” – indicates what the terminal should do once the loop completes.

It took about 20 minutes to run through all the RAR files, but I used that time to create this tutorial. Doing this by hand would have taken me hours.

 2,703 total views,  2 views today