ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinStufftar

Overload Journal #132 - April 2016 + Programming Topics   Author: Ian Bruntlett
How do you quickly transfer data from one machine to another? Ian Bruntlett shows us the bash script he uses.

A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system.
~ John Gall

Some time ago Frances Buontempo was looking for articles for Overload. I mentioned in an e-mail that I had a backup script called stufftar that I could write about. Frances kindly provided the questions that this article was built on.

What inspired it? Were you trying to solve a specific problem?

I use both Ubuntu and Lubuntu Linux. I am a volunteer tester of both. For personal use, I use Ubuntu and LUbuntu and I want to keep up to date so I needed some method to create backups and transport of key files and folders as flexibly as possible.

Initially I was backing up key files (all of ~/stuff) to a .tar.gz file. Then I decided I wanted to automate it a bit so I worked out how to automatically create its destination filename:

  DESTINATION_FILENAME=$1`date "+_%d_%B_%Y.tar.gz"`

Then I added commands to display the time taken to do the backups using the line:

/usr/bin/time -f "%E mins:secs " tar -czf $DESTINATION_FILENAME $4 $5 $6 $7 $8 $9 ${10} ${11} ${12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20}

I had to call time from /usr/bin because bash was ‘hiding’ it with its own, less flexible, time implementation, which doesn’t support the options I wanted.

Because of my inexperience with bash, testing was very important to me. As new features were developed, I would put a ‘test’ framework in place, thoroughly test each new feature and then remove the ‘test’ framework. Having functions ensure they had been given sufficient parameters was particularly important.

As time went by, I added more options. In particular I wanted to backup individual folders leading to the options (desktop, extra, rpg, home and localhost) being implemented. For convenience I wanted to be able to specify ‘backup everything’ so I implemented the all option. To keep things transparent, I implemented the verbose option for stufftar to output more information about what it is currently doing. I typically have a refurbished Lenovo ThinkPad T420 (from www.tier1online.com) as my main computer (hostname newton) running Ubuntu Linux. I have an old 32 bit Samsung NC10 that I take with me when visiting family (running lubuntu Linux). I back up my .tar.gz files to USB flash drive. I also back them up to external hard drives. I’ve got three external 500GB hard drives and, once a month, I copy that month’s latest .tar.gz files to one of them. I rotate my use of external hard drive so that I’m backing up to the one containing the oldest backup of the set. About once a year I archive everything to dual-layer DVD-R – mainly as individual folders but I put the ‘home’ and ‘localhost’ .tar.gz files as is onto the DVD-R.

What does it do?

I’ve got a bunch of important digital files I carry around and backup to USB flash drives and hard drives. My ~/Desktop files are my key files for general use. Other folders are ~/stuff (the original folder I put my ‘stuff’ in), ~/extra (the folder I moved from ~/stuff when it became too large), ~/RPG (the folder I use to keep RPG PDFs etc in). The home option backs up key shell scripts. The localhost option backs up key files (/var/www/) from LAMP studies.

How do you use it?

For help type in ./stufftar and it gives you a list of command line options (see Figure 1).

ian@newton:~$ ./stufftar
./stufftar Usage : stufftar followed by one or more commands: desktop, 
extra, rpg, localhost, stuff, home and all
All data files are:-
1. Named after the relevant command name, followed by day number, month, year.
   For example: Desktop_01_March_2014.tar.gz
2. Are created using the tar command with file compression switched on.

Explanation of stufftar commands:-
desktop   - copy desktop files to a desktop tar file
extra     - copy extra to a tar file –  Linux Voice, Overload, QL Today
stuff     - copy stuff – anything I want to keep (main files are here)
rpg       - copy ~/RPG to a tar file
home      - copy refurb, stufftar, coder, removefiles.c to a home tar file
localhost - copy the whole /var/www/html subtree to a tar file
all       - execute stuff, extra, desktop and home commands in one go. 
            Use when you want a full backup.
verbose   - display more details about the work being done.
status    - display status info about the stufftarred files on this system.

Also consider backing up Firefox bookmarks, and .emacs config file.
			
Figure 1

What future improvements do you envisage?

As a side-effect of writing this article, I’ve started a ‘TO DO’ comment section. Currently it does everything I need it to do. Also, despite having written a bash shell script helped by my copy of the Linux Pocket Guide (O’Reilly) I’ve never really studied bash. I’ve just muddled through. I have copies of How Linux Works and The Linux Command Line to make my way through sometime next year – I am concentrating on Ruby this year. The command line options are non-standard but OK for my purposes but could be modified to handle arguments like -f some_kind_of_parameter. There is a UNIX tool called TripWire, used to report changes to folders of files. I think I’ll be looking at tackling that – in the future.

A walk through the code (edited highlights)

The line #!/bin/bash tells Linux that this is a bash shell script. Some people to specify sh instead of bash but as this script is for personal use, I’m using bash.

  #!/bin/bash
  
  # stufftar backup script by Ian Bruntlett, 
  # 2012 - December 2015, expanded and desktar 
  # merged in on August 11th 2012, 
  # March 2013 added "coder" file to Desktop tar,
  # added BACKUP_HOME

As is usual, information about the script is stored at the start of the script (summarised for brevity).

  # echo_and_log(logfilename, text to put in log
  # file and echo to screen)
  function echo_and_log()

This is a ‘helper’ function that echoes its parameters both to the screen and to a specified log file. Useful to avoid repeated identical echo statements.

  # if error code set ($1), display error messages
  # and exit programme
  function exit_if_failed()

This is another ‘helper’ function. It gets passed an error code by its caller. Normally it is 0 so this function does nothing. If it is non-zero then diagnostic information is echoed and it exits/aborts the script with a return code of 1.

  # $1 log filename aka $LOG_FILE
  # $2 file to get MD5 from aka $SOURCE_FILENAME
  # example:- get_and_log_md5 "~/md5log.txt"
  "localhost_04_May_2015.tar.gz"
  function get_and_log_md5()

This function calculates the MD5 checksum of the file (function parameter $2) and logs it to a logfile (function parameter $1). It is a ‘helper’ function used by function perform_backup (Listing 1) when global variable $STUFFTAR_VERBOSE is greater than zero.

# perform_backup
# $1 - stub of .tar.gz filename
# $2 - name of log file 
# e.g. "scripts/stufftarlog.txt"
# $3 - directory to do the tarring in
# $4 onwards - files/directories to put in .tar.gz
# file relative to $3
function perform_backup()
{
  if [ $# -lt 4  ]
  then
    echo "Error perform_backup() insufficient no of parameters";
    return 1;
  fi;
 FILENAME_STUB=$1
 DESTINATION_FILENAME=$1`date "+_%d_%B_%Y.tar.gz"`
 LOGFILE=$2
 TAR_DIR=$3

  if [ $STUFFTAR_VERBOSE -gt 0 ]
  then
    echo $0 $1 $2 $3 $4 $5 $6 $7 $8 $9 ${10} ${11} ${12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20}
    echo DESTINATION_FILENAME=$DESTINATION_FILENAME e.g. Desktop_28_December_2014.tar.gz
    echo FILENAME_STUB=$FILENAME_STUB
    echo LOGFILE=$LOGFILE
    echo TAR_DIR=$TAR_DIR
  fi
  CURRENT_TIME=`date "+%H:%M:%S"`
  cd $TAR_DIR
  echo_and_log $LOGFILE $CURRENT_TIME Backing up
    key $TAR_DIR $4 $5 $6 $7 $8 $9 ${10} ${11} {12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20}files to $DESTINATION_FILENAME 
  /usr/bin/time -f "%E mins:secs " tar -czf     $DESTINATION_FILENAME $4 $5 $6 $7 $8 $9 ${10} {11} ${12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20}
  exit_if_failed $? "perform_backup to "     $DESTINATION_FILENAME 
  echo File count:-
  tar -tvf $DESTINATION_FILENAME | wc -l
  ls -lh $DESTINATION_FILENAME
  if [ $STUFFTAR_VERBOSE -gt 0 ]
  then
    get_and_log_md5 $LOGFILE $DESTINATION_FILENAME 
  fi
  echo
  cd;
  return 0;
}
# end function perform_backup	
			
Listing 1

perform_backup is the ‘engine’ of the script. It validates its parameters. It does some logging, if running in verbose mode. It does the backup using both the time command and tar. The backup is created by tar and time outputs the amount of time taken. It also lists the number of files in the tar file by piping a list of file to the word count utility wc.

  # show_last_line
  # $1 is the name to show
  # $2 is the log file to show the tail end of
  # $3 is the number of lines to show 

This ‘helper’ function, show_last_line, echoes some information about a logfile – the name of the archive (‘stuff’, ‘localhost’ etc) and the last $3 lines of the log file $2. See Listing 2.

function show_last_line()
{
  if [ $# -ne 3  ]
  then
    echo "Error show_last_line() insufficient no     of parameters ($#)";
      echo usage "show_last_line name of file,       source log file, no of lines to show"
    return 1;
  fi;
  echo -n "$1 - $2 "
  tail -$3 $2
  echo -n
  return 0;
}
# end function show_last_line
			
Listing 2

The function in Listing 3 uses show_last_line to display the contents of all stufftarlog.txt files. The logfile performs two purposes. On the master computer, show_status() indicates when a particular folder was last backed up to tar file. On other computers, it shows the age of the data that has been transferred by tar file.

function show_status()
{
  echo STUFFTAR STATUS
  show_last_line "Desktop" "$HOME/Desktop/stufftarlog.txt" 3
  echo
  show_last_line "extra" "extra/stufftarlog.txt" 3
  echo
  show_last_line "localhost" "/var/www/html/stufftarlog.txt" 3
  echo
  show_last_line "RPG"   "RPG/stufftarlog.txt" 3 
  echo
  show_last_line "scripts and home" "scripts/stufftarlog.txt" 3
  echo
  show_last_line "stuff" "stuff/stufftarlog.txt" 3
  echo
  return 0;
}
# end function show_status
			
Listing 3

This function, show_status, is triggered when a parameter of status is passed on the command line. It can be used on its own or in conjunction with other commands.

This is the main part of this script. If no parameters are passed, a help message is displayed explaining the use and parameters of the script.

The worker variables BACKUP_DESKTOP, BACKUP_EXTRA, BACKUP_HOME, BACKUP_LOCALHOST, BACKUP_RPG, BACKUP_STUFF are initialised to zero here. Another variable, $STUFFTAR_VERBOSE, is initialised near the start of the script.

Listing 4 is where I loop through the script’s command line arguments, setting worker variables accordingly. Note if the parameter all is found, then a bunch of worker variables are set to 1.

<IMAGE xml:link="simple" href="Bruntlett-7.gif" show="embed" actuate="auto"/>

For information purposes, the name of the script is echoed ($0) and if in verbose mode, ls -lh is used to show even more information about the script file. Also for information purposes, the status of worker variables is displayed.

  echo $# PARAMETER\(S\),
  BACKUP_STUFF=$BACKUP_STUFF
  BACKUP_EXTRA=$BACKUP_EXTRA,
  BACKUP_DESKTOP=$BACKUP_DESKTOP,
  BACKUP_HOME=$BACKUP_HOME,
  BACKUP_RPG=$BACKUP_RPG,
  BACKUP_LOCALHOST=$BACKUP_LOCALHOST,
  STUFFTAR_VERBOSE=$STUFFTAR_VERBOSE,
  STUFFTAR_STATUS=$STUFFTAR_STATUS

For consistency, the script changes the current directory to the current user’s home directory before doing any file handling.

  cd ~

Then the .tar.gz files are created – basically checking to see if a worker variable is 1 and then calling perform_backup to do the work.

  # backup to a stuff tar
  if [ $BACKUP_STUFF -eq 1 ]
  then
    perform_backup "Stuff" "stuff/stufftarlog.txt"
    "$HOME" "stuff" 
  fi

Similar clauses are used for the creation of the extra, rpg, desktop .tar.gz files. Backing up the key ‘home’ files is a little different:

  # backup key /home/ian things e.g this backup
  # script to a tar file
  if [ $BACKUP_HOME -eq 1 ]
  then
    perform_backup "Home" "scripts/stufftarlog.txt"
    "$HOME" refurb scripts synclamp stufftar coder
    removefiles.c;
  fi

And localhost is used to backup key LAMP files (see Listing 5).

<IMAGE xml:link="simple" href="Bruntlett-8.gif" show="embed" actuate="auto"/>

As its final act, if the ‘status’ option has been activated, the status of every backup file is displayed.

  if [ $STUFFTAR_STATUS -eq 1 ]
  then
    show_status;
  fi

And that is it.

Feedback from Overload technical reviewers

This script serves as a decent example of how to back things up in a reproducible way using tar. I didn’t see any glaring errors in it, but I would comment that it is not tolerant of spaces in filenames (fixing that would require liberal use of double-quotes, and I usually find some trial-and-error is required to get this right).

The script was written by me – a bash novice. Given the importance of the data, it was tested heavily as it evolved. As I expected to use the resulting .tar.gz files from the command line, I decided that my filenames would not contain spaces.

The article as it is now is a bit specific to one use case – I think it would be more useful if it explained the ideas and techniques being used rather than presenting the details of the script itself. E.g.:

  • Interesting policy decisions like creating separate log files for each piece of work being done – it would be interesting to hear how this supports the workflow.

    I thought about having a single log file, ~/stufftarlog.txt but it wasn’t flexible enough and I’d have to somehow process that log file, looking for status information about each type of backup. By having separate log files, I avoid that problem and it means I can decide to just backup certain bunches of files instead of a full-blown backup of everything.

  • How to get the last relevant line out of a log file (as the script does) – this seems more widely-applicable and a useful little nugget.

    I added the function body of show_last_line to this article. It is quite simple and uses the tail command.

  • How to deal with command line arguments (the for loop used here looks quite convenient for simple applications like this).

    Yes. With a bit of effort it can be more flexible. At the moment it handles one word parameters that act as flags or specify a certain backup to perform.

    Supporting a syntax like -f some_kind_of_flag would be possible. I have some ideas about it, mainly involving extending the loop to set a flag ($ARGUMENT_F_EXPECTED) when a parameter of -f is detected and setting a flag. Then the head of the loop would need another set of if statements – followed by use of the continue loop modifier.

  • The benefits of writing a script rather than doing this manually (e.g. reduced errors and less time take)

    Spot on. Being able to run a command to do all the backups I wanted, have standard filenames and contents, and walk away was crucial. That dealt with errors during creation of the backups. However, I used to transport my files to a Samsung NC10 NetBook (when it wasn’t being wiped and used to test Lubuntu pre-releases), and I noticed that I was occasionally forgetting to install the contents of newer .tar.gz files. So I needed to know when a particular folder’s files were created. This resulted in the function show_last_line (it can show more than one line) which was discussed earlier. When I’m working on the Samsung NC10, typing in ./stufftar status means I can see how fresh this copy of my files is.

Reference

stufftar can be downloaded from:https://sites.google.com/site/ianbruntlett/home/free-software/linux

Overload Journal #132 - April 2016 + Programming Topics