Monday, May 10, 2010

Monster Bash

This post is meant to cover some of the subtleties of learning shell programming. TLDP's Bash Beginner's Guide and TLDP's Advanced Bash Scripting Guide are really worth a read, but I'll point out some highlights that every data-munging NLP practitioner should know.


Variable Evaluation

Ever read a bash script and wondered what ${!x} means? What echo<(true) does? Do you want to know how to evaluate ${x-y} and ${x%y} (no, it's not subtraction and modulo)?


Options

Options change how bash behaves when executing a scripts. You can set bash options either when invoking bash or during script execution with "set."

My favorite set of flags is:
set -e # Stop on non-zero exit codes
set -o pipefail # Stop on non-zero exit codes for any program in a pipe
set -x # Print each command to stderr as it is being executed

Alternatively "set -v" shows the actual commands from the script before executing them (as opposed to the behavior of -x, which shows the command after variable substitution and decomposing pipes).

For a full rundown of options, see http://tldp.org/LDP/abs/html/options.html.


Traps

Now that you know how to make sure your scripts don't keep charging forward after a fatal error, you might be wondering "What if I need to do some cleanup before exiting? What if I want to print an error message before exiting?" Exception handling provides this in other languages, but in bash, they're called traps.



Heredocs

Ever wanted to generate a script from within a script? Send a long string of commands over ssh within a script? Hard-code an entire document within a script? Heredocs are the answer.

You can also use <<-EOF (notice the dash before the limit string) to indicate that leading whitespace should be stripped from each line of the heredoc or <<"EOF" to indicate that variable substitution should not be performed inside the heredoc.



Process Substitution and Named Pipes

Heredocs are for the simple case when you just want to write some data to a process's stdin. What if the tool you want to use takes in a file? Or multiple filenames? But you don't have a file. Process substitution to the rescue. It can pipe the stdout of a process to a file descriptor like so: cat <(yes). Named pipes (or FIFO pipes) just generalize this concept by letting you assign filenames on disk instead of passing a file descriptor.



Self-extracting scripts (A Fun Parlor Trick)

Some example use cases: 1) You have tarball that you want to install, but you want to include an installer script AND you only want to distribute a single shell script. 2)

Note that you might want extract the file to /dev/shm (shared memory), rather than putting it on disk, since writing the file would add unnecessary startup time otherwise.

For the full scoop, have a look at http://www.linuxjournal.com/node/1005818.


Other bash necessities

Passwordless SSH key forwarding, awk scripting, and

No comments:

Post a Comment