Monday, May 10, 2010
The Bag of Tricks
It seems like everyone else is blogging these days, so I've decided to keep track of my bag of programming tricks here so that I can find them later and perhaps you'll find some of them useful as well. I'm a graduate student in machine translation (read: I do a lot of data munging and plumbing in Natural? Language Processing and occasionally play with some large statistical models), so I might also use this space to celebrate or bemoan various things in our field.
Monster Bash
This post is meant to cover some of the subtleties of learning shell programming. TLDP's Bash Beginner's Guide and TLDP's Advanced Bash Scripting Guide are really worth a read, but I'll point out some highlights that every data-munging NLP practitioner should know.
Variable Evaluation
Options
Options change how bash behaves when executing a scripts. You can set bash options either when invoking bash or during script execution with "set."
My favorite set of flags is:
set -e # Stop on non-zero exit codes
set -o pipefail # Stop on non-zero exit codes for any program in a pipe
set -x # Print each command to stderr as it is being executed
Alternatively "set -v" shows the actual commands from the script before executing them (as opposed to the behavior of -x, which shows the command after variable substitution and decomposing pipes).
For a full rundown of options, see http://tldp.org/LDP/abs/html/options.html.
Traps
Now that you know how to make sure your scripts don't keep charging forward after a fatal error, you might be wondering "What if I need to do some cleanup before exiting? What if I want to print an error message before exiting?" Exception handling provides this in other languages, but in bash, they're called traps.
Have a look here: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_12_02.html.
Heredocs
Ever wanted to generate a script from within a script? Send a long string of commands over ssh within a script? Hard-code an entire document within a script? Heredocs are the answer.
You can also use <<-EOF (notice the dash before the limit string) to indicate that leading whitespace should be stripped from each line of the heredoc or <<"EOF" to indicate that variable substitution should not be performed inside the heredoc.
The full story is here: http://tldp.org/LDP/abs/html/here-docs.html
Process Substitution and Named Pipes
Heredocs are for the simple case when you just want to write some data to a process's stdin. What if the tool you want to use takes in a file? Or multiple filenames? But you don't have a file. Process substitution to the rescue. It can pipe the stdout of a process to a file descriptor like so: cat <(yes). Named pipes (or FIFO pipes) just generalize this concept by letting you assign filenames on disk instead of passing a file descriptor.
For process substitution, see http://tldp.org/LDP/abs/html/process-sub.html and for named pipes: http://tldp.org/LDP/abs/html/extmisc.html#NAMEDPIPEREF.
Self-extracting scripts (A Fun Parlor Trick)
Some example use cases: 1) You have tarball that you want to install, but you want to include an installer script AND you only want to distribute a single shell script. 2)
Note that you might want extract the file to /dev/shm (shared memory), rather than putting it on disk, since writing the file would add unnecessary startup time otherwise.
For the full scoop, have a look at http://www.linuxjournal.com/node/1005818.
Other bash necessities
Passwordless SSH key forwarding, awk scripting, and
Subscribe to:
Posts (Atom)