Thursday, March 31, 2011

Highlight lines in stdout/stderr containing a pattern

So you have lots of log lines flying by on stdout, but some of them are much more important than others. It's not always easy to identify which lines you should be paying attention to. One solution is to grep for patterns and then use color to draw your attention to the important part:


# Show only matching lines from stdout and stderr, with 5 surrounding lines

$ ./verboseProgram |& grep -C5 --color=yes "ERROR"

-or-

# Redirect stdout and stderr to a file while also printing them to screen
$ ./verboseProgram |& tee log
# Then do the same matching post-mortem
$ grep -C5 --color=yes "ERROR" log


Another solution is to highlight the lines matching patterns you care about without discarding everything else. This can be done using the following bash/awk script:

## highlight.sh ##

#!/usr/bin/env bash                                                                                                                          
set -ueo pipefail


pattern=$1
awk '{if(/'$pattern'/){print "\033[1;31m"$0"\033[0m"}else{print}}'
###############

Then you can run your program, keep a log file, and still not miss things as they arrive:
$ ./verboseProgram |& tee log | highlight.sh "ERROR"

If you're not a fan of red, you can pick your favorite color from: http://en.wikipedia.org/wiki/ANSI_escape_code

Thursday, March 17, 2011

The missing operator<< for C++ STL's vector

Ever wanted to print a std::vector to stdout/stderr, but been met with a missing operator compile error? It's annoying and simple to fix. Just define operator<< in a header file. Here's a short example:

#include <iostream>
#include <vector>

template <typename T>
std::ostream& operator<<(std::ostream& out, const std::vector<T>& v) {
  out << "[";
  for(typename std::vector<T>::const_iterator it = v.begin(); it != v.end(); ++it) {
    out << *it;
if(it != v.end()-1) out << ", ";
  }
  out << "]";
  return out;
}

int main() {
  std::vector<int> v;
  v.push_back(1);
  std::cerr << v << std::endl;
}

Wednesday, March 9, 2011

Ubuntu mouse issues

Just in case someone else runs across this issue:
From time to time, the mouse on my Ubuntu box (server 10.04 LTS lucid) running Gnome, the Xserver sometimes decides that the mouse needs to always be moving toward the top, bottom, left, or right of the screen. It appears to to stuck there. You can move it away, but it quickly moves back to the edge of the screen along one of these axes. However, this can be fixed without restarting Xorg:

$ sudo -s
$ modprobe -r usbhid; modprobe usbhid

This just removes and re-enables the kernel module that handles human interaction devices via USB. Don't try to type the modprobe commands as 2 lines -- this likely won't be possible since no USB input devices (read: your keyboard) won't work.

Wednesday, December 15, 2010

SSH keys in 3 Commands

SSH keys are useful for accessing machines over SSH without having to type your password every time. I'll be discussing a method here that uses a SSH agent, which is arguably more secure than using a blank password for your key. Also, I assume you're running a bash-like shell. The "server" is the machine you want to connect to and the "client" is the machine you'll be connecting from.

What to run:
1) you@client$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
2) you@client$ cat ~/.ssh/id_rsa.pub | ssh YOU@SERVER.ORG "mkdir -p ~/.ssh; cat - >> ~/.ssh/authorized_keys"
3) you@client$ source <(ssh-agent) && ssh-add ~/.ssh/id_rsa
Now, when you ssh YOU@SERVER.ORG, you should not be prompted for a password. You will need to rerun step 3 each time you open a new terminal or use your window manager to

What these do:
1) Creates a new public/private RSA keypair. The private key (id_rsa) will always stay on your computer. The public key will be copied to servers you wish to access. I recommend using a strong (that definitely means non-blank) password.
2) This *appends* your new public key to the list of keys that are allowed to access the machine (a magic file, which sshd always looks for in ~/.ssh/authorized_keys). If we had just used scp to copy the file here, we could have overwritten other keys that are allowed to access the machine.
3) This first starts the ssh-agent program, which returns some bash commands to stdout to initialize environment variables. By enclosing this in a process substitution and passing the result as a "file" to source, this stdout is executed as commands in the current shell. Finally, it adds your private key to the ssh-agent so that it will be passed to remote hosts you try to login to.

To keep things secure, it is important to remember that this key be kept secure. That means:

1) You should set permissions so that only you can read from the key (chmod o-rwx ~/.ssh/id_rsa)
2) You should never place your host keys on an insecure filesystem. This isn't as obvious as it sounds. Network filesystems such as AFS and NFS are apt to passing files that can "only be read by you" according to their permissions over the network unencrypted so that anyone who can sniff the network traffic can acquire your key. Be sure to keep your keys on a local filesystem.

NOTES:
Many modern servers now use Kerberos via GSSAPI for authentication. If your server is using Kerberos for authentication, the method here may either 1) not result in you being able to login without a password or 2) result in you being able to login, but then getting strange security errors when trying to access filesystems such as AFS or NFS4 or resources such as Hadoop MapReduce.

Monday, May 10, 2010

The Bag of Tricks

It seems like everyone else is blogging these days, so I've decided to keep track of my bag of programming tricks here so that I can find them later and perhaps you'll find some of them useful as well. I'm a graduate student in machine translation (read: I do a lot of data munging and plumbing in Natural? Language Processing and occasionally play with some large statistical models), so I might also use this space to celebrate or bemoan various things in our field.

Monster Bash

This post is meant to cover some of the subtleties of learning shell programming. TLDP's Bash Beginner's Guide and TLDP's Advanced Bash Scripting Guide are really worth a read, but I'll point out some highlights that every data-munging NLP practitioner should know.


Variable Evaluation

Ever read a bash script and wondered what ${!x} means? What echo<(true) does? Do you want to know how to evaluate ${x-y} and ${x%y} (no, it's not subtraction and modulo)?


Options

Options change how bash behaves when executing a scripts. You can set bash options either when invoking bash or during script execution with "set."

My favorite set of flags is:
set -e # Stop on non-zero exit codes
set -o pipefail # Stop on non-zero exit codes for any program in a pipe
set -x # Print each command to stderr as it is being executed

Alternatively "set -v" shows the actual commands from the script before executing them (as opposed to the behavior of -x, which shows the command after variable substitution and decomposing pipes).

For a full rundown of options, see http://tldp.org/LDP/abs/html/options.html.


Traps

Now that you know how to make sure your scripts don't keep charging forward after a fatal error, you might be wondering "What if I need to do some cleanup before exiting? What if I want to print an error message before exiting?" Exception handling provides this in other languages, but in bash, they're called traps.



Heredocs

Ever wanted to generate a script from within a script? Send a long string of commands over ssh within a script? Hard-code an entire document within a script? Heredocs are the answer.

You can also use <<-EOF (notice the dash before the limit string) to indicate that leading whitespace should be stripped from each line of the heredoc or <<"EOF" to indicate that variable substitution should not be performed inside the heredoc.



Process Substitution and Named Pipes

Heredocs are for the simple case when you just want to write some data to a process's stdin. What if the tool you want to use takes in a file? Or multiple filenames? But you don't have a file. Process substitution to the rescue. It can pipe the stdout of a process to a file descriptor like so: cat <(yes). Named pipes (or FIFO pipes) just generalize this concept by letting you assign filenames on disk instead of passing a file descriptor.



Self-extracting scripts (A Fun Parlor Trick)

Some example use cases: 1) You have tarball that you want to install, but you want to include an installer script AND you only want to distribute a single shell script. 2)

Note that you might want extract the file to /dev/shm (shared memory), rather than putting it on disk, since writing the file would add unnecessary startup time otherwise.

For the full scoop, have a look at http://www.linuxjournal.com/node/1005818.


Other bash necessities

Passwordless SSH key forwarding, awk scripting, and