Fork me on GitHub

Checking git staged files for tabs before committing

Posted on 10 January 2010 @ 19:42

I recently cocked up a bunch of files in a project I'm working on because I'd inadvertently changed the language indentation from Soft Tabs to Tabs in TextMate. When I finally noticed this I was a might bit annoyed, not least because I'd have to change all the files and thereby nuke half of the git blame history. The first step of this was to find the files with the tab characters in them, a quick google revealed this

$ TAB=$(printf "\t")
$ grep -Hnr "$TAB" *

which helped me solve the initial problem. I then decided I'd write a rake task for my project so I can check for tab indented files in the future more easily. I aimed to make this Ruby 1.8 and Ruby 1.9 friendly and came up with the following function

 1 def check_for_tabs_in(srcfile)
 2   line_number = 0
 3   File.readlines(srcfile).each do |line|
 4     line_number += 1
 5     if line.index("\t")
 6       found << "#{srcfile}:#{line_number}\t#{line}"
 7     end
 8   end
 9 end
10 
11 namespace :check_tabs do
12   task :all => [:source, :tests]
13   task :source do
14     Dir.glob("Source/**/*.[hm]").each do |srcfile|
15      check_for_tabs_in srcfile
16     end
17   end
18   task :tests do
19     Dir.glob("Test/Units/*.[hm]").each do |srcfile|
20       check_for_tabs_in srcfile
21     end
22   end
23 end

which left me feeling quite happy, then it struck me. What if git could warn me if I was about to commit files with tabs and let me know what files were affected so I could go and fix them. Well, the code was already there above, but I hardly wanted a git hook to call my rake file, so I rolled the function straight into the hook file. Then to make the error output look better I cached the list of found lines and printed the at the end of the message, the result is this lovely life saving snippet of code.

 1 #!/usr/bin/env ruby
 2 
 3 found = []
 4 `git diff --cached --name-only`.split("\n").each do |srcfile|
 5   line_number = 0
 6   File.readlines(srcfile).each do |line|
 7     line_number += 1
 8     if line.index("\t")
 9       found << "#{srcfile}:#{line_number}\t#{line}"
10     end
11   end
12 end
13 
14 unless found.empty?
15   puts "Error: Attempt to add file with tab indentation"
16   puts ""
17   puts "This project uses spaces rather than tabs for indendation,"
18   puts "please fix the lines of the following files and then re-add"
19   puts "the files to the index and re-commit.\n\n"
20   puts found.join("\n")
21   exit 1
22 end
23 
24 exit 0

feel free to use it yourself.

Running Stompserver with Daemontools

Posted on 23 October 2009 @ 08:57

I've been messing around with chef recently and needed to setup a stompserver for the chef indexer. As I normally have daemontools installed on my servers anyway I thought I'd have a go at running stompserver under daemontools rather than runit.

My instructions will deviate a little from the prescribed daemontools configuration. I'm running FreeBSD and using the sysutils/daemontools port which runs out of /var/service rather than /service. This should be the only difference between my instructions and those for a Linux based operating system.

I'm assuming you've already got daemontools installed and that its running.

Installing Stompserver

Stompserver is a RubyGem which requires Ruby and RubyGems to be installed. On FreeBSD the following commands will install Ruby and RubyGems

cd /usr/ports/lang/ruby18
make install clean
cd /usr/ports/devel/ruby-gems
make install clean

or if you've got portupgrade installed then you'll already have lang/ruby18 so we just need the gems

portinstall devel/ruby-gems

Now that we've got Ruby and RubyGems we can install stompserver with

gem install stompserver --no-ri --no-rdoc

I've added the --no-ri --no-rdoc as I'm going to be running this on a server and don't want the docs as well.

Creating the Stompserver directory

To begin with we need a place to keep all the run, env, log files and directories which daemontools will use to run the server. I've opted for /usr/local/etc/stompserver and a working directory of /var/db/stompserver

mkdir -p /usr/local/etc/stompserver
cd /usr/local/etc/stompserver
mkdir -p env log/main
mkdir /var/db/stompserver

Creating a user for the stompserver

This is short and sweet really, but we'll chown the folders as well

pw useradd stompserver -d /nonexistent -s /sbin/nologin
chown -R stompserver:stompserver log/main
chown -R stompserver:stompserver /var/db/stompserver

The run scripts

Starting off with the multilog run file

cat < EOF > log/run
#!/bin/sh
exec setuidgid stompserver \
multilog \
    t \
    ${MAXFILESIZE+"s$MAXFILESIZE"} \
    ${MAXLOGFILES+"n$MAXLOGFILES"} \
    ${PROCESSOR+"!$PROCESSOR"} \
    ./main
EOF
chmod 755 log/run

now for the stompserver run file. First lets set some env variables we'll use in the run file

echo localhost > env/HOST
echo 61613 > env/PORT
echo memory > env/QUEUETYPE
echo 0 > env/SECONDS
echo /var/db/stompserver > env/WORKING_DIR

and now the run file

cat < EOF > run
exec 2>&1
envdir ./env \
sh -c '
    exec \
    setuidgid stompserver \
    /usr/local/bin/stompserver \
        ${PORT+"-p$PORT"} ${HOST+"-b$HOST"} \
        ${QUEUETYPE+"-q$QUEUETYPE"} \
        ${WORKING_DIR+"-w$WORKING_DIR"} \
        ${DEBUG+"-d"} \
        ${AUTH+"-a"} \
        ${SECONDS+"-c$SECONDS"}
'
EOF
chmod 755 run

its worth noting that if you set either env/DEBUG or env/AUTH then debug or auth flags will be set when the stompserver is run.

Starting the Stompserver

Now it is only required to link the stompserver directory into the daemontools service directory

ln -s /usr/local/etc/stompserver /var/service

you can check that its working with

svstat /var/service/stompserver
ps aux | grep stompserver

Handling arguments with execve

Posted on 13 September 2009 @ 11:53

As I mentioned in my previous post I've been working on a project at work which required me to be managing shell commands from a C program. The solution I documented worked fine so long as you knew all of the arguments at compile time. Another popular requirement would be to generate a command string and have this passed to the external program. Unfortunately this adds some complications.

Here is the main executing body of our program from last time.

 1 pid_t pid;
 2 int status;
 3 char *envp[] = { NULL };
 4 char *argv[] = { "./test_args", "hello", "there", NULL };
 5 
 6 switch ( pid = fork() ) {
 7   case -1:
 8     perror("fork()");
 9     exit(EXIT_FAILURE);
10   case 0: // in the child
11     status = execve("./test_args", argv, envp);
12     exit(status); // only happens if execve(2) fails
13   default: // in parent
14     if ( waitpid(pid, &status, 0) < 0 ) {
15       perror("waitpid()");
16       exit(EXIT_FAILURE);
17     }
18 
19     if ( WIFEXITED(status) ) {
20       // return status from child, ie ./test_args
21       exit( WEXITSTATUS(status) );
22     }
23     exit(EXIT_FAILURE);
24 }

So say we wanted to throw "hello there world" as arguments to test_args. We could changes line 4 to read

char *argv[] = { "./test_args", "hello there world", NULL };

and I would forgive you for thinking that test args would get 4 arguments, in fact it only gets 2. The second argument is just the full string. If you look at the line above this is actually fairly obvious.

Generating our arguments

So in order for this to work we have to preprocess the "hello there world" string into an argv** list. Unfortunately its now been a while since I had to do this so I won't be going into great detail as to how the function evolved. So I'll just splat it straight down and then go through it.

 1 static char** argv_from_string(char *cmd, char *args) {
 2  int i, spaces = 0, argc = 0, len = strlen(args);
 3  char **argv;
 4 
 5  for ( i = 0; i < len; i++ )
 6      if ( isspace(args[i]) ) spaces++;
 7 
 8  // add 1 for cmd, 1 for NULL and 1 as spaces will be one short
 9  argv = (char**) malloc ( (spaces + 3) * sizeof(char*) );
10  argv[argc++] = cmd;
11  argv[argc++] = args;
12 
13  for ( i = 0; i < len; i++ ) {
14      if ( isspace(args[i]) ) {
15          args[i] = '\0';
16          if ( i + 1 < len )
17            argv[argc++] = args + i + 1;
18      }
19  }
20 
21  argv[argc] = (char*)NULL;
22  return argv;
23 }

The algorithm is quite simple, lines 4 and 5 count the number of spaces in the argument string. This tells us how big we need to make the argv list. Once we know how big the argument list is we need to allocate the memory for that list.

NOTE that whatever function calls this will need to free the memory allocated by it. This is slightly bad form, but I felt it was acceptable in this case.

Next the arguments are added to the list using the argv[argc++] lines. On line 10 the first argument is added to the list, over the course of lines 12 to 18 the spaces within the argument string are replaced with null characters to terminate the strings. The starts of the new strings are then added to the argv list. Finally on line 20 the null sentinel is added at the end of the argv list.

NOTE the char *args contents are permanently changed by argv_from_string, if the calling function does not want to alter its argument string it should duplicate it and pass the clone to argv_from_string.

Using argv_from_string

Now I've already alluded to two issues when calling argv_from_string. The caller needs to ensure its memory is freed and potentially make a clone of the argument string before passing it if it cares about it being modified. So then to use this new function we need to change the main executing body above to

 1 pid_t pid;
 2 int status;
 3 char *args, **argv = NULL;
 4 char *envp[] = { NULL };
 5 
 6 switch ( pid = fork() ) {
 7   case -1:
 8     perror("fork()");
 9     exit(EXIT_FAILURE);
10   case 0: // in the child
11     args = strdup(cmd); // cmd is our argument string
12     argv = argv_from_string("./test_args", args);
13 
14     status = execve("./test_args", argv, envp);
15 
16     // only happens if execve(2) fails
17     free( args );
18     free( argv );
19     exit(status);
20   default: // in parent
21     if ( waitpid(pid, &status, 0) < 0 ) {
22       perror("waitpid()");
23       exit(EXIT_FAILURE);
24     }
25 
26     if ( WIFEXITED(status) ) {
27       // return status from child, ie ./test_args
28       exit( WEXITSTATUS(status) );
29     }
30     exit(EXIT_FAILURE);
31 }

the main body of changes are on lines 3 to 4 and 10 to 20. On lines 3 and 4 we've altered our variable list to include a string called args and an array of strings called argv. Now the real work begins on line 10, here we duplicate the string of arguments we received, this is so we don't trample the contents. We then pass this string of arguments to argv_from_string with the name of the program we are executing.

The execve(2) call itself hasn't changed, but should it fail we first cleanup the argument list and duplicated arguments string before exiting.

Now you might wonder why I chose to do all of this in the forked child instead of doing this before calling fork(2). My reasoning for this is that the external program I am executing is likely to be short lived and so memory created in its space will be freed by the operating system when its done. Also by keeping all of this in only the child I don't have to worry about freeing the memory in the parent or in the event that fork(2) fails.

Well I hope thats been useful for you.

XML Demolisher

Posted on 28 August 2009 @ 11:25

The majority of people familiar with Ruby have at one time or another used the Builder library to generate XML files. Builder is nice in that it provides a very ruby-ish interface to writing XML.

For example the following Builder code

 1 require 'builder'
 2 
 3 people = [
 4   { :firstname => 'Enoch', :lastname => 'Root',
 5     :phone => '01234 567 8900', :email => 'enoch@example.com',
 6     :active => true },
 7   { :firstname => 'Randy', :lastname => 'Waterhouse',
 8     :phone => '01234 567 8901', :email => 'randy@example.com',
 9     :active => false },
10 ]
11 
12 File.open('addressbook.xml', 'w') do |file|
13   xml = Builder::XmlMarkup.new(:target => file, :indent => 2)
14   xml.instruct!
15   
16   xml.addressbook do
17     people.each do |person|
18       xml.person do
19         xml.firstname person[:firstname]
20         xml.lastname  person[:lastname]
21         xml.contact do
22           xml.phone person[:phone]
23           xml.email person[:email]
24         end
25         xml.active person[:active] ? 'YES' : 'NO'
26       end
27     end
28   end
29 end

will generate this XML

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <addressbook>
 3   <person>
 4     <firstname>Enoch</firstname>
 5     <lastname>Root</lastname>
 6     <contact>
 7       <phone>01234 567 8900</phone>
 8       <email>enoch@example.com</email>
 9     </contact>
10     <active>YES</active>
11   </person>
12   <person>
13     <firstname>Randy</firstname>
14     <lastname>Waterhouse</lastname>
15     <contact>
16       <phone>01234 567 8901</phone>
17       <email>randy@example.com</email>
18     </contact>
19     <active>NO</active>
20   </person>
21 </addressbook>

nice and easy huh, despite me complicating it by using an array of people in the ruby code you can see how the XML is being put together. Now with the likes of REXML and Hpricot you can, with out a great deal of effort, extract pieces of this XML file back out and re-create the original array. The trouble is that neither of these projects are that well designed for this kind of process as I discovered.

What I was needing to do was export vast amounts of data into a portable format and then either import it back into the same system or potentially another system. Unfortunately coping with differences between how some other system might want to store that data was very tricky with the normal pick and choose method XML parsing. What I really wanted was something like Builder but the other way round. This was actually a lot simpler to achieve than I thought and with only a couple of hours work I had some code do just that.

Demolisher

So Demolisher was born, first onto GitHub and then properly, after adding tests, to Rubyforge. So to proceed with our addressbook example a bit more, how would you get the information back out?

 1 require 'demolisher'
 2 people = []
 3 
 4 xml = Demolisher.demolish('addressbook.xml')
 5 xml.addressbook do
 6   xml.person do
 7     person = {}
 8 
 9     person[:firstname] = xml.firstname
10     person[:lastname]  = xml.lastname
11     person[:active] = xml.active?
12     xml.contact do
13       person[:phone] = xml.phone
14       person[:email] = xml.email
15     end
16 
17     people << person
18   end
19 end

notice the active? call, if you append a ? to the end of an element name then it will return true provided the contents of the element are "t", "y", "true", "yes" or 1. This is case insensitive.

As I found it common in my import/export code to have little wrapped elements for logical grouping like the contact element above I also added a shorthand approach to accessing sub elements. You can get the contact phone using xml.contact.phone as well as the block method, so it could also be written as (lines 11 and 12)

 1 require 'demolisher'
 2 people = []
 3 
 4 xml = Demolisher.demolish('addressbook.xml')
 5 xml.addressbook do
 6   xml.person do
 7     person = {}
 8 
 9     person[:firstname] = xml.firstname
10     person[:lastname]  = xml.lastname
11     person[:phone] = xml.contact.phone
12     person[:email] = xml.contact.email
13     person[:active] = xml.active?
14 
15     people << person
16   end
17 end

I hope you like it and have some more fun demolishing XML data files. You can view the rdoc for only the tiniest bit more information, hey I told you it was pretty simple.

Using execve for the first time

Posted on 28 August 2009 @ 00:20

Recently I've been working on a project at work which required me to be managing shell commands from a C program. It's quite an interesting project, when its ready I hope to do a bit of a post on it. I've got quite a limited history of C experience but I do quite enjoy working with the language. Nowhere in my past experience with it had I ever been invoking other programs within my code, so this was definitely a learning experience for me.

For the purposes of any examples here is the source for the program I will be invoking as test_args

 1 #include <stdio.h>
 2 
 3 int main(int argc, char **argv) {
 4  FILE *log;
 5  int i;
 6 
 7  log = fopen("./out.txt", "a+"); // assume it worked
 8  fprintf(log, "Called with %d args as: ", argc);
 9  
10  for ( i = 0; i < argc ; i++ ) {
11      fprintf(log, "\"");
12      fprintf(log, argv[i]);
13      fprintf(log, "\"");
14      fprintf(log, " ");
15  }
16 
17  fprintf(log, "\n");
18 
19  return 0;
20 }

I've opted to write the arguments passed to the program to a text file for the sake of simplicity.

execve(2) and friends

The main system call to invoke an external program within your code is execve(2). Now that I've said that I'm going to retract nearly all of it. From the manpage

execve() transforms the calling process into a new process.

we see something a bit different, and initially for me a source of confusion. To begin with I was simply calling execve(2) in my code and letting it do its thing. This is not correct as any veteran of execve(2) knows and will probably baulk at my stupidity for not getting it right away.

So what does the manpage mean in this case, well as I found out to my cost, it means exactly what it says. The moment execve(2) is entered, provided it doesn't encounter an error, your program becomes the program invoked by execve(2). It will run to the end of that invoked program and return you whatever that program would if you'd run it yourself. This was my first tripping point and I quickly found out that I needed to fork(2) my way around it. As an example, most definitely not to be followed, here is what I had started with

1 int ret;
2 char *envp[] = { NULL };
3 char *argv[] = { "./test_args", "hello", "there", NULL };
4 
5 ret = execve("./test_args", argv, envp);
6 
7 // do things based on ret from test_args down here

as I've described, with the exception of an error, nothing after the execve(2) would be run.

Now I mentioned friends, execve(2) is wrapped by a number of functions such as execl, execle, execlp, execv, execvp and execvP which are documented in exec(3). These functions primarily offer a selection of different ways to wrap up the call to execve(2) for your convenience. I won't touch on these again as you can look into them yourself.

Another friend of execve(2) is system(2), this function takes a command string and passes it to sh(1) the default shell for interpretation and execution. For small commands it can be a quick way of getting things done but for my needs I didn't feel like it gave me enough control.

fork(2)ing around to keep in your program

So I needed to first fork(2) my program before calling execve(2) to invoke the external program. Now the important thing to remember about fork(2) is that your program branches into a child and parent. The parent is your controlling process and the child is what you will offer up to the gods of execve(2) to be transformed into the program you wish to invoke.

Here follows a crude example...

 1 pid_t pid;
 2 int status;
 3 char *envp[] = { NULL };
 4 char *argv[] = { "./test_args", "hello", "there", NULL };
 5 
 6 switch ( pid = fork() ) {
 7   case -1:
 8     perror("fork()");
 9     exit(EXIT_FAILURE);
10   case 0: // in the child
11     status = execve("./test_args", argv, envp);
12     exit(status); // only happens if execve(2) fails
13   default: // in parent
14     if ( waitpid(pid, &status, 0) < 0 ) {
15       perror("waitpid()");
16       exit(EXIT_FAILURE);
17     }
18 
19     if ( WIFEXITED(status) ) {
20       // return status from child, ie ./test_args
21       exit( WEXITSTATUS(status) );
22     }
23     exit(EXIT_FAILURE);
24 }

So again we setup the envp and argv string arrays to pass to execve(2) but before that we declare a variable of type pid_t. This is a type to store Process Identifiers, PIDs, which are returned from fork(2).

I've opted for a switch statement here, you could equally use an if but I prefer the readability and clarity of the switch.

So we fork our process and save the returned PID value as we'll need this in the parent. If our PID has a value of -1 then fork(2) has failed. Now we have the fun part, the child will always get a 0 return value from fork(2), if you want the PID in the child you can call getpid(2). So inside the child we make the call to execve(2) and capture the result of the call in the event of an error. From this point on the child is test_args and not our main program. Now the fork(2) call will return the value of the PID for the child to the parent, so we can catch this with the default case. In this rather simple example we wait for the child process to complete with waitpid(2) which will fill in our status variable with the return value of the child. Next we check the status to ensure the child exited and didn't crash or die, we then return the status code or indicate failure.

Thats the basic gist of using execve(2) to invoke external programs in your code. In a later post I'll describe how I went about handling more free form argument lists.

I'm Expert Staff apparently

Posted on 27 August 2009 @ 23:06

Rather recently the company I work for went through a bit of a re-branding and structural reorganisation. So now Open Hosting is a division of a new parent company M247 Ltd. Part of this re-branding was the creation of a couple of new divisions, one of which is Ice Colo through which our data centre services are being sold.

Now a while ago I'd rather foolishly agreed to allow myself to be photographed as part of some generic PR effort, this resulted in the following image

Geoff IceColo PR Image

which was at least not quite as bemusing as

Chris IceColo PR Image

which I think is fantastic and cheesy at the same time.

Anyway I digress, the first picture found its way into I believe Data Centre Management magazine as part of a piece on the data centre we've built in Manchester.

Expert staff

With the creation of the Ice Colo brand came the inevitable website where once again you can see the image of my lovely self as part of the auto scrolling schpiel under Expert Staff. Wonderful eh, guess I've now got a reputation to live up to.

Marketing Idea for Apple

Posted on 27 August 2009 @ 22:28

Portrait style banner, fresh green apple at the bottom with a bite taken out of it. Text reads

They're just better for you ;)

as added bonus, have people standing outside Microsoft stores selling bushels of apples.

RSpec with MacRuby, part 2

Posted on 24 July 2009 @ 16:35

RSpec with MacRuby, part 2

RSpec with MacRuby, part 1

Posted on 24 July 2009 @ 16:35

RSpec with MacRuby, part 1

Failure in the GPS

Posted on 26 May 2009 @ 14:36

I swear, sometimes it's as if Hollywood sets out with "failure" plugged right into the GPS.

A Joss Whedon-less 'Buffy' movie: Worst idea ever of the year

Harsh highlighting

Posted on 2 April 2009 @ 13:23

Just read about the Harsh ERb/HAML highlighter, it looks quite cool. Also because it uses UltraViolet it can be powered by my version of the Oniguruma gem.