My OSX Setup

October 9, 2011

After over 10 years of Linux I recently switched to OSX on a MacBook Air. I can’t say enough good things about the Air hardware, but it’s taken me  a few months to get completely confortable with the operating system. Here are the apps I’ve settled on, which I think make for a fairly awesome OSX setup:

Alfred 

Alfred is a powerful launcher, which allows you to start any application simply by pressing Alt-Space and then typing the application name. Alfred also has an inbuilt calculator, which is really handy, and can be used to quickly search google too.

Chrome

I did try Safari for a while but came back to Chrome, my browser of choice. It’s fast, has a clean UI, and more and more great extensions are available all the time. My current extensions: Rapportive, Pretty Beautiful Javascript, YSlow, GitHub Inbox, and Screen Capture.

Cyberduck

Cyberduck is a handy little file transfer app that supports FTP, SCP, WebDAV and a load of cloud storage services such as Amazon’s S3. It integrates well with Finder, so you can simply drag and drop files between computers.

Fluid

A lot of the applications that I use these days are actually web apps. For the ones I use most frequently (Google Calendar and WorkFlowy) I use Fluid to turn them into desktop-like apps in their own window, which makes them easier to access.

Homebrew

A package manager for OSX that makes it as easy as brew install <package> to install almost any UNIX package you can think of. For those it doesn’t support it’s relatively easy to add support yourself. Here are the apps installed with brew so far:

$ brew list
ack             fortune         jpeg            ngrep           pv              sqlite
android-sdk     gdbm            libevent        nmap            readline        unrar
cmake           git             libmemcached    pidof           redis           watch
ctags           graphviz        macvim          pil             siege           wget
curl            htop            memcached       pkg-config      solr

iTerm2

OSX ships with a terminal application, but iTerm comes with a whole host of advanced features that it became my terminal app of choice. Some of those features include 256 colour support and allowing scrollback within screen. The killer feature for me though is the system wide hotkey. I’ve set it up so that whenever I press F12 iTerm drops down from the top of the screen, quake style, and I’m good to go.

MacVim 

On Linux I used Vim within a terminal, but I’m really liking MacVim on OSX. It integrates really nicely with the system clipboard and is customisable enough that you get get most of the default GUI elements out of the way. I had to make some minor adjustments to my existing vimrc but almost everything I had working on vim under linux, including all of the plugins, just worked.

VLC

 I find iTunes a bit too heavy weight for most of my media needs, and it also crashes a lot! I used VLC a lot on Linux, and the Mac client is equally as good. I’ve yet to find an audio or video format it doesn’t support.

Spaces

Initially the hardest part of the transition to OSX was the lack of a good virtual desktop manager. With Linux I’d constantly be changing virtual desktops, and moving windows around with the various keyboard shortcuts. OSX’s manager, spaces, though required me to move windows around with the mouse! It wasn’t until I discovered the “always open this app on this desktop” feature that I really began to like spaces. I have chrome always open on it’s own desktop, vim on another, workflowy on another, and everything else on another one again. I haven’t upgraded to Lion yet, and I hear that Spaces has been replaced. Hopefully I’ll get on with the replacement just as well.

What am I missing?

Let me know what great OSX apps I’m missing out on!

How I got the Turntable.fm Gorilla in less than 48 hours

September 13, 2011

Turntable.fm has taken off in a big way. Launching in May, over 140,000 users signed up in the first month. Celebrities regularly hang out there, and Lady GaGa and Kanye West are even investing in the site.

For those that haven’t tried it out yet (or perhaps can’t, more on that shortly) turntable is somewhere that you can listen to great music and discover new artists and songs that you might otherwise not have otherwise come across.

The site features a number of rooms, each with a different theme (eg. chillout, dubstep, indie) and in each room there are up to 5 DJs. Every one in a room can “awesome” or “lame” each song that is played.

If you’re the DJ you get a point for everyone awesome you receive. The more points you get the better the avatar you can select. The most prized avatar on the site, a gorilla, requires 1000 points. Currently that’s something that has only been achieved by 4000 of the sites users, and something that has taken most of them weeks or even months of effort. Here’s how I got the gorilla in less than 48 hours…

Getting in

Due to some licensing issues Turntable has been unavailable to anyone outside the US since the end of June. I’m based in the UK, so the first challenge I faced was simply getting into the site. I’d need to make it appear as though I was in the US. The sshuttle app makes extremely easy to do exactly that. You just need a host in the target country (fortunately this blog is hosted in the US, so that’s what I used), and the IP address of the target site. It gets a little more complicated if the target site has several IP addresses, but a quick check with dig shows that turntable currently has just the one:

$ dig -tA +short turntable.fm
50.16.229.9

Tunnelling all traffic to this IP via my US based server is as simple as running this command:

sshuttle -r coderholic.com 50.16.229.9/32

It’ll now appear to turntable that I’m in the US, and can login into the site using my Facebook account.

Getting 1000 points

In popular rooms it can be really difficult to get a DJ spot. Even if you manage to get one it’s not easy to pick songs that get a lot of points, and it can be hard to keep your spot. That’s why it usually takes weeks if not months of effort to get the 1000 points required for the gorilla. My plan was to automate the process as much as possible.

There are already lots of scripts and plugins available for Turntable. I started digging into the code of frankielaguna’s Auto-Awesome bookmarket to see what was going on under the hood. The code starts with this:

//Attempt to find the room manager object
for (var prop in window) { 
    if (window.hasOwnProperty(prop) && window[prop] instanceof roommanager){ 
        ttObj = window[prop];
        break;
    } 
}

The “room manager object” sounded very interesting! Pasting the above code into the Chrome javascript console showed right away how much interesting stuff there really is in this object:

It contains details of who’s DJing, how many DJ slots there are, callbacks for when you get points, and lots lots more. Certainly everything I would need was there.

My plan was to create a new room and login with 2 accounts, one my actual account, and one fake account. I’d have both the accounts DJ and automatically awesome each other. To speed the process up I made some changes to the auto-awesome code so that it would “awesome” every 5 seconds, and so that the current DJ would skip the rest of their song as soon as they received a point. I noticed a set_dj_points method in the room manager object, and overrode it like so:

// Override the set_dj_points function, so we skip to the next DJ as soon as we get a point
var set_dj_points = ttObj['set_dj_points'];
ttObj['set_dj_points'] = function(j) {
    if(ttObj.myuserid == ttObj.current_dj[0]) {
        console.log("I've got more points:", j);
        // We're done - skip to the next DJ
        ttObj.callback('stop_song');
    }
    set_dj_points(j);
}

After settings things in motion I discovered that turntable require a variable amount of the song to be played before the an awesome is actually counted, so it would usually take longer than 5 seconds for the current DJ to get a point. Not a huge problem, but it meant things would take longer than expected.

The next problem I ran into was a little more serious. After each DJ had played 40 songs they got kicked off. I could manually make them a DJ again, but that’d require me to check the site every so often. Instead I updated the script so that every 5 second iteration we check to see if we’re the DJ, and if not then become one:

// Check to see if we're in the DJ queue or not
if(!ttObj.myuserid in ttObj.djs_uid) {
    // We're not!! Become a DJ
    ttObj.callback('become_dj');
}

The next problem that I ran into was that turntable limits the number of awesomes you can get from a single user to 50! Therefore I had to signup for more fake facebook accounts and get them in on the act. Rather than signing up for one new account at a time I instead created severeal accounts and got them all into the room at the same time. I created a slightly different script for these other accounts, so that they’d just awesome the song rather than DJ. I also modified the DJ script to DJ for a fixed amount of time, rather than give up DJing after the first awesome. That way I could collect a few points for each song play.

Less than 48 hours and a load of fake facebook accounts later I’d managed to get the required 1000 points. The gorilla was mine! The complete code that I used is up on GitHub: https://github.com/coderholic/turntable.fm

Preventing it

I went through this process mostly out of curiosity. It’s clear that not many people are employing the same kind of tactics, with only around 4000 users of the site having obtained the gorilla. It’d certainly be a problem for turntable if everyone started doing this though, so what could they do to prevent it?

It seems as though turntable are already doing quite a bit to make it difficult , by limiting the number of awesomes received from each profile, requiring the song to be played for a certain amount of time, and booting DJs off after they’ve played a certain number of songs. There’s certainly more that they could do. For example, they could make it harder to get hold of the roommanger object. Within the room manager object itself they could ignore any actions unless the browser has focus.

Ultimately the client code must communicate with the Turntable server though, so any client side changes would only make things harder. They wouldn’t actually prevent anything. In fact Alain Gilbert has put together a node.js based Turntable client that does exactly that.

Comment or vote at Hacker News

Invaluable command line tools for web developers

August 13, 2011

Life as a web developer can be hard when things start going wrong. The problem could be in any number of places. Is there a problem with the request your sending, is the problem with the response, is there a problem with a request in a third party library you’re using, is an external API failing? There are lots of different tools that can make our life a little bit easier. Here are some command line tools that I’ve found to be invaluable.

Curl
Curl is a network transfer tool that’s very similar to wget, the main difference being that by default wget saves to file, and curl outputs to the command line. This makes is really simple to see the contents of a website. Here, for example, we can get our current IP from the ifconfig.me website:

$ curl ifconfig.me
93.96.141.93

Curl’s -i (show headers) and -I (show only headers) option make it a great tool for debugging HTTP responses and finding out exactly what a server is sending to you:

$ curl -I news.ycombinator.com
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: private
Connection: close

The -L option is handy, and makes Curl automatically follow redirects. Curl has support for HTTP Basic Auth, cookies, manually settings headers, and much much more.

Siege
Siege is a HTTP benchmarking tool. In addition to the load testing features it has a handy -g option that is very similar to curl -iL except it also shows you the request headers. Here’s an example with www.google.com (I’ve removed some headers for brevity):

$ siege -g www.google.com
GET / HTTP/1.1
Host: www.google.com
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.70)
Connection: close
 
HTTP/1.1 302 Found
Location: http://www.google.co.uk/
Content-Type: text/html; charset=UTF-8
Server: gws
Content-Length: 221
Connection: close
 
GET / HTTP/1.1
Host: www.google.co.uk
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.70)
Connection: close
 
HTTP/1.1 200 OK
Content-Type: text/html; charset=ISO-8859-1
X-XSS-Protection: 1; mode=block
Connection: close

What siege is really great at is server load testing. Just like ab (apache benchmark tool) you can send a number of concurrent requests to a site, and see how it handles the traffic. With the following command we test google with 20 concurrent connections for 30 seconds, and then get a nice report at the end:

$ siege -c20 www.google.co.uk -b -t30s
...
Lifting the server siege...      done.
Transactions:                    1400 hits
Availability:                 100.00 %
Elapsed time:                  29.22 secs
Data transferred:              13.32 MB
Response time:                  0.41 secs
Transaction rate:              47.91 trans/sec
Throughput:                     0.46 MB/sec
Concurrency:                   19.53
Successful transactions:        1400
Failed transactions:               0
Longest transaction:            4.08
Shortest transaction:           0.08

One of the most useful features of siege is that it can take a url file as input, and hit those urls rather than just a single page. This is great for load testing, because you can replay real traffic against your site and see how it performs, rather than just hitting the same URL again and again. Here’s how you would use siege to replay your apache logs against another server to load test it with:

$ cut -d ' ' -f7 /var/log/apache2/access.log > urls.txt
$ siege -c<concurreny rate> -b -f urls.txt

Ngrep
For serious network packet analysis there’s Wireshark, with it’s thousands of settings, filters and different configuration options. There’s also a command line version, tshark. For simple tasks I find wireshark can be overkill, so unless I need something more powerful, ngrep is my tool of choice. It allows you to do with network packets what grep does with files.

For web traffic you almost always want the -W byline option which preserves linebreaks, and -q is a useful argument which supresses some additional output about non-matching packets. Here’s an example that captures all packets that contain GET or POST:

ngrep -q -W byline "^(GET|POST) .*"

You can also pass in additional packet filter options, such as limiting the matched packets to a certain host, IP or port. Here we filter all traffic going to or coming from google.com, port 80, and that contains the term “search”.

ngrep -q -W byline "search" host www.google.com and port 80

Hacker News London Meetup: A Year On

May 29, 2011

Almost exactly a year ago I posted a comment to Hacker News about organizing a London meetup. Soon after that I met up with Dimitri, and over a beer we came up with a plan for the Hacker News London meetup. The first event was a lot of fun, with around 40 London based hackers turning up to a pub to drink beer and chat about what they were all hacking on.

We’ve had 7 more meetups since then, and in that time we’ve grown from 40 hackers to almost 200! We’ve changed the format slightly, starting the evening with 8 short 5 minute talks, and we’ve managed to bag a few sponsors who make sure everyone who attends is well fed with pizza, and never short of a beer. We’re still toying with the format a bit, and at the most recent meetup we had a panel of YC alumni (Pete Smith and Phil Cowans from SongKick, Josh Buckley from MinoMonstors, and Colin Beattie from Tuxebo – see the picture below) that I think worked well.

The meetups are a great opportunity to meet like-minded people, discover new opportunities, and relax over a few beers! Our next event is going to be on June 23rd, so if you’re a Hacker News reader, hacker, or simply interested in technology and startups and not too far from London then signup on our meetup page and come along! If you’re not in London then see if there’s a HN meetup near you, and if not start one!

YC alumni panel

Scraping the web with Node.io

April 15, 2011

Node.io is a relatively new screen scraping framework that allows you to easily scrape data from websites using Javascript, a language that I think is perfectly suited to the task. It’s built on top of Node.js, but you don’t need to know any Node.js to get started, and can run your node.io jobs straight from the command line.

The existing documentation is pretty good, and includes a few detailed examples, such as the one below that returns the number of google search results for some given keywords:

var nodeio = require('node.io');
var options = {timeout: 10};
 
exports.job = new nodeio.Job(options, {
    input: ['hello', 'foobar','weather'],
    run: function (keyword) {
        var self = this, results;
        this.getHtml('http://www.google.com/search?q=' + encodeURIComponent(keyword), function (err, $) {
            results = $('#resultStats').text.toLowerCase();
            self.emit(keyword + ' has ' + results);
        });
    }
});

Running this from the command line gives you the following output:

$ node.io google.js
hello has about 878,000,000 results
foobar has about 2,630,000 results
weather has about 719,000,000 results
OK: Job complete

Scraping Multiple Pages

Unfortunately some of the documentation simply says coming soon, so you’re left to guess the best way to put together more advanced scraping workflows. For example, I wanted to scrape the search results from GitHub. If you search for “django” then you (currently) get 6067 results spread over 203 pages.

What I could figure out from the documentation is that a node.io job passes through several stages: input, run, reduce, and output. The documentation also mentions that multiple invocations of the run method can be run in parallel, so the logical thing to do seems to be to pass in the page number to run, and have it scrape the results from a single page. You can then scrape lots of different pages in parallel.

To calculate the total number of pages, and pass the page numbers to the run method, I implemented an input method. There’s not much documentation on this, but the key thing is to make sure it returns false once you’re done, otherwise it’ll keep getting called again and again. The other key thing is that you need to pass your data to the run method via the callback function, and it needs to be wrapped in an array. Here’s the complete GitHub search results scraper:

var nodeio = require('node.io');
exports.job = new nodeio.Job({benchmark: true, max: 50}, {
    input: function(start, num, callback) {
        if(start !== 0) return false; // We only want the input method to run once
        var self = this;
 
        this.getHtml('https://github.com/search?type=Repositories&language=python&q=django&repo=&langOverride=&x=0&y=0&start_value=1', function(err, $) {
            if (err) self.exit(err);
            var total_pages = $('.pager_link').last().text;
            for(var i = 1; i < total_pages; i++) {
                callback([i]); // The page number will be passed to the run method
            }
            callback(null, false);
        });
    }, 
    run: function(page_number) {
        var self = this;
        this.getHtml('https://github.com/search?type=Repositories&language=python&q=django&repo=&langOverride=&x=0&y=0&start_value=' + page_number, function(err, $) {
            if (err) {
                console.log("ERROR", err);
                self.retry();
            }
            else {
                $('.result').each(function(listing) {
                    var project = {}
                    var title = $('h2 a', listing).fulltext;
                    project.author = title.substring(0, title.indexOf(" / "));
                    project.title = title.substring(title.indexOf(" / ") + 3);
                    project.link = "https://github.com" + $('h2 a', listing).attribs.href; 
                    var language = $('.language', listing).fulltext;
                    project.language = language.substring(1, language.length - 1); // Strip of leading and trailing brackets
                    project.description = $('.description', listing).fulltext
                    self.emit(project)
                });
            }
        });
    }
});

While my solution works I’m sure it’s not optimal. By implementing an input method there’s no way to specify a search term from the command line, which is far from ideal. Hopefully I’ll be able to improve the scraper once some additional documentation is written, or after I’ve dug through the node.io code some more.

There’s lots more than node.io can do. It has built in functions to do things like calculate the pagerank of a domain, resolving domain names to IPs, and lots of other useful utilities. Like Node.js it also has full support for coffeescript. It’s a fantastic tool to have in your toolbox!

Older Posts »