Thursday, November 30, 2017

Perl for DevOps: perlbrew and carton

I'm sick of seeing the same, old, and very dated articles/books/whatever relating to Perl and systems administration. There are a ton of Perl modules and tools available to make life easy for developers, testers and operations staff in a DevOps environment, but unless you're already deep in the Perl world, many remain fairly hidden from the public eye and hard to come by.

I'm hoping that the next few posts will show off what's currently on offer in the Perl world. Whether it's a brand new startup launching a product from scratch, or an established organisation with an already mature product, I'm not trying to convince anyone to change their main product's stack, but for all of the glue required to support the application in a production environment, Perl is an excellent choice.

Perl's "There's More That One Way To Do It" attitude has inspired a variety of modules with expressive APIs and tools that make working with Perl in a production environment easy.

But before getting into specific modules and tools, the first thing to discuss, even if it's less exciting, is the management of multiple perl versions and the management of CPAN dependencies.

What I'm going to discuss here isn't new material; an article from 2016, A Perl toolchain for building micro-services at scale, summed up a great set of relevant tools for using Perl which can be extended from building microservices to doing almost any other development work with Perl. This first post will focus on the two tools that I think are the most important.

Perlbrew

Unfortunately, most Linux distributions still ship with perl 5.8, despite reaching end-of-life years ago. This often leads to people sticking with perl 5.8 and installing modules from CPAN to the system-level perl, sometimes even using their distribution's package manager instead of a CPAN client to do it. This is a terrible idea. Often, depending on which OS and distribution you're running, the system-level perl is used for internal tools, and breaking the system-level perl starts to break other important things.

This is where perlbrew is a no-brainer.

Perlbrew is just like pyenv for Python, or rbenv for Ruby; it's a tool for managing and using various perl versions without interfering with the system-level perl.

The added bonus of running a more recent version of perl out of perlbrew is the availability of some modules which require perl 5.10 or later, having left behind 5.8 long ago, e.g. Mojolicious.

Alternatively, plenv is another tool for managing Perl versions, although it's not a tool I have a lot of experience with.

Carton

There are a few options for managing Perl dependencies. I'm only going to describe Carton, but there are also distribution- or OS-specific options aswell that cover all or some of the functionality of Carton, e.g. Red Hat Software Collections.

While not strictly necessary, carton - comparable to using a combination of virtualenv + pip for Python or Bundler for Ruby - is an excellent tool to manage dependencies.

The cpanfile (used by carton) provides the ability to specify the direct dependencies of the script or system and, along with the generated cpanfile.snapshot file which contains the full dependency tree, can be checked into a source control system along with the code it supports. The carton utility then provides the ability to use this cpanfile to create a local repository containing only the modules and versions specified in the snapshot.

Multiple cpanfiles may be used to track the dependencies of multiple different systems or subsystems.

An example setup might be to only use a carton bundle for critical customer-facing services, as you would want that environment to be as static as possible and not be prone to failure just because someone updated a dependency for a utility script. Or perhaps use one carton bundle for critical production stuff, and another one for the less critical stuff. Or perhaps a more granular setup, depending on the situation.

The caveat with carton is that for any dependency on third party libraries (e.g. IO::Socket::SSL requiring openssl, or EV requiring libev), the third party library will not be bundled into your carton repository.

Kinda Boring But Important

I feel like this was a pretty boring introduction to using Perl as a language for your devops needs, but it's an important topic that - unfortunately, in my own personal experience - can be a real pain in the ass to deal with if it's not considered early on in the piece.

Perlbrew and Carton are powerful tools, are both worth knowing and, when used in tandem, they allow any development to be as isolated as possible, so as to interfere with as little as possible on a system.

Friday, October 27, 2017

Perl Hack: perlbrew libs

The libs feature of perlbrew is one I don't see used very often. At least, not by the developers I currently work with and have worked with in the past.

Sometimes I want to run a piece of code against the core libraries and only the core libraries. Sometimes I wrap a script up with Carton and want to verify that a base install + Carton can run my script. And sometimes I just want a place to install anything and everything from CPAN, play with new versions' features, etc...

This is where the libs feature comes in handy.

I have three sets of perl 5.20.3 libs:

$ perlbrew list
  perl-5.20.3
  perl-5.20.3@carton
* perl-5.20.3@dev

99% of my time is spent on the "dev" lib, where I install anything I want. The "perl-5.20.3" is just a base installation of 5.20.3. And the "carton" lib is just a base 5.20.3 installation with only Carton installed. And if I ever break the "carton" or "dev" libs, they're easily recreated from the base installation.

Thursday, September 28, 2017

Rehab

In December 2016, on my last heavy deadlift session before competing in Victoria's Strongest Man in January, I injured my back. Although the damage wasn't too serious, it was enough that I had to pull out of the competition.

At first, the rehab protocol was simply to do 50-100 reps of hyperextensions and reverse hyperextensions as part of my warm-up. During this time I didn't squat or deadlift anything over about 60% of my max. This felt like it was working for a couple of months, and then I hurt my back in the same way again, but this time while doing a speed squat session.

This was frustrating, considering that I thought I was doing the "right thing" this time around.

I changed my rehab protocol to instead focus on a lot of direct ab work - something I've neglected a lot in the past - and regular stretching of the hip region and other areas in close proximity. The ab work work was just some ab wheel work to begin with, and then I started doing 100 reps of light pulldown abs as part of my warm-up protocol along with the ab wheel work.

This time, things have started to get better; I've been able to squat up to ~180kg again and deadlift in the 200-240kg range again, although not for the higher volume I was used to before the injury. There has been some slight discomfort and tightness, but some stretching and mobility work has sorted it out.

I then, more recently, added the lower-back rehab work from earlier in the year back in; 100 reps of hyperextensions in my warm-up protocol on top of the ab work I'm already doing.

This has been working well and I'm back on track to squat and pull some heavier numbers by the end of the year. Just in time for the next Victoria's Strongest Man competition in January 2018.

Tuesday, August 29, 2017

Serving the Current Directory over SSL

Recently at work, we needed to setup a dummy HTTPS server just as an endpoint that needed to do... something. Nothing specific. Just something that did SSL/TLS and returned a 200 response. Immediately I thought of python's builtin SimpleHTTPServer, which can be used to serve the current directory:

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

And away it goes. But to put SSL into it, more code is needed, but there are examples around.

I wondered how easily (or not) I could do it with Perl and a Plack server.

First, I needed the following dependencies installed:

  1. Plack::App::Directory. This comes with the standard Plack distribution, but it's used to serve a directory listing.
  2. Starman. This is currently the only Plack server that supports SSL, without requiring something like nginx in front of it. A little disappointing, but not a big deal.
  3. IO::Socket::SSL. To do the SSL stuff. Requires OpenSSL.

These can either be managed by Carton, or you can just install them with cpanm.

$ cpanm Plack Starman IO::Socket::SSL

Next, I need to generate a dummy SSL certificate.

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout server.key -out server.crt

Now I can run the server:

$ plackup -s Starman --listen :8000 \
  --enable-ssl --ssl-cert server.crt --ssl-key server.key \
  -MPlack::App::Directory -e'Plack::App::Directory->new(root=>".")'
2017/08/29-12:27:01 Starman::Server (type Net::Server::PreFork) starting! pid(38015)
Resolved [*]:8000 to [0.0.0.0]:8000, IPv4
Binding to SSL port 8000 on host 0.0.0.0 with IPv4
Setting gid to "20 20 20 504 401 12 61 79 80 81 98 33 100 204 395 398 399"
Starman: Accepting connections at https://*:8000/

... and the directory listing is available via https://locahost:8000/

It's not as simple as Python's SimpleHTTPServer to get going, but it works!

Friday, July 28, 2017

Concurrency, Perl and Web Services, Oh My!

The majority of the systems I develop - both in my spare time and at work - are heavily IO-based; connecting various web services, databases and files on disk together in some fashion. In those particular systems, I'd be stupid not to promote a concurrency-based solution.

Lately I've spent a lot of time developing small web services (or microservices or whatever it's called this year), both as brand new systems and also as a way of shaving off parts of large monoliths into smaller chunks to auto-scale.

There are many ways to design the architecture for this kind of system and its subsystems, and often there are instances where pre-forking a bunch of worker processes to handle requests is either too resource-hungry or just not appropriate for the work it's doing.

Let's Write Something

I want to write an example web service, but I'm sick of seeing the same, "Hello World"-esque web services that can be written in a dozen lines of code that no way represent any web service that anyone ever has written. I want to write an application that can actually benefit from a concurrent solution and semi-resembles a real-world thing. So I've got an example:

  1. HTTP-based web service
  2. Runs in a single process
  3. Accepts a domain name, and returns the geographic locations of the domain's mail servers, in JSON format

To satisfy the first two criteria, a Plack/PSGI application running out of either Twiggy or Feersum should do just fine.

In order to satisfy the last point (i.e. the actual functionality), the app needs to perform a few steps:

  1. Retrieve the mail servers of the domain via a DNS lookup. AnyEvent::DNS can do this.
  2. For each of the mail servers, resolve the IP addresses via another DNS lookup. AnyEvent::DNS to the rescue again.
  3. For each of the IP addresses, I'm going to use the IP Vigilante API to retrieve the geographic location data. There are no modules on CPAN for the IP Vigilante service, so I'll need to write something. AnyEvent::HTTP would work just fine here, but lately I prefer to use Mojo::UserAgent where possible, because it's much more versatile, e.g. providing a proper request/response object for us, and handling JSON responses.

There's a fairly straight-forward sequence of operations to perform, so I'm going to use a Promises-based approach. I've found that this makes concurrent Perl code much easier to follow, especially when other developers need to jump in and understand and maintain it. There's no real reason for why I've settled on Promises over Futures (or any other implementation of these two patterns); either will do just fine.

Firstly, I need a function that can lookup the MX records for a single domain and return an arrayref of addresses (via a promise).


sub lookup_mx {
    my ($domain) = @_;
    AE::log trace => "lookup_mx($domain)";

    my $d = Promises::deferred;

    AnyEvent::DNS::mx $domain, sub {
        my (@addrs) = @_;

        if (@addrs) {
            $d->resolve(\@addrs);
            return;
        }

        $d->reject("unable to perform MX lookup of $domain");
    };

    return $d->promise;
}

This is actually a pretty boring function to look at.

Next, I need a function that can resolve a domain name to an IP address (or addresses).


sub resolve_addr {
    my ($domain) = @_;
    AE::log trace => "resolve_addr($domain)";

    my $d = Promises::deferred;

    AnyEvent::DNS::a $domain, sub {
        my (@addrs) = @_;

        if (@addrs) {
            $d->resolve(\@addrs);
            return;
        }

        $d->reject("unable to resolve $domain");
    };

    return $d->promise;
}

This is also a pretty boring function.

Now I need a function that can perform a lookup to the IP Vigilante service for a single IP address and return an arrayref containing the continent, country and city for which it resides.


my $ua = Mojo::UserAgent->new->max_redirects(5);

sub ipvigilante {
    my ($address) = @_;
    AE::log trace => "ipvigilante($address)";

    my $d = Promises::deferred;
    my $url = sprintf "https://ipvigilante.com/json/%s", $address;

    $ua->get($url, sub {
        my ($ua, $tx) = @_;
        if ($tx->res->is_success) {
            my $json = $tx->res->json;
            my $rv = [
                $json->{data}->{continent_name},
                $json->{data}->{country_name},
                $json->{data}->{city_name},
            ];
            $d->resolve($rv);
            return;
        }
        $d->reject( $tx->res->error );
    } );

    return $d->promise;
}

This function is slightly more interesting - it receives a JSON response from IP Vigilante - but, in the end, is still fairly boring, since Mojo::UserAgent handles all of it for us.

The next function will need to take an arrayref of IP addresses, and collate the IP Vigilante data into a hashref, for which the keys will be the IP addresses and the values will be the IP Vigilante information from the previous function.


sub get_ip_informations {
    my ($ips) = @_;

    my $d = Promises::deferred;

    my %rv;
    Promises::collect( map {
            my $ip = $_;
            ipvigilante($ip)
                ->then( sub {
                    my ($ip_info) = @_;
                    $rv{$ip} = $ip_info;
                } )
            } @$ips )
        ->then( sub { $d->resolve(\%rv) } )
        ->catch( sub { $d->reject(@_) } );

    return $d->promise;
}

This is the first note-worthy function, and it's still not that big of a function. The call to the ipvigilante()->then() chain will return a new promise, and we have used map and the Promises::collect() function to collate the results of multiple promises. This means that if we are trying to get the IP information for 10 addresses, the map will return 10 promises, and for this function to return a result, we need the response from all 10 promises. The entire batch executes concurrently and only runs as slow as the slowest IP Vigilante lookup. Yay concurrency!

Lastly, I need a function that will take an arrayref of domain names, resolve each domain to its IP address(es) and get the IP Vigilante information for each IP address (via the previous function) and return it as a hashref.


sub get_mx_informations {
    my ($addrs) = @_;

    my $d = Promises::deferred;

    my %rv;
    Promises::collect( map {
                my $mx = $_;
                resolve_addr($mx)
                    ->then( sub { get_ip_informations($_[0]) } )
                    ->then( sub { $rv{$mx} = $_[0] } );
            } @$addrs )
        ->then( sub { $d->resolve(\%rv) } )
        ->catch( sub { $d->reject(@_) } );


    return $d->promise;
}

This function is basically the bulk of the application.

I feel like these last two functions shouldn't be necessary and they rub me the wrong way a little, as they're essentially just for-loops where the inside of the loop has already been put into another function, but for the purposes of maintainability and testability, I kept them.

The beauty about all of the code written so far is that because Promises, AnyEvent and Mojo all integrate with the lower-level EV event loop, and in some cases with each other, everything works together. This makes it simple to mix and match your favorite libraries that were originally written for different frameworks.

The whole thing just needs to be wrapped in a Plack/PSGI application.


my $app = sub {
    my ($env) = @_;
    my $request = Plack::Request->new($env);

    if ($request->method ne 'GET') {
        return [ 400, [], [] ];
    }

    (my $domain = $request->path_info) =~ s{^/}{};

    if (not $domain) {
        return [
            400,
            [ 'Content-Type' => 'application/json' ],
            [ Mojo::JSON::encode_json( { error => 'domain required' } ) ]
        ];
    }

    return sub {
        my ($responder) = @_;
        my $response = Plack::Response->new;

        lookup_mx($domain)
            ->then( sub { get_mx_informations($_[0]) } )
            ->then( sub {
                    my ($mx_informations) = @_;
                    $response->status(200);
                    return { $domain => $mx_informations };
                } )
            ->catch( sub {
                    my ($error) = @_;
                    $response->status(400);
                    return { error => $error };
                } )
            ->finally( sub {
                    my ($json) = @_;
                    $response->headers( [
                        'Content-Type' => 'application/json'
                    ] );
                    $response->body( Mojo::JSON::encode_json($json) );
                    $responder->( $response->finalize )
                } );
    }
};

I'm going to use Carton to handle and bundle the dependencies. This step isn't absolutely necessary, but when deploying Perl applications across many machines in a production environment, it's a solid tool for keeping things consistent across the board. Not having a solution for this is a massive headache once many different pieces of code have been deployed again and again for a few years. The Carton FAQ has a good rundown of its use-cases. I now need to declare my immediate dependencies in a new file - for Carton to consume - cpanfile.


requires 'Plack';
requires 'Feersum';
requires 'AnyEvent';
requires 'IO::Socket::SSL';
requires 'Mojolicious';
requires 'Promises';

I'm not tied down to specific versions of any of these modules.

The last step is to - with the help of carton - install the dependencies, which will also generate a snapshot file with all my dependencies' dependicies, and then run the server.

$ carton install
$ carton exec -- feersum --listen :5000 -a mx.psgi

... and in another shell ...

$ curl -s http://localhost:5000/nab.com.au
{
   "nab.com.au" : {
      "cust23659-2-in.mailcontrol.com" : {
         "116.50.58.190" : [
            "Oceania",
            "Australia",
            null
         ]
      },
      "cust23659-1-in.mailcontrol.com" : {
         "116.50.59.190" : [
            "Asia",
            "India",
            null
         ]
      }
   }
}

The full code is available on github.

Just Release It Now, Right?

This is beyond the original scope of this post, but there's still a lot more to do. The application is just barely in an acceptable state. There are a number of extra steps before this can/should be deployed to production, for which I may write follow-up posts:

  1. Unit tests. The application and functions should be moved into its own package in order to have unit tests written against it. I've had great success using Plack::Test to test Plack applications and Test::MockObject::Extends to mock functions that would perform network calls, so that I don't require a working internet connection to run unit tests.
  2. Logging. Self-explanatory (I hope).
  3. Rate limiting the ipvigilante.com API requests. I don't want the service to inundate IP Vigilante with tons of connections/requests at the same time.
  4. Dealing with ipvigilante.com failures. The circuitbreaker pattern will help the service remain stable and not constantly hit a remote service which is having an outage.
  5. Caching. IP addresses aren't likely to move geographic locations very often (if at all), so caching the IP Vigilante responses will be of great benefit. Either a simple local cache with Cache::FastMmap, or perhaps with a remote cache in Cache::Memcached, if I end up with a cluster of servers - which are an auto-scaling group - and I want a centralised cache for all hosts to use.
  6. Monitoring. How long do DNS lookups take? How long do ipvigilante.com API requests take? How often do they fail? When they fail, do they fail fast or do they timeout after 5 minutes of waiting?
  7. There's probably more...

Friday, June 30, 2017

Mojo::UserAgent is Best User Agent

For the longest time, I used LWP::UserAgent, because it was always there and it was reliable and there were tons of examples on the internet and almost every other Perl programmer I'd interacted with had used it before. So it was just easier to use it.

But then my requirements for a HTTP client changed.

A few years ago when I started tackling a lot of problems with concurrent/non-blocking solutions, I needed a HTTP client. Initially, because I was using AnyEvent, I just used AnyEvent::HTTP. And then I came across a really annoying issue: all headers are lower-cased and then the first letter is upper-cased. For the particular problem I was solving, this wasn't gonna work.

The next step was to combine AnyEvent::Handle with HTTP::Request and HTTP::Parser::XS. But that quickly introduced new problems; handling 302 responses, keep-alive, chunked responses and everything in between. I also had to package responses into robust objects myself. This was way more work than it was worth.

Finally, I ended up at Mojo::UserAgent. Because it has a blocking interface, it works great as a replacement for LWP::UserAgent. And because everything in Mojolicious was built to be non-blocking and because Mojo supports EV, it ties in well with all of the other libraries I use for my non-blocking code and it replaces my hand-rolled AnyEvent/HTTP::Request/HTTP::Parser::XS approach.

And for all of this, I only need to remember one API and manage one dependency.

Monday, May 29, 2017

Lack of Posting

I don't really have a good excuse for not posting anything here for the past month and a half. I'm thinking about a juicy Perl/concurrency-related post though.

Friday, April 7, 2017

Performance: Pick Better Libraries/Frameworks

I both love and hate CPAN; it has a module for every need, but often, for one reason or another, those modules aren't written by someone who cares about performance.

For example, Crypt::Blowfish is a XS module, which is great, but it's meant to be used, for example, via Crypt::CBC, which is a pure-Perl library, and it runs terribly slow if you encrypt a lot of data (and we do). On the flipside, Crypt::Mode::CBC is a XS module and performs brilliantly (as does the rest of CryptX). Swapping from Crypt::CBC/Crypt::Blowfish to Crypt::Mode::CBC/Crypt::Cipher::Blowfish, we noticed a 7x performance gain in our system.

In the Perl world, when utilising microservices is the right tool for the job, Plack/PSGI is the winning platform for writing RESTful webservices.

An old service that I was maintaining used HTTP::Server::Simple::CGI, which was originally used for the sake of simplicity and getting something up and running with what seemed like a lightweight framework. It was also able to plug in a Net::Server subclass to leverage Net::Server's features, which was desirable from an operational point-of-view given we'd used Net::Server successfully for other purposes. However, HTTP::Server::Simple is terribly written from a performance point-of-view.

Simply porting the service over to Plack/PSGI and serving it out of Starman resulted in a 6x performance gain.

Popular web frameworks, like Mojolicious, support PSGI out of the box, if you'd prefer the abstraction they give you. However, in my experience, even a "Hello World" will run significantly slower. But, depending on the complexity of your service, the performance hit may be worth it.

Friday, March 10, 2017

Performance: Compress Your Payloads

... unless the CPU is the source of your latency.

Compressing payloads requires a little more CPU, but it can save on network IO, i.e. bandwidth, which can also have cost-saving benefits if your bandwidth costs are high.

The beauty is, if you're serving content out of a HTTP server, the popular ones can handle this for you with a small amount of configuration.

See:

Friday, February 10, 2017

Performance: Use Persistent Connections

Small Changes First

The first few points of this series are going to be around moderately small changes for the greatest gain.

It's easy to recommend the complete overhaul and re-engineering of a system using all the latest and greatest technologies and platforms available, but for many organisations, especially if you're having to maintain and slowly migrate a monolith to something more granular, that kind of change takes many months (and longer) to achieve once you factor in costs (both in time and dollars), training, testing, competing business priorities, etc...

As I run out of quick wins, I'll get into more architectural changes because they're pretty much inevitable as a system grows.

Be Persistent

If a single process is making multiple network calls to the same service, it makes sense that maintaining a persistent connection instead of creating a new connection for each individual call will lead to a performance gain; it reduces the number of connect() syscalls, the number of TLS handshakes and any sort of authentication/authorisation that happens once a connection is established.

See:

The question then becomes: can the endpoint you're connecting to (if you control it) handle the number of persistent connections you'll have open in the worst case?

Monday, January 30, 2017

Performance: Profile Your Damn Code First

I started jotting down notes for this post without much of an idea of how I was going to present them; I was just working on a bunch of performance issues and investigating potential future improvements and noticed a number of recurring ideas across different systems, and I thought they were worth sharing with the world.

Originally I wanted to post all of these ideas in one large post, but with the number of things to discuss, it felt like I was never gonna get the damn thing finished, so I decided to split it up instead.

This isn't for the developer who is already well-seasoned in tackling performance issues; it's for the developer who may need to do it soon, but doesn't know exactly where to start.

Although the points in this series will be on the Perl-ish side, the ideas should easily transfer.

Latency

Mostly what I'm going to be talking about in this series relates to latency.

With the various infrastructure providers around today, if a system is running slow, it's pretty easy to add more machines to the mix to pick up the slack. But all you've done is increased the capacity of the system. The latency - the time it takes to serve a single request - still sucks, no matter how many more machines you add.

Until you address the latency issues, you won't be utilising all of the resources of the individual machines you're paying for and your cost-to-serve (which your CFO cares about) will be unnecessarily high. That's bad.

Stretching the hardware and infrastructure you've already got by managing and minimising latency lowers cost-to-serve and increases capacity and throughput.

Use a Profiler

I feel like this goes without saying, but evidently, I've found that having a vested interest in this kind of work makes me the exception to the rule, so: profile your code first. If you have a slow application, there's no point guessing which bit is the slow bit.

In the past, I've seen slow page-load times and memory spikes, and everyone was sure what the cause was going to be, only to hook up NYTProf and quickly see from the flame graph that a Class::DBI relationship inflating timestamps into DateTime objects was the cause.

Don't guess! Profile the damn thing!