Sam's Tech Blog

Friday, 1 May 2015

Datadog for server monitoring and stats

I was looking for a simpler way to monitor server-side applications and publish custom metrics. Open source tools like zabbix are good for monitoring infrastructure, and they're free, but can be tricky to set up and maintain. Also it is difficult to graph custom metrics from our apps, and you can forget about presenting the resulting graphs to "management"...they're not pretty.

I was looking for a tool that met the following requirements:

View and monitor server statistics (CPU usage, disk usage etc.)
Publish custom metrics from server-side apps.
Present metrics in user-friendly dashboards.

After some googling I came across DataDog, "Cloud Monitoring as a Service". So far I'm impressed. Setup is incredibly fast; I installed the datadog agent on a Linux VM by pasting one line into a terminal: https://app.datadoghq.com/account/settings#agent/ubuntu. This automatically enables all the infrastructure monitoring I need for that VM:

It's really easy to view infrastructure graphs, set up monitoring alerts and dashboards.

Now that was done, I wanted to log custom metrics from my Java app. Datadog provides a simple Java library for logging metrics. The metrics are sent via UDP to the datadog agent, which collates and publishes them to the datadog service. The nice thing is that the metrics are then automatically available, no config required!

I'm using the Spring framework so I exposed the Java client as a bean like so:

@Bean(name = "dataDogMetrics")

public StatsDClient dataDogMetrics() throws StatsDClientException {

return new NonBlockingStatsDClient();

}

Now I could easily inject the bean anywhere and log stats:

@Autowired

private StatsDClient dataDogMetrics;

.....

dataDogMetrics.incrementCounter("test");

I could then immediately see the stats in the datadog service, and produce pretty graphs like this:

...nice!

Monday, 6 October 2014

Spring Javaconfig Logback

This post shows how to add logging to a basic Spring project using Logback and slf4j.

This builds on the Minimal REST Api project.

Maven Dependencies

Logback Configuration

Add this to src/main/resources/logback.xml

Logback shutdown gotcha

Until this has been fixed, using the SMTP appender in logback prevents tomcat from shutting down. Add the following to your context configuration to shut down properly:

Controller updates

Now you can update the controller to print some logging:

Monday, 29 September 2014

Spring Javaconfig Minimal REST Api

This post shows you how to create a minimal REST API in Spring using Java Config (no xml) and maven.

Checkout the code: https://github.com/sparso/spring-javaconfig-examples (look at spring-barebones-javaconfig).

Main Components

It's helpful to have some background on the components that we'll be setting up:

Web Application: The term for a collection of servlets packaged in a war file.
ServletContext: Defines methods to talk to the servlet container (eg Tomcat).
ApplicationContext: Defines methods to access Spring Beans etc.
WebApplicationContext: Extends the above, adds a getServletContext() method.
AnnotationConfigApplicationContext: implementation of the above which accepts annotated classes as input rather than XML.
ContextLoaderListener: Creates the application context. Ties the lifecycle of the ApplicationContext to the lifecycle of the ServletContext.
DispatcherServlet: Takes an incoming request, delegates handling of it to a controller.

Maven

The following is set up in the pom.xml:

spring-framework-bom: Sets version numbers of spring modules so we don't have to.
spring-core, spring-webmvc, spring-web: Minimal dependencies for a REST API.
javax.servlet-api: So we can deploy our module in Tomcat.
maven-war-plugin: Build the war and set failOnMissingWebXml to false.
WebApplicationInitializer

You implement this interface in order to configure the ServletContext programatically.

Option 1: Implement it yourself

The following code shows you how to bootstrap the container yourself. It:

Creates the root ApplicationContext (where spring beans live)
Attaches a ContextLoaderListener to it (connecting it to the servlet lifecycle)
Creates a DispatcherServlet to router HTTP requests to your app.
Adds your Java Config classes to each step.

Option 2: Use an AbstractAnnotationConfigDispatcherServletInitializer

This gets rid of most of the boilerplate code:

Context Configuration

Here's my context configuration, documentation in the comments:

REST Controller

Finally here's the controller:

Sunday, 25 November 2012

I recently switched my offsite backup solution (for personal files) to Google Drive. Here's why:

I back up a lot of home movies and Google Drive has a great feature that automatically transcodes AVCHD movies and makes them available to view online. So now sharing movies with the rest of the family is as easy as sending them a link.
The storage solution is backed by a large company that's not going anywhere soon - important consideration when backing up precious data.
It's relatively cheap. Works out to be about half the price of dropbox. Also, as a side-note, I expect Google storage will always be able to undercut the competition due to the scale at which they purchase and maintain storage for their existing services. Competitors such as Dropbox either run on third-party storage solutions (Amazon's S3 for example), or provide their own but would not be able to match the economies-of-scale that Google can.
It scales up to terabytes - so as I add more and more data over the years it will always be able to match my storage needs (although it might hurt my wallet somewhat).

The biggest problem with Google Drive for me is that they do not provide a Linux version. Some third party software called Insync is available which allows me to sync to drive, but it's still a beta version (which is a little worrying when backing up important data). Also it does not allow me to only sync some folders from my drive to my linux machine, which is a little annoying as this is a nice feature of the Google drive Windows client.

Ah well, can't have everything I suppose (Google please release a Linux client!!!)

Monday, 30 January 2012

Online Data Backup & Storage – CrashPlan – Backup Software, Disaster Recovery

Online Data Backup & Storage – CrashPlan – Backup Software, Disaster Recovery:

'via Blog this'

I regularly used Dropbox for my offsite backup needs, until I purchased an HD camcorder and my data storage requirements increased significantly. I find Dropbox is excellent for syncing files between PC's, but for backup of large amounts of data it is too expensive. Dropbox is not really an offline backup solution anyway, it is really aimed at synchronising files. Apart from the cost, another reason I moved away from dropbox for backup purposes is that if a problem occurred on the Dropbox servers which deleted all my files, this could then sync to both my laptop and PC which would result in me losing everything! So I downgraded to a free Dropbox account and looked around for a new backup solution.

So far my favourite solution is Crashplan. They have clients for all the major OS's including Linux. You can easily backup to an offsite location by just running an instance of Crashplan on a remote PC. If you don't have an always-on offsite machine, then you can use their cloud offering which is remarkably cheap. So far I have only used the free solution of backing up to a remote PC. This works very well. The client automatically determines your external IP address and all your client instances can discover each other via your Crashplan account.

Tuesday, 13 September 2011

Google Native Client: The web of the future - or the past? • The Register

Google Native Client: The web of the future - or the past? • The Register: An interesting discussion about running native code in the browser, advantages/disadvantages and whether it is "should" be done.

My personal opinion is that over the long term, the ideas behind cloud gaming will be the right direction for graphics/cpu intensive applications on the client. This model assumes a fast internet connection with low latency, but this is rapidly becoming a reality in many parts of the world.

A thin-client (or browser) would send commands to the server which would react and send images back. So there is no need to update your hardware, install or patch your application. It is good for the application developers too because it can reduce piracy.

'via Blog this'

Friday, 26 August 2011

First Post - Amazon EC2 vs Google App Engine vs Traditional Hosting

So this is my first blog post! I'll start off with an account of my recent experiences with data centres, EC2 and Google App Engine. A brief comparison of them and some instructions on getting Google App Engine to run on Ubuntu 10.10 (Maverick).

At work (an instant messaging company), we recently moved data centre and performed a brief exercise to evaluate whether we should move to EC2 or take out another data centre contract. We settled on a hybrid approach, running our core service in a "normal" data centre and using EC2 for services that would need to scale rapidly. This provides us with the following advantages over a complete EC2-based solution:

Lower cost. The total cost of running our core service is about 20% of the approximate cost of running it on EC2. We did however purchase the hardware ourselves (with some negotiation on price) and take out a long term contract with the data center (again with cost negotiations).
Predictable costs. During our costing exercise we found it difficult to predict what the actual costs of running an EC2 solution would be, particularly when it came to predicting how many I/O requests to EBS volumes we would have. Additionally there is no way to "cap" your monthly EC2 costs that I am aware of, so if something unexpected occurs you could be left with a very large bill.
More control: We purchased the hardware and therefore have complete control over our servers, switches and storage solutions.
More personal service: We can actually go an meet the people at the data centre to discuss our support requirements. They work closely with us to resolve issues such as a recent DDOS attack.

For our highly scalable services we use EC2 for the following reasons:

On-demand instances. Of course the first reason is that we can just add instances in minutes. Although the cost of EC2 is higher than a traditional data centre it is vastly quicker to scale up and down by adding and removing machines. No calls to the data centre or possible cost negotiations required. I should probably add the our data centre does not currently offer a "cloud computing" solution.
Cost grows with usage. As we deploy only new features on our EC2 instances, the cost grows gradually as the service grows. So it's easier to estimate future costs based on current usage and costs figures. Once the service grows to a certain size we can then consider purchasing hardware and offloading the service to our data centre to reduce costs.
No maintenance required. The big problem with purchasing hardware is that we have to maintain it! Everything from installation to hardware failures requires a trip to the data centre which means lost programming-time. This is more of a "purchase vs lease" point I suppose.
Just do it their way. It can be seen as a good point or bad point about services such as EC2 that they usually offer one way to do something, such as HTTP load-balancing for example. If you want to do HTTP load balancing, use their load balancer - it's simple and easy to configure. If we want to do it on our own hardware then we have to do it ourselves by evaluating different software/hardware options and configuring them - more lost programming time. Again the total cost is lower for our (software) solution, but this does not take into account our time spent configuring and maintaining it. I've listed it as an advantage because you can of course implement your own software solutions on EC2 if required (just not hardware).

On reflection we should have included our time spent purchasing, configuring and maintaining hardware and talking to the data centre in our cost estimates as we would not incur many of these costs with EC2.

I have also recently been experiment with Google App Engine. I like that it is very different to EC2 in that it is a cloud platform rather than cloud infrastructure. My observations in general are:

More "Cloudy": In my opinion GAE is more in keeping with the "cloud-services" idea. You just deploy your app and it automatically scales from zero requests upwards. EC2 is a middle-ground between handling all the data centre stuff and a full cloud solution. You still have to monitor your app to see if it needs more resources, you still have to provision more if required and you still have to install the O/S and and software requirements. I should point out that I am aware of the Elastic Beanstalk solutions from Amazon but haven't yet looked at them in much detail.
Limited functionality: The above advantage (as always) comes at a price. Understandably there are a lot of things that you cannot do (yet) as every feature must fit into a scalable architecture. GAE is basically limited to handling HTTP requests within a limited time or more persistent backend tasks. You can't handle or create raw TCP connections and the only persistent connection you can create from a client is using the Channel API which limits you to Javascript clients. So if your application cannot fit into the GAE limitations then you just have to go somewhere else.
Cost Caps! You can cap your monthly bill; if you reach the cap your app will just stop accepting requests (I think). This is very useful for a cloud-based solution because you don't want a dodgy loop in your software or a malicious attack on your system resulting in a huge monthly bill because of massive I/O / storage / app-instance costs.
Free to start with. If your app is not being used then you're not being charged. With EC2, even if you're not doing anything you still require a whole instance sitting there. Also when you scale-up on EC2 it is by adding whole machines or moving up to the next size of instance. GAE scales (in terms of cost) more linearly. It is worth mentioning that Amazon have recently started a "free micro-instance for a year" for new accounts.

Running GAE on Ubuntu

GAE requires python 2.5 at the moment and Ubuntu 10.10 comes with Python 2.6. Support for 2.7 is on the roadmap but until then, here's how I got it running on 10.10 (in a terminal):

sudo apt-add-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python2.5 python2.5-dev libjpeg62 libjpeg62-dev build-essential gcc libssl-dev libbluetooth-dev sqlite3 libsqlite3-dev
wget http://effbot.org/media/downloads/Imaging-1.1.6.tar.gz
tar xzf Imaging-1.1.6.tar.gz
cd Imaging-1.1.6
edit setup.py line 38: JPEG_ROOT = libinclude("/usr/lib")
sudo python2.5 setup.py install
wget http://pypi.python.org/packages/source/s/ssl/ssl-1.15.tar.gz
tar xzf ssl-1.15.tar.gz
cd ssl-1.15/
sudo python2.5 setup.py install