Sans Bytes

Daniel Heath blogs here

Learn browser automation from scratch

Browser automation with selenium-webdriver

Why?

Browser tests are super useful, but they’re famous for having hard to debug, unreliable failures.

This reputation is largely the result of developers learning high-level frameworks without understanding the underlying abstractions.

In this guide I’ll show you some cool terminal tricks - including how to drive a browser directly.

I wouldn’t write my tests for a production app this way, but it illustrates the underlying concepts in a way most tools abstract away.

This will help you to develop a better understanding of how most web testing tools work.

Setup

You will need:

If you are using homebrew on mac, you can install the last three that way:

brew install selenium-server-standalone
brew install jq
brew install chromedriver

Getting started

selenium is the predominant way of writing automatinc browser tests. It can operate all major browsers.

Selenium runs on a client/server model with the following components:

Open a terminal and run the standalone server:

# On OSX using homebrew:
/usr/local/bin/selenium-server -port 4444

# Everyone else - you'll need to be in the directory where selenium is downloaded to / installed:
java -jar selenium-server-standalone-3.0.1.jar -port 4444

Leave this running and open a another terminal window. In the new window, run the following to ask the server to start a new chrome session (copy & paste will work, or save it to a file and run source <file>):

`# Save the URL to the local selenium server`
`# to the variable SELENIUM_URL`
SELENIUM_URL="http://localhost:4444/wd/hub/session"

`# Save the result of asking for a new browser session`
`# to the variable SELENIUM_SESSION_ID`
SELENIUM_SESSION_ID=$( \
  `# Use curl to make a HTTP POST asking the selenium` \
  `# server to open chrome` \
  curl "$SELENIUM_URL" \
       --data-ascii \
       '{"desiredCapabilities":{"browserName":"chrome"}}' \
       | jq -r .sessionId \
)

You should see a new window running chrome, looking at a blank page.

The above command sets your current shell (terminal window) up with the variables SELENIUM_SESSION_ID and SELENIUM_ID, which are used in later commands.

If you open a new shell, those variables will not be present and the commands won’t work.

Lets make our browser go to google.

curl "$SELENIUM_URL/$SELENIUM_SESSION_ID/url" \
     --data-ascii \
     '{"url": "https://google.com"}' \
     | jq .

You should see:

Filling in forms

Lets automate issuing a search. First, find the input element:

SEARCH_BOX_ELEMENT=$(
  curl "$SELENIUM_URL/$SELENIUM_SESSION_ID/element" \
       --data-ascii \
       '{"using": "css selector", "value": "input[name=q]"}' \
       `# Use 'jq' to extract part of the response. ` \
       `# -r to get the text without quotes. ` \
       | jq -r .value.ELEMENT \
)

Now we’ll type something into the search box:

curl "$SELENIUM_URL/$SELENIUM_SESSION_ID/element/$SEARCH_BOX_ELEMENT/value" \
     --data-ascii \
     '{"value": ["s","e","l","e","n","i","u","m", "\n"]}' \
     | jq .

Switching to the chrome window, you should be able to see the search results.

Customizing your shell

This is getting repetitive. Lets define some bash functions to make it easier:

function debug {
  if [ -n "$DEBUG" ] ; then
    # Print debug messages to STDERR
    echo "$@" 1>&2
  fi
}

function get {
  # Build the URL, requiring that a path was provided
  local URL="${SELENIUM_URL}/${SELENIUM_SESSION_ID}${1?Must provide URL}"

  debug "$URL"

  # Fetch the URL and format with jq.
  curl --silent "${URL}" | jq .
}

function post {
  # Build the URL, requiring that a path was provided
  local URL="${SELENIUM_URL}/${SELENIUM_SESSION_ID}${1?Must provide URL}"

  # Set a default value for the POST data
  local DATA="${2-{\}}"

  # Validate that the provided data is valid JSON
  echo "$DATA" | jq . > /dev/null || \
    ${ALLOW_INVALID_JSON?Second argument is not valid JSON}

  debug "$URL"

  # Fetch the URL and format the response with jq.
  curl --silent "${URL}" --data-ascii "${DATA}" | jq .

Screenshots

Lets use these new functions to take a screenshot:

get /screenshot `# Ask for a screenshot` \
    | jq -r .value `# Fetch the raw (-r) string from the 'value'` \
    | base64 -D `# Decode the base64-encoded data provided by selenium` \
    > screenshot.png `# Save the result to screenshot.png`

If you open ‘screenshot.png’ you should see the page screenshotted.

Extracting text from the page

Lets grab the URL of each search result.

Looking at the document using ‘inspect element’, I can see that search result URLs are all in cite elements and nothing else is. That sounds like a good way to identify those URLs.

for element_id in $( \
  post /elements \
       '{"using": "css selector", "value": "cite"}' \
       | jq -r .value[].ELEMENT \
  ) ; do
  get "/element/$element_id/text" | jq -r .value
done

I get the following output:

www.seleniumhq.org/
www.seleniumhq.org/
www.webmd.com/a-to-z-guides/supplement-guide-selenium
https://en.wikipedia.org/wiki/Selenium
https://ods.od.nih.gov/factsheets/Selenium-HealthProfessional/
https://www.nrv.gov.au/nutrients/selenium
https://cosmosmagazine.com › Geoscience › Topics
www.whfoods.com/genpage.php?dbid=95&tname=nutrient
selenium-python.readthedocs.io/
selenium-python.readthedocs.io/getting-started.html
https://seleniumhq.wordpress.com/

Further reading

I’ve referred frequently to the spec in writing this. In particular, the “List of Endpoints” and “Command Contexts” sections are useful. At the time of writing you can search the page for those terms to get there quickly.

Builtin methods

The list of supported actions (from the spec) are summarized below.

The spec makes for hard reading but each section is quite short.

Element interaction

These APIs operate on element handles, which are opaque IDs allocated by selenium when you find/get an element.

Use those IDs to interact with elements.

Javascript

Inject arbitrary JS to run in the page.

Modals: ‘alert’, ‘confirm’ and ‘prompt’

Cookies

Window management

Lets you open multiple tabs/windows in the same selenium session.

Timeout management

Lets you define how long a page is allowed to spend loading html/scripts.

I haven’t had a reason to use them to date so can’t describe them in much more detail than that.

Action API

This API lets you script relatively advanced sequences of user behaviors - eg by describing the touch events that form a gesture.

I haven’t had a reason to use them to date so can’t describe them in much more detail than that.

Postgres features for Ruby developers

About postgres

Postgres is a relational database which is widely used by Ruby developers.

For instance, it’s the default database for Heroku (a popular hosting provider for Ruby apps).

Like most databases, postgres uses standard SQL. It also offers many features which are not available in most other databases.

I’m going to provide a high-level overview of features here without providing much detail. If you have questions, the online documentation for postgres is comprehensive, useful, and up to date.

Overview of Features

UUID columns

By default, Rails gives you tables with an integer ID. This ID is always generated when you create a new record - you can’t specify the value for a new records ID.

Postgres supports using a Universally Unique ID (UUID) instead.

UUIDs look like this: 93743318-67f4-4c29-8ba0-5ba4f667c7b2.

This can be a useful alternative because it lets you specify the ID, but will still generate an ID if you don’t specify one.

Some apps need fulltext search, but running a separate search engine is hard. Postgres has a reasonably good quality text-search engine built in.

You’ll need to spend some time perusing the documentation to use this - every app is different and you’ll need to configure the search to suit your needs.

Spatial data

One of the most commonly used extensions to postgres is called PostGIS. This lets you store geographic data (e.g. points or areas on a map).

You can then write queries to find (e.g.) what’s within 10km of this point, with the closest points first.

Scaling

Most web applications using postgres will have one server dedicated to running postgres, with other servers running the web application.

As the amount of data you’re dealing with increases, the database server can become overloaded.

There are three common approaches to avoiding this:

Throw money at it

The easiest answer is usually “buy more ram/CPU/disk”.

Partitioning via Child Tables

Partitioning lets you use multiple servers, storing part of your data on each of them.

This technique helps you scale a database which is overloaded due to too many update, delete & insert commands.

For instance, all records where the name field starts from A through F will go to one server, G through P to a second, and the remainder to a third.

You can achieve this using a feature called child tables (the documentation has good examples).

Replication

Replication lets you use multiple servers, storing all of your data on each of them. Only one of the servers is allowed to handle update, insert or delete queries.

This technique helps you scale a database which is overloaded due to too many select queries running at once.

It is also good because each server has a full copy of the database, meaning that if the primary database fails, you can quickly switch to one of the replica servers.

Most modern databases, including Postgres, have good support for replication.

Arrays and Hashes and JSON, oh my! (XML too)

Arrays

A column in an SQL table has a type (e.g. string, integer, timestamp). Postgres also lets you can create a column which stores an array of strings, or timestamps, or any other type (even arrays of arrays!).

If you’re using Rails with ActiveRecord, these arrays work like normal Ruby arrays.

Hashes (via HSTORE)

Similarly to arrays, you can store hashes like {'foo' => 'bar', 'k' => 'v'} This works like a normal Ruby hash through ActiveRecord, except you can only use strings for the keys and values.

JSON

Postgres has a JSON type which can store any JSON value. ActiveRecord makes this work nicely with Ruby (with caveats; it can’t tell that the JSON has changed when you modify part of a nested structure).

You can query your JSON data using postgres built-in functions.

XML

The XML type will stop you from accidentally storing something which is not valid XML. This is better than just using a string field, as you can be sure that the data you have stored is valid XML.

You can query your XML data using XPATH selectors.

Stored procedures

Postgres lets you create stored procedures (functions written in SQL).

You can write these functions in other languages if you have the right extensions installed; for instance, the V8 Postgres extension lets you write stored procedures in Javascript. There is also a Ruby extension.

Writing stored procedures in Javascript is clearly a good idea which will not upset your teammates.

Indexing

Big database tables can be hundreds of gigabytes, and it can take a long time to scan through all that data to find the record you’re looking for.

An index is a much smaller file which can be scanned quickly and tells the database where to find the records which match a query.

Practically all databases support some kind of indexing.

Partial indexes

Unlike other databases, postgres lets you create an index with a where condition (e.g. create index (...) where (condition)).

This means that the index file can be even smaller and quicker to use.

Expression indexes

Postgres also allows you to create an index on any expression (not just a column). This means you can do things like create index index_name on table_name ( to_uppercase(description) )

DDL Transactions

This feature protects you when running migrations.

In other databases, if a migration fails part-way through, the database can be left half-way between the old state and the new state. Postgres stops this from happening - migrations either succeed completely, or do nothing at all.

Constraints & Deferred constraints

Many databases support constraints (like validations built into the database).

However, they are often hard to use because they can’t be broken, even temporarily.

Postgres supports deferred constraints, which only apply when you commit a transaction. This makes constraints far more usable with Rails.

Foreign Tables

Foreign tables let you run join queries against tables which are in a totally different database (e.g. MySql or any JDBC database).

Handy when dealing with really big, organically grown systems.

Questions for prospective employers

I’ve been jobhunting a bit recently. Below are a few of the questions I find give me a reasonable sense of what a company is about.

Corporate

How many large customers? How many small ones? How much variance in customer spend is there? (companies with few large customers are probably going to work on whatever those customers tell them to work on; if they have many small customers and don’t A/B test something might be wrong)

Org chart: How many sales / dev / product / UX / admin?

Has the team released any interesting OSS in the last year (on work time)?

Has the team released any interesting OSS in the last year (on their own time)?

How much pairing do you do and when/why do you do it?

How/when do you do code reviews? During pairing?

Code

What’s your take on test coverage?

What quality metrics do you use?

Do you enforce a styleguide? How?

What languages & frameworks do you use?

How long does a typical CI build take? How long does a typical local build take?

What techniques do you use to keep tests running fast? (e.g. parallel tests, GC tweaks, avoiding I/O)

How many code repositories? apps?

Devops

How similar is your development environment to production?

How many architectural pieces are there (appworker, queue, cache, database)?

How many of these are outsourced vs self run (e.g. are you using RDS/heroku PG, or running mysql/postgres yourselves? Do you run your own queue or use iron.io?)

How often do you deploy? Continuous delivery? Fortnightly sprints?

How many production-like environments do you have (eg staging, uat)?

How long from ‘lets deploy’ to ‘done’?

How long to roll-back from a bad deploy?

How many servers are you running?

How many kinds of server are you running (db, cache, web, ci, SCM hosting…)?

Cloud or self-hosted (or something else)?

Team & Culture

How is the team organised? Small groups? Heirarchy? Triads? How do the dev team decide what to work on? How do you stay in sync across multiple locations (if applicable)? How do the team keep each other informed (eg email/chat/face-to-face)?

How often do you use A/B testing to determine whether a change stays? Who makes the call on what to test?

How do you set up a new development environment?

Learning Go

I’ve been using Go fulltime for ~9 months now.

The big differences I’ve noticed coming from Ruby:

Imports

Imports can’t clobber the global namespace or change the behavior of other packages (unless those packages were designed to allow it).

Only types/methods that are exported by a package are available when you import it.

Static language

There’s no monkey patching. When writing packages, export interfaces, not types (to make testing possible).

Composition, not inheritance

Go has no classes. Group data together using structs.

Any type can have methods. E.G.

type Foo string
func(f Foo) Print() { fmt.Println(f) }
Foo("a string").Print() # Prints "a string"

type Bar struct { str string }
func(b Bar) Print() { fmt.Println(b.str) }
Bar{"A different string"}.Print() # Prints "A different string"

Important differences moving from ruby to go: * Code re-use via composition feels very natural coming from ruby. * Where you would have used a module Foo in ruby, embed an interface Foo in go.

The rails asset pipeline - Now for every framework.

I recently found myself wanting the features of the rails asset pipeline in my golang project at work.

Since there isn’t much in the way of asset pipelining for golang yet, I built it.

Turns out, sprockets is really easy to integrate.

Assets in development

First things first - lets get to ‘it works on my machine’.

I’ve put together a sample repo using the asset pipeline.

The setup for your app will be similar: * The assets folder contains your stylesheets, javascript, etc (this directory name is set in sprockets/environment.rb). * You’ll need a similar Rakefile to build assets (and maybe launch the server) * You might store the sprockets directory somewhere else - update the Rakefile to match. * Use a Gemfile and the bundler rubygem to manage dependencies. * Edit the rakefile to change the port the asset server runs on.

When your app starts (in development), it should make a request to http://localhost:11111/assets/manifest.json.

Parse this JSON hash; the keys are asset names (eg “application.css”) and the values are relative URLs the compiled assets can be fetched from.

When generating a link to an asset in your app, use the JSON hash you fetched to lookup the URL. In the case of “application.css” this might look like http://localhost:11111/application-8e5bf6909b33895a72899ee43f5a9d53.css.

That should be all you need for development - you should be able to see SASS/Coffeescript assets compiled and loading normally.

Assets in production

For production we want to pre-compile assets rather than regenerating them each time they change.

rake assets will create a ‘public’ folder containing ‘manifest.json’ (same format as before). Get this directory onto your production servers (git add -Af public/ will add it to source control if you deploy via git).

When generating a link to an asset, look up manifest.json (the same as in development, but from the filesystem instead of over HTTP).

Fin

The whole thing, including deployment, took me well under a day to add to our app. The resulting assets are minified, concatenated, and gzipped (for size). They are also fingerprinted (so you can set an unlimited cache lifetime).