Quickstart

These are the basic requirements for working on the pipeline.

Tech setup

Git

The whole pipeline is stored in one git repository, hosted here. Your account must be given access to contribute.

Credentials

The pipeline uses some private credentials to connect to services like our databases, S3 etc. Ask an IJF employee for these credentials, which are hosted in our 1Password vault.

Then, put the credentials in an .env file at the project root. Put them nowhere else.

Docker

The pipeline uses Docker containers. Your machine must have both docker and docker-compose installed. Start the container(s) with:

docker-compose up -d

This will start two containers, app and db. The former holds this repository and its dependencies; the latter holds a local postgres database. The db container is only used when running in debug mode and when running tests.

You can use these containers as you normally would. VSCode makes it easy to connect to a running container and develop inside it.

The app container runs COPY . .: your current copy of the pipeline code will be copied 1:1 into the container, even if this means git ignored files (like the .env file, which you need), or changes not yet committed to git.

Survival guide

Version control

The repository’s main branch requires approval from a repo owner before merging in any changes. When doing any development, create a branch and work from there before ultimately opening a PR.

The CLI

This repo has a single CLI, available only in the docker environment by the pipe command. There is currently no other supported interface.

pipe -d lob qc crawl -f 2020-01-01

Every CLI call, like the example above, has two components: the root and the step.

The root begins with the pipe command. It necessarily has some db, s value. It also has some options that apply to all possible steps (especially debug; see below).

The step begins with some step name like crawl above. Source-specific arguments like -f above appear thereafter.

The CLI is documented fully here.

Debug mode

If you are ever running the pipe command, run it with the debug flag:

pipe -d ...

Debug mode enables debug logging but more importantly points the pipeline away from our production storage media, in particular the local Postgres container provisioned by the docker-compose comand above.