The Source Root
Sequins is generally stateless; it works by mirroring data at rest somewhere else. When sequins starts up, and whenever it is told to refresh its local data, it will do its best to mirror the organizational structure of the source root.
Data in local disk can be referred to by just the path, or with a
Data in HDFS can be referred to with an
hdfs://URI, using the namenode address and port as the hostname:
Data in S3 can be referred to by an
s3://URI, using the bucket name as the host:
In the later case, the "path" is really a key prefix; S3 does not have real
directories. Sequins treats prefix components separated by
/ as directories,
just like awscli or other tools.
Under the source root, the data should be organized into databases and below that into versions:
/path/to/data ├── <database> │ ├── <version> │ │ ├── <data file> │ │ ├── <data file> │ │ ├── <data file> │ │ └── ... │ └── <version> │ └── ... └── <database> └── <version> └── ...
Both the database and version can be arbitrary strings, but URI-friendly strings are recommended.
Databases and Versions
A single sequins instance can hold multiple databases. This is the first layer of directories in the source root.
Sequins doesn't support individual writes, like other object stores would. Instead, databases are versioned. To update a database, you present it with an entirely new copy of the dataset, called a version, by dropping it into the database folder in the source root.
Database and version names must consist of alphanumeric characters,
hyphens, and underscores (path components are ignored if they include
other characters). But otherwise, they are just arbitrary strings, and
are compared lexicographically - so to update a database, you need to
give it a version that is lexicographically greater than the current
one. At Stripe we use
date +%Y%m%d (eg
20160901) and timestamps.
Sequins will load this in the background and hotswap it in atomically.
Additionally, Sequins returns an
X-Sequins-Version header on
you need to track what version you're getting.
Loading New Data
You can tell sequins to check for new data in two ways. The first is the refresh_period configuration property, which instructs sequins to continually check for new data. You can also send a SIGHUP to the process and it will reload a single time.
Either way, sequins will perform three operations:
For any new databases that weren't there before, load the latest version and start serving it.
For each database which is no longer in the source root, stop serving it and delete it locally.
For each existing database, load whichever version is lexicographically greatest, if it is not the current local version.