Configuration Reference
The sequins configuration is in the toml format. Sequins will look for
a sequins.conf
file in the local directory, and then /etc/sequins.conf
if
that doesn't exist.
Below is a full list of the configuration properties. Some configuration
properties are nested under headers, like [s3]
. See the sequins.conf.example
file that ships with releases (also on github) for an example of
the file layout.
A few properties below are durations. These are strings with a shorthand unit,
like "1s"
or "20m"
. Valid units are ns
, us
(or µs
), ms
, s
, m
,
and h
.
Top Level Properties
source
Type | Default |
---|---|
string | unset (eg "hdfs://<namenode>:<port>/path/to/stuff" ) |
The url or directory where the sequencefiles are. This can be a local directory,
an HDFS url of the form hdfs://<namenode>:<port>/path/to/stuff
, or an S3 url
of the form s3://<bucket>/path/to/stuff
. This should be a a directory of
directories of directories; each first level represents a 'database', and each
subdirectory therein represents a 'version' of that database. This must be set,
but can be overriden from the command line with --source
.
bind
Type | Default |
---|---|
string | "0.0.0.0:9599" |
The address to bind on. This can be overridden from the command line with
--bind
.
local_store
Type | Default |
---|---|
string | "/var/sequins" |
This is where sequins will store its internal copy of all the data it ingests.
This can be overriden from the command line with --local-store.
max_parallel_loads
Type | Default |
---|---|
string | unset (eg 4 ) |
If this flag is set, sequins will only update this many databases at a time, minimizing disk usage while new data is being loaded. If you set this to 1, then loads will be completely serialized.
throttle_loads
Type | Default |
---|---|
string | unset (eg "800μs" ) |
If this flag is set, sequins will sleep this long between writes while loading data, artificially slowing down loads and reducing disk i/o. If you are using disks where the latency is extremely sensitive to activity, then loading large amounts of data can negatively impact your latency, and you may want to experiment with this setting.
refresh_period
Type | Default |
---|---|
string | unset (eg "10m" ) |
If this flag is set, sequins will periodically download new data this often. If
you enable this, you should also enable require_success_file
, or sequins may
start automatically downloading a partially-created set of files.
require_success_file
Type | Default |
---|---|
bool | false |
If this flag is set, sequins will only ingest data from directories that have a _SUCCESS file (which is produced by hadoop when it completes a job).
content_type
Type | Default |
---|---|
string | unset (eg "application/json" ) |
If this is set, sequins will set this Content-Type header on responses.
[storage]
compression
Type | Default |
---|---|
string | "snappy" |
This can be either 'snappy' or 'none', and defines how data is compressed on disk.
block_size
Type | Default |
---|---|
int | 4096 |
This controls the block size for on-disk compression.
[s3]
region
Type | Default |
---|---|
string | unset (eg "us-west-2" ) |
The S3 region for the bucket where your data is. If unset, and sequins is running on EC2, this will be set to the instance region.
access_key_id
Type | Default |
---|---|
string | see below (eg "AKIAIOSFODNN7EXAMPLE" ) |
secret_access_key
Type | Default |
---|---|
string | see below (eg "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" ) |
The access key and secret to use for S3. If unset, the env variables
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
will be used, or IAM instance
role credentials if they are available.
[sharding]
enabled
Type | Default |
---|---|
bool | false |
If true, sequins will attempt to connect to zookeeper at the specified addresses (see zk.servers), and coordinate with peer instances to shard datasets. For a complete description of the sharding algorithm, see the manual.
replication
Type | Default |
---|---|
int | 2 |
This is the number of replicas responsible for each partition.
min_replication
Type | Default |
---|---|
int | 1 |
This is the minimum number of replicas required for sequins to switch to a new version. Set this to a higher value to ensure data redundancy before upgrading.
You probably don't want this to be equal to replication
,
or sequins will never upgrade versions if any node at all is down.
time_to_converge
Type | Default |
---|---|
string | "10s" |
Upon startup, sequins will wait this long for the set of known peers to stabilize.
proxy_timeout
Type | Default |
---|---|
string | "100ms" |
This is the total timeout (connect + request) for proxied requests to peers in a sequins cluster. You may want to increase this if you're running on particularly cold storage, or if there are other factors significantly increasing request time.
proxy_stage_timeout
Type | Default |
---|---|
string | see below (eg "50ms" ) |
After this interval, sequins will try another peer concurrently with the first,
as long as there are other peers available and the total time is less than
proxy_timeout
. If left unset, this defaults to the proxy_timeout
divided by
replication_factor
- enough time for all peers to be tried within the total
timeout.
cluster_name
Type | Default |
---|---|
string | "sequins" |
This defines the root prefix to use for zookeeper state. If you are running multiple sequins clusters using the same zookeeper for coordination, you should change this so they can't conflict.
advertised_hostname
Type | Default |
---|---|
string | see below (eg "sequins1.example.com" ) |
This is the hostname sequins uses to advertise itself to peers in a cluster. It should be resolvable by those peers. If left unset, it will be set to the hostname of the server.
shard_id
Type | Default |
---|---|
string | see below (eg "sequins1" ) |
The shard ID is used to determine which partitions the node is responsible for.
By default, it is the same as advertised_hostname
. Unlike the hostname,
however, it doesn't have to be unique; two nodes can have the same shard_id, in
which case they will download the same partitions. This can be useful if you
don't have stable hostnames, but want to be able to rebuild a server to take the
place of a dead or decomissioning one.
[zk]
servers
Type | Default |
---|---|
array of string | ["localhost:2181"] |
If set and 'sharding.enabled' is true, sequins will connect to zookeeper at the given addresses.
connect_timeout
Type | Default |
---|---|
string | "1s" |
This specifies how long to wait while connecting to zookeeper.
session_timeout
Type | Default |
---|---|
string | "10s" |
This specifies the session timeout to use with zookeeper. The actual timeout is negotiated between server and client, but will never be lower than this number.
[datadog]
url
Type | Default |
---|---|
string | "localhost:8200" |
If set, sequins will send metrics concerning S3 file downloads using the DogStatsD protocol to this address.
[debug]
bind
Type | Default |
---|---|
string | unset (eg "localhost:6060" ) |
If set, binds the golang debug http server, which can serve expvars and profiling information, to the specified address.
expvars
Type | Default |
---|---|
bool | true |
If set, this adds expvars to the debug HTTP server, including the default ones and a few sequins-specific ones.
pprof
Type | Default |
---|---|
bool | false |
If set, this adds the default pprof handlers to the debug HTTP server.