Configuration Reference

The sequins configuration is in the toml format. Sequins will look for a sequins.conf file in the local directory, and then /etc/sequins.conf if that doesn't exist.

Below is a full list of the configuration properties. Some configuration properties are nested under headers, like [s3]. See the sequins.conf.example file that ships with releases (also on github) for an example of the file layout.

A few properties below are durations. These are strings with a shorthand unit, like "1s" or "20m". Valid units are ns, us (or µs), ms, s, m, and h.

Top Level Properties

source

Type	Default
string	unset (eg `"hdfs://<namenode>:<port>/path/to/stuff"`)

The url or directory where the sequencefiles are. This can be a local directory, an HDFS url of the form hdfs://<namenode>:<port>/path/to/stuff, or an S3 url of the form s3://<bucket>/path/to/stuff. This should be a a directory of directories of directories; each first level represents a 'database', and each subdirectory therein represents a 'version' of that database. This must be set, but can be overriden from the command line with --source.

bind

Type	Default
string	`"0.0.0.0:9599"`

The address to bind on. This can be overridden from the command line with --bind.

local_store

Type	Default
string	`"/var/sequins"`

This is where sequins will store its internal copy of all the data it ingests. This can be overriden from the command line with --local-store.

max_parallel_loads

Type	Default
string	unset (eg `4`)

If this flag is set, sequins will only update this many databases at a time, minimizing disk usage while new data is being loaded. If you set this to 1, then loads will be completely serialized.

throttle_loads

Type	Default
string	unset (eg `"800μs"`)

If this flag is set, sequins will sleep this long between writes while loading data, artificially slowing down loads and reducing disk i/o. If you are using disks where the latency is extremely sensitive to activity, then loading large amounts of data can negatively impact your latency, and you may want to experiment with this setting.

refresh_period

Type	Default
string	unset (eg `"10m"`)

If this flag is set, sequins will periodically download new data this often. If you enable this, you should also enable require_success_file, or sequins may start automatically downloading a partially-created set of files.

require_success_file

Type	Default
bool	`false`

If this flag is set, sequins will only ingest data from directories that have a _SUCCESS file (which is produced by hadoop when it completes a job).

content_type

Type	Default
string	unset (eg `"application/json"`)

If this is set, sequins will set this Content-Type header on responses.

[storage]

compression

Type	Default
string	`"snappy"`

This can be either 'snappy' or 'none', and defines how data is compressed on disk.

block_size

Type	Default
int	4096

This controls the block size for on-disk compression.

[s3]

region

Type	Default
string	unset (eg `"us-west-2"`)

The S3 region for the bucket where your data is. If unset, and sequins is running on EC2, this will be set to the instance region.

access_key_id

Type	Default
string	see below (eg `"AKIAIOSFODNN7EXAMPLE"`)

secret_access_key

Type	Default
string	see below (eg `"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"`)

The access key and secret to use for S3. If unset, the env variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will be used, or IAM instance role credentials if they are available.

[sharding]

enabled

Type	Default
bool	`false`

If true, sequins will attempt to connect to zookeeper at the specified addresses (see zk.servers), and coordinate with peer instances to shard datasets. For a complete description of the sharding algorithm, see the manual.

replication

Type	Default
int	2

This is the number of replicas responsible for each partition.

min_replication

Type	Default
int	1

This is the minimum number of replicas required for sequins to switch to a new version. Set this to a higher value to ensure data redundancy before upgrading.

You probably don't want this to be equal to replication, or sequins will never upgrade versions if any node at all is down.

time_to_converge

Type	Default
string	`"10s"`

Upon startup, sequins will wait this long for the set of known peers to stabilize.

proxy_timeout

Type	Default
string	`"100ms"`

This is the total timeout (connect + request) for proxied requests to peers in a sequins cluster. You may want to increase this if you're running on particularly cold storage, or if there are other factors significantly increasing request time.

proxy_stage_timeout

Type	Default
string	see below (eg `"50ms"`)

After this interval, sequins will try another peer concurrently with the first, as long as there are other peers available and the total time is less than proxy_timeout. If left unset, this defaults to the proxy_timeout divided by replication_factor - enough time for all peers to be tried within the total timeout.

cluster_name

Type	Default
string	`"sequins"`

This defines the root prefix to use for zookeeper state. If you are running multiple sequins clusters using the same zookeeper for coordination, you should change this so they can't conflict.

advertised_hostname

Type	Default
string	see below (eg `"sequins1.example.com"`)

This is the hostname sequins uses to advertise itself to peers in a cluster. It should be resolvable by those peers. If left unset, it will be set to the hostname of the server.

shard_id

Type	Default
string	see below (eg `"sequins1"`)

The shard ID is used to determine which partitions the node is responsible for. By default, it is the same as advertised_hostname. Unlike the hostname, however, it doesn't have to be unique; two nodes can have the same shard_id, in which case they will download the same partitions. This can be useful if you don't have stable hostnames, but want to be able to rebuild a server to take the place of a dead or decomissioning one.

[zk]

servers

Type	Default
array of string	`["localhost:2181"]`

If set and 'sharding.enabled' is true, sequins will connect to zookeeper at the given addresses.

connect_timeout

Type	Default
string	`"1s"`

This specifies how long to wait while connecting to zookeeper.

session_timeout

Type	Default
string	`"10s"`

This specifies the session timeout to use with zookeeper. The actual timeout is negotiated between server and client, but will never be lower than this number.

[datadog]

url

Type	Default
string	`"localhost:8200"`

If set, sequins will send metrics concerning S3 file downloads using the DogStatsD protocol to this address.

[debug]

bind

Type	Default
string	unset (eg `"localhost:6060"`)

If set, binds the golang debug http server, which can serve expvars and profiling information, to the specified address.

expvars

Type	Default
bool	`true`

If set, this adds expvars to the debug HTTP server, including the default ones and a few sequins-specific ones.

pprof

Type	Default
bool	`false`

If set, this adds the default pprof handlers to the debug HTTP server.

1. Configuration Reference

Configuration Reference

Top Level Properties

source

bind

local_store

max_parallel_loads

throttle_loads

refresh_period

require_success_file

content_type

[storage]

compression

block_size

[s3]

region

access_key_id

secret_access_key

[sharding]

enabled

replication

min_replication

time_to_converge

proxy_timeout

proxy_stage_timeout

cluster_name

advertised_hostname

shard_id

[zk]

servers

connect_timeout

session_timeout

[datadog]

url

[debug]

bind

expvars

pprof

results matching ""

No results matching ""