I am currently working on two great technologies:
Puppet and
ElasticSearch.
The purpose of this post is to describe a Puppet module which deploys ElasticSearch and selected plugins as a service in a Linux or Mac-OS environments.
I am not an expert of these technologies; what I describe here is just a snapshot of what I am able to do at a given point in time (2013, March) of my Puppet & ElasticSearch learning curve...
This kind of work has already been made
a lot of time, and it's my turn !
Limitations
This module has been tested against OpenSuse 12.2 and Mac-OS 10.6.8, ElasticSearch 0.20.5 and the latest version of the
ElasticSearch service wrapper as I found it on GitHub around mid-february 2013.
I tested the ElasticSearch resulting installations in the following ways:
- batch indexing on the OpenSuse (master) node (180 millions of documents distributed over 8 shards)
- replication on Mac-OS nodes (puppetized or not)
- search on all types of covered nodes
- usage of the head plugin
- upgrading / downgrading in place with 0.20.5 and 0.20.6 versions of Elasticsearch
No more warranties, this is still unfinished work !
Context
- The Puppet Master is hosted on a Raspberry PI model B running Raspbian Wheezy (Puppet: 2.7.18-2, Ruby: 1.8.7). No compilation of Puppet from source code; I have used what is coming with the distribution (the same stands for OpenSuse 12.2). Modules are managed using git, thank's to Git and my Synology Diskstation.
- Puppet (2.7.18) and Facter (1.6.11) on Mac-OS come from the Mac-OS download puppet site. It is on purpose that I did not used the latest versions available in each case: my intention was to keep the puppet agents at the same level than the master. The installation of these disk images is not sufficient enough to launch the puppet agent at Mac-OS start-up; do not forget to read this.
- The version of the puppet agent on OpenSuse 12.2 is 2.7.6-4.1.2. It runs under Ruby 1.9.3-2.2.1. Talking about Ruby 1.9.3 and Puppet, I remember that this post helped me a lot to get my client cert signed by the puppet master.
I will not describe here how to setup the Puppet Master and its agents, as this topic is largely covered by multiple resources on the Internet.
Prerequisite
ElasticSearch and the service wrapper zip binaries must be located under the
files/elasticsearch mount point of the Puppet Master (the service wrapper adds the multi-platforms service functionality to ElasticSearch, thank's to
Tanuki Software which is also used (and embedded) to my knowledge in
Apache Archiva).
/etc/puppet/fileserver.conf must contain a section resembling to:
[files]
path /etc/puppet/files
allow *.your.domain
Under the path
/etc/puppet/files, there should be a directory for the ElasticSearch module containing the following files (
wrapper.zip being a zip archive built using the
git archive command and
elasticsearch-VERSION.zip being the version of ElasticSearch that has to be puppet distributed):
/etc/puppet/files
└── elasticsearch
├── elasticsearch-0.20.5.zip
└── wrapper.zip
Module parameters
This module is implemented as a single puppet class with the following parameters:
- els_username which defaults to elasticsearch, is the Unix user name used to run the ElasticSearch node.
- els_uid is the UID of els_username.
- els_groupname which defaults to elasticsearch, is the Unix group name of els_username.
- els_gid is the GID of els_groupname.
- els_homedir is the directory under which ElasticSearch will be deployed.
- els_env is the path where the template env will be created. The resulting file aims at gathering environment variables and system settings. It is sourced by the service wrapper script (see bellow).
- els_version is the version of ElasticSearch that has to be installed. There must be an existing elasticsearch-<els_version>.zip file under the /etc/puppet/files/elasticsearch directory (see pre-requisites above).
- els_clustername is the ElasticSearch cluster name of the node (all the nodes with the same cluster name belongs to the same cluster).
- els_nodename is the ElasticSearch node name.
- els_masters is an array of ElasticSearch masters (see bellow an explanation of why I did not use multicasting which is the default for ElasticSearch).
- els_node2node_port which defaults to 9300, is the port number used by ElasticSearch nodes to communicate between them.
- els_http_port which defaults to 9200, is the port number used to communicate with the cluster (eg: REST API, site plugins, ...).
- els_is_master which defaults to false, indicates if the node is a master node.
- els_is_data which defaults to true, indicates if the node is a data node.
- els_nb_of_shards which defaults to 5, is the default number of shards.
- els_nb_of_replicas which defaults to 1,is the default number of replicas for each shard.
- els_heap_size which defaults to 1024,is the heap size of the JVM of the ElasticSearch server. Units are in mega bytes, and usual letters (G,M) must not be used as they generate an error with the service wrapper script.
- els_plugins which defaults to [], is an array of the plugins which must be installed on each node. Each plugin is described by a space separated string: the first part of the string is the name of the plugin as it is passed as a parameter to the bin/plugin ElasticSearch shell script for installation, the second part of the string is the name of the directory which is created under the ElasticSearch plugins directory after the installation of the plugin is completed. This trick is used to check that the plugin has already been installed. Example of such string: mobz/elasticsearch-head head.
- els_ensure_running which defaults to false, indicates if Puppet should verify that ElasticSearch is running as a service.
This class is able to update ElasticSearch just by changing the value of the els_version parameter on the puppet master node file , as far as no more than a service restart is required (and obviously after having downloaded into the /etc/puppet/files/elasticsearch puppet master directory the corresponding binary zip file).
Note that plugins are not updatable nor removable. They can just be installed. It should not be so difficult to make them more puppet dynamic, for instance by adding an action sign before the plugin name (! to force reinstall, - to remove, ...) and modifying the install_plugin define (see bellow).
Usage example
node 'mini.your.domain' {
class { 'elasticsearch' :
els_uid => 1963,
els_gid => 1963,
els_heap_size => 1536,
els_homedir => "/Users/elasticsearch",
els_env => "/Users/elasticsearch/elasticsearch.env.sh",
els_version => "0.20.5",
els_nodename => "elasticsearch@$fqdn",
els_clustername => "mycluster",
els_masters => [ 'mahina.your.domain' ],
els_nb_of_shards => 8,
els_ensure_running => true,
els_plugins => [
'mobz/elasticsearch-head head',
'elasticsearch/elasticsearch-mapper-attachments/1.6.0 mapper-attachments'
],
}
}
The module's files
The module is made up of an
init.pp file and three templates (one for the ElasticSearch configuration, one for installing it as a MacOS service, and the last one for hosting environment variables used by the service wrapper).
manifests/init.pp
This is the main file. It creates a tree that will look like:
homedir
├── downloads
│ ├── elasticsearch-0.20.5.zip
│ └── wrapper.zip
├── elasticsearch-0.20.5
│ ├── bin
│ │ ├── elasticsearch
│ │ ├── elasticsearch.bat
│ │ ├── elasticsearch.in.sh
│ │ ├── plugin
│ │ ├── plugin.bat
│ │ └── service
│ │ ├── elasticsearch
│ │ ├── ...
│ ├── ...
│ ├── config
│ │ ├── elasticsearch.yml
│ ├── lib
│ │ ├── elasticsearch-0.20.5.jar
│ │ ├── ...
│ ├── plugins
│ │ ├── head
│ │ │ └── ...
│ │ └── mapper-attachments
│ │ ├── ...
├── elasticsearch.env.sh
├── elasticsearch_content
│ ├── data
│ ├── log
│ └── piddir
└── elasticsearch_current -> /home/elasticsearch/elasticsearch-0.20.5
First of all, the group and the user under which ElasticSearch will run are created if they are not present in the target node. The same stands for the home directory.
Under the home directory, a
download directory is created. This directory will be used to store the downloaded files from the puppet master fileserver.
An
elasticsearch_content directory is also created. It will be used to store all the content of a running node (shards, log files, ...), independently of the version of ElasticSearch itself. Under this directory, a
piddir directory is created, it will be used by the service wrapper to store its status independently of the version of ElasticSearch it is bind to.
Then the ElasticSearch zip file is downloaded and stored into the
download directory. It is unzipped under the home directory using its zip native name:
elasticsearch-<els_version> (e.g.: elasticsearch-0.20.5).
Permissions of the ElasticSearch unzipped directory (and its sub-directories) are fixed thanks to Perl and find. Just a word about that: I think that Perl is a good match with puppet for Unixes boxes, as it is portable for simple things, far more portable than sed for instance for the same kind of tasks.
A link is established between
elasticsearch-<els_version> and
elasticsearch_current.
Note that prefixing
content and
current directories with
elasticsearch allows this class to deploy ElasticSearch with other middlewares under the same home directories using the same deployment pattern (current/content).
Now that an ElasticSearch version is available, specified plugins are installed using the ElasticSearch
plugin script (see the
define install_plugin).
The ElasticSearch configuration file (
elasticsearch.yml) is then created using the Puppet templating mechanism.
Next, the service wrapper is downloaded from the Puppet master, unzipped at the right place (under the
bin directory of the ElasticSearch installation), and the service file is patched (thanks to Perl) to include the environment file that will be created at the next step, and to set the proper values for
PIDDIR and
ES_HOME. These two variables must not be left to their default value in order to properly restart ElasticSearch when a new version is puppet-installed.
The final step is dedicated to the service installation. Depending on the kind of target node, a plist file is installed (MacOS) or a link is set under
/etc/init.d (Linux).
In both cases, the service is installed, thanks to Puppet.
class elasticsearch (
$els_username = elasticsearch,
$els_uid,
$els_groupname = elasticsearch,
$els_gid,
$els_homedir,
$els_env,
$els_version,
$els_clustername,
$els_nodename,
$els_masters,
$els_node2node_port = 9300,
$els_http_port = 9200,
$els_is_master = false,
$els_is_data = true,
$els_nb_of_shards = 5,
$els_nb_of_replicas = 1,
$els_heap_size = 1024,
$els_plugins = [],
$els_ensure_running = false
) {
$els_name="elasticsearch-${els_version}"
$els_base="$els_homedir/$els_name"
$els_current="$els_homedir/elasticsearch_current"
$els_content="$els_homedir/elasticsearch_content"
$els_downloads="$els_homedir/downloads"
$els_datadir="$els_content/data"
$els_workdir="$els_content/work"
$els_logdir="$els_content/log"
$els_piddir="$els_content/piddir"
$els_wrapper_script="$els_current/bin/service/elasticsearch"
if ($operatingsystem == "Darwin") {
$els_notify="Service[org.tanukisoftware.wrapper.elasticsearch]"
} else {
$els_notify="Service[elasticsearch]"
}
File {
owner => $els_username,
group => $els_groupname,
mode => '0644',
}
Exec {
user => $els_username,
group => $els_groupname,
cwd => $els_homedir,
path => "/usr/bin/:/bin",
timeout => 900,
}
define install_plugin {
$p_array=split($title,' ')
$p_name=$p_array[0]
$p_dir=$p_array[1]
exec { "install elasticsearch plugin $p_name":
command => "$els_current/bin/plugin -install $p_name",
logoutput => true,
creates => "$els_current/plugins/$p_dir",
}
}
group { "$els_groupname" :
ensure => present,
name => $els_groupname,
gid => $els_gid,
}
user { "$els_username" :
require => Group[$els_groupname],
ensure => present,
name => $els_username,
uid => $els_uid,
gid => $els_groupname,
shell => '/bin/bash',
home => $els_homedir,
comment => 'ElasticSearch User',
}
file { "$els_homedir" :
require => User[$els_username],
ensure => directory,
mode => 755,
}
file { "$els_content" :
require => File[$els_homedir],
ensure => directory,
mode => 755,
}
file { "$els_piddir" :
require => File[$els_content],
ensure => directory,
mode => 755,
}
file { "$els_downloads" :
require => File[$els_homedir],
ensure => directory,
mode => 755,
}
file { "$els_downloads/$els_name.zip" :
require => File["$els_downloads"],
source => "puppet:///files/elasticsearch/$els_name.zip",
}
exec { 'unzip elasticsearch' :
require => File["$els_downloads/$els_name.zip"],
command => "unzip $els_downloads/$els_name.zip",
creates => "$els_base",
}
# Perms in zip file are too wide (777)
exec { 'fix directories perms elasticsearch' :
require => Exec['unzip elasticsearch'],
command => "find $els_base -type d -exec chmod go-w {} \;",
onlyif => "perl -e 'exit(sprintf(\"%o\", (stat(\"$els_base\"))[2]&00077) ne \"77\")'",
}
file { "$els_current" :
require => Exec['unzip elasticsearch'],
ensure => link,
target => "$els_base",
notify => $els_notify,
}
install_plugin { $els_plugins :
require => File["$els_current"],
notify => $els_notify,
}
file { "$els_current/config/elasticsearch.yml" :
require => File["$els_current"],
content => template("elasticsearch/elasticsearch.yml"),
notify => $els_notify,
}
file { "$els_downloads/wrapper.zip" :
require => File["$els_downloads"],
source => "puppet:///files/elasticsearch/wrapper.zip"
}
exec { 'install elasticsearch wrapper' :
require => [ File["$els_current"], File["$els_downloads/wrapper.zip"] ],
cwd => "$els_current/bin",
command => "unzip '$els_downloads/wrapper.zip'",
creates => "$els_current/bin/service",
}
exec { 'source env in elasticsearch wrapper':
require => [ Exec['install elasticsearch wrapper'], File["$els_env" ] ],
command => "perl -pi.bak -e 'print \". $els_env # KILROY WAS HERE\n\" if $. == 2' $els_current/bin/service/elasticsearch",
onlyif => "grep -v '# KILROY WAS HERE$' $els_current/bin/service/elasticsearch",
creates => "$els_current/bin/service/elasticsearch.bak",
}
exec { 'fix elasticsearch wrapper':
require => Exec['source env in elasticsearch wrapper'],
command => "perl -pi.fix -e 's|^PIDDIR=\".\"$|PIDDIR=$els_piddir|; s|^export ES_HOME=.*$|export ES_HOME=$els_current|' $els_current/bin/service/elasticsearch",
creates => "$els_current/bin/service/elasticsearch.fix",
}
file { "$els_env" :
require => [ File["$els_homedir"], User["$els_username"] ],
content => template("elasticsearch/env"),
notify => $els_notify,
}
if ($operatingsystem == 'Darwin') {
file { '/Library/LaunchDaemons/org.tanukisoftware.wrapper.elasticsearch.plist':
require => Exec['fix elasticsearch wrapper'],
content => template("elasticsearch/elasticsearch.plist"),
owner => root,
group => wheel,
mode => 0644,
}
service { 'org.tanukisoftware.wrapper.elasticsearch' :
require => File['/Library/LaunchDaemons/org.tanukisoftware.wrapper.elasticsearch.plist'],
enable => true,
ensure => $els_ensure_running,
}
} else {
file { '/etc/init.d/elasticsearch':
require => Exec['fix elasticsearch wrapper'],
ensure => link,
owner => root,
group => root,
target => "$els_current/bin/service/elasticsearch",
}
service { 'elasticsearch' :
require => File['/etc/init.d/elasticsearch'],
name => "elasticsearch",
enable => true,
ensure => $els_ensure_running,
}
}
}
templates/elasticsearch.yml
This is the configuration file used for all the nodes of the cluster, whereas they are master or data nodes.
It is rather simple and probably far away from what is required in a production environment.
Just a word about it: I am not using multicasting to discover master nodes (see the <% if !els_is_master %> section below) because of my home network peculiarities and what I want to do with ElasticSearch at the present time. I guess that this is not a good practice, and this is another one good reason to adapt this file to your needs.
cluster.name: "<%= els_clustername %>"
<% if @els_nodename %>
node.name: "<%= els_nodename %>"
<% end %>
node.master: <%= els_is_master %>
node.data: <%= els_is_data %>
index.number_of_shards: <%= els_nb_of_shards %>
index.number_of_replicas: <%= els_nb_of_replicas %>
path.data: <%= els_datadir %>
path.work: <%= els_workdir %>
path.logs: <%= els_logdir %>
transport.tcp.port: <%= els_node2node_port %>
http.port: <%= els_http_port %>
<% if !els_is_master %>
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [<% els_masters.each do |host| -%> <%= host -%>, <%end -%> ]
<% end %>
templates/env
This file is sourced by the service wrapper (see how above in the
init.pp file). The first three export lines are used by the wrapper. The two following lines may be used by ElasticSearch clients (such as scripts made up of curl requests or Java program) to access to the cluster.
The last line purpose is to give the indexing nodes enough resources to index a big dataset (in my case a corpus of 1.3 billion of words). I guess that it has to be transformed into a class parameter, and maybe under node type conditions (master node, data node only, ...).
export RUN_AS_USER=<%= els_username %>
export ES_HOME=<%= els_current %>
export ES_HEAP_SIZE=<%= els_heap_size %>
export ES_HTTP_PORT=<%= els_http_port %>
export ES_NODE2NODE_PORT=<%= els_node2node_port %>
# Number of files that can be opened simultaneously
ulimit -n 4096
templates/elasticsearch.plist
This file is specific to MacOS, it is used to install ElasticSearch as a service.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>org.tanukisoftware.wrapper.elasticsearch</string>
<key>ProgramArguments</key>
<array>
<string><%= els_wrapper_script %></string>
<string>launchdinternal</string>
</array>
<key>OnDemand</key>
<true/>
<key>RunAtLoad</key>
<true/>
<key>UserName</key>
<string>elasticsearch</string>
</dict>
</plist>
Resources
Internet resources put aside (among them
this one), I appreciated particularly the content of these three books, each of them having its own interest.