DCOS-Training

Notes for training on DC OS

View on GitHub

00 A review of how we got here

01 DC/OS - Theory 01 - Basic Concepts

Key terms to keep in mind:

DC/OS is a distributed OS for the data center built on top of Apache Mesos.

Apache Mesos is a distributed systems kernel that manages cluster resources and tasks.

DC/OS builds up on Mesos and uses a lot of similar terminology. It helps to consider the terms in the context of DC/OS as well as Mesos

Nodes

A DC/OS node is a virtual or physical machine on which a Mesos agent and/or Mesos master process runs. DC/OS nodes are networked together to form a DC/OS cluster.

Master Node

A DC/OS master node is a virtual or physical machine that runs a collection of DC/OS components that work together to manage the rest of the cluster.

Agent Node

A DC/OS agent node is a virtual or physical machine on which Mesos tasks are run.

Private Agent Node

A private agent node is an agent node that is on a network that does not allow entry or ‘ingress’ from outside of the cluster via the cluster’s infrastructure networking.

Public Agent Node

A public agent node is an agent node that is on a network that does allow entry or ‘ingress’ from outside of the cluster via the cluster’s infrastructure networking.

Host Operating System

A host operating system is the operating system that runs on each DC/OS node underneath the DC/OS components, manages the local hardware and software resources, and provides common services for running other programs and services.

Bootstrap Machine

A bootstrap machine is the machine on which the DC/OS installer artifacts are configured, built, and distributed.

Task

A Mesos task is an abstract unit of work, lifecycle managed by a Mesos executor, that runs on a DC/OS agent node.

Executor

A Mesos executor is a method by which Mesos agents launch tasks.

Scheduler

A Mesos scheduler is a program that defines new Mesos tasks and assigns resources to them (placing them on specific nodes).

Framework

A Mesos framework consists of a scheduler, tasks, and optionally custom executors.

Resource Offer

A Mesos resource offer provides a set of unallocated resources (e.g. cpu, disk, memory) from an agent to a scheduler so that the scheduler may allocate those resources to one or more tasks. Resource offers are constructed by the leading Mesos master, but the resources themselves are reported by the individual agents.

Ref: Application Framework development guide.

02 DC/OS - Theory 02 - Architecture and Components

A quick look at architecture

The DCOS Architecture can be divided into three layers:

ArchLayers

Software Layer

Package management and repository for multiple types of services: databases, message queues, stream processors, artifact repositories, monitoring solutions, continuous integration tools, source control management, log aggregators, etc.

Platform Layer

Takes care of the following using components (which we’ll talk about in a bit):

These components are divided across multiple node types:

Infrastructure Layer

At the infrastructure layer, DC/OS can be installed on public clouds, private clouds, or on-premises hardware. Some of these install targets have automated provisioning tools, but almost any infrastructure can be used, as long as it includes multiple x86 machines on a shared IPv4 network.

Additional Components

Also called external components, these include the CLI and GUI that let you interact with the cluster.

Components:

Most DC/OS components run as systemd services on the DC/OS nodes.

To see a list of the systemd components running on any node, list the contents of the /etc/systemd/system/dcos.target.wants/ directory or execute systemctl | grep dcos- to see their current status.

This should list all the components on the particular node in your DCOS installation.

Here’s a bird’s eye view of the key components:

Components

DC/OS components are installed, upgraded, and managed by DC/OS Component Package Manager (Pkgpanda - Package Panda - a package manager for systemd units).

Check out the packages directory of the DC/OS source repository, to see what gets installed when we vagrant up our cluster.

Let’s take a closer look at some of the components, we’ll be running a few in detail in the labs that follow:

Cluster Management vis Mesos Master, Exhibitor and Zookeeper

Mesos

manages resources and tasks as a distributed systems kernel. Mesos Master exposes scheduler, executor, and operator interfaces to facilitate cluster management. Mesos Agent manages individual executors, tasks, and resources on each DC/OS agent node. Mesos Agent Public is a Mesos Agent configured to run on DC/OS public agent nodes.

System Services:
Zookeeper

provides consistent, highly available, distributed key-value storage for configuration, synchronization, name registration, and cluster state storage.

Exhibitor

supervises Zookeeper and provides a management web interface.

System Service:

Container orchestration

The continuous, automated scheduling, coordination, and management of containerized processes and the resources they consume.

Marathon

orchestrates long-running containerized services (apps and pods).

System Service:

dcos-marathon.service

DC/OS Jobs or Metronome

orchestrates short-lived, scheduled or immediate, containerized jobs.

System Service:

dcos-metronome.service

Logging and Metrics

DC/OS Diagnostics (3DT)

aggregates and exposes component health. DC/OS Diagnostics is also known as DC/OS Distributed Diagnostics Tool (3DT).

Services:
DC/OS Log service exposes node, component, and container (task) logs.
Services:

dcos-log-master.service dcos-log-master.socket dcos-log-agent.service dcos-log-agent.socket

Logrotate

manages rotation, compression, and deletion of historical log files.

Services:

dcos-logrotate-master.service dcos-logrotate-master.timer dcos-logrotate-agent.service dcos-logrotate-agent.timer

DC/OS Metrics service

exposes node, container, and application metrics.

Services:

dcos-metrics-master.service dcos-metrics-master.socket dcos-metrics-agent.service dcos-metrics-agent.socket

DC/OS Signal service

reports cluster telemetry and analytics to help improve DC/OS. Administrators can opt-out of telemetry at install time.

Services:

dcos-signal.service dcos-signal.timer

DC/OS History service

caches and exposes historical system state to facilitate cluster usage statistics in the GUI.

Service:

dcos-history.service

Networking

Admin Router

exposes a unified control plane proxy for components and services using NGINX. Admin Router Agent proxies node-specific health, logs, metrics, and package management internal endpoints.

Services:
Mesos DNS

provides domain name based service discovery within the cluster.

Services:
DNS Forwarder (Spartan)

forwards DNS requests to multiple DNS servers. Spartan Watchdog restarts Spartan when it is unhealthy.

Services:
Generate resolv.conf

configures network name resolution by updating /etc/resolv.conf to facilitate DC/OS’s software defined networking.

Services:
Minuteman

provides distributed Layer 4 virtual IP load balancing.

Services:

orchestrates virtual overlay networks using VXLAN.

Services:
Erlang Port Mapping Daemon (EPMD)

facilitates communication between distributed Erlang programs.

Services:

Package Management

DC/OS Package Manager (Cosmos)

installs and manages DC/OS packages from DC/OS package repositories, such as the Mesosphere Universe.

Services:
DC/OS Component Package Manager (Pkgpanda) installs and manages DC/OS components.
Services:

03 Hands on: SSH into our cluster

To install these programs, download the Windows installer from the official PuTTY download page.

Convert the .pem file type to .ppk by using PuTTYgen:

  1. Open PuTTYgen, select File > Load Private Key, and choose your .pem file.

  2. Select SSH-2 RSA as the key type, click Save private key, then choose the name and location to save your new .ppk key.

Windows

  1. Close PuTTYgen.

  2. SSH into the cluster.

    • To SSH to a master node:

      1. From the DC/OS web interface, copy the IP address of the master node. It will be the IP address that you used to connect to the GUI.

      2. Open PuTTY and enter the master node IP address in the Host Name (or IP address) field.

        Putty Configuration

      3. In the Category pane on the left side of the PuTTY window, choose Connection > SSH > Auth, click Browse, locate and select your .ppk file, then click Open.

        Putty SSH Options

      4. Login as user “core” if you’re running CoreOS. The default user on CentOS is “centos”.

        Windows Login

    • To SSH to an agent node

      Prerequisite: You must be logged out of your master node.

      1. Enable agent forwarding in PuTTY.

        Caution: SSH agent forwarding has security implications. Only add servers that you trust and that you intend to use with agent forwarding. For more information on agent forwarding. Ref: Using SSH agent forwarding.

        1. Open PuTTY. In the Category pane on the left side of the PuTTY window, choose Connection > SSH > Auth and check the Allow agent forwarding box.

        2. Click the Browse button and locate the .ppk file that you created previously using PuTTYgen.

          Windows Forwarding

      2. Add the .ppk file to Pageant.

        1. Open Pageant. If the Pageant window does not appear, look for the Pageant icon in the notification area in the lower right area of the screen next to the clock and double-click it to open Pageant’s main window.

        2. Click the Add Key button.

        3. Locate the .ppk file that you created using PuTTYgen and click Open to add your key to Pageant.

          Windows Pageant

        4. Click the Close button to close the Pageant window.

      3. SSH into the master node.

        1. From the DC/OS web interface, copy the IP address of the master node. The IP address is displayed beneath your cluster name.

        2. In the Category pane on the left side of the PuTTY window, choose Session and enter the master node IP address in the Host Name (or IP address) field.

        3. Login as user “core” if you’re running CoreOS. The default user on CentOS is “centos”.

          Windows Login

      4. From the master node, SSH into the agent node.

        1. From the Mesos web interface, copy the agent node hostname. You can find hostnames on the Frameworks (<master-node-IPaddress>/mesos/#/frameworks) or Slaves page (<master-node-IPaddress>/mesos/#/slaves).

        2. SSH into the agent node as the user core with the agent node hostname specified:

          ssh core@<agent-node-hostname>
          

04 Hands on: Service Discovery and Load Balancing

Vagrant up your cluster please. :)

Service Discovery is the process of finding out the IP:PORT combination a service is available from in a cluster of machines. Mesos Master is the ultimate source of truth concerning this information and as such any Service Discovery (SD) mechanisms need to consult it to learn about the mapping of a Mesos task (== service instance) to IP and port.

Let’s deploy a simple service using a named Virtual IP (VIP), then we will then scale it and observe the automagic load-balancing and last but not least we will make the service publicly accessible.

Setup

The simple service we will be using is written in Python and exposes only one endpoint, /endpoint0.

This endpoint responds with something like:

    {
        "host": "localhost:9876",
        "result": "all is well",
        "version": "0.3.0"
    }

Via the host field you can see from which machine the request came and the version field states the version of the service itself. Naturally, this is only a dummy service that does nothing useful, but it exposes the basic features needed for the following steps.

Invoking the service within the cluster via a named VIP

The containerized version of the simple service is a Docker image available via mhausenblas/simpleservice. This can be launched as a DC/OS service using the following specification:

    {
      "id": "/sise",
      "instances": 1,
      "cpus": 0.1,
      "mem": 32,
      "container": {
        "type": "DOCKER",
        "docker": {
          "image": "mhausenblas/simpleservice:0.3.0",
          "network": "HOST",
          "forcePullImage": true
        }
      },
      "portDefinitions": [
        {
          "labels": {
            "VIP_0": "/sise:80"
          }
        }
      ]
    }

Let’s assume you’ve saved above specification in a file called example-service.json, then you can launch the service using the DC/OS CLI:

    $ dcos marathon app add example-service.json

From within the cluster, (SSH into a master node), the simple service can be invoked like so:

    core@ip-10-0-5-223 ~ $ curl sise.marathon.l4lb.thisdcos.directory:80/endpoint0
    {"host": "sise.marathon.l4lb.thisdcos.directory", "version": "0.3.0", "result": "all is well"}

To maintain uniqueness, the following URL pattern is used:

<vip-name>.<scheduler>.l4lb.thisdcos.directory:<vip-port>

Also, no matter on which host and port the service actually runs, you can invoke it from anywhere within the cluster using sise.marathon.l4lb.thisdcos.directory:80.

To make the service listen to the random port assigned by DC/OS, I’m passing in the special environment variable PORT0, which DC/OS automagically makes available in the task environment, to the service.

You can now see the result of the simple service being invoked from the DC/OS CLI:

    $ dcos task log --follow sise stderr
    I0923 14:12:46.683511 22454 exec.cpp:161] Version: 1.0.1
    I0923 14:12:46.687604 22463 exec.cpp:236] Executor registered on agent a9d74521-cf9f-479a-8b6e-f10ae1b82346-S0
    I0923 14:12:46.688691 22459 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-09-23T14:12:42.351Z -e HOST=10.0.0.114 -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=mhausenblas/simpleservice:0.3.0 -e PORT_10000=31747 -e MESOS_TASK_ID=sise.cac98ccf-8197-11e6-8b52-26ceb6df775b -e PORT=31747 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=31747 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/sise -e PORT0=31747 -e LIBPROCESS_IP=10.0.0.114 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a9d74521-cf9f-479a-8b6e-f10ae1b82346-S0.f19b3376-e999-41cd-9f00-12b9f7a01ae9 -v /var/lib/mesos/slave/slaves/a9d74521-cf9f-479a-8b6e-f10ae1b82346-S0/frameworks/a9d74521-cf9f-479a-8b6e-f10ae1b82346-0000/executors/sise.cac98ccf-8197-11e6-8b52-26ceb6df775b/runs/f19b3376-e999-41cd-9f00-12b9f7a01ae9:/mnt/mesos/sandbox --net host --name mesos-a9d74521-cf9f-479a-8b6e-f10ae1b82346-S0.f19b3376-e999-41cd-9f00-12b9f7a01ae9 mhausenblas/simpleservice:0.3.0
    2016-09-23T02:14:43 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:14:43 INFO 200 GET /endpoint0 (10.0.5.223) 1.08ms [at line 1946]
    2016-09-23T02:15:00 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:15:00 INFO 200 GET /endpoint0 (10.0.5.223) 0.66ms [at line 1946]

Scaling the service

So far we had one instance of the simple service running. Next, we assume we want to serve more load and hence scale the simple service up to 3 instances.

When we now look at the logs again we see the automatic load balancing between the three tasks (these are dummy hashes, your task IDs may differ) - sise.[5ca09a40-8198-11e6-8b52-26ceb6df775b], sise.5ca0c151-8198-11e6-8b52-26ceb6df775b, and sise.cac98ccf-8197-11e6-8b52-26ceb6df775b taking place:

    $ dcos task log --follow sise stderr
    ===> task:sise.5ca09a40-8198-11e6-8b52-26ceb6df775b:stderr <===
    I0923 14:16:48.579545   448 exec.cpp:236] Executor registered on agent a9d74521-cf9f-479a-8b6e-f10ae1b82346-S2
    I0923 14:16:48.580734   447 docker.cpp:815] Running docker -H unix:///var/run/docker.sock run --cpu-shares 102 --memory 33554432 -e MARATHON_APP_VERSION=2016-09-23T14:16:47.027Z -e HOST=10.0.0.115 -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_DOCKER_IMAGE=mhausenblas/simpleservice:0.3.0 -e PORT_10000=29727 -e MESOS_TASK_ID=sise.5ca09a40-8198-11e6-8b52-26ceb6df775b -e PORT=29727 -e MARATHON_APP_RESOURCE_MEM=32.0 -e PORTS=29727 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/sise -e PORT0=29727 -e LIBPROCESS_IP=10.0.0.115 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-a9d74521-cf9f-479a-8b6e-f10ae1b82346-S2.07e049b6-201c-47dc-9228-fad4048e387d -v /var/lib/mesos/slave/slaves/a9d74521-cf9f-479a-8b6e-f10ae1b82346-S2/frameworks/a9d74521-cf9f-479a-8b6e-f10ae1b82346-0000/executors/sise.5ca09a40-8198-11e6-8b52-26ceb6df775b/runs/07e049b6-201c-47dc-9228-fad4048e387d:/mnt/mesos/sandbox --net host --name mesos-a9d74521-cf9f-479a-8b6e-f10ae1b82346-S2.07e049b6-201c-47dc-9228-fad4048e387d mhausenblas/simpleservice:0.3.0
    2016-09-23T02:17:48 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:17:48 INFO 200 GET /endpoint0 (10.0.5.223) 1.12ms [at line 1946]
    2016-09-23T02:19:18 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:19:18 INFO 200 GET /endpoint0 (10.0.5.223) 0.61ms [at line 1946]
    2016-09-23T02:19:20 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:19:20 INFO 200 GET /endpoint0 (10.0.5.223) 0.56ms [at line 1946]
    2016-09-23T02:25:07 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:07 INFO 200 GET /endpoint0 (10.0.5.223) 0.65ms [at line 1946]
    ===> task:sise.5ca0c151-8198-11e6-8b52-26ceb6df775b:stderr <===
    2016-09-23T02:19:22 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:19:22 INFO 200 GET /endpoint0 (10.0.5.223) 0.87ms [at line 1946]
    2016-09-23T02:24:56 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:24:56 INFO 200 GET /endpoint0 (10.0.5.223) 0.67ms [at line 1946]
    2016-09-23T02:25:00 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:00 INFO 200 GET /endpoint0 (10.0.5.223) 0.65ms [at line 1946]
    2016-09-23T02:25:01 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:01 INFO 200 GET /endpoint0 (10.0.5.223) 0.68ms [at line 1946]
    2016-09-23T02:25:06 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:06 INFO 200 GET /endpoint0 (10.0.5.223) 0.65ms [at line 1946]
    ===> task:sise.cac98ccf-8197-11e6-8b52-26ceb6df775b:stderr <===
    2016-09-23T02:19:17 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:19:17 INFO 200 GET /endpoint0 (10.0.5.223) 0.59ms [at line 1946]
    2016-09-23T02:25:02 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:02 INFO 200 GET /endpoint0 (10.0.5.223) 0.60ms [at line 1946]
    2016-09-23T02:25:03 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:03 INFO 200 GET /endpoint0 (10.0.5.223) 0.63ms [at line 1946]
    2016-09-23T02:25:05 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:05 INFO 200 GET /endpoint0 (10.0.5.223) 0.58ms [at line 1946]
    2016-09-23T02:25:30 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:30 INFO 200 GET /endpoint0 (10.0.5.223) 0.59ms [at line 1946]
    ===> task:sise.5ca09a40-8198-11e6-8b52-26ceb6df775b:stderr <===
    2016-09-23T02:25:33 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:33 INFO 200 GET /endpoint0 (10.0.5.223) 0.63ms [at line 1946]
    2016-09-23T02:25:35 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:35 INFO 200 GET /endpoint0 (10.0.5.223) 0.62ms [at line 1946]
    ===> task:sise.5ca0c151-8198-11e6-8b52-26ceb6df775b:stderr <===
    2016-09-23T02:25:38 INFO /endpoint0 has been invoked from 10.0.5.223 [at line 35]
    2016-09-23T02:25:38 INFO 200 GET /endpoint0 (10.0.5.223) 0.86ms [at line 1946]

Exposing the service to the Web using Marathon-LB

So far we were able to access the simple service from within the cluster, for example, from another service or manually as we did above using curl. Now we want to make the service available to the public.

To expose the simple service to the Web, we use Marathon-LB.

If needed, install marathon-lb from the GUI or CLI:

    $ dcos package install marathon-lb

To expose the simple service you only need to add two labels, HAPROXY_GROUP and HAPROXY_0_VHOST (set it to the public agent host) and define a port, 10010 in our case:

    {
      "id": "/sise",
      "instances": 3,
      "cpus": 0.1,
      "mem": 32,
      "container": {
        "type": "DOCKER",
        "docker": {
          "image": "mhausenblas/simpleservice:0.3.0",
          "network": "HOST",
          "forcePullImage": true
        }
      },
      "labels": {
        "HAPROXY_GROUP": "external",
        "HAPROXY_0_VHOST": "http://ec2-52-42-2-149.us-west-2.compute.amazonaws.com"
      },
      "portDefinitions": [{
        "port": 10010,
        "labels": {
          "VIP_0": "/sise:80"
        }
      }]
    }

Now we can deploy the simple service again:

    $ dcos marathon app add example-service-external.json

And list all of the tasks that currently run (we expect MLB and three instances of our simple service):

    $ dcos task
    NAME         HOST        USER  STATE  ID
    marathon-lb  10.0.4.224  root    R    marathon-lb.dd8c87c5-819a-11e6-8b52-26ceb6df775b
    sise         10.0.0.114  root    R    sise.1417e3b9-819c-11e6-8b52-26ceb6df775b
    sise         10.0.0.114  root    R    sise.d45af1db-819c-11e6-8b52-26ceb6df775b
    sise         10.0.0.115  root    R    sise.d45acaca-819c-11e6-8b52-26ceb6df775b

Now, MLB uses HAProxy on a public node to expose services to the public.

As a result of using MBL we can now invoke our simple service from anywhere on the Internet:

    $ http http://ec2-52-42-2-149.us-west-2.compute.amazonaws.com:10010/endpoint0
    HTTP/1.1 200 OK
    Content-Length: 110
    Content-Type: application/json
    Date: Fri, 23 Sep 2016 14:46:54 GMT
    Etag: "d9ba10b4a90ac72d4578dab377b010ccfaaaaef5"
    Server: TornadoServer/4.3
    
    {
        "host": "ec2-52-42-2-149.us-west-2.compute.amazonaws.com:10010",
        "result": "all is well",
        "version": "0.3.0"
    }

Summing up, we exposed and invoked a simple service using VIPs and MLB with a few configuration settings necessary and most of the heavy lifting taking care of by DC/OS.

05 Hands on: Installation revisited

vagrant destroy -f everything before moving on.

Create a new folder (leave your ealier cluster alone for this one)

We’ll follow the basic steps again, just to kickstart things:

* Install the vagrant hostmanager plugin: 

```shell
vagrant plugin install vagrant-hostmanager
```

* now clone the git repo into your folder:

```shell
git clone https://github.com/dcos/dcos-vagrant
```

* move into the new directory

```shell
cd dcos-vagrant
```

This installation method uses a bootstrap node to administer the DC/OS installation across your cluster. The bootstrap node uses an SSH key to connect to each node in your cluster to automate the DC/OS installation.

The DC/OS installation creates these folders:

Folder Description
/opt/mesosphere Contains all the DC/OS binaries, libraries, cluster configuration. Do not modify.
/etc/systemd/system/dcos.target.wants Contains the systemd services which start the things that make up systemd. They must live outside of `/opt/mesosphere` because of systemd constraints.
/etc/systemd/system/dcos.<units> Contains copies of the units in `/etc/systemd/system/dcos.target.wants`. They must be at the top folder as well as inside `dcos.target.wants`.
/var/lib/dcos/exhibitor/zookeeper Contains the ZooKeeper data.
/var/lib/docker Contains the Docker data.
/var/lib/dcos Contains the DC/OS data.
/var/lib/mesos Contains the Mesos data.

Install DC/OS

  1. Download the DC/OS installer to your bootstrap node.
curl -O https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh
  1. From your terminal, start the DC/OS GUI installer with this command.
sudo bash dcos_generate_config.sh --web

Here is an example of the output.

Running mesosphere/dcos-genconf docker with BUILD_DIR set to /home/centos/genconf
16:36:09 dcos_installer.action_lib.prettyprint:: ====> Starting DC/OS installer in web mode
16:36:09 root:: Starting server ('0.0.0.0', 9000)
sudo bash dcos_generate_config.sh --web -v
  1. Launch the DC/OS web installer in your browser at: http://<bootstrap-node-public-ip>:9000.

  2. Click Begin Installation.

  3. Specify your Deployment and DC/OS Environment settings:

    Deployment Settings

    Master Private IP List

    Specify a comma-separated list of your internal static master IP addresses.

    Agent Private IP List

    Specify a comma-separated list of your internal static private agent private IP addresses.

    Agent Public IP List

    Specify a comma-separated list of your internal static public agent private IP addresses.

    Master Public IP

    Specify a publicly accessible proxy IP address to one of your master nodes. If you don’t have a proxy or already have access to the network where you are deploying this cluster, you can use one of the master IP’s that you specified in the master list. This proxy IP address is used to access the DC/OS web interface on the master node after DC/OS is installed.

    SSH Username

    Specify the SSH username, for example centos.

    SSH Listening Port

    Specify the port to SSH to, for example 22.

    Private SSH Key

    Specify the private SSH key with access to your master IPs.

    DC/OS Environment Settings

    Upstream DNS Servers

    Specify a comma-separated list of DNS resolvers for your DC/OS cluster nodes. Set this parameter to the most authoritative nameservers that you have. If you want to resolve internal hostnames, set it to a nameserver that can resolve them. If you have no internal hostnames to resolve, you can set this to a public nameserver like Google or AWS.

    In the example above, the Google Public DNS IP addresses (IPv4) are specified: 8.8.8.8 and 8.8.4.4. If Google DNS is not available in your country, you can replace the Google DNS servers with your local DNS servers.

    ** Caution: If you set this parameter incorrectly you will have to reinstall DC/OS.**

    IP Detect Script

    Choose an IP detect script from the dropdown to broadcast the IP address of each node across the cluster. Each node in a DC/OS cluster has a unique IP address that is used to communicate between nodes in the cluster. The IP detect script prints the unique IPv4 address of a node to STDOUT each time DC/OS is started on the node.

    Important: The IP address of a node must not change after DC/OS is installed on the node. For example, the IP address must not change when a node is rebooted or if the DHCP lease is renewed. **If the IP address of a node does change, the node must be wiped and reinstalled. **

    Send Anonymous Telemetry

    Indicate whether to allow Mesosphere to collect anonymous DC/OS usage data.

    Enable Authentication

    Indicate whether to enable authentication for your DC/OS cluster.

  4. Click Run Pre-Flight. The preflight script installs the cluster prerequisites and validates that your cluster is installable. This step can take up to 15 minutes to complete. If errors any errors are found, fix and then click Retry.

*Important: If you exit your GUI installation before launching DC/OS, you must do this before reinstalling:

*   SSH to each node in your cluster and run `rm -rf /opt/mesosphere`.
*   SSH to your bootstrap master node and run `rm -rf /var/lib/dcos/exhibitor/zookeeper`
  1. Click Deploy to install DC/OS on your cluster. If errors any errors are found, fix and then click Retry.

    This step might take a few minutes, depending on the size of your cluster.

  2. Click Run Post-Flight. If errors any errors are found, fix and then click Retry.

    Notes:

    • If you encounter errors such as Time is marked as bad, adjtimex, or Time not in sync during Post-Flight, verify that Network Time Protocol (NTP) is enabled on all nodes.
    • You can click Download Logs to view your logs locally.
    • If this takes longer than about 10 minutes, you’ve probably misconfigured your cluste.
  3. Click Log In To DC/OS.

    You are done!

Whoa! we are done!

This was fun for me. Thank you for being so nice to me.

DC/OS and Distributed computing is an growing, involved field. Here’s wishing you happy discovering.

You can find more cool things to read and refer to here: https://github.com/theanalyst/awesome-distributed-systems

au revoir!

noun au re·voir \ˌōr-ə-ˈvwär, ˌȯr-, French ōr(-ə)-vw’r
Till we meet again.