MetaRouter Enterprise

Data Streaming on your own terms

MetaRouter Enterprise is a private, cloud-agnostic, streaming data platform designed for companies with sensitive data or who aim for full control of their data ingestion. Deploy an end-to-end data router on your own cloud, maximizing control, flexibility, and security.

With support for a variety of Destinations, MetaRouter Enterprise leverages Kubernetes to allow all of your streaming data processing to happen in one place.

Learn More: Here

Architecture

The Data Streaming Engine

The primary component of our platform, consisting of three distinct layers of services, that allows for the real-time ingestion, durable processing, and monitored delivery of data from your applications.

MetaRouter Overview

Layer 1: Ingestion - A high performance API, only outwardly facing component of the platform, that accepts the incoming data from your sources to write into durable stateful message queues.

Layer 2: Routing - Syndicates event processing among Layer 3 to ensure the sending of data to it’s destination based on the configuration you provide for that individual source.

Layer 3: Forwarding - Handles mapping the data schema implemented within your sources to that of the proprietary format you want the event sent to and tracks sending success to destinations outside of platform.

As with all of our Clickstream products, you only have to implement a centralized schema for the data you want tracked and the platform handles validation and parsing of that data to match the requirements of all of the distinct services that you want to receive that data.

Customization

The design of MetaRouter Enterprise is modular and allows for an unprecedented amount of flexibility when it comes to real-time streaming data processors. It can sit downstream from existing APIs to receive data or feed data directly to your own internal data processors.

Each deployment supports multi-tenancy out of the box, allowing you to implement tracking with a variety of data sources and map their data distinctly based on the configurations you set for that specific instance.

Resource Requirements

Kubernetes

Minimum Node Resources

We recommend that you have at least 3 nodes that each offer 4 CPU Cores and 16 GB of Memory per zone, and that you have at least 2 zones that your cluster is available from (to reduced the change of downtime from your Cloud provider) in order to support a majority of traffic with just the core platform.

Also, for every Destination you want supported, have an additional .5 CPU Core and 1GB Mem. available per region for every 1,000 events/sec for your peak expected traffic.

The minimum recommended size of our platform is, per zone:

  • 3x Load Balancer (.25 CPU, .5 MEM)
  • 3x Ingestors (.25 CPU, .25 MEM)
  • 2x Routers (.25 CPU, .5 MEM)
  • 1x Forwarder per Destination (.5 CPU, 1 MEM)

Calculating Scaling

The platform is set to automatically scale to meet event demand, so if you expect a large range of events or ever-increasing peaks of traffic, please leave plenty of headroom or set your cluster to autoscale.

The Load Balancer, Ingestor and Router will each autoscale roughly by one pod for every 1,000 events/sec over the minimum rate.

To calculate the amount of Forwarders you will need, take the amount of Destinations you want the platform to support and multiply it by the amount of events per second you expect at peak volume. Then, compare it against the amount of Destinations and take the higher number.

For example, if I want to support Google Analytics, Facebook Pixel, and Pinterest on my platform only, and I expect 500 events/sec. Then, I would expect 3 Forwarders.

(3*(500/1000)) = 1.5 < 3

However, supporting those 3 Destinations at 5000 events/sec, I should expect to need 15 Forwarders to handle that traffic.

(3*(5000/1000)) = 15 > 3

Supported Cloud Providers

Currently Verified Kubernetes Providers:

  • Google Cloud Platform (GCP)
  • Amazon Web Services (AWS)

Coming Soon:

  • Digital Ocean (DO)
  • Microsoft Azure

Load Balancer

Our platform will spin up an internal Load Balancer and connect it with your Cloud Provider which, depending on your Cloud Provider, may create ingress rules or spin up resources that are represented outside of Kubernetes. The only manual step that is currently required is for you to link the DNS records to Ingress instances created for the Load Balancer, as our platform takes care of the rest. Learn more about how Kubernetes leverages Ingress with Cloud Providers on their Docs.

Message Queues

We support connecting our services between different Stateful Message Queues in order to provide data durability and allow for finer connecting control for you to send data to other parts of your platform.

Our currently supported queues are:

  • Confluent Kafka
  • AWS Kafka (MSK)
  • Google PUB/SUB

Coming Soon:

  • Azure Service Bus (planned)
  • AWS Kinesis (planned)
  • NATS (investigating)
  • Pulsar (investigating)

Although we do not have current support for queues managed from within the platform, we are planning to introduce it this year. This will prevent you from having to link any externally provided services, however the total amount of resources needed will be much higher.

Before Enterprise Platform Installation

Cloud Provider Prerequisites

Google Cloud (GCP)

For a proper DNS setup, our platform will require a Static IP to be assigned to it. First, reserve a Static IP (Found in GCP > Networking > VPC Network > External IP Addresses) and note the IP Address. This will be needed during the installation process.

Amazon Web Services (AWS)

Note: This is a stub of an existing feature. Details coming soon.

Kubernetes Prerequisites

Create the namespace for the platform. We recommend creating a namespace that will be used solely by the MetaRouter Platform:

kubectl create namespace mr-ee

Install Tiller

Prep cluster for installation of Tiller

kubectl -n kube-system create serviceaccount tiller

kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller

Start Tiller

helm init --service-account tiller

Then, verify tiller is there with:

kubectl get pods --namespace kube-system

Adding Custom Definitions

Ensure Cluster has access to cert-manager definitions

kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.6/deploy/manifests/00-crds.yaml

Message Queue Preparations

Topic Creation

The EE Platform uses topics to pass events from one stage of the service to another. The topic from the first to second stage is called main, and from the second to the third stage are named after the destination they send events into (in kebab-case.)

For example, if I want the platform to send events to Google Analytics, Adobe Analytics and Pinterest, I would need to create the following topics:

  • main
  • google-analytics
  • adobe-analytics
  • pinterest

To ensure the uniqueness of the names, if the Message Queue is shared by other applications, we recommend supplying a Topic Prefix to the platform upon installation. By adding a prefix to be used by the platform, you will need to ensure that topics exists for the existing naming convention with the prefix you specified. For example, if I use the default Topic Prefix of mr-ee I will need to make the following topics:

  • mr-ee-main
  • mr-ee-google-analytics
  • mr-ee-adobe-analytics
  • mr-ee-pinterest

We recommend always specifying a prefix to future-proof any shared resources.

Custom Topics

If you desire the router to send a copy of events to a Topic you will manually read from, you will need to push a custom Destination for your Write Key into the Configuration Store. Keep in mind that whatever you add here will have the same prefix added as specified in the Topic Names section above.

For example, if you have the prefix mr-ee- and add logs as a destination, the Router will write to the Topic mr-ee-logs.

We only create Event Forwarders based on the values used to install or update the platform, so there is no need to worry about an additional service spinning up to consume messages. However, we do recommend ensuring that the naming is unique and will not clash with future updates of us making additional destinations available. Stay away from names that describe third-party services or message delivery types, such as amazon-kinesis or redis as they may conflict in the future and require further customization to support.

Partition Creation

If supported by your Message Queue of choice, such as Kafka or Kinesis, Partitions allow for multiple services in each group to read events simultaneously. You will want to have, at least, a 1:1 ratio for partitions to consumers. We recommend assuming a 3:1 ratio of partitions to consumers to allow for the easy scaling of consumers.

Our minimal production size is two Routers and one of each Forwarder. An example of partitions, according to our recommendation, is as follows:

  • main: 6 Partitions
  • google-analytics: 3 Partitions
  • adobe-analytics: 3 Partitions
  • pinterest: 3 Partitions

Installation of the Platform

What you will need

  1. Kubernetes and Helm installed and configured on your computer
  2. Docker installed and configured on your computer
  3. Operational access to your Kubernetes Cluster
  4. The domain(s) you will use for sending events, pointing to the Static IP/Load Balancer created in the Cloud Provider Prerequisites > Networking section.

Step 1: Create Install Directory

Create a new folder or directory on your local system. Our installation process will generate files into this directory for you to use Helm for the final installation of the platform.

Step 2: Run the Installation Script

Using your terminal, navigate into the directory you created in Step 1. Then, run the installation script we've provided you, passing along the License Key (one per contracted platform instance) once the installation process asks for it. After, providing the Key, the rest of the installation will continue over your web browser. Follow the on-screen instructions to finishing the configuration of your platform's details.

Step 3: Installing the platform

Once the installation is complete, the directory you created in Step 1 will now have all of our platform charts, customized to your instructions, within it. Run helm install metarouter-enterprise -f ./metarouter-ee.yaml --name <RELEASE> --namespace <NAMESPACE> to install the platform into Kubernetes, replacing <RELEASE> with the name you would like to use to reference the platform (we recommend ee for just the one deployment) and replacing <NAMESPACE> with the namespace you've previously configured for our platform to run in (as specified in the Kubernetes Prerequisites section.)

Installation complete

That's it, the platform will automatically pick up on events sent to the domain you have specified and process them using your provided Message Queue.

Please keep the files generate by the installation process in a safe place, as it will make updating the platform exponentially easier. Feel free to keep in Revision Control the files created during the installation process to keep a record of the platform state, if you would like.

Updates to the Platform

Manual Updates

Using the files generated by the Installation Process, you can repeat Step 2 and Step 3 of the installation process to create updated instructions. Then, run the command helm upgrade <RELEASE> metarouter-enterprise -f ./metarouter-ee.yaml and Helm will handle rolling out changes to the updated components. There is no need to stop or switch traffic during this process.

Automated Stable Updates

Note: This is a stub of a future feature. Currently not available.

Controlling how the MetaRouter Platform sends data

The MetaRouter Streaming Data Platform will accept events from Sources, identified by their Write Key, and fan-out that data to the Source's Destinations, which map and forward the data out of the platform to their respective endpoints. You can control which Sources are allowed to have data received by the platform and which destinations are sent the data by providing the platform with a Source Configuration.

For example, if you have two applications, one a website and another a server that supports said website, you can give them a unique identifier, called the Write Key, that will allow you to detail where their events go. Using the Write Keys WEB01 and SVR01 for example, you would then create two Source Configurations so the platform knows what to do once it receives events with those Write Keys.

In order to create effective Source Configurations, you'll need send events to Destinations. Let's say that you'd like want events from the server to go into S3, but want the web events to go to S3 and Google Analytics. Your Source Configurations would look something like this:

"WEB01": {
  "s3": {
    "config": {
      ...
    }
  },
  "google-analytics": {
    "config": {
      ...
    }
  }
}

"SVR01": {
  "s3": {
    "config": {
      ...
    }
  }
}

Each configuration for a Destination can be unique per Source, but cannot have more than one of the same Destination for a single Source. To find out what the config parameters are for a Destination, visit the Destinations section.

Version Control

We recommend using version control to save the contents of the configuration JSON file. The state of the configuration stored inside of the cluster does not maintain a state history. If a configuration is updated through the process outlined below, it will overwrite the current configuration with the new revision. In order to roll back a configuration change the pervious version of the previous version of the JSON file will need to be uploaded again.

Canary Configuration Management

Canary is our tool that delegates instructions to the rest of the platform. It has a built-in API that can take updates to Sources and immediately deploy them to the rest of the platform.

Manual Updates

Step 1: Locate Canary

Using Kubernetes' built-in tools, you can connect your local environment to Canary to pass it instructions directly.

Find the pod name of Canary by running kubectl get pods -n <NAMESPACE>, filling in the correct namepsace for your deployment, and find a pod that has canary in it's name. Copy the full pod name.

Step 2: Port Forward Canary

Using the pod name you copied in Step 1, run the command kubectl port-forward <POD_NAME> 8080 -n <NAMESPACE>, replacing the pod name and namespace with the correct values. You should now see the console verify that your local system is not connected to Canary on port 8080.

Step 3: POST the Source Configuration to Canary

You want to send a POST request to localhost:8080/cache/set/WRITEKEY, where WRITEKEY is the Write Key of your Source and the application/json body is an object of the Destinations and their configurations.

Here is an example of the request body:

"WEB01": {
  "facebook-pixel": {
    "config": {
      "pixelId": "348149915804750",
      "valueFieldIdentifier": "price",
      "whitelistPII": [
        "email",
        "phone"
      ]
    }
  },
  "google-analytics": {
    "config": {
      "appId": "UA-XXXXXXXX-4",
      "searchParam": "searchRedirect",
      "baseURL": "http://www.metarouter.io/"
    }
  }
}

And here is a cURL example of the request:

curl localhost:8080/cache/set/WRITEKEY -X POST --data "{\"google-analytics\": {\"config\": {\"appId\": \"UA-XXXXXXXX-4\"}}}" -H "Content-type: application/json"

Or from a file:

curl localhost:8080/cache/set/WRITEKEY -X POST --data-binary "@PATH/TO/YOUR-CONFIG-FILE.json" -H "Content-type: application/json"

Step 4: Verify Source Configuration

You can also make a GET request to Canary to read the existing Source Configuration by the Source's Write Key. Simply make a GET request to localhost:8080/cache/get/WRITEKEY and Canary will respond with a JSON payload it has for that configuration.

Here is a cURL example of that request:

curl localhost:8080/cache/get/WRITEKEY

Using our Management Tools

Note: This is a stub of a future feature. Coming Soon.

Using our Remote Management Connection

Note: This is a stub of a future feature. Coming Soon.

© 2020 MetaRouter

Disclaimer: MetaRouter Cloud leverages code from several of Segment's fabulous library of open-source tools, including their source SDKs, destinations, and core Analytics.js library. As some of the core configurations outlined in our docs are largely identical to Segment's, we often re-purpose, and in some cases copy, directly from Segment's docs. In the cases where we have forked Segment's tools, we acknowledge and attribute all credit to Segment for their creation and subsequent documentation.