Schema Enforcement (Limited Release)
Overview
(Please reach out to your MetaRouter representative to learn more and gain access)
Schema Enforcement is a feature within the MetaRouter platform that allows users to define, assign, and monitor data schemas across pipelines. By adhering to the schema-json.org format, Schema Enforcement ensures that only valid data is processed, enhancing data quality, consistency, and governance. This feature allows users to enforce schema compliance at the pipeline level, track violations, and take corrective actions to maintain the integrity of their data across all systems and use cases.
Prerequisites
To effectively use Schema Enforcement, ensure you have:
- Access to the MetaRouter platform with schema enforcement features enabled.
- Basic knowledge of schema-json.org for defining and managing data schemas.
- Familiarity with data pipelines and the events you intend to enforce schema rules on.
- Understanding of the data sources and systems integrated with MetaRouter.
Configuring Schema Enforcement
Schema Enforcement configuration consists of several key steps, from defining the events you want to enforce to assigning them to pipelines. Follow these steps to set up your enforcement layer:
-
Navigate to the Schema Enforcement Tab
Log in to the MetaRouter platform. From the main dashboard, navigate to the Schema Enforcement tab in the left-hand navigation menu. This is where you will define your events, create schemas, and build your schema library.
-
Define Events
Begin by creating the events you wish to enforce. These event names should align with the events your website or application is expected to send. Click Add Event, provide the event name, and save.
Examples of common events:
- pageview
- add_to_cart
- purchased
-
Once your events are defined, create schemas to validate the structure and data quality for those events. Schemas can be:Create Schemas
Event-Specific: A unique schema for a single event.
Shared Across Events: A schema that applies to multiple events with similar structures.
To create a schema:
Go to the Schema tab and then add Schemas and provide a name and description.
Enter the schema in JSON format, ensuring it aligns with the expected structure of your events.
Save the schema.
Once a schema is assigned to a Published Schema Library then the Schema cannot be edited or changed. You will need to remove the Schema from the Schema Library first and then make your changes.
Example schema for a Purchase event:
{
"title": "Purchase Event",
"type": "object",
"properties": {
"event": {
"type": "string",
"const": "Purchase"
},
"userId": {
"type": "string",
"description": "Unique identifier for the user making the purchase"
},
"anonymousId": {
"type": "string",
"description": "Anonymous identifier if the user is not logged in"
},
"timestamp": {
"type": "string",
"format": "date-time",
"description": "ISO 8601 timestamp of the event"
},
"properties": {
"type": "object",
"properties": {
"orderId": {
"type": "string",
"description": "Unique identifier for the order"
},
"total": {
"type": "number",
"description": "Total amount of the purchase"
},
"currency": {
"type": "string",
"description": "Currency code in ISO 4217 format"
},
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"productId": {
"type": "string",
"description": "Unique identifier for the product"
},
"name": {
"type": "string",
"description": "Name of the product"
},
"category": {
"type": "string",
"description": "Category of the product"
},
"price": {
"type": "number",
"description": "Price of the product"
},
"quantity": {
"type": "integer",
"description": "Quantity of the product purchased"
}
},
"required": ["productId", "name", "price", "quantity"]
},
"description": "List of products in the purchase"
}
},
"required": ["orderId", "total", "currency", "products"]
},
"context": {
"type": "object",
"properties": {
"ip": {
"type": "string",
"format": "ipv4",
"description": "IP address of the user"
},
"userAgent": {
"type": "string",
"description": "User agent of the browser or device"
}
},
"additionalProperties": false
}
},
"required": ["event", "userId", "timestamp", "properties"],
"additionalProperties": false
}
-
The Schema Library serves as your enforcement plan, grouping events and their respective schemas.Build Your Schema Library
Go to the Schema Libraries tab.
Click Add Library and name your library.
Assign events to schemas by selecting an event and linking it to the appropriate schema using the drop downs. Finally, select the version of the Schema you want to use (if multiple versions have been created).
Review your library to ensure all events are accounted for and linked to schemas.
Select the library from the list.
Click Publish to lock the schema assignments and make the library available for use.
Published schema libraries will be listed as available resources for Pipelines.
-
Navigate to the Pipelines tab in the navigation bar and find the pipeline you wish to enforce schemas on:Assign the Schema Library to a Pipeline
Click the Schema tile within the pipeline card.
Select the published Schema Library from the dropdown menu.
Save your changes.
Only one Schema Library can be assigned to a pipeline at a time. The Schema Library must also be published in order to used on a Pipeline.
Once the Schema Library has been assigned and data is flowing any events that violate the Schema Library will be flagged and a violation count will be shown on the Schema Library tile.
How Schema Enforcement Works
Schema Enforcement monitors the events flowing through pipelines and evaluates them against the schemas defined in the Schema Library assigned to each pipeline. Here’s how the process works:
Pipeline Monitoring: Once a Schema Library is assigned to a pipeline, it actively monitors all events being collected.
Validation: Each event is evaluated against the assigned schema. The system checks:
- Required parameters and events are present.
- Parameters adhere to the correct data types and formats as defined in the schema.
- Flagging Violations: If an event does not meet the schema requirements, it is flagged as a violation. Examples include:
- A required parameter is missing.
- A parameter has a mismatched data type (e.g., a string instead of a number).
- Aggregating Violations: All schema violations are aggregated and displayed in the Violations Report, accessible via the Schema tile on the pipeline card.
- Drill-Down Insights: Users can click the violation count on the pipeline card to open the Violation Report slide-out:
- View a list of events flagged as violations.
- Click on individual events to see detailed information about the violation types associated with that event.
Actionable Insights:
Use the Violation Report to identify issues in the event payload. Work with your web development team to determine the root cause of the violations. Team can then fix the issue/change on the website or application, or update the Schema Library to reflect any intentional changes. As new events continue to be collected the violation count on the pipeline card continuously update.
By consistently monitoring and resolving schema violations, you ensure that your data pipelines and downstream tools maintain high data quality and integrity.
Additional Considerations
Versioning and Testing
Versioning: All Schemas and Schema Libraries support versioning so they can evolve as you data shape changes. However, you can maintain the historical lineage of your customers data.
Testing: Before publishing updates, test new schemas against sample data to ensure compatibility with your pipeline.
Schema Violation Impacts
Pipeline Health: Frequent schema violations may indicate an issue with your data source or schema definition. Regularly review the Violation Report to identify trends and take corrective actions.
Data Governance: Schema Enforcement plays a critical role in maintaining compliance with data governance policies. Violations should be promptly addressed to avoid downstream impacts.
Schema Updates and Deployment
Backward Compatibility: When updating schemas, consider the impact on existing data sources. Communicate schema changes to all stakeholders to avoid disruptions.
Publishing Changes: Changes to schemas or the Schema Library will not take effect until published and assigned to a pipeline.
Performance Considerations
Validation Overhead: Schema validation introduces minimal latency. Monitor your pipeline performance to ensure it meets operational requirements.
Scalability: Schema Enforcement is designed to scale with high data volumes. Ensure your schema definitions are optimized to avoid unnecessary complexity.
Collaboration Between Teams
Developer Coordination: Use the Violation Report as a bridge between your data and development teams. Detailed violation insights help teams quickly diagnose and resolve issues.
Documentation: Keep an up-to-date reference of your schemas and assigned events. This reduces ambiguity and accelerates troubleshooting.
Using Violations for Continuous Improvement
Identify Trends: Regularly review violation trends to identify recurring issues or areas for improvement.
Optimize Schema Definitions: Use violation insights to refine your schemas, ensuring they better align with your data collection requirements.
By addressing these considerations, you can maximize the value of Schema Enforcement and maintain reliable, high-quality data pipelines.
Updated about 12 hours ago