Working with Filters

In this guide we explore how to use the Control API to manage filters.

What are Filters

When content is retrieved from (social) media services we usually want extra control on what is collected from said services. Filters allow us to do this automatically. For example, we might want to reject all content that contains profanity, i.e. blacklisting. Or the other way around, we might want to only accept content that matches certain criteria, i.e. whitelisting.

After filtering, accepted content can be viewed and moderated in the so called content queue. Optionally, this content can be published as a so called publication content feed. See the Publication Guide for more information.

How does Content flow through a Filter

A filter can be used on three levels:

  • Contract

    When a filter is set on a contract level, all content resulting from all collections (and therefore all inputs grouped by said collection!) that belong to that contract will be controlled by that filter.

  • Collection

    When a filter is set on a collection level, only the content resulting from that single collection (and therefore all inputs grouped by said collection!) will be controlled by that filter.

  • Input

    When a filter is set on an input level, only the content resulting from that single input will be controlled by that filter.

The filter of a higher level (for example contract level) can be overridden by a filter on a lower level (for example collection- or input level).

The use of a filter is optional. If a filter is not used on input, collection or contract level, all content resulting from the inputs grouped by a collection in a contract, will be unfiltered. If however a filter is used (on any level), the content queue will only contain content items that are accepted by said filter.

See the Collection Guide and the Contract Guide for more information.

Content and Filter flow

The Anatomy of a Filter

We need to define filter rules to tell a filter when it needs to reject or accept a content item. Because filtering is usually a complex process we allow multiple rules to be used by a filter to offer as much flexibility as possible. This is accomplished by grouping filter rules together in a filter set. In turn these sets are grouped together by a filter chain, which in essence is the (entire) filter.

What is a Rule

A rule operates on a content item by applying logic to it. To do this the rule needs to know the field of the content item on to which it can apply said logic to. Then it can evaluate the provided rule value to determine if the content item in question should be rejected or accepted. For example, the equals operator can be used to check if the name field of a content item equals the value John Doe. If this rule evaluates to true, i.e. the name of the content item equals the value John Doe, the content item will be rejected, i.e. filtered.

Besides the mentioned properties, a filter rule also has the not property. By default this value is set to false, but when set to true, the effect is that the rule will be flipped. To reuse the rule example above, this would mean that only content items where the name field equals the value John Doe, will be accepted. For more information see how to whitelist.

Operator Overview

The example use cases below, all assume default filter rule logic where the not property of the rule is set to false.

OPERATOR USED WHEN EXAMPLE USE CASE
equals A content item field must exactly match the rule value. If a content item should be rejected when its name field matches the string Justin Bieber.
gt A content item field must be greater than the rule value. If a content item should be rejected when its followers field is greater that 10000.
gte A content item field must be greater than or equal the rule value. If a content item should be rejected when its followers field is equal to 10000 or more.
lt A content item field must be less than the rule value. If a content item should be rejected when its followers field is greater less that 10.
lte A content item field must be less than or equal the rule value. If a content item should be rejected when its followers field is equal to 10 or less.
pattern A content item field must exactly match the pattern of the rule value. This pattern must be a valid regular expression. If a content item should be rejected when its text field matches the pattern /[\d]/.
in A content item field must exactly match a single entry in the rule value. For more information see using in operators. If a content item should be rejected when its name field exactly matches a single entry in the array [ 'Justin Bieber', 'Bob Saget' ].
patternin A content item field must exactly match a single pattern entry in the rule value. This pattern must be a valid regular expression. For more information see using in operators. If a content item should be rejected when its text field exactly matches a single entry in the array [ '/coca cola/i', '/fanta/i', '/sprite/i' ].
datediff A content item date field must be within the bounds of the rule value. If a content item should be rejected when its older than 86400 seconds (24 hours).
exists A content item field must exists as a property of the content item. If a content item should be rejected when it has no media field.

Rule examples by Operator

RULE OPERATOR RULE VALUE RULE EXAMPLE
equals String
 { field: 'normalized.name', operator: 'equals', value: 'Justin Bieber', not: false } 
gt Number
 { field: 'normalized.followerCount', operator: 'gt', value: 10000, not: false } 
gte Number
 { field: 'normalized.followerCount', operator: 'gte', value: 10000, not: false } 
lt Number
 { field: 'normalized.followerCount', operator: 'lt', value: 10, not: false } 
lte Number
 { field: 'normalized.followerCount', operator: 'lte', value: 10, not: false } 
pattern String
 { field: 'normalized.text', operator: 'pattern', value: '/\d/', not: false } 
in Array
  • Providing an Array of entries:
     { field: 'normalized.name', operator: 'in', value: [ 'Justin Bieber', 'Bob Saget' ], not: false } 
  • Providing an external list reference:
     { field: 'normalized.text', operator: 'in', listId: '57287ccea1e26b9b67362f92', not: true } 
patternin Array
  • Providing an Array of entries:
     { field: 'normalized.text', operator: 'patternin', value: [ '/coca cola/i', '/fanta/i', '/sprite/i' ], not: false } 
  • Providing an external list reference:
     { field: 'normalized.text', operator: 'patternin', listId: '57287ccea1e26b9b67362f93', not: true } 
datediff Number
 { field: 'normalized.date', operator: 'datediff', value: 86400, not: false } 
exists Number
 { field: 'normalized.media', operator: 'exists', not: false } 

Using in Operators

The in and patternin operators expect to operate on multiple values when evaluating a rule. These so called entries must always be string elements but we distinguish between regular (used by the in operator) and pattern (used by the patternin operator) strings. For example, [ 'a', 'b' 'c' ] might be a value used by the in operator and [ '/[a-f]/', '/\d/' ] by the patternin operator. Note that patterns must always be valid regular expressions.

The entries of the rule value must either be part of an array or an external list. Depending on if an array or external list is used, the rule will have different properties set: - Array of entries: a rule will have a value property, where said property value must be an Array. - List of entries: a rule will have no value property, but instead a listId property, where said property value must be a valid resource ID.

The benefit of using lists over arrays is that the lists can be reused across rules. However do keep in mind that when a listId is provided, the consumer creating the rule must have access to the contract the list belongs to. Additionally, the contract the list belongs to must be the same contract as the contract the set belongs to (a rule is always part of a set). See the Contract Guide for more information on this matter.

What is a List

A list is a collection of entries that can be used as rule values. Lists may only be used in combination with the in or patternin operators. For more information see using in operators.

What is a Set

A filter set groups multiple rules together and allows you to specify special logic to only evaluate its rules when a precondition is met. This allows for more efficient and fine grained filtering. For example you might only want to apply a set of rules when a content item originates from a specific (social) media provider.

What is a Chain

A filter chain groups multiple sets together. In essence the chain is referenced whenever an entity wants to use a filter, i.e. an input, collection or contract.

How do Chains, Sets and Lists fit together

A chain can have references to one or more sets. Where a rule (the set precondition is also a rule!) can have a reference to a list. The chain can then be used by an input, collection and/or contract by specifying the chain resource ID as the value for the filterId property.

Chains, Sets and Lists Relation

How are Rules Processed

Whenever a filter rule evaluates to true, the content item the rule is applied to is rejected.

Rules are processed individually per set and by default, processing of a set stops when a rule evaluates to true, because the content item is then rejected and there is no need to evaluate the other rules. However, this functionality can be altered by the or property defined on a filter set. For more information see how to blacklist and how to whitelist, where manipulating the or property is explored.

How to Blacklist

A filter chain can support blacklisting functionality (on a set level) by setting the or property of a set to false, which is the default value.

Set Representation

We can create the following filter set to blacklist the user Justin Bieber from all twitter content:

{
    "_id": "565d4de4056f859526d53389",
    "created": "2016-05-03T10:26:22.009Z",
    "lastModified": "2016-05-03T10:26:22.009Z",
    "lastModifiedBy": "John Doe",
    "createdBy": "John Doe",
    "contractId": "565c4df4056e859526e62257",
    "name": "Twitter Blacklist",
    "preCondition": {
        "operator": "equals",
        "field": "service",
        "value": "twitter",
        "not": false
    },
    "active": true,
    "rules": [
        {
            "operator": "equals",
            "field": "user",
            "value": "Justin Bieber",
            "not": false
        }
    ],
    "or": false
}

Behavior

To blacklist, the rules of a set must be processed until the first rule that evaluates to true (rejecting the content item). After that there is no need to evaluate the other rules, because the content item has already been rejected. This functionality is provided by setting the or property of the set to false, which is the default value.

Rule Evaluation

{
    "operator": "equals",
    "field": "user",
    "value": "Justin Bieber",
    "not": false
}

The rule above dictates that every content item with a field name that equals the value Justin Bieber, must be rejected.

Therefore if we would evaluate the following content:

[
    {
        "name": "Chuck Norris"
    },

    {
        "name": "Justin Bieber"
    }
]

The following will happen during the filter process:

1. Evaluate the Rule for the first Content Item: 1. Does the content item have a field name where the value is Justin Bieber? * no: rule evaluates to false 2. Does the rule have the not property set to true? * no: the rule still evaluates to false 3. Will the content item be rejected? * no: the rule evaluates to false

2. Evaluate the Rule for the second Content Item: 1. Does the content item have a field name where the value is Justin Bieber? * yes: rule evaluates to true 2. Does the rule have the not property set to true? * no: the rule still evaluates to true 3. Will the content item be rejected? * yes: the rule evaluates to true

The content queue would therefore contain the following accepted content after filtering:

[
    {
        "name": "Chuck Norris"
    }
]

How to Whitelist

A filter chain can support whitelisting functionality (on a set level) by setting the or property of a set to true.

This means that an entire set is considered a whitelist and said set must only contain whitelisting rules.

Set Representation

{
    "_id": "565d4de4056f859526d53390",
    "created": "2016-05-03T10:26:22.009Z",
    "lastModified": "2016-05-03T10:26:22.009Z",
    "lastModifiedBy": "John Doe",
    "createdBy": "John Doe",
    "contractId": "565c4df4056e859526e62257",
    "name": "Twitter Whitelist",
    "preCondition": {
        "operator": "equals",
        "field": "service",
        "value": "twitter",
        "not": false
    },
    "active": true,
    "rules": [
        {
            "operator": "equals",
            "field": "user",
            "value": "Steven Seagal",
            "not": true
        },

        {
            "operator": "patternin",
            "field": "text",
            "value": [ "/apples/", "/bananas/" ],
            "not": true
        }
    ],
    "or": true
}

Behavior

To whitelist, all rules of a set must be processed, because even though a rule evaluates to true (normally immediately rejecting the content item), a different rule might evaluate to false (accepting the content item). This functionality is provided by setting the or property of the set to true.

Rule Evaluation

[
    {
        "operator": "equals",
        "field": "user",
        "value": "Steven Seagal",
        "not": true
    },

    {
        "operator": "patternin",
        "field": "text",
        "value": [ "/apples/", "/bananas/" ],
        "not": true
    }
]

The rules above dictate that every content item with a field user that equals the value Steven Seagal or that every content item with a field text that contains the value apples or bananas must be accepted.

Therefore if we would evaluate the following content:

[
    {
        "name": "Chuck Norris",
        "text": "I love bananas!"
    }
]

The following will happen during the filter process:

1. Evaluate the first Rule:

{
    "operator": "equals",
    "field": "name",
    "value": "Steven Seagal",
    "not": true
}
  1. Does the content item have a field name where the value is Steven Seagal?
    • no: rule evaluates to false
  2. Does the rule have the not property set to true?
    • yes: flip the evaluated rule value, the rule now evaluates to true
  3. Will the content item be rejected?
    • yes: the rule evaluates to true

Even though the first rule says the content item must be rejected because the name is not 'Steven Seagal', we can't stop processing the set, because the next (whitelisting) rule might apply. This behavior is enabled by setting the or property of the set to true.

2. Evaluate the second Rule:

{
    "operator": "patternin",
    "field": "text",
    "value": [ "/apples/", "/bananas/" ],
    "not": true
}
  1. Does the content item have a field text that either contains the value apples or bananas?
    • yes: rule evaluates to true
  2. Does the rule have the not property set to true?
    • yes: flip the evaluated rule value, the rule now evaluates to false
  3. Will the content item be rejected?
    • no: the rule evaluates to false

The content queue would therefore contain the following (accepted) content after filtering:

[
    {
        "name": "Chuck Norris",
        "text": "I love bananas!"
    }
]

Managing Chains, Sets and Lists in the Control API