Working with Filters¶
In this guide we explore how to use the Control API to manage filters.
What are Filters¶
When content is retrieved from (social) media services we usually want extra control on what is collected from said services. Filters allow us to do this automatically. For example, we might want to reject all content that contains profanity, i.e. blacklisting. Or the other way around, we might want to only accept content that matches certain criteria, i.e. whitelisting.
After filtering, accepted content can be viewed and moderated in the so called content queue. Optionally, this content can be published as a so called publication content feed. See the Publication Guide for more information.
How does Content flow through a Filter¶
A filter can be used on three levels:
-
Contract
When a filter is set on a contract level, all content resulting from all collections (and therefore all inputs grouped by said collection!) that belong to that contract will be controlled by that filter.
-
Collection
When a filter is set on a collection level, only the content resulting from that single collection (and therefore all inputs grouped by said collection!) will be controlled by that filter.
-
Input
When a filter is set on an input level, only the content resulting from that single input will be controlled by that filter.
The filter of a higher level (for example contract level) can be overridden by a filter on a lower level (for example collection- or input level).
The use of a filter is optional. If a filter is not used on input, collection or contract level, all content resulting from the inputs grouped by a collection in a contract, will be unfiltered. If however a filter is used (on any level), the content queue will only contain content items that are accepted by said filter.
See the Collection Guide and the Contract Guide for more information.
The Anatomy of a Filter¶
We need to define filter rules to tell a filter when it needs to reject or accept a content item. Because filtering is usually a complex process we allow multiple rules to be used by a filter to offer as much flexibility as possible. This is accomplished by grouping filter rules together in a filter set. In turn these sets are grouped together by a filter chain, which in essence is the (entire) filter.
What is a Rule¶
A rule operates on a content item by applying logic to it.
To do this the rule needs to know the field of the content
item on to which it can apply said logic to. Then it can
evaluate the provided rule value to determine if the
content item in question should be rejected or accepted.
For example, the equals
operator can be used to check
if the name
field of a content item equals the value
John Doe
. If this rule evaluates to true
, i.e. the name
of the content item equals the value John Doe
, the
content item will be rejected, i.e. filtered.
Besides the mentioned properties, a filter rule also has the
not
property. By default this value is set to false
, but
when set to true
, the effect is that the rule will be
flipped. To reuse the rule example above, this would mean
that only content items where the name
field equals
the value John Doe
, will be accepted. For more information
see how to whitelist.
Operator Overview¶
The example use cases below, all assume default filter
rule logic where the not
property of the rule is set to
false
.
OPERATOR | USED WHEN | EXAMPLE USE CASE |
---|---|---|
equals | A content item field must exactly match the rule value. | If a content item should be rejected when its name field matches the string Justin Bieber . |
gt | A content item field must be greater than the rule value. | If a content item should be rejected when its followers field is greater that 10000 . |
gte | A content item field must be greater than or equal the rule value. | If a content item should be rejected when its followers field is equal to 10000 or more. |
lt | A content item field must be less than the rule value. | If a content item should be rejected when its followers field is greater less that 10 . |
lte | A content item field must be less than or equal the rule value. | If a content item should be rejected when its followers field is equal to 10 or less. |
pattern | A content item field must exactly match the pattern of the rule value. This pattern must be a valid regular expression. | If a content item should be rejected when its text field matches the pattern /[\d]/ . |
in | A content item field must exactly match a single entry in the rule value. For more information see using in operators. |
If a content item should be rejected when its name field exactly matches a single entry in the array [ 'Justin Bieber', 'Bob Saget' ] . |
patternin | A content item field must exactly match a single pattern entry in the rule value. This pattern must be a valid regular expression. For more information see using in operators. |
If a content item should be rejected when its text field exactly matches a single entry in the array [ '/coca cola/i', '/fanta/i', '/sprite/i' ] . |
datediff | A content item date field must be within the bounds of the rule value. | If a content item should be rejected when its older than 86400 seconds (24 hours). |
exists | A content item field must exists as a property of the content item. | If a content item should be rejected when it has no media field. |
Rule examples by Operator¶
RULE OPERATOR | RULE VALUE | RULE EXAMPLE |
---|---|---|
equals | String | { field: 'normalized.name', operator: 'equals', value: 'Justin Bieber', not: false } |
gt | Number | { field: 'normalized.followerCount', operator: 'gt', value: 10000, not: false } |
gte | Number | { field: 'normalized.followerCount', operator: 'gte', value: 10000, not: false } |
lt | Number | { field: 'normalized.followerCount', operator: 'lt', value: 10, not: false } |
lte | Number | { field: 'normalized.followerCount', operator: 'lte', value: 10, not: false } |
pattern | String | { field: 'normalized.text', operator: 'pattern', value: '/\d/', not: false } |
in | Array |
|
patternin | Array |
|
datediff | Number | { field: 'normalized.date', operator: 'datediff', value: 86400, not: false } |
exists | Number | { field: 'normalized.media', operator: 'exists', not: false } |
Using in
Operators¶
The in
and patternin
operators expect to operate on
multiple values when evaluating a rule. These so called
entries must always be string elements but we distinguish
between regular (used by the in
operator) and pattern
(used by the patternin
operator) strings. For example,
[ 'a', 'b' 'c' ]
might be a value used by the in
operator
and [ '/[a-f]/', '/\d/' ]
by the patternin
operator.
Note that patterns must always be valid regular
expressions.
The entries of the rule value must either be part of an array
or an external list. Depending on if
an array or external list is used, the rule will have different
properties set:
- Array of entries: a rule will have a value
property, where said property value must be an Array
.
- List of entries: a rule will have no value
property, but instead a listId
property, where said property value must be a valid resource ID.
The benefit of using lists over arrays is that the
lists can be reused across rules. However do keep in mind
that when a listId
is provided, the consumer creating the
rule must have access to the contract the list belongs to.
Additionally, the contract the list belongs to must be the
same contract as the contract the set belongs to
(a rule is always part of a set). See the Contract Guide
for more information on this matter.
What is a List¶
A list is a collection of entries that can
be used as rule values. Lists may only be used in combination
with the in
or patternin
operators. For more information
see using in operators.
What is a Set¶
A filter set groups multiple rules together and allows you to specify special logic to only evaluate its rules when a precondition is met. This allows for more efficient and fine grained filtering. For example you might only want to apply a set of rules when a content item originates from a specific (social) media provider.
What is a Chain¶
A filter chain groups multiple sets together. In essence the chain is referenced whenever an entity wants to use a filter, i.e. an input, collection or contract.
How do Chains, Sets and Lists fit together¶
A chain can have references to one or more sets. Where a
rule (the set precondition is also a rule!) can have a
reference to a list. The chain can then be used by an input,
collection and/or contract by specifying the chain
resource ID as the value for the filterId
property.
How are Rules Processed¶
Whenever a filter rule evaluates to true
, the content item
the rule is applied to is rejected.
Rules are processed individually per set and by default,
processing of a set stops when a rule evaluates to
true
, because the content item is then rejected and there
is no need to evaluate the other rules. However, this
functionality can be altered by the or
property defined on
a filter set. For more information see how to
blacklist and how to whitelist,
where manipulating the or
property is explored.
How to Blacklist¶
A filter chain can support blacklisting functionality (on
a set level) by setting the or
property of a set to
false
, which is the default value.
Set Representation¶
We can create the following filter set to blacklist the user Justin Bieber from all twitter content:
{
"_id": "565d4de4056f859526d53389",
"created": "2016-05-03T10:26:22.009Z",
"lastModified": "2016-05-03T10:26:22.009Z",
"lastModifiedBy": "John Doe",
"createdBy": "John Doe",
"contractId": "565c4df4056e859526e62257",
"name": "Twitter Blacklist",
"preCondition": {
"operator": "equals",
"field": "service",
"value": "twitter",
"not": false
},
"active": true,
"rules": [
{
"operator": "equals",
"field": "user",
"value": "Justin Bieber",
"not": false
}
],
"or": false
}
Behavior¶
To blacklist, the rules of a set must be processed until
the first rule that evaluates to true
(rejecting the
content item). After that there is no need to evaluate the
other rules, because the content item has already been
rejected. This functionality is provided by setting the or
property of the set to false
, which is the default value.
Rule Evaluation¶
{
"operator": "equals",
"field": "user",
"value": "Justin Bieber",
"not": false
}
The rule above dictates that every content item with a
field name
that equals the value Justin Bieber
,
must be rejected.
Therefore if we would evaluate the following content:
[
{
"name": "Chuck Norris"
},
{
"name": "Justin Bieber"
}
]
The following will happen during the filter process:
1. Evaluate the Rule for the first Content Item:
1. Does the content item have a field name
where the value is Justin Bieber
?
* no: rule evaluates to false
2. Does the rule have the not
property set to true
?
* no: the rule still evaluates to false
3. Will the content item be rejected?
* no: the rule evaluates to false
2. Evaluate the Rule for the second Content Item:
1. Does the content item have a field name
where the value is Justin Bieber
?
* yes: rule evaluates to true
2. Does the rule have the not
property set to true
?
* no: the rule still evaluates to true
3. Will the content item be rejected?
* yes: the rule evaluates to true
The content queue would therefore contain the following accepted content after filtering:
[
{
"name": "Chuck Norris"
}
]
How to Whitelist¶
A filter chain can support whitelisting functionality (on
a set level) by setting the or
property of a set to true
.
This means that an entire set is considered a whitelist and said set must only contain whitelisting rules.
Set Representation¶
{
"_id": "565d4de4056f859526d53390",
"created": "2016-05-03T10:26:22.009Z",
"lastModified": "2016-05-03T10:26:22.009Z",
"lastModifiedBy": "John Doe",
"createdBy": "John Doe",
"contractId": "565c4df4056e859526e62257",
"name": "Twitter Whitelist",
"preCondition": {
"operator": "equals",
"field": "service",
"value": "twitter",
"not": false
},
"active": true,
"rules": [
{
"operator": "equals",
"field": "user",
"value": "Steven Seagal",
"not": true
},
{
"operator": "patternin",
"field": "text",
"value": [ "/apples/", "/bananas/" ],
"not": true
}
],
"or": true
}
Behavior¶
To whitelist, all rules of a set must be processed, because
even though a rule evaluates to true
(normally immediately
rejecting the content item), a different rule might evaluate
to false
(accepting the content item). This functionality
is provided by setting the or
property of the set
to
true
.
Rule Evaluation¶
[
{
"operator": "equals",
"field": "user",
"value": "Steven Seagal",
"not": true
},
{
"operator": "patternin",
"field": "text",
"value": [ "/apples/", "/bananas/" ],
"not": true
}
]
The rules above dictate that every content item with a
field user
that equals the value Steven Seagal
or
that every content item with a field text
that contains
the value apples
or bananas
must be accepted.
Therefore if we would evaluate the following content:
[
{
"name": "Chuck Norris",
"text": "I love bananas!"
}
]
The following will happen during the filter process:
1. Evaluate the first Rule:
{
"operator": "equals",
"field": "name",
"value": "Steven Seagal",
"not": true
}
- Does the content item have a field
name
where the value isSteven Seagal
?- no: rule evaluates to
false
- no: rule evaluates to
- Does the rule have the
not
property set totrue
?- yes: flip the evaluated rule value, the rule now evaluates to
true
- yes: flip the evaluated rule value, the rule now evaluates to
- Will the content item be rejected?
- yes: the rule evaluates to
true
- yes: the rule evaluates to
Even though the first rule says the content item must be
rejected because the name is not 'Steven Seagal'
, we
can't stop processing the set, because the next (whitelisting)
rule might apply. This behavior is enabled by setting the
or
property of the set to true
.
2. Evaluate the second Rule:
{
"operator": "patternin",
"field": "text",
"value": [ "/apples/", "/bananas/" ],
"not": true
}
- Does the content item have a field
text
that either contains the valueapples
orbananas
?- yes: rule evaluates to
true
- yes: rule evaluates to
- Does the rule have the
not
property set totrue
?- yes: flip the evaluated rule value, the rule now evaluates to
false
- yes: flip the evaluated rule value, the rule now evaluates to
- Will the content item be rejected?
- no: the rule evaluates to
false
- no: the rule evaluates to
The content queue would therefore contain the following (accepted) content after filtering:
[
{
"name": "Chuck Norris",
"text": "I love bananas!"
}
]
Managing Chains, Sets and Lists in the Control API¶
- Create a filter chain
- Get a filter chain
- Get all filter chains
- Update a filter chain
- Remove filter chain(s)
- Create a filter set
- Get a filter set
- Get all filter sets
- Update a filter set
- Remove filter set(s)
- Create a filter list
- Get a filter list
- Get all filter lists
- Update a filter list
- Remove filter list(s)