Indicators
The toolbox provides a repository of predefined indicators and simple tools to add new, custom indicators.
Overview
Structure
By default, the central unit of observation is the forum thread, which can be part of one or multiple sub-forums. We propose that a thread is the observable trace of a sequence of interactions that can represent or display innovation activities. Each post in a thread is a contribution by a community member (the contributor). We single out the initial contribution, as we assume that XYZ. A community consists of a number of contributors and their contributions in threads, which are organized in sub-forums.
Networks
Additional levels of observation are the networks formed by contributors. The two main representations are the co-contribution network and the comment network.
co-contributor network
-
The co-contributor network is an undirected graph where each node represents a contributor. A pair of contributors is connected by an edge if both have contributed in at least one shared thread. The edge weight is proportional to the number of threads both contributed to.
commenter network
-
The commenter network is a directed graph where each node represents a contributor. A contributor A has an edge pointing to contributor B if A has contributed in at least one thread where contributor B has posted the initial contribution (A has "commented"). The weight of edge A→B is proportional to the total number of comments A made on initial contributions of B.
Levels of observation
Each indicator is observed on either contributor, contribution, thread, or community level. Aggregations of indicators are provided for higher levels. For example, the number of contributions made by a contributor can be an indicator for their role in the community. This indicator would be provided on thread-level as, e.g., average number of contributions per contributor, measured for all contributors that have contributed to a specific thread. Common aggregations are mean, sum, concate, standard-deviation, min and max. Other aggregations (or transformations) could also pick out a single of multiple values. On thread-level, an example would be the total number of contributions made by the contributor of the initial contribution.
The thread observation level can be subdivided into three further subsets of posts: the whole thread, only the initial contribution, or only the feedback (all posts that were not authored by the initial contributor). This helps to implement indicators that rely on more detailed concepts, such as ideas in idea communities, or questions in question communities, which can potentially be operationalized by inital contribution.
[TODO: Grafik]
Data
In order to be able to apply indicators to a heterogeneous set of communities, most indicators rely on the following basic data that can be collected from most online forums:
- contribution meta data: contributor, date, associated thread, position in thread, sub-forum
- contribution content: text, html, extracted links, images
- contributor properties: name/id
- thread: title
Other data, such as likes/upvotes, friendship-relations, contributor location, thread views, etc., might be available in some communities and can be included in more specialized indicators.
Generating indicator values
Reports are groups of metrics. You can use reports if you would like to analyze a certain subset of metrics in one table, or if you would like to compare results for different metric parameters.
There are several pre-defined reports that can be generated by calling pici.reports.example_report()
:
A custom report can be defined by calling Pici.add_report with a list of metrics that should be included in the report. Each metric is listed as a tuple of metric function and metric arguments (as dict).
A report that shows the number of posts in total, per week, and per quarter could be defined like this:
from pici.metrics.basic import number_of_posts, agg_number_of_posts_per_interval
p.add_report(
name='posts',
list_of_metrics=[
(number_of_posts, {}),
(agg_number_of_posts_per_interval, {'interval': '1W'}),
(agg_number_of_posts_per_interval, {'interval': '3M'})
]
)
# generate report:
p.reports.posts()
Defining new indicators
You can define custom metrics that can then be used just like the pre-defined metrics. A custom metric is implemented as a decorated function and needs to be added to the "metrics registry".
the @metric decorator
You can define a custom metric by implementing a method that has the @metric
decorator from [pici.decorators][] and takes (at least) community
as parameter:
The decorator takes two arguments, level
and returntype
. The level determines about which level of observation the metric gives information, for example the whole community, and returntype
says whether your metric returns values that can be displayed on a single row in a table (TABLE), whether it returns values on multiple rows (DATAFRAME, for example one result for each thread in the community), or whether it returns a single value (PLAIN, can also be something else than a number).
A decorator for a metric that measures a single value for a community, for example the total number of contributors, would be defined like this:
from pici.decorators import metric
from pici.datatypes import CommunityDataLevel, MetricReturnType
@metric(
level=CommunityDataLevel.COMMUNITY,
returntype=MetricReturnType.TABLE
)
def total_number_of_contributors(community):
# calculate something
A metric that measures multiple values per community, for example the number of posts in each topic, would use the decorator like this:
...
@metric(
level=CommunityDataLevel.TOPICS,
returntype=MetricReturnType.DATAFRAME
)
def number_of_posts_per_topic(community):
# calculate something
More
- The
level
can be any value specified in CommunityDataLevel (COMMUNITY, TOPICS, POSTS, ...):CommunityDataLevel.COMMUNITY
: One value for the communityCommunityDataLevel.POSTS
: One value for each post in the communityCommunityDataLevel.TOPICS
: One value for each topic (thread) in the community- ...
- The
returntype
can be any value of MetricReturnType:MetricReturnType.PLAIN
MetricReturnType.TABLE
_MetricReturnType.DATAFRAME
return values
A metric can return one or multiple values (in a dictionary). If your returntype
is TABLE
or DATAFRAME
, the metric's return values must be named entries in a dictionary. The columns in the resulting dataframe will then correspond to these names.
from pici.decorators import metric
from pici.datatypes import CommunityDataLevel, MetricReturnType
@metric(
level=CommunityDataLevel.COMMUNITY,
returntype=MetricReturnType.TABLE
)
def total_number_of_contributors(community):
# calculate something
return {
'number of contributors': # calculated value
}
A use case for returning multiple values as dictionary entries are, for example, different aggregations:
from pici.decorators import metric
from pici.datatypes import CommunityDataLevel, MetricReturnType
@metric(
level=CommunityDataLevel.COMMUNITY,
returntype=MetricReturnType.TABLE
)
def posts_per_week(community):
# calculate something
return {
'mean posts per week': # val 1
'max posts per week': # val 2
'min posts per week': # val 3
}
-----------------------------------------------------------------------------
| community | mean posts per week | max posts per week | min posts per week |
-----------------------------------------------------------------------------
| A | 37.5 | 144 | 16 |
| B | 18.2 | 39 | 5 |
| C | 27.6 | 47 | 0 |
| ... | ... | ... | ... |
|-----------|---------------------------------------------------------------|
registering a new metric
Custom metrics need to be added to the "metrics registry", so that they can be used like pre-defined metrics through Community.metrics.my_metric()
or in [reports][pici.metrics.Reports]. This can be done either via Pici.add_metric()
(adds metric for all communities), or via [Community.metrics.add()
][pici.metrics.Metrics] for a single community:
from pici import Pici
from pici.communities import OEMCommunityFactory, PPCommunityFactory
p = Pici(communities={
'OpenEnergyMonitor': OEMCommunityFactory,
'PreciousPlastic': PPCommunityFactory,
})
@metric()
def my_metric(communities):
# ...
# add metric to a community:
p.communities['OpenEnergyMonitor'].metrics.add(my_metric)
# call metric
p.communities['OpenEnergyMonitor'].metrics.my_metric()
# add metric to all communities:
p.add_metric(my_metric)
# call metric
p.communities['PreciousPlastic'].metrics.my_metric()