Skip to content

pici

community

Community

Bases: ABC

Abstract community class.

__eq__(other)

Two communities are the same if they have the same name. This is used to simplify caching.

_generate_temporal_graph(start=None, end=None, kind='co_contributor')

Generate a graph based only on posts created after start (>) and before end (<).

Parameters:

Name Type Description Default
start

datetime

None
end

datetime

None
kind

string (one of 'co_contributor', 'commenter')

'co_contributor'

temporal_graph(start=None, end=None, kind='co_contributor')

Cached access to temporal graphs.

Parameters:

Name Type Description Default
start

None or datetime start of graph snapshot

None
end

None or datetime end of graph snapshot

None
kind

string ('co_contributor' or 'commenter').

'co_contributor'

datatypes

CommunityDataLevel

Bases: Enum

View on community.

TODO

Document properly

MetricReturnType

Bases: Enum

Category of representation of metrics' return type.

DATAFRAME = 'dataframe' class-attribute

Metric's return values as series in Pandas.Dataframe

PLAIN = 'plain' class-attribute

Metric's original return value (not modified).

TABLE = 'table' class-attribute

Metric's return value(s) in table format.

helpers

aggregate(dict_of_series, aggregations=[np.mean, np.min, np.max, np.std, np.sum])

Applies a number of aggregations to the series supplied as values in dict_of_series. Keys are names of series, the name of the aggregation is appended to the series names as "(agg-name)".

Parameters:

Name Type Description Default
aggregations

list of aggregation functions

[np.mean, np.min, np.max, np.std, np.sum]
dict_of_series

dict of indicator_name:Pandas.Series

required

Returns:

Type Description

dict of formatted indicator_name: aggregated series

apply_to_initial_posts(community, new_cols, func)

Applies func to initial posts (community.posts where post_position_in_thread==1). Returns DataFrame with topic_column field as index. Cols in retured df are named according to strings in new_cols, values in cols in order of values returned by func.

Parameters:

Name Type Description Default
community

pici.Community

required
new_cols

list of strings

required
func

function to apply to each initial post from community.posts

required

and indexed by thread-ids.

as_table(func)

Decorator that returns results as table, indexed with community name. TODO: document

Parameters:

Name Type Description Default
func required

create_co_contributor_graph(link_data, node_data, node_col, group_col, node_attributes, connected=True)

Creates a networkx.Graph with nodes=users and edges if two users have contributed to the same thread. Edge weights = number of threads where two users co-contributed.

Parameters:

Name Type Description Default
link_data required
node_data required
node_col required
group_col required
node_attributes required
connected True

create_commenter_graph(link_data, node_data, node_col, group_col, node_attributes, conntected=True)

Creates a networkx.DiGraph with nodes=users and directed edges a->b if a has replied to an initial post by b. Edge weight is the number of comments.

Parameters:

Name Type Description Default
link_data required
node_data required
node_col required
group_col required
node_attributes required
conntected True

flat(df, columns='community_name')

Returns a pivoted version of df with flattened index.

Parameters:

Name Type Description Default
df pd.DataFrame

Pandas.DataFrame

required
columns str

Column name to pivot on.

'community_name'

generate_indicator_results(posts, initial_post, feedback, indicator_text, column, aggs=[np.sum, np.mean, np.min, np.max, np.std])

Returns results from column in DataFrames posts, initial_post, and feedback as different aggregations (sum, mean, ...). Initial post is only aggregated as sum. Output is a dict with df/agg: value, e.g. "posts indicator_text (mean)":value.

Parameters:

Name Type Description Default
posts required
initial_post required
feedback required
indicator_text required
column required

join_df(func)

Decorator that joins results to existing dataframe in community. TODO: document

Parameters:

Name Type Description Default
func required

merge_dfs(dfs, only_unique=False)

Wrapper for Pandas.merge(). Merges DataFrames, so that

TODO: document

Parameters:

Name Type Description Default
dfs Iterable[pd.DataFrame] required
only_unique bool False

num_words(text)

Counts the number of words in a text. Does account for html tags and comments (not included in count).

Parameters:

Name Type Description Default
text str

Text to count words in.

required

Returns:

Name Type Description
count int

Number of words.

series_most_common(series)

Get most common element from Pandas.Series.

Parameters:

Name Type Description Default
series pd.Series

Pandas.Series

required

where_all(conditions)

Concatenates logical condition with and.

Parameters:

Name Type Description Default
conditions required

word_occurrences(text, words)

Counts the number of occurrences of specified words in text.

Parameters:

Name Type Description Default
text str

A text with words.

required
words list of str

Words.

required

Returns:

Name Type Description
occurrences dict of str:int

A word (str), number of occurrences (int) dictionary

labelling

InnovationLabels

Bases: Labels

TODO: add documentation

from_limesurvey(limesurvey_results, drop_labellers=None)

Adds label entries from Limesurvey results format. Limesurvey results can contain multiple labelled threads per response. For each thread i and associated url and labels, the data must contain one column, e.g.: "thread1" (=url), "labelA1", "labelB1", ..., "thread2", "labelA2", "labelB2", ...

Parameters:

Name Type Description Default
limesurvey_results

String (path to file) or Pandas.DataFrame

required

LabelCollection

TODO: add documentation

all_label_names() property

TODO: add documentation

by_level(level)

TODO: Add documentation

Parameters:

Name Type Description Default
level required

labels() property

TODO: add documentation

LabelStats

This class provides metrics and visualizations to analyze the annotations made by the labellers.

Available metrics are
  • % agreement ("a_0")
  • Cohen's kappa (two labellers)
  • Fleiss' kappa (multiple labellers)
  • Krippendorff's alpha (multiple labellers, missing data)

Labellers' annotations can furthermore be evaluated against a subsample of "goldstandard" annotations, allowing to associate labellers with a quality- score.

TODO refactor --> move visualizations to visualizations.py ?

See [1] for a comparison of inter-rater agreement metrics.

[1] Xinshu Zhao, Jun S. Liu & Ke Deng (2013) Assumptions behind Intercoder Reliability Indices, Annals of the International Communication Association, 36:1, 419-480, DOI: 10.1080/23808985.2013.11679142

_melt_goldstandard_agreement(data)

Prepares interrater_agreement dataframe for plotting (e.g., with seaborn): wide format to long format, shorten label names.

Parameters:

Name Type Description Default
data

Pandas.DataFrame with label names as index, "labellers"-

required

Returns:

Type Description

Pandas.DataFrame with columns: - index: shortened label names - labellers: labeller names - variable: metric name - value: metric value

cohen_kappa()

Get Cohen's kappa for all labels, using scikit-learn implementation sklearn.metrics.cohen_kappa_score. Returns NaN if number of labellers != 2.

Returns:

Type Description

dict of (label name, kappa)

complete_agreement()

Get percentage of cases where all labellers agree (per label).

Returns:

Name Type Description
agreement

Pandas.DataFrame with 'label', '% perfect agreement',

'% n'

fleiss_kappa()

Get Fleiss kappa for all labels, based on Statsmodels implementation (statsmodels.stats.inter_rater.fleiss_kappa). Returns NaN if number of labellers < 2.

Returns:

Type Description

dict of (label name, kappa)

interrater_agreement()

Calculated the overall interrater agreement for all labellers in data. If number of labellers > 2, all values for Cohen/Fleiss kappa will be NaN.

Returns:

Type Description

agreement dataframe

krippendorff_alpha()

Get Krippendorff alphas using the ''krippendorff'' package.

See also: Andrew F. Hayes & Klaus Krippendorff (2007) Answering the Call for a Standard Reliability Measure for Coding Data, Communication Methods and Measures, 1:1, 77-89, DOI: 10.1080/19312450709336664

Returns:

Type Description

Pandas.DataFrame

pairwise_interrater_agreement(goldstandard=None, min_comparisons=1)

Calculates the agreement metrics for all combinations of two labellers. If goldstandard is set (name of labeller), only comparisons with the goldstandard are calculated.

Parameters:

Name Type Description Default
goldstandard

Name of labeller

None
min_comparisons

Minimum number of shared labelled cases

1

Returns:

Type Description

Agreement dataframe with "labellers" column that contains

  • (labeller A, labeller B) tuples (default), or
  • labeller B strings (if labeller A is set as goldstandard)

plot_goldstandard_agreement(kind='label_boxplots', goldstandard=None, data=None)

Plot the labellers' agreement with goldstandard. Provides different plots through kind: - label_boxplots: values: each labeller's agreement with goldstandard, x: metric, y: boxplot of values - labellers_points: values: each labeller's agreement with goldstandard, grid, col per metric, x: labels, y: values

Parameters:

Name Type Description Default
kind

One of {'label_boxplots','labellers_points'}

'label_boxplots'
goldstandard

Name of labeller to use as goldstandard (used if data is None, generates data)

None
data

agreement dataframe generated by pairwise_interrater_agreement

None

Labels

Bases: ABC

TODO: add documentation

__init__(data=None, cols=DEFAULT_COLS, filter=None)

Parameters:

Name Type Description Default
data pd.DataFrame None
cols dict DEFAULT_COLS

append(data, cols=DEFAULT_COLS, drop_labellers=None)

TODO: add documentation

Parameters:

Name Type Description Default
data pd.DataFrame required
cols dict DEFAULT_COLS
drop_labellers None

data_by_label(format='sklearn', dropna=False)

TODO: add documentation

Parameters:

Name Type Description Default
dropna False

labellers() property

TODO: add documentation

rating_table(label_name, communities=None, custom_filter=None, allow_missing_data=False)

Get the rating table for one label to be used, e.g., with statsmodels.stats.inter_rater.

Parameters:

Name Type Description Default
communities

List of communities to include in table,

None
label_name

label to be returned in table

required
allow_missing_data

whether to drop columns with missing ratings

False

Returns:

Type Description

rating table: labels as 2-dim table with raters (labellers) in

rows and ratings in columns.

set_filter(f)

TODO: add documentation

Parameters:

Name Type Description Default
f required

metrics

basic

Basic metrics based on counts, dates etc. of posts, contributors.

By level of observation / concept:

topics

community

agg_number_of_posts_per_interval(community, interval)

Number of posts per interval.

Total number of posts in community per interval (parameter).

Parameters:

Name Type Description Default
community pici.Community required
interval str

The interval over which to aggregate. See pandas.Timedelta (https://pandas.pydata.org/docs/user_guide/timedeltas.html)

required

Returns:

Name Type Description
results dict of str:int
  • number of posts per <interval>

agg_posts_per_topic(community)

Min, max, and average number of posts authored per topic.

Parameters:

Name Type Description Default
community required

Returns:

Name Type Description
results dict of str:int
  • <agg> posts per topic

contributors_per_interval(community, interval)

Number of users that have authored at least one post in time interval.

TODO
  • document
  • add to TOC

Parameters:

Name Type Description Default
community required
interval required

lorenz(community)

Distribution of posts (in analogy to lorenz curve). Returns (x,y) where x is the (least-contributing) bottom x% of users, and y the proportion of posts made by them.

Parameters:

Name Type Description Default
community

report: - % contributors - % posts

required

number_of_contributors_per_topic(community)

Number of different contributors that have authored at least one post in a thread.

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str: int
  • number of contributors

number_of_posts(community)

Total number of posts authored by community.

TODO

document

Parameters:

Name Type Description Default
community required

number_of_posts_per_topic(community)

Number of posts per topic.

TODO
  • add to toc

Parameters:

Name Type Description Default
community required

Returns:

Name Type Description
report
  • number of posts

number_of_words(community)

The number of words in a post (removing html).

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str:int
  • number of words

post_dates_per_topic(community)

Date of first post, second post, and last post.

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str:date
  • first post date
  • second post date
  • last post date

post_delays_per_topic(community)

Delays (in days) between first and second post, and first and last post.

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str:int
  • delay first last post
  • delay first second post

posts_per_interval(community, interval)

Number of posts authored by community per time interval.

TODO
  • document
  • add to TOC

Parameters:

Name Type Description Default
community required
interval required

posts_word_occurrence(community, words, normalize=True)

Counts the occurrence of a set of words in each post.

Parameters:

Name Type Description Default
community pici.Community required
words list of str

List of words to count in post texts.

required
normalize bool

Normalize occurrence count by text length.

True

Returns:

Name Type Description
results dict of str:int
  • occurrence of <word> for each provided word

cached_metrics

This is a collection of all cachable functions that are used in the calculation of indicators. The cache is implemented using functools.lru_cache with maxsize=None. Caching is commonly done at least on community level (pici.Community is hashable). Examples for when using a cache makes sense:

  • calculating the similarity of post texts (done once for all combinations)
  • generating "temporal networks" (filtered representations of networks, depending on dates of posts)

It is recommended to define cached parts of indicators here.

_comments_by_contributor(community, contributor, date_limit=None)

Get all threads initiated by contributor.

Parameters:

Name Type Description Default
community required
contributor

User name

required
date_limit

Date in string format, e.g. '2020-01-15'

None

specified user (before the specified date_limit).

_contribution_regularity(community, contributor, start, end)

Get the contribution regularity of contributor as the percentage of days that contributor posted in the forum, between the dates start and end.

Parameters:

Name Type Description Default
community required
contributor required
start required
end required

_initial_post_author_network_metric(initial_post, community, metric, kind)

Get a cached network metric for the author of an initial post.

Parameters:

Name Type Description Default
initial_post required
metric required
community required
thread_date required
kind required

Returns:

Type Description

The value of the metric.

_replies_to_own_topics(community, contributor, date_limit=None)

The number of replies made to initial posts by specified contributor in community.

Parameters:

Name Type Description Default
community required
contributor required
date_limit

Date in string format, e.g. '2020-01-15'

None

contributor. If date_limit is provided, only threads & replies posted before the date limit are considered.

_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')

Returns a dictionary of post-text:1xn-similarity-matrix for similarity subgraph filtered by date.

Parameters:

Name Type Description Default
community required
date required
text_col 'preprocessed_text__words_no_stop'
similarity_metric 'token_sort_ratio'

_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)

Create a subview graph of the text similarity network created by `_text_similarity_network() by filtering out all nodes (=posts) where post.date is > date.

Args: community: date: text_col: similarity_metric: only_initial_posts:

Returns:

_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)

Create a text-similarity network for all posts in community, using textacy.representations.network.build_similarity_network().

Parameters:

Name Type Description Default
community required
text_col 'preprocessed_text__words_no_stop'
similarity_metric 'token_sort_ratio'
only_initial_posts True

_threads_by_contributor(community, contributor, date_limit=None)

Get all threads initiated by contributor.

Parameters:

Name Type Description Default
community required
contributor

User name

required
date_limit

Date in string format, e.g. '2020-01-15'

None

specified user (before the specified date_limit).

distinctiveness

initial_post_text_distance(community, similarity_metric='token_sort_ratio')

Calculates the text distance of initial posts to previously authored initial posts as a measure of distinctiveness.

Parameters:

Name Type Description Default
community required
similarity_metric 'token_sort_ratio'
agg_method required

elaboration

basic_text_based_elaboration(community, col_n_words='preprocessed_text__n_words', col_n_words_no_stop='preprocessed_text__n_words_no_stop', col_syllables='preprocessed_text__n_syllables', col_avg_syllables='preprocessed_text__avg_syllables_per_word', col_smog_index='preprocessed_text__smog_index', col_auto_readability='preprocessed_text__automated_readability_index', col_coleman_liau='preprocessed_text__coleman_liau_index', col_flesch_kincaid='preprocessed_text__flesch_kincaid_grade_level', col_frac_uppercase='preprocessed_text__frac_uppercase', col_frac_punctuation_marks='preprocessed_text__frac_punctuation_marks')

Provides basic text-based elaboration indicators, such as number of words, number of syllables, and different readability scores.

Parameters:

Name Type Description Default
col_flesch_kincaid 'preprocessed_text__flesch_kincaid_grade_level'
col_coleman_liau 'preprocessed_text__coleman_liau_index'
col_auto_readability 'preprocessed_text__automated_readability_index'
col_smog_index 'preprocessed_text__smog_index'
col_syllables

column name in community.posts to use for mean number

'preprocessed_text__n_syllables'
col_n_words

column name in community.posts to use for word count

'preprocessed_text__n_words'
community required

experience

initiator_experience_by_commenter_network_out_deg_centrality(community)

Determines a thread initiator's 'experience' by their out-degree centrality in the commenter network at the time of thread creation, i.e., the number of users the initiator has 'commented on' (has replied to in a user's thread).

Parameters:

Name Type Description Default
community required

initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)

Parameters:

Name Type Description Default
ignore_temporal_dependency True
community required

helpfulness

initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)

Calculates the contribution regularity of the initiator of each thread. Contribution regularity is the percentage of past days in which the initiator posted in the forum. Past days is limited by lookback_days parameter.

Parameters:

Name Type Description Default
lookback_days 100
community required

initiator_helpfulness_by_foreign_thread_comment_frequency(community)

This indicator measures initiator helpfulness by the frequency of comments by the thread's initiator that were posted in threads with a different initiator ('foreign threads').

Parameters:

Name Type Description Default
community required

initiator_helpfulness_by_top_commenter_status(community, contributor, k=90)

Calculates whether a thread's initiator has top commenter status. A 'top commenter' has posted more comments than the k-th percentile (default: k=90).

Parameters:

Name Type Description Default
community required
contributor required

idea_popularity

idea_popularity_by_number_of_unique_users_commenting(community)

Parameters:

Name Type Description Default
community required

network_position

Metrics using the community's graph object (representation of contributor network).

By level of observation:

contributors

  • [contributor_degree][pici.metrics.network.contributor_degree]
  • [contributor_centralities][pici.metrics.network.contributor_centralities]
  • [contributor_communities][pici.metrics.network.contributor_communities]

co_contributor_centralities(community)

Contributor centralities.

Includes degree centrality, betweenness centrality, and eigenvector centrality. Using networkx implementation.

Parameters:

Name Type Description Default
community pici.Community required

co_contributor_communities(community, leiden_lib='cdlib')

Find communities within the contributor network.

Uses weighted Leiden algorithm (Traag et al., 2018) implemented in cdlib.algorithms.leiden or leidgenalg.

Traag, Vincent, Ludo Waltman, and Nees Jan van Eck. From Louvain to Leiden: guaranteeing well-connected communities. arXiv preprint arXiv:1810.08473 (2018).

Parameters:

Name Type Description Default
leiden_lib

Which Leiden alg. implementation to use, 'cdlib' or

'cdlib'
community required

Returns:

Name Type Description
node_communities_map dict of node:list(communities)

List of

communities a contributor belongs to. See [

cdlib.NodeClustering.to_node_community_map]

(https://cdlib.readthedocs.io/en/latest/reference/classes

/node_clustering.html).

co_contributor_degree(community)

Number of contributors each contributor has co-authored with in a thread.

Using implementation of networkx.Graph.degree.

TODO

document

Parameters:

Name Type Description Default
community pici.Community required

initiator_centrality_in_co_contributor_network(community, k=None)

TODO: implement using _initial_post_author_network_metric()

Parameters:

Name Type Description Default
community required
k None

reports

Reports are groups of metrics evaluated for all communities under analysis. See also: building reports.

Reports by level of observation:

community

posts_contributors_per_interval(pici, interval)

Number of contributors and posts for each time interval.

TODO
  • document
  • add to TOC

Parameters:

Name Type Description Default
pici required
interval required

Returns:

Name Type Description
report
  • number of posts per interval
  • number of contributors per interval

summary(pici)

Summarizes communities by posting behavior.

Parameters:

Name Type Description Default
pici pici.Pici required

Returns:

Name Type Description
report
  • number of posts,
  • number of posts per day (aggregated)
  • number of posts per month (aggregated)

status_reputation

initiator_prestige_by_commenter_network_in_deg_centrality(community)

Determines a thread initiator's 'prestige' by their degree centrality in the commenter network at the time of thread creation, i.e., the number of users that have commented on at least one of their threads at that time.

Parameters:

Name Type Description Default
community required

number_of_replies_to_topics_initiated_by_thread_initiator(community, ignore_temporal_dependency=True)

Bla.

Parameters:

Name Type Description Default
ignore_temporal_dependency

Whether to simple count all replies,

True
community required

pici

Pici

TODO

Add documentation.

Examples:

from communities import OEMCommunityFactory, OSMCommunityFactory, PPCommunityFactory

p = Pici(
    communities={
        'OpenEnergyMonitor': OEMCommunityFactory,
        'OpenStreetMap': OSMCommunityFactory,
        'PreciousPlastic': PPCommunityFactory,
    },
    start='2017-01-01',
    end='2017-12-01',
    cache_nrows=5000
)

__init__(communities=None, labels=[], cache_dir='cache', cache_nrows=None, start=None, end=None)

Loads communities.

Communities can be loaded from cache or scraped. Loaded data can be restricted either by number of rows loaded from cache (cache_nrows), or by setting start and end dates (filter on publication dates of posts).

Parameters:

Name Type Description Default
communities dict of str

pici.CommunityFactory): Dictionary of communities. Communities are provided as name (str): CommunityFactory tuples.

None
cache_dir str

Path to folder that contains cache files.

'cache'
cache_nrows int

Number of rows to load from cache (None (default): load all rows).

None
start str

Start-date for filtering posts. String format must be valid input for pandas.Timestamp.

None
end str

End-date for filtering posts. String format must be valid input for pandas.Timestamp.

None

get_metrics(level=None, returntype=None, unwrapped=False, select_func=set.intersection)

Get all available metrics that are defined for the communities. The select_func parameter is set to set.intersection per default, meaning that only those metrics are returned, that exist for all communities. Metrics can be filtered by level and returntype.

Parameters:

Name Type Description Default
level None
returntype None
unwrapped

'Unwrap' the returned metric functions from their

False
select_func set.intersection

Returns:

Type Description

dict of str:func metricname:metric

get_preprocessors(level=None, returntype=None, unwrapped=False, select_func=set.intersection)

Get all available metrics that are defined for the communities. The select_func parameter is set to set.intersection per default, meaning that only those metrics are returned, that exist for all communities. Metrics can be filtered by level and returntype.

Parameters:

Name Type Description Default
level None
returntype None
unwrapped

'Unwrap' the returned metric functions from their

False
select_func set.intersection

Returns:

Type Description

dict of str:func metricname:metric

pipelines

Pipelines

_append_preprocessing_results(results) staticmethod

Appends Series generated by preprocessing to according datalevel objects of a Community.

Parameters:

Name Type Description Default
results

should have format ( [ (datalevel, Series), (datalevel, Series), ...], Community

required

preprocessors

posts

number_of_words(community)

Adds the number of words in each post as int to community.posts.

post_position_in_thread(community)

Adds each post's position in thread (as int, starting with 1) to community.posts.

preprocessed_text(community, n_topics=10)

This preprocessor supplies cleaned text, text statistics (using Textacy) and sentiment statistics (TextBlob). The following columns are added to Community.posts:

  • clean
  • all_words
  • words_no_stop
  • n_words_no_stop
  • frac_uppercase
  • frac_punctuation_marks
  • avg_syllables_per_word
  • sentiment_polarity
  • sentiment_subjectivity
  • n_words
  • n_chars
  • n_long_words
  • n_unique_words
  • n_syllables
  • n_syllables_per_word
  • entropy
  • ttr
  • segmented_ttr
  • hdd
  • automated_index
  • flesch_reading_ease
  • smog_index
  • coleman_liau_index
  • flesch_kincaid_grade_level
  • gunning_fog_index

Parameters:

Name Type Description Default
community required

rounded_date(community, round_dates_to='7D')

Round the post dates according to specified frequency. If round_dates_to is None (default), this preprocessor does nothing.

Parameters:

Name Type Description Default
community required
round_dates_to

Frequency to round the initial posts'

'7D'
<https

//pandas.pydata.org/docs/user_guide/timeseries.html

required

topics

thread_text(community)

Adds column thread_text to community.topics. Supplies texts of all posts in thread as tuple of strings in order of post creation date (starting with initial post).

registries

MetricRegistry

Bases: FuncExposer

This class exposes all methods decorated with @metric as its own methods and passes the community parameter to them.

PreprocessorRegistry

Bases: FuncExposer

This class exposes all methods decorated with @community_preprocessor as its own methods and passes the community parameter to them.

ReportRegistry

Bases: FuncExposer

This class exposes all methods decorated with @report as its own methods and passes the communities parameter to them.

reporting

Metric

TODO: add documentation

Report

TODO: add documentation

metric(level, returntype)

A decorator for community metrics.

The parameters level and type determine how and using which level of observation (topics, posts, etc.) the metrics' results are represented.

  • Only methods using this decorator are available as metrics through pici.Community.metrics.

Parameters:

Name Type Description Default
level pici.datatypes.CommunityDataLevel

The metric's data level Determines to which 'view' of pici.Community metric's results are appended to.

required
returntype pici.datatypes.MetricReturnType

Data type of metric's return value.

required

Returns:

Type Description

Returns either plain metric value, or determined value(s) appended to community data. Type determined by returntype parameter.

preprocessor(level)

A decorator for preprocessors.

report(func)

TODO: add documentation

Parameters:

Name Type Description Default
func required

visualizations

plot_lorenz_curves(pici)

Plots the %posts vs %contributors Lorenz curves for all communities.

Parameters:

Name Type Description Default
pici required