pici

`community`

`Community`

Bases: ABC

Abstract community class.

`eq(other)`

Two communities are the same if they have the same name. This is used to simplify caching.

`_generate_temporal_graph(start=None, end=None, kind='co_contributor')`

Generate a graph based only on posts created after start (>) and before end (<).

Parameters:

Name	Description	Default
`start`	datetime	`None`
`end`	datetime	`None`
`kind`	string (one of 'co_contributor', 'commenter')	`'co_contributor'`

`temporal_graph(start=None, end=None, kind='co_contributor')`

Cached access to temporal graphs.

Parameters:

Name	Description	Default
`start`	None or datetime start of graph snapshot	`None`
`end`	None or datetime end of graph snapshot	`None`
`kind`	string ('co_contributor' or 'commenter').	`'co_contributor'`

`datatypes`

`CommunityDataLevel`

Bases: Enum

View on community.

TODO

Document properly

`MetricReturnType`

Bases: Enum

Category of representation of metrics' return type.

`DATAFRAME = 'dataframe'` `class-attribute`

Metric's return values as series in Pandas.Dataframe

`PLAIN = 'plain'` `class-attribute`

Metric's original return value (not modified).

`TABLE = 'table'` `class-attribute`

Metric's return value(s) in table format.

`helpers`

`aggregate(dict_of_series, aggregations=[np.mean, np.min, np.max, np.std, np.sum])`

Applies a number of aggregations to the series supplied as values in dict_of_series. Keys are names of series, the name of the aggregation is appended to the series names as "(agg-name)".

Parameters:

Name	Type	Description	Default
`aggregations`		list of aggregation functions	`[np.mean, np.min, np.max, np.std, np.sum]`
`dict_of_series`		dict of indicator_name:Pandas.Series	required

Returns:

Type	Description
	dict of formatted indicator_name: aggregated series

`apply_to_initial_posts(community, new_cols, func)`

Applies func to initial posts (community.posts where post_position_in_thread==1). Returns DataFrame with topic_column field as index. Cols in retured df are named according to strings in new_cols, values in cols in order of values returned by func.

Parameters:

Name	Description	Default
`community`	pici.Community	required
`new_cols`	list of strings	required
`func`	function to apply to each initial post from community.posts	required

and indexed by thread-ids.

`as_table(func)`

Decorator that returns results as table, indexed with community name. TODO: document

Parameters:

Name	Type	Description	Default
`func`			required

`create_co_contributor_graph(link_data, node_data, node_col, group_col, node_attributes, connected=True)`

Creates a networkx.Graph with nodes=users and edges if two users have contributed to the same thread. Edge weights = number of threads where two users co-contributed.

Parameters:

Name	Type	Description	Default
`link_data`			required
`node_data`			required
`node_col`			required
`group_col`			required
`node_attributes`			required
`connected`			`True`

`create_commenter_graph(link_data, node_data, node_col, group_col, node_attributes, conntected=True)`

Creates a networkx.DiGraph with nodes=users and directed edges a->b if a has replied to an initial post by b. Edge weight is the number of comments.

Parameters:

Name	Type	Description	Default
`link_data`			required
`node_data`			required
`node_col`			required
`group_col`			required
`node_attributes`			required
`conntected`			`True`

`flat(df, columns='community_name')`

Returns a pivoted version of df with flattened index.

Parameters:

Name	Type	Description	Default
`df`	`pd.DataFrame`	Pandas.DataFrame	required
`columns`	`str`	Column name to pivot on.	`'community_name'`

`generate_indicator_results(posts, initial_post, feedback, indicator_text, column, aggs=[np.sum, np.mean, np.min, np.max, np.std])`

Returns results from column in DataFrames posts, initial_post, and feedback as different aggregations (sum, mean, ...). Initial post is only aggregated as sum. Output is a dict with df/agg: value, e.g. "posts indicator_text (mean)":value.

Parameters:

Name	Type	Description	Default
`posts`			required
`initial_post`			required
`feedback`			required
`indicator_text`			required
`column`			required

`join_df(func)`

Decorator that joins results to existing dataframe in community. TODO: document

Parameters:

Name	Type	Description	Default
`func`			required

`merge_dfs(dfs, only_unique=False)`

Wrapper for Pandas.merge(). Merges DataFrames, so that

TODO: document

Parameters:

Name	Type	Description	Default
`dfs`	`Iterable[pd.DataFrame]`		required
`only_unique`	`bool`		`False`

`num_words(text)`

Counts the number of words in a text. Does account for html tags and comments (not included in count).

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to count words in.	required

Returns:

Name	Type	Description
`count`	`int`	Number of words.

`series_most_common(series)`

Get most common element from Pandas.Series.

Parameters:

Name	Type	Description	Default
`series`	`pd.Series`	Pandas.Series	required

`where_all(conditions)`

Concatenates logical condition with and.

Parameters:

Name	Type	Description	Default
`conditions`			required

`word_occurrences(text, words)`

Counts the number of occurrences of specified words in text.

Parameters:

Name	Type	Description	Default
`text`	`str`	A text with words.	required
`words`	`list of str`	Words.	required

Returns:

Name	Type	Description
`occurrences`	`dict of str:int`
		A `word (str), number of occurrences (int)` dictionary

`labelling`

`InnovationLabels`

Bases: Labels

TODO: add documentation

`from_limesurvey(limesurvey_results, drop_labellers=None)`

Adds label entries from Limesurvey results format. Limesurvey results can contain multiple labelled threads per response. For each thread i and associated url and labels, the data must contain one column, e.g.: "thread1" (=url), "labelA1", "labelB1", ..., "thread2", "labelA2", "labelB2", ...

Parameters:

Name	Type	Description	Default
`limesurvey_results`		String (path to file) or Pandas.DataFrame	required

`LabelCollection`

TODO: add documentation

`all_label_names()` `property`

TODO: add documentation

`by_level(level)`

TODO: Add documentation

Parameters:

Name	Type	Description	Default
`level`			required

`labels()` `property`

TODO: add documentation

`LabelStats`

This class provides metrics and visualizations to analyze the annotations made by the labellers.

Available metrics are

% agreement ("a_0")
Cohen's kappa (two labellers)
Fleiss' kappa (multiple labellers)
Krippendorff's alpha (multiple labellers, missing data)

Labellers' annotations can furthermore be evaluated against a subsample of "goldstandard" annotations, allowing to associate labellers with a quality- score.

TODO refactor --> move visualizations to visualizations.py ?

See [1] for a comparison of inter-rater agreement metrics.

[1] Xinshu Zhao, Jun S. Liu & Ke Deng (2013) Assumptions behind Intercoder Reliability Indices, Annals of the International Communication Association, 36:1, 419-480, DOI: 10.1080/23808985.2013.11679142

`_melt_goldstandard_agreement(data)`

Prepares interrater_agreement dataframe for plotting (e.g., with seaborn): wide format to long format, shorten label names.

Parameters:

Name	Type	Description	Default
`data`		Pandas.DataFrame with label names as index, "labellers"-	required

Returns:

Type	Description
	Pandas.DataFrame with columns: - index: shortened label names - labellers: labeller names - variable: metric name - value: metric value

`cohen_kappa()`

Get Cohen's kappa for all labels, using scikit-learn implementation sklearn.metrics.cohen_kappa_score. Returns NaN if number of labellers != 2.

Returns:

Type	Description
	dict of (label name, kappa)

`complete_agreement()`

Get percentage of cases where all labellers agree (per label).

Returns:

Name	Type	Description
`agreement`		Pandas.DataFrame with 'label', '% perfect agreement',
		'% n'

`fleiss_kappa()`

Get Fleiss kappa for all labels, based on Statsmodels implementation (statsmodels.stats.inter_rater.fleiss_kappa). Returns NaN if number of labellers < 2.

Returns:

Type	Description
	dict of (label name, kappa)

`interrater_agreement()`

Calculated the overall interrater agreement for all labellers in data. If number of labellers > 2, all values for Cohen/Fleiss kappa will be NaN.

Returns:

Type	Description
	agreement dataframe

`krippendorff_alpha()`

Get Krippendorff alphas using the ''krippendorff'' package.

See also: Andrew F. Hayes & Klaus Krippendorff (2007) Answering the Call for a Standard Reliability Measure for Coding Data, Communication Methods and Measures, 1:1, 77-89, DOI: 10.1080/19312450709336664

Returns:

Type	Description
	Pandas.DataFrame

`pairwise_interrater_agreement(goldstandard=None, min_comparisons=1)`

Calculates the agreement metrics for all combinations of two labellers. If goldstandard is set (name of labeller), only comparisons with the goldstandard are calculated.

Parameters:

Name	Type	Description	Default
`goldstandard`		Name of labeller	`None`
`min_comparisons`		Minimum number of shared labelled cases	`1`

Returns:

Type	Description
	Agreement dataframe with "labellers" column that contains
	(labeller A, labeller B) tuples (default), or
	labeller B strings (if labeller A is set as goldstandard)

`plot_goldstandard_agreement(kind='label_boxplots', goldstandard=None, data=None)`

Plot the labellers' agreement with goldstandard. Provides different plots through kind: - label_boxplots: values: each labeller's agreement with goldstandard, x: metric, y: boxplot of values - labellers_points: values: each labeller's agreement with goldstandard, grid, col per metric, x: labels, y: values

Parameters:

Name	Description	Default
`kind`	One of {'label_boxplots','labellers_points'}	`'label_boxplots'`
`goldstandard`	Name of labeller to use as goldstandard (used if data is None, generates data)	`None`
`data`	agreement dataframe generated by `pairwise_interrater_agreement`	`None`

`Labels`

Bases: ABC

TODO: add documentation

`init(data=None, cols=DEFAULT_COLS, filter=None)`

Parameters:

Name	Type	Description	Default
`data`	`pd.DataFrame`		`None`
`cols`	`dict`		`DEFAULT_COLS`

`append(data, cols=DEFAULT_COLS, drop_labellers=None)`

TODO: add documentation

Parameters:

Name	Type	Default
`data`	`pd.DataFrame`	required
`cols`	`dict`	`DEFAULT_COLS`
`drop_labellers`		`None`

`data_by_label(format='sklearn', dropna=False)`

TODO: add documentation

Parameters:

Name	Type	Description	Default
`dropna`			`False`

`labellers()` `property`

TODO: add documentation

`rating_table(label_name, communities=None, custom_filter=None, allow_missing_data=False)`

Get the rating table for one label to be used, e.g., with statsmodels.stats.inter_rater.

Parameters:

Name	Description	Default
`communities`	List of communities to include in table,	`None`
`label_name`	label to be returned in table	required
`allow_missing_data`	whether to drop columns with missing ratings	`False`

Returns:

Type	Description
	rating table: labels as 2-dim table with raters (labellers) in
	rows and ratings in columns.

`set_filter(f)`

TODO: add documentation

Parameters:

Name	Type	Description	Default
`f`			required

`metrics`

`basic`

Basic metrics based on counts, dates etc. of posts, contributors.

By level of observation / concept:

topics

community

`agg_number_of_posts_per_interval(community, interval)`

Number of posts per interval.

Total number of posts in community per interval (parameter).

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required
`interval`	`str`	The interval over which to aggregate. See `pandas.Timedelta` (https://pandas.pydata.org/docs/user_guide/timedeltas.html)	required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`number of posts per <interval>`

`agg_posts_per_topic(community)`

Min, max, and average number of posts authored per topic.

Parameters:

Name	Type	Description	Default
`community`			required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`<agg> posts per topic`

`contributors_per_interval(community, interval)`

Number of users that have authored at least one post in time interval.

TODO

document
add to TOC

Parameters:

Name	Type	Description	Default
`community`			required
`interval`			required

`lorenz(community)`

Distribution of posts (in analogy to lorenz curve). Returns (x,y) where x is the (least-contributing) bottom x% of users, and y the proportion of posts made by them.

Parameters:

Name	Type	Description	Default
`community`		report: - % contributors - % posts	required

`number_of_contributors_per_topic(community)`

Number of different contributors that have authored at least one post in a thread.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str: int`	`number of contributors`

`number_of_posts(community)`

Total number of posts authored by community.

TODO

document

Parameters:

Name	Type	Description	Default
`community`			required

`number_of_posts_per_topic(community)`

Number of posts per topic.

TODO

add to toc

Parameters:

Name	Type	Description	Default
`community`			required

Returns:

Name	Type	Description
`report`		number of posts

`number_of_words(community)`

The number of words in a post (removing html).

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`number of words`

`post_dates_per_topic(community)`

Date of first post, second post, and last post.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str:date`	`first post date` `second post date` `last post date`

`post_delays_per_topic(community)`

Delays (in days) between first and second post, and first and last post.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`delay first last post` `delay first second post`

`posts_per_interval(community, interval)`

Number of posts authored by community per time interval.

TODO

document
add to TOC

Parameters:

Name	Type	Description	Default
`community`			required
`interval`			required

`posts_word_occurrence(community, words, normalize=True)`

Counts the occurrence of a set of words in each post.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required
`words`	`list of str`	List of words to count in post texts.	required
`normalize`	`bool`	Normalize occurrence count by text length.	`True`

Returns:

Name	Type	Description
`results`	`dict of str:int`	`occurrence of <word>` for each provided `word`

`cached_metrics`

This is a collection of all cachable functions that are used in the calculation of indicators. The cache is implemented using functools.lru_cache with maxsize=None. Caching is commonly done at least on community level (pici.Community is hashable). Examples for when using a cache makes sense:

calculating the similarity of post texts (done once for all combinations)
generating "temporal networks" (filtered representations of networks, depending on dates of posts)

It is recommended to define cached parts of indicators here.

`_comments_by_contributor(community, contributor, date_limit=None)`

Get all threads initiated by contributor.

Parameters:

Name	Description	Default
`community`		required
`contributor`	User name	required
`date_limit`	Date in string format, e.g. '2020-01-15'	`None`

specified user (before the specified date_limit).

`_contribution_regularity(community, contributor, start, end)`

Get the contribution regularity of contributor as the percentage of days that contributor posted in the forum, between the dates start and end.

Parameters:

Name	Type	Description	Default
`community`			required
`contributor`			required
`start`			required
`end`			required

`_initial_post_author_network_metric(initial_post, community, metric, kind)`

Get a cached network metric for the author of an initial post.

Parameters:

Name	Type	Description	Default
`initial_post`			required
`metric`			required
`community`			required
`thread_date`			required
`kind`			required

Returns:

Type	Description
	The value of the metric.

`_replies_to_own_topics(community, contributor, date_limit=None)`

The number of replies made to initial posts by specified contributor in community.

Parameters:

Name	Description	Default
`community`		required
`contributor`		required
`date_limit`	Date in string format, e.g. '2020-01-15'	`None`

contributor. If date_limit is provided, only threads & replies posted before the date limit are considered.

`_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')`

Returns a dictionary of post-text:1xn-similarity-matrix for similarity subgraph filtered by date.

Parameters:

Name	Type	Description	Default
`community`			required
`date`			required
`text_col`			`'preprocessed_text__words_no_stop'`
`similarity_metric`			`'token_sort_ratio'`

`_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)`

Create a subview graph of the text similarity network created by `_text_similarity_network() by filtering out all nodes (=posts) where post.date is > date.

Args: community: date: text_col: similarity_metric: only_initial_posts:

Returns:

`_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)`

Create a text-similarity network for all posts in community, using textacy.representations.network.build_similarity_network().

Parameters:

Name	Type	Description	Default
`community`			required
`text_col`			`'preprocessed_text__words_no_stop'`
`similarity_metric`			`'token_sort_ratio'`
`only_initial_posts`			`True`

`_threads_by_contributor(community, contributor, date_limit=None)`

Get all threads initiated by contributor.

Parameters:

Name	Description	Default
`community`		required
`contributor`	User name	required
`date_limit`	Date in string format, e.g. '2020-01-15'	`None`

specified user (before the specified date_limit).

`distinctiveness`

`initial_post_text_distance(community, similarity_metric='token_sort_ratio')`

Calculates the text distance of initial posts to previously authored initial posts as a measure of distinctiveness.

Parameters:

Name	Type	Description	Default
`community`			required
`similarity_metric`			`'token_sort_ratio'`
`agg_method`			required

`elaboration`

basic_text_based_elaboration(community, col_n_words='preprocessed_text__n_words', col_n_words_no_stop='preprocessed_text__n_words_no_stop', col_syllables='preprocessed_text__n_syllables', col_avg_syllables='preprocessed_text__avg_syllables_per_word', col_smog_index='preprocessed_text__smog_index', col_auto_readability='preprocessed_text__automated_readability_index', col_coleman_liau='preprocessed_text__coleman_liau_index', col_flesch_kincaid='preprocessed_text__flesch_kincaid_grade_level', col_frac_uppercase='preprocessed_text__frac_uppercase', col_frac_punctuation_marks='preprocessed_text__frac_punctuation_marks')

Provides basic text-based elaboration indicators, such as number of words, number of syllables, and different readability scores.

Parameters:

Name	Description	Default
`col_flesch_kincaid`		`'preprocessed_text__flesch_kincaid_grade_level'`
`col_coleman_liau`		`'preprocessed_text__coleman_liau_index'`
`col_auto_readability`		`'preprocessed_text__automated_readability_index'`
`col_smog_index`		`'preprocessed_text__smog_index'`
`col_syllables`	column name in community.posts to use for mean number	`'preprocessed_text__n_syllables'`
`col_n_words`	column name in community.posts to use for word count	`'preprocessed_text__n_words'`
`community`		required

`experience`

`initiator_experience_by_commenter_network_out_deg_centrality(community)`

Determines a thread initiator's 'experience' by their out-degree centrality in the commenter network at the time of thread creation, i.e., the number of users the initiator has 'commented on' (has replied to in a user's thread).

Parameters:

Name	Type	Description	Default
`community`			required

`initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)`

Parameters:

Name	Type	Description	Default
`ignore_temporal_dependency`			`True`
`community`			required

`helpfulness`

`initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)`

Calculates the contribution regularity of the initiator of each thread. Contribution regularity is the percentage of past days in which the initiator posted in the forum. Past days is limited by lookback_days parameter.

Parameters:

Name	Type	Description	Default
`lookback_days`			`100`
`community`			required

`initiator_helpfulness_by_foreign_thread_comment_frequency(community)`

This indicator measures initiator helpfulness by the frequency of comments by the thread's initiator that were posted in threads with a different initiator ('foreign threads').

Parameters:

Name	Type	Description	Default
`community`			required

`initiator_helpfulness_by_top_commenter_status(community, contributor, k=90)`

Calculates whether a thread's initiator has top commenter status. A 'top commenter' has posted more comments than the k-th percentile (default: k=90).

Parameters:

Name	Type	Description	Default
`community`			required
`contributor`			required

`idea_popularity`

`idea_popularity_by_number_of_unique_users_commenting(community)`

Parameters:

Name	Type	Description	Default
`community`			required

`network_position`

Metrics using the community's graph object (representation of contributor network).

By level of observation:

contributors

[contributor_degree][pici.metrics.network.contributor_degree]
[contributor_centralities][pici.metrics.network.contributor_centralities]
[contributor_communities][pici.metrics.network.contributor_communities]

`co_contributor_centralities(community)`

Contributor centralities.

Includes degree centrality, betweenness centrality, and eigenvector centrality. Using networkx implementation.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

`co_contributor_communities(community, leiden_lib='cdlib')`

Find communities within the contributor network.

Uses weighted Leiden algorithm (Traag et al., 2018) implemented in cdlib.algorithms.leiden or leidgenalg.

Traag, Vincent, Ludo Waltman, and Nees Jan van Eck. From Louvain to Leiden: guaranteeing well-connected communities. arXiv preprint arXiv:1810.08473 (2018).

Parameters:

Name	Type	Description	Default
`leiden_lib`		Which Leiden alg. implementation to use, 'cdlib' or	`'cdlib'`
`community`			required

Returns:

Name	Type	Description
`node_communities_map`	`dict of node:list(communities)`	List of
		communities a contributor belongs to. See [
		`cdlib.NodeClustering.to_node_community_map`]
		(https://cdlib.readthedocs.io/en/latest/reference/classes
		/node_clustering.html).

`co_contributor_degree(community)`

Number of contributors each contributor has co-authored with in a thread.

Using implementation of networkx.Graph.degree.

TODO

document

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

`initiator_centrality_in_co_contributor_network(community, k=None)`

TODO: implement using _initial_post_author_network_metric()

Parameters:

Name	Type	Description	Default
`community`			required
`k`			`None`

`reports`

Reports are groups of metrics evaluated for all communities under analysis. See also: building reports.

Reports by level of observation:

community

summary

`posts_contributors_per_interval(pici, interval)`

Number of contributors and posts for each time interval.

TODO

document
add to TOC

Parameters:

Name	Type	Description	Default
`pici`			required
`interval`			required

Returns:

Name	Type	Description
`report`		number of posts per `interval` number of contributors per `interval`

`summary(pici)`

Summarizes communities by posting behavior.

Parameters:

Name	Type	Description	Default
`pici`	`pici.Pici`		required

Returns:

Name	Type	Description
`report`		number of posts, number of posts per day (aggregated) number of posts per month (aggregated)

`status_reputation`

`initiator_prestige_by_commenter_network_in_deg_centrality(community)`

Determines a thread initiator's 'prestige' by their degree centrality in the commenter network at the time of thread creation, i.e., the number of users that have commented on at least one of their threads at that time.

Parameters:

Name	Type	Description	Default
`community`			required

`number_of_replies_to_topics_initiated_by_thread_initiator(community, ignore_temporal_dependency=True)`

Bla.

Parameters:

Name	Type	Description	Default
`ignore_temporal_dependency`		Whether to simple count all replies,	`True`
`community`			required

`pici`

`Pici`

TODO

Add documentation.

Examples:

Python

from communities import OEMCommunityFactory, OSMCommunityFactory, PPCommunityFactory

p = Pici(
    communities={
        'OpenEnergyMonitor': OEMCommunityFactory,
        'OpenStreetMap': OSMCommunityFactory,
        'PreciousPlastic': PPCommunityFactory,
    },
    start='2017-01-01',
    end='2017-12-01',
    cache_nrows=5000
)

`init(communities=None, labels=[], cache_dir='cache', cache_nrows=None, start=None, end=None)`

Loads communities.

Communities can be loaded from cache or scraped. Loaded data can be restricted either by number of rows loaded from cache (cache_nrows), or by setting start and end dates (filter on publication dates of posts).

Parameters:

Name	Type	Description	Default
`communities`	`dict of str`	pici.CommunityFactory): Dictionary of communities. Communities are provided as `name (str): CommunityFactory` tuples.	`None`
`cache_dir`	`str`	Path to folder that contains cache files.	`'cache'`
`cache_nrows`	`int`	Number of rows to load from cache (None (default): load all rows).	`None`
`start`	`str`	Start-date for filtering posts. String format must be valid input for `pandas.Timestamp`.	`None`
`end`	`str`	End-date for filtering posts. String format must be valid input for `pandas.Timestamp`.	`None`

`get_metrics(level=None, returntype=None, unwrapped=False, select_func=set.intersection)`

Get all available metrics that are defined for the communities. The select_func parameter is set to set.intersection per default, meaning that only those metrics are returned, that exist for all communities. Metrics can be filtered by level and returntype.

Parameters:

Name	Description	Default
`level`		`None`
`returntype`		`None`
`unwrapped`	'Unwrap' the returned metric functions from their	`False`
`select_func`		`set.intersection`

Returns:

Type	Description
	dict of str:func metricname:metric

`get_preprocessors(level=None, returntype=None, unwrapped=False, select_func=set.intersection)`

Get all available metrics that are defined for the communities. The select_func parameter is set to set.intersection per default, meaning that only those metrics are returned, that exist for all communities. Metrics can be filtered by level and returntype.

Parameters:

Name	Description	Default
`level`		`None`
`returntype`		`None`
`unwrapped`	'Unwrap' the returned metric functions from their	`False`
`select_func`		`set.intersection`

Returns:

Type	Description
	dict of str:func metricname:metric

`pipelines`

`Pipelines`

`_append_preprocessing_results(results)` `staticmethod`

Appends Series generated by preprocessing to according datalevel objects of a Community.

Parameters:

Name	Type	Description	Default
`results`		should have format ( [ (datalevel, Series), (datalevel, Series), ...], Community	required

`preprocessors`

`posts`

`number_of_words(community)`

Adds the number of words in each post as int to community.posts.

`post_position_in_thread(community)`

Adds each post's position in thread (as int, starting with 1) to community.posts.

`preprocessed_text(community, n_topics=10)`

This preprocessor supplies cleaned text, text statistics (using Textacy) and sentiment statistics (TextBlob). The following columns are added to Community.posts:

clean
all_words
words_no_stop
n_words_no_stop
frac_uppercase
frac_punctuation_marks
avg_syllables_per_word
sentiment_polarity
sentiment_subjectivity
n_words
n_chars
n_long_words
n_unique_words
n_syllables
n_syllables_per_word
entropy
ttr
segmented_ttr
hdd
automated_index
flesch_reading_ease
smog_index
coleman_liau_index
flesch_kincaid_grade_level
gunning_fog_index

Parameters:

Name	Type	Description	Default
`community`			required

`rounded_date(community, round_dates_to='7D')`

Round the post dates according to specified frequency. If round_dates_to is None (default), this preprocessor does nothing.

Parameters:

Name	Description	Default
`community`		required
`round_dates_to`	Frequency to round the initial posts'	`'7D'`
`<https`	//pandas.pydata.org/docs/user_guide/timeseries.html	required

`topics`

`thread_text(community)`

Adds column thread_text to community.topics. Supplies texts of all posts in thread as tuple of strings in order of post creation date (starting with initial post).

`registries`

`MetricRegistry`

Bases: FuncExposer

This class exposes all methods decorated with @metric as its own methods and passes the community parameter to them.

`PreprocessorRegistry`

Bases: FuncExposer

This class exposes all methods decorated with @community_preprocessor as its own methods and passes the community parameter to them.

`ReportRegistry`

Bases: FuncExposer

This class exposes all methods decorated with @report as its own methods and passes the communities parameter to them.

`reporting`

`Metric`

TODO: add documentation

`Report`

TODO: add documentation

`metric(level, returntype)`

A decorator for community metrics.

The parameters level and type determine how and using which level of observation (topics, posts, etc.) the metrics' results are represented.

Only methods using this decorator are available as metrics through pici.Community.metrics.

Parameters:

Name	Type	Description	Default
`level`	`pici.datatypes.CommunityDataLevel`	The metric's data level Determines to which 'view' of pici.Community metric's results are appended to.	required
`returntype`	`pici.datatypes.MetricReturnType`	Data type of metric's return value.	required

Returns:

Type	Description
	Returns either plain metric value, or determined value(s) appended to community data. Type determined by `returntype` parameter.

`preprocessor(level)`

A decorator for preprocessors.

`report(func)`

TODO: add documentation

Parameters:

Name	Type	Description	Default
`func`			required

`visualizations`

`plot_lorenz_curves(pici)`

Plots the %posts vs %contributors Lorenz curves for all communities.

Parameters:

Name	Type	Description	Default
`pici`			required

pici

community

Community

__eq__(other)

_generate_temporal_graph(start=None, end=None, kind='co_contributor')

temporal_graph(start=None, end=None, kind='co_contributor')

datatypes

CommunityDataLevel

MetricReturnType

DATAFRAME = 'dataframe' class-attribute

PLAIN = 'plain' class-attribute

TABLE = 'table' class-attribute

helpers

aggregate(dict_of_series, aggregations=[np.mean, np.min, np.max, np.std, np.sum])

apply_to_initial_posts(community, new_cols, func)

as_table(func)

create_co_contributor_graph(link_data, node_data, node_col, group_col, node_attributes, connected=True)

create_commenter_graph(link_data, node_data, node_col, group_col, node_attributes, conntected=True)

flat(df, columns='community_name')

generate_indicator_results(posts, initial_post, feedback, indicator_text, column, aggs=[np.sum, np.mean, np.min, np.max, np.std])

join_df(func)

merge_dfs(dfs, only_unique=False)

num_words(text)

series_most_common(series)

where_all(conditions)

word_occurrences(text, words)

labelling

InnovationLabels

from_limesurvey(limesurvey_results, drop_labellers=None)

LabelCollection

all_label_names() property

by_level(level)

labels() property

LabelStats

_melt_goldstandard_agreement(data)

cohen_kappa()

complete_agreement()

fleiss_kappa()

interrater_agreement()

krippendorff_alpha()

pairwise_interrater_agreement(goldstandard=None, min_comparisons=1)

plot_goldstandard_agreement(kind='label_boxplots', goldstandard=None, data=None)

Labels

__init__(data=None, cols=DEFAULT_COLS, filter=None)

append(data, cols=DEFAULT_COLS, drop_labellers=None)

data_by_label(format='sklearn', dropna=False)

labellers() property

rating_table(label_name, communities=None, custom_filter=None, allow_missing_data=False)

set_filter(f)

metrics

basic

agg_number_of_posts_per_interval(community, interval)

agg_posts_per_topic(community)

contributors_per_interval(community, interval)

lorenz(community)

number_of_contributors_per_topic(community)

number_of_posts(community)

number_of_posts_per_topic(community)

number_of_words(community)

post_dates_per_topic(community)

post_delays_per_topic(community)

posts_per_interval(community, interval)

posts_word_occurrence(community, words, normalize=True)

cached_metrics

_comments_by_contributor(community, contributor, date_limit=None)

_contribution_regularity(community, contributor, start, end)

_initial_post_author_network_metric(initial_post, community, metric, kind)

_replies_to_own_topics(community, contributor, date_limit=None)

_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')

_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)

_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)

_threads_by_contributor(community, contributor, date_limit=None)

distinctiveness

initial_post_text_distance(community, similarity_metric='token_sort_ratio')

elaboration

experience

initiator_experience_by_commenter_network_out_deg_centrality(community)

initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)

helpfulness

initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)

`community`

`Community`

`eq(other)`

`_generate_temporal_graph(start=None, end=None, kind='co_contributor')`

`temporal_graph(start=None, end=None, kind='co_contributor')`

`datatypes`

`CommunityDataLevel`

`MetricReturnType`

`DATAFRAME = 'dataframe'` `class-attribute`

`PLAIN = 'plain'` `class-attribute`

`TABLE = 'table'` `class-attribute`

`helpers`

`aggregate(dict_of_series, aggregations=[np.mean, np.min, np.max, np.std, np.sum])`

`apply_to_initial_posts(community, new_cols, func)`

`as_table(func)`

`create_co_contributor_graph(link_data, node_data, node_col, group_col, node_attributes, connected=True)`

`create_commenter_graph(link_data, node_data, node_col, group_col, node_attributes, conntected=True)`

`flat(df, columns='community_name')`

`generate_indicator_results(posts, initial_post, feedback, indicator_text, column, aggs=[np.sum, np.mean, np.min, np.max, np.std])`

`join_df(func)`

`merge_dfs(dfs, only_unique=False)`

`num_words(text)`

`series_most_common(series)`

`where_all(conditions)`

`word_occurrences(text, words)`

`labelling`

`InnovationLabels`

`from_limesurvey(limesurvey_results, drop_labellers=None)`

`LabelCollection`

`all_label_names()` `property`

`by_level(level)`

`labels()` `property`

`LabelStats`

`_melt_goldstandard_agreement(data)`

`cohen_kappa()`

`complete_agreement()`

`fleiss_kappa()`

`interrater_agreement()`

`krippendorff_alpha()`

`pairwise_interrater_agreement(goldstandard=None, min_comparisons=1)`

`plot_goldstandard_agreement(kind='label_boxplots', goldstandard=None, data=None)`

`Labels`

`init(data=None, cols=DEFAULT_COLS, filter=None)`

`append(data, cols=DEFAULT_COLS, drop_labellers=None)`

`data_by_label(format='sklearn', dropna=False)`

`labellers()` `property`

`rating_table(label_name, communities=None, custom_filter=None, allow_missing_data=False)`

`set_filter(f)`

`metrics`

`basic`

`agg_number_of_posts_per_interval(community, interval)`

`agg_posts_per_topic(community)`

`contributors_per_interval(community, interval)`

`lorenz(community)`

`number_of_contributors_per_topic(community)`

`number_of_posts(community)`

`number_of_posts_per_topic(community)`

`number_of_words(community)`

`post_dates_per_topic(community)`

`post_delays_per_topic(community)`

`posts_per_interval(community, interval)`

`posts_word_occurrence(community, words, normalize=True)`

`cached_metrics`

`_comments_by_contributor(community, contributor, date_limit=None)`

`_contribution_regularity(community, contributor, start, end)`

`_initial_post_author_network_metric(initial_post, community, metric, kind)`

`_replies_to_own_topics(community, contributor, date_limit=None)`

`_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')`

`_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)`

`_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)`

`_threads_by_contributor(community, contributor, date_limit=None)`

`distinctiveness`

`initial_post_text_distance(community, similarity_metric='token_sort_ratio')`

`elaboration`

`experience`

`initiator_experience_by_commenter_network_out_deg_centrality(community)`

`initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)`

`helpfulness`

`initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)`

`initiator_helpfulness_by_foreign_thread_comment_frequency(community)`