metrics

`basic`

Basic metrics based on counts, dates etc. of posts, contributors.

By level of observation / concept:

topics

community

`agg_number_of_posts_per_interval(community, interval)`

Number of posts per interval.

Total number of posts in community per interval (parameter).

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required
`interval`	`str`	The interval over which to aggregate. See `pandas.Timedelta` (https://pandas.pydata.org/docs/user_guide/timedeltas.html)	required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`number of posts per <interval>`

`agg_posts_per_topic(community)`

Min, max, and average number of posts authored per topic.

Parameters:

Name	Type	Description	Default
`community`			required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`<agg> posts per topic`

`contributors_per_interval(community, interval)`

Number of users that have authored at least one post in time interval.

TODO

document
add to TOC

Parameters:

Name	Type	Description	Default
`community`			required
`interval`			required

`lorenz(community)`

Distribution of posts (in analogy to lorenz curve). Returns (x,y) where x is the (least-contributing) bottom x% of users, and y the proportion of posts made by them.

Parameters:

Name	Type	Description	Default
`community`		report: - % contributors - % posts	required

`number_of_contributors_per_topic(community)`

Number of different contributors that have authored at least one post in a thread.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str: int`	`number of contributors`

`number_of_posts(community)`

Total number of posts authored by community.

TODO

document

Parameters:

Name	Type	Description	Default
`community`			required

`number_of_posts_per_topic(community)`

Number of posts per topic.

TODO

add to toc

Parameters:

Name	Type	Description	Default
`community`			required

Returns:

Name	Type	Description
`report`		number of posts

`number_of_words(community)`

The number of words in a post (removing html).

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`number of words`

`post_dates_per_topic(community)`

Date of first post, second post, and last post.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str:date`	`first post date` `second post date` `last post date`

`post_delays_per_topic(community)`

Delays (in days) between first and second post, and first and last post.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

Returns:

Name	Type	Description
`results`	`dict of str:int`	`delay first last post` `delay first second post`

`posts_per_interval(community, interval)`

Number of posts authored by community per time interval.

TODO

document
add to TOC

Parameters:

Name	Type	Description	Default
`community`			required
`interval`			required

`posts_word_occurrence(community, words, normalize=True)`

Counts the occurrence of a set of words in each post.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required
`words`	`list of str`	List of words to count in post texts.	required
`normalize`	`bool`	Normalize occurrence count by text length.	`True`

Returns:

Name	Type	Description
`results`	`dict of str:int`	`occurrence of <word>` for each provided `word`

`cached_metrics`

This is a collection of all cachable functions that are used in the calculation of indicators. The cache is implemented using functools.lru_cache with maxsize=None. Caching is commonly done at least on community level (pici.Community is hashable). Examples for when using a cache makes sense:

calculating the similarity of post texts (done once for all combinations)
generating "temporal networks" (filtered representations of networks, depending on dates of posts)

It is recommended to define cached parts of indicators here.

`_comments_by_contributor(community, contributor, date_limit=None)`

Get all threads initiated by contributor.

Parameters:

Name	Description	Default
`community`		required
`contributor`	User name	required
`date_limit`	Date in string format, e.g. '2020-01-15'	`None`

specified user (before the specified date_limit).

`_contribution_regularity(community, contributor, start, end)`

Get the contribution regularity of contributor as the percentage of days that contributor posted in the forum, between the dates start and end.

Parameters:

Name	Type	Description	Default
`community`			required
`contributor`			required
`start`			required
`end`			required

`_initial_post_author_network_metric(initial_post, community, metric, kind)`

Get a cached network metric for the author of an initial post.

Parameters:

Name	Type	Description	Default
`initial_post`			required
`metric`			required
`community`			required
`thread_date`			required
`kind`			required

Returns:

Type	Description
	The value of the metric.

`_replies_to_own_topics(community, contributor, date_limit=None)`

The number of replies made to initial posts by specified contributor in community.

Parameters:

Name	Description	Default
`community`		required
`contributor`		required
`date_limit`	Date in string format, e.g. '2020-01-15'	`None`

contributor. If date_limit is provided, only threads & replies posted before the date limit are considered.

`_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')`

Returns a dictionary of post-text:1xn-similarity-matrix for similarity subgraph filtered by date.

Parameters:

Name	Type	Description	Default
`community`			required
`date`			required
`text_col`			`'preprocessed_text__words_no_stop'`
`similarity_metric`			`'token_sort_ratio'`

`_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)`

Create a subview graph of the text similarity network created by `_text_similarity_network() by filtering out all nodes (=posts) where post.date is > date.

Args: community: date: text_col: similarity_metric: only_initial_posts:

Returns:

`_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)`

Create a text-similarity network for all posts in community, using textacy.representations.network.build_similarity_network().

Parameters:

Name	Type	Description	Default
`community`			required
`text_col`			`'preprocessed_text__words_no_stop'`
`similarity_metric`			`'token_sort_ratio'`
`only_initial_posts`			`True`

`_threads_by_contributor(community, contributor, date_limit=None)`

Get all threads initiated by contributor.

Parameters:

Name	Description	Default
`community`		required
`contributor`	User name	required
`date_limit`	Date in string format, e.g. '2020-01-15'	`None`

specified user (before the specified date_limit).

`distinctiveness`

`initial_post_text_distance(community, similarity_metric='token_sort_ratio')`

Calculates the text distance of initial posts to previously authored initial posts as a measure of distinctiveness.

Parameters:

Name	Type	Description	Default
`community`			required
`similarity_metric`			`'token_sort_ratio'`
`agg_method`			required

`elaboration`

basic_text_based_elaboration(community, col_n_words='preprocessed_text__n_words', col_n_words_no_stop='preprocessed_text__n_words_no_stop', col_syllables='preprocessed_text__n_syllables', col_avg_syllables='preprocessed_text__avg_syllables_per_word', col_smog_index='preprocessed_text__smog_index', col_auto_readability='preprocessed_text__automated_readability_index', col_coleman_liau='preprocessed_text__coleman_liau_index', col_flesch_kincaid='preprocessed_text__flesch_kincaid_grade_level', col_frac_uppercase='preprocessed_text__frac_uppercase', col_frac_punctuation_marks='preprocessed_text__frac_punctuation_marks')

Provides basic text-based elaboration indicators, such as number of words, number of syllables, and different readability scores.

Parameters:

Name	Description	Default
`col_flesch_kincaid`		`'preprocessed_text__flesch_kincaid_grade_level'`
`col_coleman_liau`		`'preprocessed_text__coleman_liau_index'`
`col_auto_readability`		`'preprocessed_text__automated_readability_index'`
`col_smog_index`		`'preprocessed_text__smog_index'`
`col_syllables`	column name in community.posts to use for mean number	`'preprocessed_text__n_syllables'`
`col_n_words`	column name in community.posts to use for word count	`'preprocessed_text__n_words'`
`community`		required

`experience`

`initiator_experience_by_commenter_network_out_deg_centrality(community)`

Determines a thread initiator's 'experience' by their out-degree centrality in the commenter network at the time of thread creation, i.e., the number of users the initiator has 'commented on' (has replied to in a user's thread).

Parameters:

Name	Type	Description	Default
`community`			required

`initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)`

Parameters:

Name	Type	Description	Default
`ignore_temporal_dependency`			`True`
`community`			required

`helpfulness`

`initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)`

Calculates the contribution regularity of the initiator of each thread. Contribution regularity is the percentage of past days in which the initiator posted in the forum. Past days is limited by lookback_days parameter.

Parameters:

Name	Type	Description	Default
`lookback_days`			`100`
`community`			required

`initiator_helpfulness_by_foreign_thread_comment_frequency(community)`

This indicator measures initiator helpfulness by the frequency of comments by the thread's initiator that were posted in threads with a different initiator ('foreign threads').

Parameters:

Name	Type	Description	Default
`community`			required

`initiator_helpfulness_by_top_commenter_status(community, contributor, k=90)`

Calculates whether a thread's initiator has top commenter status. A 'top commenter' has posted more comments than the k-th percentile (default: k=90).

Parameters:

Name	Type	Description	Default
`community`			required
`contributor`			required

`idea_popularity`

`idea_popularity_by_number_of_unique_users_commenting(community)`

Parameters:

Name	Type	Description	Default
`community`			required

`network_position`

Metrics using the community's graph object (representation of contributor network).

By level of observation:

contributors

[contributor_degree][pici.metrics.network.contributor_degree]
[contributor_centralities][pici.metrics.network.contributor_centralities]
[contributor_communities][pici.metrics.network.contributor_communities]

`co_contributor_centralities(community)`

Contributor centralities.

Includes degree centrality, betweenness centrality, and eigenvector centrality. Using networkx implementation.

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

`co_contributor_communities(community, leiden_lib='cdlib')`

Find communities within the contributor network.

Uses weighted Leiden algorithm (Traag et al., 2018) implemented in cdlib.algorithms.leiden or leidgenalg.

Traag, Vincent, Ludo Waltman, and Nees Jan van Eck. From Louvain to Leiden: guaranteeing well-connected communities. arXiv preprint arXiv:1810.08473 (2018).

Parameters:

Name	Type	Description	Default
`leiden_lib`		Which Leiden alg. implementation to use, 'cdlib' or	`'cdlib'`
`community`			required

Returns:

Name	Type	Description
`node_communities_map`	`dict of node:list(communities)`	List of
		communities a contributor belongs to. See [
		`cdlib.NodeClustering.to_node_community_map`]
		(https://cdlib.readthedocs.io/en/latest/reference/classes
		/node_clustering.html).

`co_contributor_degree(community)`

Number of contributors each contributor has co-authored with in a thread.

Using implementation of networkx.Graph.degree.

TODO

document

Parameters:

Name	Type	Description	Default
`community`	`pici.Community`		required

`initiator_centrality_in_co_contributor_network(community, k=None)`

TODO: implement using _initial_post_author_network_metric()

Parameters:

Name	Type	Description	Default
`community`			required
`k`			`None`

`reports`

Reports are groups of metrics evaluated for all communities under analysis. See also: building reports.

Reports by level of observation:

community

summary

`posts_contributors_per_interval(pici, interval)`

Number of contributors and posts for each time interval.

TODO

document
add to TOC

Parameters:

Name	Type	Description	Default
`pici`			required
`interval`			required

Returns:

Name	Type	Description
`report`		number of posts per `interval` number of contributors per `interval`

`summary(pici)`

Summarizes communities by posting behavior.

Parameters:

Name	Type	Description	Default
`pici`	`pici.Pici`		required

Returns:

Name	Type	Description
`report`		number of posts, number of posts per day (aggregated) number of posts per month (aggregated)

`status_reputation`

`initiator_prestige_by_commenter_network_in_deg_centrality(community)`

Determines a thread initiator's 'prestige' by their degree centrality in the commenter network at the time of thread creation, i.e., the number of users that have commented on at least one of their threads at that time.

Parameters:

Name	Type	Description	Default
`community`			required

`number_of_replies_to_topics_initiated_by_thread_initiator(community, ignore_temporal_dependency=True)`

Bla.

Parameters:

Name	Type	Description	Default
`ignore_temporal_dependency`		Whether to simple count all replies,	`True`
`community`			required