Skip to content

metrics

basic

Basic metrics based on counts, dates etc. of posts, contributors.

By level of observation / concept:

topics

community

agg_number_of_posts_per_interval(community, interval)

Number of posts per interval.

Total number of posts in community per interval (parameter).

Parameters:

Name Type Description Default
community pici.Community required
interval str

The interval over which to aggregate. See pandas.Timedelta (https://pandas.pydata.org/docs/user_guide/timedeltas.html)

required

Returns:

Name Type Description
results dict of str:int
  • number of posts per <interval>

agg_posts_per_topic(community)

Min, max, and average number of posts authored per topic.

Parameters:

Name Type Description Default
community required

Returns:

Name Type Description
results dict of str:int
  • <agg> posts per topic

contributors_per_interval(community, interval)

Number of users that have authored at least one post in time interval.

TODO
  • document
  • add to TOC

Parameters:

Name Type Description Default
community required
interval required

lorenz(community)

Distribution of posts (in analogy to lorenz curve). Returns (x,y) where x is the (least-contributing) bottom x% of users, and y the proportion of posts made by them.

Parameters:

Name Type Description Default
community

report: - % contributors - % posts

required

number_of_contributors_per_topic(community)

Number of different contributors that have authored at least one post in a thread.

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str: int
  • number of contributors

number_of_posts(community)

Total number of posts authored by community.

TODO

document

Parameters:

Name Type Description Default
community required

number_of_posts_per_topic(community)

Number of posts per topic.

TODO
  • add to toc

Parameters:

Name Type Description Default
community required

Returns:

Name Type Description
report
  • number of posts

number_of_words(community)

The number of words in a post (removing html).

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str:int
  • number of words

post_dates_per_topic(community)

Date of first post, second post, and last post.

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str:date
  • first post date
  • second post date
  • last post date

post_delays_per_topic(community)

Delays (in days) between first and second post, and first and last post.

Parameters:

Name Type Description Default
community pici.Community required

Returns:

Name Type Description
results dict of str:int
  • delay first last post
  • delay first second post

posts_per_interval(community, interval)

Number of posts authored by community per time interval.

TODO
  • document
  • add to TOC

Parameters:

Name Type Description Default
community required
interval required

posts_word_occurrence(community, words, normalize=True)

Counts the occurrence of a set of words in each post.

Parameters:

Name Type Description Default
community pici.Community required
words list of str

List of words to count in post texts.

required
normalize bool

Normalize occurrence count by text length.

True

Returns:

Name Type Description
results dict of str:int
  • occurrence of <word> for each provided word

cached_metrics

This is a collection of all cachable functions that are used in the calculation of indicators. The cache is implemented using functools.lru_cache with maxsize=None. Caching is commonly done at least on community level (pici.Community is hashable). Examples for when using a cache makes sense:

  • calculating the similarity of post texts (done once for all combinations)
  • generating "temporal networks" (filtered representations of networks, depending on dates of posts)

It is recommended to define cached parts of indicators here.

_comments_by_contributor(community, contributor, date_limit=None)

Get all threads initiated by contributor.

Parameters:

Name Type Description Default
community required
contributor

User name

required
date_limit

Date in string format, e.g. '2020-01-15'

None

specified user (before the specified date_limit).

_contribution_regularity(community, contributor, start, end)

Get the contribution regularity of contributor as the percentage of days that contributor posted in the forum, between the dates start and end.

Parameters:

Name Type Description Default
community required
contributor required
start required
end required

_initial_post_author_network_metric(initial_post, community, metric, kind)

Get a cached network metric for the author of an initial post.

Parameters:

Name Type Description Default
initial_post required
metric required
community required
thread_date required
kind required

Returns:

Type Description

The value of the metric.

_replies_to_own_topics(community, contributor, date_limit=None)

The number of replies made to initial posts by specified contributor in community.

Parameters:

Name Type Description Default
community required
contributor required
date_limit

Date in string format, e.g. '2020-01-15'

None

contributor. If date_limit is provided, only threads & replies posted before the date limit are considered.

_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')

Returns a dictionary of post-text:1xn-similarity-matrix for similarity subgraph filtered by date.

Parameters:

Name Type Description Default
community required
date required
text_col 'preprocessed_text__words_no_stop'
similarity_metric 'token_sort_ratio'

_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)

Create a subview graph of the text similarity network created by `_text_similarity_network() by filtering out all nodes (=posts) where post.date is > date.

Args: community: date: text_col: similarity_metric: only_initial_posts:

Returns:

_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)

Create a text-similarity network for all posts in community, using textacy.representations.network.build_similarity_network().

Parameters:

Name Type Description Default
community required
text_col 'preprocessed_text__words_no_stop'
similarity_metric 'token_sort_ratio'
only_initial_posts True

_threads_by_contributor(community, contributor, date_limit=None)

Get all threads initiated by contributor.

Parameters:

Name Type Description Default
community required
contributor

User name

required
date_limit

Date in string format, e.g. '2020-01-15'

None

specified user (before the specified date_limit).

distinctiveness

initial_post_text_distance(community, similarity_metric='token_sort_ratio')

Calculates the text distance of initial posts to previously authored initial posts as a measure of distinctiveness.

Parameters:

Name Type Description Default
community required
similarity_metric 'token_sort_ratio'
agg_method required

elaboration

basic_text_based_elaboration(community, col_n_words='preprocessed_text__n_words', col_n_words_no_stop='preprocessed_text__n_words_no_stop', col_syllables='preprocessed_text__n_syllables', col_avg_syllables='preprocessed_text__avg_syllables_per_word', col_smog_index='preprocessed_text__smog_index', col_auto_readability='preprocessed_text__automated_readability_index', col_coleman_liau='preprocessed_text__coleman_liau_index', col_flesch_kincaid='preprocessed_text__flesch_kincaid_grade_level', col_frac_uppercase='preprocessed_text__frac_uppercase', col_frac_punctuation_marks='preprocessed_text__frac_punctuation_marks')

Provides basic text-based elaboration indicators, such as number of words, number of syllables, and different readability scores.

Parameters:

Name Type Description Default
col_flesch_kincaid 'preprocessed_text__flesch_kincaid_grade_level'
col_coleman_liau 'preprocessed_text__coleman_liau_index'
col_auto_readability 'preprocessed_text__automated_readability_index'
col_smog_index 'preprocessed_text__smog_index'
col_syllables

column name in community.posts to use for mean number

'preprocessed_text__n_syllables'
col_n_words

column name in community.posts to use for word count

'preprocessed_text__n_words'
community required

experience

initiator_experience_by_commenter_network_out_deg_centrality(community)

Determines a thread initiator's 'experience' by their out-degree centrality in the commenter network at the time of thread creation, i.e., the number of users the initiator has 'commented on' (has replied to in a user's thread).

Parameters:

Name Type Description Default
community required

initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)

Parameters:

Name Type Description Default
ignore_temporal_dependency True
community required

helpfulness

initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)

Calculates the contribution regularity of the initiator of each thread. Contribution regularity is the percentage of past days in which the initiator posted in the forum. Past days is limited by lookback_days parameter.

Parameters:

Name Type Description Default
lookback_days 100
community required

initiator_helpfulness_by_foreign_thread_comment_frequency(community)

This indicator measures initiator helpfulness by the frequency of comments by the thread's initiator that were posted in threads with a different initiator ('foreign threads').

Parameters:

Name Type Description Default
community required

initiator_helpfulness_by_top_commenter_status(community, contributor, k=90)

Calculates whether a thread's initiator has top commenter status. A 'top commenter' has posted more comments than the k-th percentile (default: k=90).

Parameters:

Name Type Description Default
community required
contributor required

idea_popularity

idea_popularity_by_number_of_unique_users_commenting(community)

Parameters:

Name Type Description Default
community required

network_position

Metrics using the community's graph object (representation of contributor network).

By level of observation:

contributors

  • [contributor_degree][pici.metrics.network.contributor_degree]
  • [contributor_centralities][pici.metrics.network.contributor_centralities]
  • [contributor_communities][pici.metrics.network.contributor_communities]

co_contributor_centralities(community)

Contributor centralities.

Includes degree centrality, betweenness centrality, and eigenvector centrality. Using networkx implementation.

Parameters:

Name Type Description Default
community pici.Community required

co_contributor_communities(community, leiden_lib='cdlib')

Find communities within the contributor network.

Uses weighted Leiden algorithm (Traag et al., 2018) implemented in cdlib.algorithms.leiden or leidgenalg.

Traag, Vincent, Ludo Waltman, and Nees Jan van Eck. From Louvain to Leiden: guaranteeing well-connected communities. arXiv preprint arXiv:1810.08473 (2018).

Parameters:

Name Type Description Default
leiden_lib

Which Leiden alg. implementation to use, 'cdlib' or

'cdlib'
community required

Returns:

Name Type Description
node_communities_map dict of node:list(communities)

List of

communities a contributor belongs to. See [

cdlib.NodeClustering.to_node_community_map]

(https://cdlib.readthedocs.io/en/latest/reference/classes

/node_clustering.html).

co_contributor_degree(community)

Number of contributors each contributor has co-authored with in a thread.

Using implementation of networkx.Graph.degree.

TODO

document

Parameters:

Name Type Description Default
community pici.Community required

initiator_centrality_in_co_contributor_network(community, k=None)

TODO: implement using _initial_post_author_network_metric()

Parameters:

Name Type Description Default
community required
k None

reports

Reports are groups of metrics evaluated for all communities under analysis. See also: building reports.

Reports by level of observation:

community

posts_contributors_per_interval(pici, interval)

Number of contributors and posts for each time interval.

TODO
  • document
  • add to TOC

Parameters:

Name Type Description Default
pici required
interval required

Returns:

Name Type Description
report
  • number of posts per interval
  • number of contributors per interval

summary(pici)

Summarizes communities by posting behavior.

Parameters:

Name Type Description Default
pici pici.Pici required

Returns:

Name Type Description
report
  • number of posts,
  • number of posts per day (aggregated)
  • number of posts per month (aggregated)

status_reputation

initiator_prestige_by_commenter_network_in_deg_centrality(community)

Determines a thread initiator's 'prestige' by their degree centrality in the commenter network at the time of thread creation, i.e., the number of users that have commented on at least one of their threads at that time.

Parameters:

Name Type Description Default
community required

number_of_replies_to_topics_initiated_by_thread_initiator(community, ignore_temporal_dependency=True)

Bla.

Parameters:

Name Type Description Default
ignore_temporal_dependency

Whether to simple count all replies,

True
community required