metrics
basic
Basic metrics based on counts, dates etc. of posts, contributors.
By level of observation / concept:
topics
community
agg_number_of_posts_per_interval(community, interval)
Number of posts per interval
.
Total number of posts in community per interval
(parameter).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required | |
interval |
str
|
The interval over which to aggregate.
See |
required |
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str:int
|
|
agg_posts_per_topic(community)
Min, max, and average number of posts authored per topic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str:int
|
|
contributors_per_interval(community, interval)
Number of users that have authored at least one post in time interval.
TODO
- document
- add to TOC
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
interval |
required |
lorenz(community)
Distribution of posts (in analogy to lorenz curve). Returns (x,y) where x is the (least-contributing) bottom x% of users, and y the proportion of posts made by them.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
report: - % contributors - % posts |
required |
number_of_contributors_per_topic(community)
Number of different contributors that have authored at least one post in a thread.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required |
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str: int
|
|
number_of_posts(community)
Total number of posts authored by community.
TODO
document
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
number_of_posts_per_topic(community)
Number of posts per topic.
TODO
- add to toc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
Returns:
Name | Type | Description |
---|---|---|
report |
|
number_of_words(community)
The number of words in a post (removing html).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required |
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str:int
|
|
post_dates_per_topic(community)
Date of first post, second post, and last post.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required |
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str:date
|
|
post_delays_per_topic(community)
Delays (in days) between first and second post, and first and last post.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required |
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str:int
|
|
posts_per_interval(community, interval)
Number of posts authored by community per time interval.
TODO
- document
- add to TOC
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
interval |
required |
posts_word_occurrence(community, words, normalize=True)
Counts the occurrence of a set of words in each post.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required | |
words |
list of str
|
List of words to count in post texts. |
required |
normalize |
bool
|
Normalize occurrence count by text length. |
True
|
Returns:
Name | Type | Description |
---|---|---|
results |
dict of str:int
|
|
cached_metrics
This is a collection of all cachable functions that are used in the
calculation of indicators. The cache is implemented using
functools.lru_cache
with maxsize=None
. Caching is commonly done at
least on community level (pici.Community is hashable). Examples for when
using a cache makes sense:
- calculating the similarity of post texts (done once for all combinations)
- generating "temporal networks" (filtered representations of networks, depending on dates of posts)
It is recommended to define cached parts of indicators here.
_comments_by_contributor(community, contributor, date_limit=None)
Get all threads initiated by contributor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
contributor |
User name |
required | |
date_limit |
Date in string format, e.g. '2020-01-15' |
None
|
specified user (before the specified date_limit).
_contribution_regularity(community, contributor, start, end)
Get the contribution regularity of contributor
as the percentage of
days that contributor posted in the forum, between the dates start
and end
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
contributor |
required | ||
start |
required | ||
end |
required |
_initial_post_author_network_metric(initial_post, community, metric, kind)
Get a cached network metric for the author of an initial post.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial_post |
required | ||
metric |
required | ||
community |
required | ||
thread_date |
required | ||
kind |
required |
Returns:
Type | Description |
---|---|
The value of the metric. |
_replies_to_own_topics(community, contributor, date_limit=None)
The number of replies made to initial posts by specified contributor in community.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
contributor |
required | ||
date_limit |
Date in string format, e.g. '2020-01-15' |
None
|
contributor. If date_limit is provided, only threads & replies posted before the date limit are considered.
_temporal_text_similarity_dict(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio')
Returns a dictionary of post-text:1xn-similarity-matrix for similarity subgraph filtered by date.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
date |
required | ||
text_col |
'preprocessed_text__words_no_stop'
|
||
similarity_metric |
'token_sort_ratio'
|
_temporal_text_similarity_network(community, date, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)
Create a subview graph of the text similarity network created by
`_text_similarity_network()
by filtering out all nodes (=posts)
where post.date is > date.
Args: community: date: text_col: similarity_metric: only_initial_posts:
Returns:
_text_similarity_network(community, text_col='preprocessed_text__words_no_stop', similarity_metric='token_sort_ratio', only_initial_posts=True)
Create a text-similarity network for all posts in community, using
textacy.representations.network.build_similarity_network()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
text_col |
'preprocessed_text__words_no_stop'
|
||
similarity_metric |
'token_sort_ratio'
|
||
only_initial_posts |
True
|
_threads_by_contributor(community, contributor, date_limit=None)
Get all threads initiated by contributor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
contributor |
User name |
required | |
date_limit |
Date in string format, e.g. '2020-01-15' |
None
|
specified user (before the specified date_limit).
distinctiveness
initial_post_text_distance(community, similarity_metric='token_sort_ratio')
Calculates the text distance of initial posts to previously authored initial posts as a measure of distinctiveness.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
similarity_metric |
'token_sort_ratio'
|
||
agg_method |
required |
elaboration
basic_text_based_elaboration(community, col_n_words='preprocessed_text__n_words', col_n_words_no_stop='preprocessed_text__n_words_no_stop', col_syllables='preprocessed_text__n_syllables', col_avg_syllables='preprocessed_text__avg_syllables_per_word', col_smog_index='preprocessed_text__smog_index', col_auto_readability='preprocessed_text__automated_readability_index', col_coleman_liau='preprocessed_text__coleman_liau_index', col_flesch_kincaid='preprocessed_text__flesch_kincaid_grade_level', col_frac_uppercase='preprocessed_text__frac_uppercase', col_frac_punctuation_marks='preprocessed_text__frac_punctuation_marks')
Provides basic text-based elaboration indicators, such as number of words, number of syllables, and different readability scores.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_flesch_kincaid |
'preprocessed_text__flesch_kincaid_grade_level'
|
||
col_coleman_liau |
'preprocessed_text__coleman_liau_index'
|
||
col_auto_readability |
'preprocessed_text__automated_readability_index'
|
||
col_smog_index |
'preprocessed_text__smog_index'
|
||
col_syllables |
column name in community.posts to use for mean number |
'preprocessed_text__n_syllables'
|
|
col_n_words |
column name in community.posts to use for word count |
'preprocessed_text__n_words'
|
|
community |
required |
experience
initiator_experience_by_commenter_network_out_deg_centrality(community)
Determines a thread initiator's 'experience' by their out-degree centrality in the commenter network at the time of thread creation, i.e., the number of users the initiator has 'commented on' (has replied to in a user's thread).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
initiator_experience_by_past_contributions(community, ignore_temporal_dependency=True, use_rounded_date=False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ignore_temporal_dependency |
True
|
||
community |
required |
helpfulness
initiator_helpfulness_by_contribution_regularity(community, lookback_days=100)
Calculates the contribution regularity of the initiator of each thread.
Contribution regularity is the percentage of past days in which the
initiator posted in the forum. Past days is limited by lookback_days
parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lookback_days |
100
|
||
community |
required |
initiator_helpfulness_by_foreign_thread_comment_frequency(community)
This indicator measures initiator helpfulness by the frequency of comments by the thread's initiator that were posted in threads with a different initiator ('foreign threads').
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
initiator_helpfulness_by_top_commenter_status(community, contributor, k=90)
Calculates whether a thread's initiator has top commenter status. A 'top
commenter' has posted more comments than the k
-th percentile (default:
k=90).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
contributor |
required |
idea_popularity
idea_popularity_by_number_of_unique_users_commenting(community)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
network_position
Metrics using the community's graph object (representation of contributor network).
By level of observation:
contributors
- [contributor_degree][pici.metrics.network.contributor_degree]
- [contributor_centralities][pici.metrics.network.contributor_centralities]
- [contributor_communities][pici.metrics.network.contributor_communities]
co_contributor_centralities(community)
Contributor centralities.
Includes degree centrality, betweenness centrality, and eigenvector centrality.
Using networkx
implementation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required |
co_contributor_communities(community, leiden_lib='cdlib')
Find communities within the contributor network.
Uses weighted Leiden algorithm (Traag et al., 2018) implemented in
cdlib.algorithms.leiden
or leidgenalg
.
Traag, Vincent, Ludo Waltman, and Nees Jan van Eck. From Louvain to Leiden: guaranteeing well-connected communities. arXiv preprint arXiv:1810.08473 (2018).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
leiden_lib |
Which Leiden alg. implementation to use, 'cdlib' or |
'cdlib'
|
|
community |
required |
Returns:
Name | Type | Description |
---|---|---|
node_communities_map |
dict of node:list(communities)
|
List of |
communities a contributor belongs to. See [ |
||
|
||
(https://cdlib.readthedocs.io/en/latest/reference/classes |
||
/node_clustering.html). |
co_contributor_degree(community)
Number of contributors each contributor has co-authored with in a thread.
Using implementation of networkx.Graph.degree
.
TODO
document
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community
|
required |
initiator_centrality_in_co_contributor_network(community, k=None)
TODO: implement using _initial_post_author_network_metric()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
k |
None
|
reports
Reports are groups of metrics evaluated for all communities under analysis. See also: building reports.
Reports by level of observation:
community
posts_contributors_per_interval(pici, interval)
Number of contributors and posts for each time interval
.
TODO
- document
- add to TOC
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pici |
required | ||
interval |
required |
Returns:
Name | Type | Description |
---|---|---|
report |
|
status_reputation
initiator_prestige_by_commenter_network_in_deg_centrality(community)
Determines a thread initiator's 'prestige' by their degree centrality in the commenter network at the time of thread creation, i.e., the number of users that have commented on at least one of their threads at that time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
number_of_replies_to_topics_initiated_by_thread_initiator(community, ignore_temporal_dependency=True)
Bla.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ignore_temporal_dependency |
Whether to simple count all replies, |
True
|
|
community |
required |