Skip to content

helpers

aggregate(dict_of_series, aggregations=[np.mean, np.min, np.max, np.std, np.sum])

Applies a number of aggregations to the series supplied as values in dict_of_series. Keys are names of series, the name of the aggregation is appended to the series names as "(agg-name)".

Parameters:

Name Type Description Default
aggregations

list of aggregation functions

[np.mean, np.min, np.max, np.std, np.sum]
dict_of_series

dict of indicator_name:Pandas.Series

required

Returns:

Type Description

dict of formatted indicator_name: aggregated series

apply_to_initial_posts(community, new_cols, func)

Applies func to initial posts (community.posts where post_position_in_thread==1). Returns DataFrame with topic_column field as index. Cols in retured df are named according to strings in new_cols, values in cols in order of values returned by func.

Parameters:

Name Type Description Default
community

pici.Community

required
new_cols

list of strings

required
func

function to apply to each initial post from community.posts

required

and indexed by thread-ids.

as_table(func)

Decorator that returns results as table, indexed with community name. TODO: document

Parameters:

Name Type Description Default
func required

create_co_contributor_graph(link_data, node_data, node_col, group_col, node_attributes, connected=True)

Creates a networkx.Graph with nodes=users and edges if two users have contributed to the same thread. Edge weights = number of threads where two users co-contributed.

Parameters:

Name Type Description Default
link_data required
node_data required
node_col required
group_col required
node_attributes required
connected True

create_commenter_graph(link_data, node_data, node_col, group_col, node_attributes, conntected=True)

Creates a networkx.DiGraph with nodes=users and directed edges a->b if a has replied to an initial post by b. Edge weight is the number of comments.

Parameters:

Name Type Description Default
link_data required
node_data required
node_col required
group_col required
node_attributes required
conntected True

flat(df, columns='community_name')

Returns a pivoted version of df with flattened index.

Parameters:

Name Type Description Default
df pd.DataFrame

Pandas.DataFrame

required
columns str

Column name to pivot on.

'community_name'

generate_indicator_results(posts, initial_post, feedback, indicator_text, column, aggs=[np.sum, np.mean, np.min, np.max, np.std])

Returns results from column in DataFrames posts, initial_post, and feedback as different aggregations (sum, mean, ...). Initial post is only aggregated as sum. Output is a dict with df/agg: value, e.g. "posts indicator_text (mean)":value.

Parameters:

Name Type Description Default
posts required
initial_post required
feedback required
indicator_text required
column required

join_df(func)

Decorator that joins results to existing dataframe in community. TODO: document

Parameters:

Name Type Description Default
func required

merge_dfs(dfs, only_unique=False)

Wrapper for Pandas.merge(). Merges DataFrames, so that

TODO: document

Parameters:

Name Type Description Default
dfs Iterable[pd.DataFrame] required
only_unique bool False

num_words(text)

Counts the number of words in a text. Does account for html tags and comments (not included in count).

Parameters:

Name Type Description Default
text str

Text to count words in.

required

Returns:

Name Type Description
count int

Number of words.

series_most_common(series)

Get most common element from Pandas.Series.

Parameters:

Name Type Description Default
series pd.Series

Pandas.Series

required

where_all(conditions)

Concatenates logical condition with and.

Parameters:

Name Type Description Default
conditions required

word_occurrences(text, words)

Counts the number of occurrences of specified words in text.

Parameters:

Name Type Description Default
text str

A text with words.

required
words list of str

Words.

required

Returns:

Name Type Description
occurrences dict of str:int

A word (str), number of occurrences (int) dictionary