helpers
aggregate(dict_of_series, aggregations=[np.mean, np.min, np.max, np.std, np.sum])
Applies a number of aggregations to the series supplied as values in
dict_of_series
. Keys are names of series, the name of the
aggregation is appended to the series names as "(agg-name)".
Parameters:
Name | Type | Description | Default |
---|---|---|---|
aggregations |
list of aggregation functions |
[np.mean, np.min, np.max, np.std, np.sum]
|
|
dict_of_series |
dict of indicator_name:Pandas.Series |
required |
Returns:
Type | Description |
---|---|
dict of formatted indicator_name: aggregated series |
apply_to_initial_posts(community, new_cols, func)
Applies func
to initial posts (community.posts
where
post_position_in_thread==1
). Returns DataFrame with topic_column
field as index. Cols in retured df are named according to strings in
new_cols
, values in cols in order of values returned by func
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
pici.Community |
required | |
new_cols |
list of strings |
required | |
func |
function to apply to each initial post from community.posts |
required |
and indexed by thread-ids.
as_table(func)
Decorator that returns results as table, indexed with community name. TODO: document
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
required |
create_co_contributor_graph(link_data, node_data, node_col, group_col, node_attributes, connected=True)
Creates a networkx.Graph with nodes=users and edges if two users have contributed to the same thread. Edge weights = number of threads where two users co-contributed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
link_data |
required | ||
node_data |
required | ||
node_col |
required | ||
group_col |
required | ||
node_attributes |
required | ||
connected |
True
|
create_commenter_graph(link_data, node_data, node_col, group_col, node_attributes, conntected=True)
Creates a networkx.DiGraph with nodes=users and directed edges a->b if a has replied to an initial post by b. Edge weight is the number of comments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
link_data |
required | ||
node_data |
required | ||
node_col |
required | ||
group_col |
required | ||
node_attributes |
required | ||
conntected |
True
|
flat(df, columns='community_name')
Returns a pivoted version of df
with flattened index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame
|
Pandas.DataFrame |
required |
columns |
str
|
Column name to pivot on. |
'community_name'
|
generate_indicator_results(posts, initial_post, feedback, indicator_text, column, aggs=[np.sum, np.mean, np.min, np.max, np.std])
Returns results from column
in DataFrames posts
,
initial_post
, and feedback
as different aggregations
(sum, mean, ...). Initial post is only aggregated as sum. Output is a
dict with df/agg: value, e.g. "posts indicator_text (mean)":value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
posts |
required | ||
initial_post |
required | ||
feedback |
required | ||
indicator_text |
required | ||
column |
required |
join_df(func)
Decorator that joins results to existing dataframe in community. TODO: document
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
required |
merge_dfs(dfs, only_unique=False)
Wrapper for Pandas.merge(). Merges DataFrames, so that
TODO: document
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dfs |
Iterable[pd.DataFrame]
|
required | |
only_unique |
bool
|
False
|
num_words(text)
Counts the number of words in a text. Does account for html tags and comments (not included in count).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Text to count words in. |
required |
Returns:
Name | Type | Description |
---|---|---|
count |
int
|
Number of words. |
series_most_common(series)
Get most common element from Pandas.Series.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
series |
pd.Series
|
Pandas.Series |
required |
where_all(conditions)
Concatenates logical condition with and
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conditions |
required |
word_occurrences(text, words)
Counts the number of occurrences of specified words
in text
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
A text with words. |
required |
words |
list of str
|
Words. |
required |
Returns:
Name | Type | Description |
---|---|---|
occurrences |
dict of str:int
|
|
A |