posts
number_of_words(community)
Adds the number of words in each post as int
to community.posts
.
post_position_in_thread(community)
Adds each post's position in thread (as int, starting with 1) to
community.posts
.
preprocessed_text(community, n_topics=10)
This preprocessor supplies cleaned text, text statistics (using Textacy)
and sentiment statistics (TextBlob). The following columns are added to
Community.posts
:
- clean
- all_words
- words_no_stop
- n_words_no_stop
- frac_uppercase
- frac_punctuation_marks
- avg_syllables_per_word
- sentiment_polarity
- sentiment_subjectivity
- n_words
- n_chars
- n_long_words
- n_unique_words
- n_syllables
- n_syllables_per_word
- entropy
- ttr
- segmented_ttr
- hdd
- automated_index
- flesch_reading_ease
- smog_index
- coleman_liau_index
- flesch_kincaid_grade_level
- gunning_fog_index
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required |
rounded_date(community, round_dates_to='7D')
Round the post dates according to specified frequency.
If round_dates_to
is None (default), this preprocessor does nothing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
community |
required | ||
round_dates_to |
Frequency to round the initial posts' |
'7D'
|
|
<https |
//pandas.pydata.org/docs/user_guide/timeseries.html |
required |