Skip to content

posts

number_of_words(community)

Adds the number of words in each post as int to community.posts.

post_position_in_thread(community)

Adds each post's position in thread (as int, starting with 1) to community.posts.

preprocessed_text(community, n_topics=10)

This preprocessor supplies cleaned text, text statistics (using Textacy) and sentiment statistics (TextBlob). The following columns are added to Community.posts:

  • clean
  • all_words
  • words_no_stop
  • n_words_no_stop
  • frac_uppercase
  • frac_punctuation_marks
  • avg_syllables_per_word
  • sentiment_polarity
  • sentiment_subjectivity
  • n_words
  • n_chars
  • n_long_words
  • n_unique_words
  • n_syllables
  • n_syllables_per_word
  • entropy
  • ttr
  • segmented_ttr
  • hdd
  • automated_index
  • flesch_reading_ease
  • smog_index
  • coleman_liau_index
  • flesch_kincaid_grade_level
  • gunning_fog_index

Parameters:

Name Type Description Default
community required

rounded_date(community, round_dates_to='7D')

Round the post dates according to specified frequency. If round_dates_to is None (default), this preprocessor does nothing.

Parameters:

Name Type Description Default
community required
round_dates_to

Frequency to round the initial posts'

'7D'
<https

//pandas.pydata.org/docs/user_guide/timeseries.html

required