Skip to content

labelling

InnovationLabels

Bases: Labels

TODO: add documentation

from_limesurvey(limesurvey_results, drop_labellers=None)

Adds label entries from Limesurvey results format. Limesurvey results can contain multiple labelled threads per response. For each thread i and associated url and labels, the data must contain one column, e.g.: "thread1" (=url), "labelA1", "labelB1", ..., "thread2", "labelA2", "labelB2", ...

Parameters:

Name Type Description Default
limesurvey_results

String (path to file) or Pandas.DataFrame

required

LabelCollection

TODO: add documentation

all_label_names() property

TODO: add documentation

by_level(level)

TODO: Add documentation

Parameters:

Name Type Description Default
level required

labels() property

TODO: add documentation

LabelStats

This class provides metrics and visualizations to analyze the annotations made by the labellers.

Available metrics are
  • % agreement ("a_0")
  • Cohen's kappa (two labellers)
  • Fleiss' kappa (multiple labellers)
  • Krippendorff's alpha (multiple labellers, missing data)

Labellers' annotations can furthermore be evaluated against a subsample of "goldstandard" annotations, allowing to associate labellers with a quality- score.

TODO refactor --> move visualizations to visualizations.py ?

See [1] for a comparison of inter-rater agreement metrics.

[1] Xinshu Zhao, Jun S. Liu & Ke Deng (2013) Assumptions behind Intercoder Reliability Indices, Annals of the International Communication Association, 36:1, 419-480, DOI: 10.1080/23808985.2013.11679142

_melt_goldstandard_agreement(data)

Prepares interrater_agreement dataframe for plotting (e.g., with seaborn): wide format to long format, shorten label names.

Parameters:

Name Type Description Default
data

Pandas.DataFrame with label names as index, "labellers"-

required

Returns:

Type Description

Pandas.DataFrame with columns: - index: shortened label names - labellers: labeller names - variable: metric name - value: metric value

cohen_kappa()

Get Cohen's kappa for all labels, using scikit-learn implementation sklearn.metrics.cohen_kappa_score. Returns NaN if number of labellers != 2.

Returns:

Type Description

dict of (label name, kappa)

complete_agreement()

Get percentage of cases where all labellers agree (per label).

Returns:

Name Type Description
agreement

Pandas.DataFrame with 'label', '% perfect agreement',

'% n'

fleiss_kappa()

Get Fleiss kappa for all labels, based on Statsmodels implementation (statsmodels.stats.inter_rater.fleiss_kappa). Returns NaN if number of labellers < 2.

Returns:

Type Description

dict of (label name, kappa)

interrater_agreement()

Calculated the overall interrater agreement for all labellers in data. If number of labellers > 2, all values for Cohen/Fleiss kappa will be NaN.

Returns:

Type Description

agreement dataframe

krippendorff_alpha()

Get Krippendorff alphas using the ''krippendorff'' package.

See also: Andrew F. Hayes & Klaus Krippendorff (2007) Answering the Call for a Standard Reliability Measure for Coding Data, Communication Methods and Measures, 1:1, 77-89, DOI: 10.1080/19312450709336664

Returns:

Type Description

Pandas.DataFrame

pairwise_interrater_agreement(goldstandard=None, min_comparisons=1)

Calculates the agreement metrics for all combinations of two labellers. If goldstandard is set (name of labeller), only comparisons with the goldstandard are calculated.

Parameters:

Name Type Description Default
goldstandard

Name of labeller

None
min_comparisons

Minimum number of shared labelled cases

1

Returns:

Type Description

Agreement dataframe with "labellers" column that contains

  • (labeller A, labeller B) tuples (default), or
  • labeller B strings (if labeller A is set as goldstandard)

plot_goldstandard_agreement(kind='label_boxplots', goldstandard=None, data=None)

Plot the labellers' agreement with goldstandard. Provides different plots through kind: - label_boxplots: values: each labeller's agreement with goldstandard, x: metric, y: boxplot of values - labellers_points: values: each labeller's agreement with goldstandard, grid, col per metric, x: labels, y: values

Parameters:

Name Type Description Default
kind

One of {'label_boxplots','labellers_points'}

'label_boxplots'
goldstandard

Name of labeller to use as goldstandard (used if data is None, generates data)

None
data

agreement dataframe generated by pairwise_interrater_agreement

None

Labels

Bases: ABC

TODO: add documentation

__init__(data=None, cols=DEFAULT_COLS, filter=None)

Parameters:

Name Type Description Default
data pd.DataFrame None
cols dict DEFAULT_COLS

append(data, cols=DEFAULT_COLS, drop_labellers=None)

TODO: add documentation

Parameters:

Name Type Description Default
data pd.DataFrame required
cols dict DEFAULT_COLS
drop_labellers None

data_by_label(format='sklearn', dropna=False)

TODO: add documentation

Parameters:

Name Type Description Default
dropna False

labellers() property

TODO: add documentation

rating_table(label_name, communities=None, custom_filter=None, allow_missing_data=False)

Get the rating table for one label to be used, e.g., with statsmodels.stats.inter_rater.

Parameters:

Name Type Description Default
communities

List of communities to include in table,

None
label_name

label to be returned in table

required
allow_missing_data

whether to drop columns with missing ratings

False

Returns:

Type Description

rating table: labels as 2-dim table with raters (labellers) in

rows and ratings in columns.

set_filter(f)

TODO: add documentation

Parameters:

Name Type Description Default
f required