labelling
InnovationLabels
Bases: Labels
TODO: add documentation
from_limesurvey(limesurvey_results, drop_labellers=None)
Adds label entries from Limesurvey results format. Limesurvey results can contain multiple labelled threads per response. For each thread i and associated url and labels, the data must contain one column, e.g.: "thread1" (=url), "labelA1", "labelB1", ..., "thread2", "labelA2", "labelB2", ...
Parameters:
Name | Type | Description | Default |
---|---|---|---|
limesurvey_results |
String (path to file) or Pandas.DataFrame |
required |
LabelCollection
TODO: add documentation
all_label_names()
property
TODO: add documentation
by_level(level)
TODO: Add documentation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
level |
required |
labels()
property
TODO: add documentation
LabelStats
This class provides metrics and visualizations to analyze the annotations made by the labellers.
Available metrics are
- % agreement ("a_0")
- Cohen's kappa (two labellers)
- Fleiss' kappa (multiple labellers)
- Krippendorff's alpha (multiple labellers, missing data)
Labellers' annotations can furthermore be evaluated against a subsample of "goldstandard" annotations, allowing to associate labellers with a quality- score.
TODO refactor --> move visualizations to visualizations.py ?
See [1] for a comparison of inter-rater agreement metrics.
[1] Xinshu Zhao, Jun S. Liu & Ke Deng (2013) Assumptions behind Intercoder Reliability Indices, Annals of the International Communication Association, 36:1, 419-480, DOI: 10.1080/23808985.2013.11679142
_melt_goldstandard_agreement(data)
Prepares interrater_agreement
dataframe for plotting (e.g., with
seaborn): wide format to long format, shorten label names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Pandas.DataFrame with label names as index, "labellers"- |
required |
Returns:
Type | Description |
---|---|
Pandas.DataFrame with columns: - index: shortened label names - labellers: labeller names - variable: metric name - value: metric value |
cohen_kappa()
Get Cohen's kappa for all labels, using scikit-learn implementation
sklearn.metrics.cohen_kappa_score
.
Returns NaN if number of labellers != 2.
Returns:
Type | Description |
---|---|
dict of (label name, kappa) |
complete_agreement()
Get percentage of cases where all labellers agree (per label).
Returns:
Name | Type | Description |
---|---|---|
agreement | Pandas.DataFrame with 'label', '% perfect agreement', |
|
'% n' |
fleiss_kappa()
Get Fleiss kappa for all labels, based on Statsmodels
implementation (statsmodels.stats.inter_rater.fleiss_kappa
).
Returns NaN if number of labellers < 2.
Returns:
Type | Description |
---|---|
dict of (label name, kappa) |
interrater_agreement()
Calculated the overall interrater agreement for all labellers in data. If number of labellers > 2, all values for Cohen/Fleiss kappa will be NaN.
Returns:
Type | Description |
---|---|
agreement dataframe |
krippendorff_alpha()
Get Krippendorff alphas using the ''krippendorff'' package.
See also: Andrew F. Hayes & Klaus Krippendorff (2007) Answering the Call for a Standard Reliability Measure for Coding Data, Communication Methods and Measures, 1:1, 77-89, DOI: 10.1080/19312450709336664
Returns:
Type | Description |
---|---|
Pandas.DataFrame |
pairwise_interrater_agreement(goldstandard=None, min_comparisons=1)
Calculates the agreement metrics for all combinations of two labellers. If goldstandard is set (name of labeller), only comparisons with the goldstandard are calculated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
goldstandard |
Name of labeller |
None
|
|
min_comparisons |
Minimum number of shared labelled cases |
1
|
Returns:
Type | Description |
---|---|
Agreement dataframe with "labellers" column that contains |
|
|
|
|
plot_goldstandard_agreement(kind='label_boxplots', goldstandard=None, data=None)
Plot the labellers' agreement with goldstandard. Provides different
plots through kind
:
- label_boxplots: values: each labeller's agreement with
goldstandard, x: metric, y: boxplot of values
- labellers_points: values: each labeller's agreement with
goldstandard, grid, col per metric, x: labels, y: values
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kind |
One of {'label_boxplots','labellers_points'} |
'label_boxplots'
|
|
goldstandard |
Name of labeller to use as goldstandard (used if data is None, generates data) |
None
|
|
data |
agreement dataframe generated by
|
None
|
Labels
Bases: ABC
TODO: add documentation
__init__(data=None, cols=DEFAULT_COLS, filter=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.DataFrame
|
None
|
|
cols |
dict
|
DEFAULT_COLS
|
append(data, cols=DEFAULT_COLS, drop_labellers=None)
TODO: add documentation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.DataFrame
|
required | |
cols |
dict
|
DEFAULT_COLS
|
|
drop_labellers |
None
|
data_by_label(format='sklearn', dropna=False)
TODO: add documentation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dropna |
False
|
labellers()
property
TODO: add documentation
rating_table(label_name, communities=None, custom_filter=None, allow_missing_data=False)
Get the rating table for one label to be used, e.g.,
with statsmodels.stats.inter_rater
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
communities |
List of communities to include in table, |
None
|
|
label_name |
label to be returned in table |
required | |
allow_missing_data |
whether to drop columns with missing ratings |
False
|
Returns:
Type | Description |
---|---|
rating table: labels as 2-dim table with raters (labellers) in |
|
rows and ratings in columns. |
set_filter(f)
TODO: add documentation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f |
required |