(六)RASA NLU意图分类器

共 19701字,需浏览 40分钟

 ·

2021-09-11 20:13


作者简介




原文:https://zhuanlan.zhihu.com/p/333309670

转载者:杨夕

面筋地址:https://github.com/km1994/NLP-Interview-Notes

个人笔记:https://github.com/km1994/nlp_paper_study


                   


RASA的逻辑是根据用户本轮说话的意图做分类,然后结合历史上下文,给出一个action。意图分类是后续策略选择的基础。

RASA支持的意图分类器有:

MitieIntentClassifier

使用MitieNLP的分类器,需要Tokenizer都使用MitieNLP,但是MitieIntentClassifier分类器里面已经自带Featurizer功能,所以不是必须配置的。简单来说,是基于稀疏线性核的一个多分类线性SVM。具体算法参考:

MITE : https://github.com/mit-nlp/MITIEhttps://github.com/mit-nlp/MITIE

SklearnIntentClassifier

使用Sklearn去做意图识别。sklearn也是通过SVM做意图识别,只是sklearn的SVM是通过grid search方法优化的,关于Grid Search参考

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

SklearnIntentClassifier使用时候需要将SVM的超参数配置上。具体配置如下:

pipeline:- name: "SklearnIntentClassifier" # Specifies the list of regularization values to # cross-validate over for C-SVM. # This is used with the ``kernel`` hyperparameter in GridSearchCV. C: [1, 2, 5, 10, 20, 100] # Specifies the kernel to use with C-SVM. # This is used with the ``C`` hyperparameter in GridSearchCV. kernels: ["linear"] # Gamma parameter of the C-SVM. "gamma": [0.1] # We try to find a good number of cross folds to use during # intent training, this specifies the max number of folds. "max_cross_validation_folds": 5 # Scoring function used for evaluating the hyper parameters. # This can be a name or a function. "scoring_function": "f1_weighted"

KeywordIntentClassifier

简单的关键字匹配意图分类,适用于小型项目,意图比较少的情况。当意图很多,相关性又很大的时候,关键词分类器无法区分。

关键字的匹配方式是,训练数据的整句话都作为关键字,去搜索用户说的话。因此写配置数据的时候,仔细设计那个训练数据很重要,关键字不能太长,这容易匹配不上意图,也不能太短,缺少意图的区分度。

DIETClassifier

DIET模型是Dual Intent and Entity Transformer的简称, 解决了对话理解问题中的2个问题,意图分类和实体识别。DIET使用的是纯监督的方式,没有任何预训练的情况下,无须大规模预训练是关键,性能好于fine-tuning Bert, 但是训练速度是bert的6倍。输入是用户消息和可选意图的稠密或者稀疏向量。输出是实体,意图和评分。

DIET体系结构基于两个任务共享的Transformer。实体标签序列通过Transformer后,输出序列进入顶层条件随机场(CRF)标记层预测,输出每个Token成为BIOE的概率。完整话语和意图标签经过Transformer输出到单个语义向量空间中。利用点积损失最大化与目标标签的相似度,最小化与负样本的相似度。具体DIET的算法参考:

DIET:Dual Intent and Entity Transformer——RASA论文翻译: https://zhuanlan.zhihu.com/p/337181983

如果只想将DIETClassifier用于意图分类,请将entity_recognition设置为False。如果只想进行实体识别,请将intent_classification设置为False。默认情况下,DIETClassifier同时执行这两项操作,即实体识别和意图分类都设置为True。

可以定义多个超参数来调整模型。如果要调整模型,请首先修改以下参数:

epochs:此参数设置算法将看到训练数据的次数(默认值:300)。一个epoch等于所有训练实例的一个向前传播和一个向后传播。有时模型需要更多的epoch来正确学习。epoch数越少,模型的训练速度就越快。

hidden_layers_sizes:此参数允许您为用户消息和意图定义前馈层的数量及其输出维度(默认值:文本:[],标签:[])。列表中的每个条目都对应一个前馈层。例如,如果设置text:[256,128],我们将在转换器前面添加两个前馈层。输入token的向量(来自用户消息)将被传递到这些层。第一层的输出维度为256,第二层的输出维度为128。如果使用空列表(默认行为),则不会添加前馈层。确保只使用正整数值。通常使用二次幂的数字,第二个值小于或等于前一个值。

embedding_dimension:该参数定义模型内部使用的嵌入层的输出维度(默认值:20)。我们在模型架构中使用了多个嵌入层。例如,在比较和计算损失之前,将完整的话语和意图的向量传递到嵌入层。

number_of_transformer_layers:此参数设置要使用的transformer层数(默认值:2)。transformer层的数量对应于要用于模型的transformer块。

transformer_size:此参数设置transformer中的单位数(默认值:256)。来自transformer的矢量将具有给定的transformer_size。

weight_sparsity:该参数定义模型中所有前馈层的内核权重的分数(默认值:0.8)。该值应介于0和1之间。如果将weight_sparsity设置为0,则不会将内核权重设置为0,该层将充当标准的前馈层。您不应该将weight_sparsity设置为1,因为这将导致所有内核权重为0,即模型无法学习。

一般来说,调整这些参数就可以获得比较好的模型。另外还有其他可以调整的参数,具体见下表。

+---------------------------------+------------------+--------------------------------------------------------------+
| Parameter | Default Value | Description |
+=================================+==================+==============================================================+
| hidden_layers_sizes | text: [] | Hidden layer sizes for layers before the embedding layers |
| | label: [] | for user messages and labels. The number of hidden layers is |
| | | equal to the length of the corresponding list. |
+---------------------------------+------------------+--------------------------------------------------------------+
| share_hidden_layers | False | Whether to share the hidden layer weights between user |
| | | messages and labels. |
+---------------------------------+------------------+--------------------------------------------------------------+
| transformer_size | 256 | Number of units in transformer. |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_transformer_layers | 2 | Number of transformer layers. |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_attention_heads | 4 | Number of attention heads in transformer. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_key_relative_attention | False | If 'True' use key relative embeddings in attention. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_value_relative_attention | False | If 'True' use value relative embeddings in attention. |
+---------------------------------+------------------+--------------------------------------------------------------+
| max_relative_position | None | Maximum position for relative embeddings. |
+---------------------------------+------------------+--------------------------------------------------------------+
| unidirectional_encoder | False | Use a unidirectional or bidirectional encoder. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_size | [64, 256] | Initial and final value for batch sizes. |
| | | Batch size will be linearly increased for each epoch. |
| | | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_strategy | "balanced" | Strategy used when creating batches. |
| | | Can be either 'sequence' or 'balanced'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| epochs | 300 | Number of epochs to train. |
+---------------------------------+------------------+--------------------------------------------------------------+
| random_seed | None | Set random seed to any 'int' to get reproducible results. |
+---------------------------------+------------------+--------------------------------------------------------------+
| learning_rate | 0.001 | Initial learning rate for the optimizer. |
+---------------------------------+------------------+--------------------------------------------------------------+
| embedding_dimension | 20 | Dimension size of embedding vectors. |
+---------------------------------+------------------+--------------------------------------------------------------+
| dense_dimension | text: 128 | Dense dimension for sparse features to use. |
| | label: 20 | |
+---------------------------------+------------------+--------------------------------------------------------------+
| concat_dimension | text: 128 | Concat dimension for sequence and sentence features. |
| | label: 20 | |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_negative_examples | 20 | The number of incorrect labels. The algorithm will minimize |
| | | their similarity to the user input during training. |
+---------------------------------+------------------+--------------------------------------------------------------+
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
| | | Should be 0.0 < ... < 1.0 for 'cosine' similarity type. |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_negative_similarity | -0.4 | Maximum negative similarity for incorrect labels. |
| | | Should be -1.0 < ... < 1.0 for 'cosine' similarity type. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_maximum_negative_similarity | True | If 'True' the algorithm only minimizes maximum similarity |
| | | over incorrect intent labels, used only if 'loss_type' is |
| | | set to 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| scale_loss | False | Scale loss inverse proportionally to confidence of correct |
| | | prediction. |
+---------------------------------+------------------+--------------------------------------------------------------+
| regularization_constant | 0.002 | The scale of regularization. |
+---------------------------------+------------------+--------------------------------------------------------------+
| negative_margin_scale | 0.8 | The scale of how important it is to minimize the maximum |
| | | similarity between embeddings of different labels. |
+---------------------------------+------------------+--------------------------------------------------------------+
| weight_sparsity | 0.8 | Sparsity of the weights in dense layers. |
| | | Value should be between 0 and 1. |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate | 0.2 | Dropout rate for encoder. Value should be between 0 and 1. |
| | | The higher the value the higher the regularization effect. |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate_attention | 0.0 | Dropout rate for attention. Value should be between 0 and 1. |
| | | The higher the value the higher the regularization effect. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_sparse_input_dropout | True | If 'True' apply dropout to sparse input tensors. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_dense_input_dropout | True | If 'True' apply dropout to dense input tensors. |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs | 20 | How often to calculate validation accuracy. |
| | | Set to '-1' to evaluate just once at the end of training. |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples | 0 | How many examples to use for hold out validation set. |
| | | Large values may hurt performance, e.g. model accuracy. |
+---------------------------------+------------------+--------------------------------------------------------------+
| intent_classification | True | If 'True' intent classification is trained and intents are |
| | | predicted. |
+---------------------------------+------------------+--------------------------------------------------------------+
| entity_recognition | True | If 'True' entity recognition is trained and entities are |
| | | extracted. |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_masked_language_model | False | If 'True' random tokens of the input message will be masked |
| | | and the model has to predict those tokens. It acts like a |
| | | regularizer and should help to learn a better contextual |
| | | representation of the input. |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_directory | None | If you want to use tensorboard to visualize training |
| | | metrics, set this option to a valid output directory. You |
| | | can view the training metrics after training in tensorboard |
| | | via 'tensorboard --logdir <path-to-given-directory>'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_level | "epoch" | Define when training metrics for tensorboard should be |
| | | logged. Either after every epoch ('epoch') or for every |
| | | training step ('minibatch'). |
+---------------------------------+------------------+--------------------------------------------------------------+
| featurizers | [] | List of featurizer names (alias names). Only features |
| | | coming from the listed names are used. If list is empty |
| | | all available features are used. |
+---------------------------------+------------------+--------------------------------------------------------------+
| checkpoint_model | False | Save the best performing model during training. Models are |
| | | stored to the location specified by `--out`. Only the one |
| | | best model will be saved. |
| | | Requires `evaluate_on_number_of_examples > 0` and |
| | | `evaluate_every_number_of_epochs > 0` |
+---------------------------------+------------------+--------------------------------------------------------------+
| split_entities_by_comma | True | Splits a list of extracted entities by comma to treat each |
| | | one of them as a single entity. Can either be `True`/`False` |
| | | globally, or set per entity type, such as: |
| | | ``` |
| | | ... |
| | | - name: DIETClassifier |
| | | split_entities_by_comma: |
| | | address: True |
| | | ... |
| | | ... |
| | | ``` |
+---------------------------------+------------------+--------------------------------------------------------------+

FallbackClassifier

当意图识别的得分比较低时,使用该分类器决定是否给出nlu_fallback意图。注意,这个FallbackClassifier总是跟在其他意图分类器之后,对前一个意图分类提给出的意图及置信度进行判定。如果前一个意图分类器给出的意图预测置信度低于threshold,或者两个排名最高的意图的置信度得分接近时,FallbackClassifier实施回退操作。

回退意图的应答,可以通过规则来实现。

rules:
- rule: Ask the user to rephrase in case of low NLU confidence
steps:
- intent: nlu_fallback
- action: utter_please_rephrase

FallbackClassifier的配置参数有:

threshold:此参数设置预测nlu_fallback意图的阈值。如果前一个意图分类器预测的意图置信度小于threshold,则FallbackClassifier将返回一个置信度为1.0的nlu_fallback意图。

ambiguity_threshold:如果两个排名最高的意图的置信度得分之差小于ambiguity_threshold,FallbackClassifier将返回一个置信度为1.0的nlu_fallback意图。


浏览 101
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报
评论
图片
表情
推荐
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报