

CNN+ATT 是一种基于语句级别选择性注意力机制神经网络模型, 用于构建基于远程监督关系抽取系统.它是一个著名的神经关系抽取 (Neural Relation Extraction, NRE) 模型。

本博文是 CNN+ATT 原论文学习笔记,包括代码实现。

CNN+ATT 原论文链接:Neural Relation Extraction with Selective Attention over Instances.

代码仓库地址: https://github.com/LuYF-Lemon-love/susu-knowledge-graph/tree/main/neural-relation-extraction/C%2B%2B .

操作系统:Ubuntu 18.04.6 LTS


CNN+ATT 原论文学习笔记

Neural Relation Extraction with Selective Attention over Instances (基于语句级别选择性注意力机制的神经网络模型) 提出于 2016 年, 发表于 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).

远程监督关系抽取已经广泛地应用于从文本中发现新型的关系事实. 然而, 远程监督不可避免的伴随着错误标注的问题, 这些嘈杂的数据将大大损害关系抽取的性能.

基于语句级别选择性注意力机制的关系出抽取神经网络模型能够缓解远程监督关系抽取错误标签问题, 该模型使用卷机神经网络嵌入句子的语义. 之后, 使用语句级别选择性注意力动态地降低嘈杂实例 (句子) 的权重.

实验结果证明, 该模型可以充分利用每个句子的所有信息, 有效的减少了错误标记实例 (句子) 的影响.


最近几年, 各种大型知识库 (Freebase, DBpedia, YAGO) 已经被建立和广泛地应用于许多自然语言处理 (natural language processing, NLP) 任务, 包括 web searchquestion answering. 这些知识库是由大量三元组 (格式为 (Microsoft, founder, Bill Gates)) 组成的.

关系抽取 (relation extraction, RE) —— 从纯文本生成关系数据的过程, 是一个自然语言处理的重要任务.

(Mintz et al., 2009)3 提出远程监督, 通过对齐知识库和纯文本自动生成训练数据. 远程监督假设, 如果两个实体在知识库中存在某种关系, 则包含这两个实体的所有句子都将表达这种关系. 例如, 三元组 (Microsoft, founder, Bill Gates) 是知识库中的关系事实, 远距离监督会把包含这两个实体的所有句子都视为关系 founder 的正例. 然而 “Bill Gates ’s turn to philanthropy was linked to the antitrust problems Microsoft had in the U.S. and the European union.” (比尔·盖茨转向慈善事业与微软在美国和欧盟的反垄断问题有关。) 这句话并没有表达关系 founder, 但仍然视为一个正例 (关系 founder).

因此有很多人 (2010 年, 2011 年, 2012 年) 采用多实例学习 (multi-instance learning) 缓解远程监督错误标注的问题. (Zeng et al., 2015)4多实例学习神经网络模型相结合进一步缓解该问题. 该方法假设至少有一个提到这两个实体的句子将表达它们之间的关系, 并且只在训练和预测中为每个实体对选择最有可能的句子. 该方法将丢失大量包含在被忽视的句子中的丰富信息.

本论文提出了一种基于句子级注意力的卷积神经网络 (CNN) 用于远程监督关系提取. 该模型使用 CNN 来嵌入句子的语义, 如下图. 之后, 为了利用每个句子的所有信息, 将关系表示为句子嵌入的语义组合. 为了解决远程监督带来的错误标注问题, 该模型在这些实例的语义向量上构建语句级别的注意力机制, 从而动态地减少噪声实例所对应的权重, 同时提升有效实例所对应的权重. 最后, 将利用注意力机制计算的权重与对应实例向量的加权求和作为特征向量来进行关系抽取.


  • 与现有的神经关系抽取模型相比, 该模型可以充分利用每个实体对的所有实例 (句子) 的信息.

  • 为了解决远程监督的错误标注问题, 该论文提出选择性注意力机制来忽视噪声数据.

  • 实验表明, 选择注意力机制对于两种 CNN 模型的关系抽取是有益的.


关系抽取是一个重要的 NLP 任务, 很多人研究有监督的关系抽取. (Mintz et al., 2009)3 提出远程监督, 通过对齐知识库和纯文本自动生成训练数据.

  1. (Riedel et al., 2010) models distant supervision for relation extraction as a multi-instance single-label problem.

  2. (Hoffmann et al., 2011; Surdeanu et al., 2012) adopt multi-instance multi-label learning in relation extraction.

Multi-instance learning was originally proposed to address the issue of ambiguously-labelled training data when predicting the activity of drugs (Dietterich et al., 1997)

(Bunescu and Mooney, 2007) connects weak supervision with multi-instance learning and extends it to relation extraction.

所有基于特征的方法严重依赖 NLP 工具生成的特征的质量, 这将受到错误传播问题 (error propagation problem) 的困扰.

deep learning (Bengio, 2009) has been widely used for various areas, including computer vision, speech recognition and so on.

NLP tasks (successfully applied):

  1. part-of-speech tagging (Collobert et al., 2011)

  2. sentiment analysis (dos Santos and Gatti, 2014)

  3. parsing (Socher et al., 2013)

  4. machine translation (Sutskever et al., 2014)

  1. (Socher et al., 2012) uses a recursive neural network in relation extraction.They parse the sentences first and then represent each node in the parsing tree as a vector.

  2. (Zeng et al., 20146; dos Santos et al., 2015) adopt an end-to-end convolutional neural network for relation extraction.

  3. (Xie et al., 2016) attempts to incorporate the text information of entities for relation extraction.

虽然深度学习的方法取得了极大的成功, 这些模型仍然在句子级别上抽取关系, 并且缺乏足够的训练数据. 此外, 传统方法的多实例学习策略不容易应用于神经网络模型.

(Zeng et al., 2015)4 combines at-least-one multi-instance learning with neural network model to extract relations on distant supervision data. However, they assume that only one sentence is active for each entity pair. Hence, it will lose a large amount of rich information containing in those neglected sentences.

因此, 本论文提出了对多个实例 (句子) 的语句级别选择性注意力机制, 它能充分利用每个实体对的所有实例 (句子) 的信息.

  1. The attention-based models have attracted a lot of interests of researchers recently.

  2. The selectivity of attention-based models allows them to learn alignments between different modalities.

It has been applied to various areas:

  1. image classification (Mnih et al., 2014)

  2. speech recognition (Chorowski et al., 2014)

  3. image caption generation (Xu et al., 2015)

  4. machine translation (Bahdanau et al., 2014).

To the best of our knowledge, this is the first effort to adopt attention-based model in distant supervised relation extraction.


给定一个句子集合 ${x_1,x_2,\cdot\cdot\cdot,x_n}$两个相对应的实体 (头实体和尾实体), 本论文的模型预测每个关系 $r$ 成立的概率.


  • 语句编码器 (Sentence Encoder). 给定一个句子 $x$ 和两个目标实体 (头实体和尾实体), 使用一个卷积神经网络 (CNN) 来提取句子的向量表示 $x$. (原始句子和句子的向量都用 $x$ 表示)

  • 选择性注意力机制 (Selective Attention over Instances). 当获取到所有实例 (句子) 的向量表示后, 本论文的模型使用语句级别的选择性注意力机制来选择那些能够真正表达对应关系的语句, 并赋予其更高的权重.


如下图所示, CNN 将句子 $x$ 转换为它的向量表示 $x$. 首先, 句子中的单词被转换成稠密实值特征向量 (词嵌入, 实值: C/C++ 中的 float 类型, 32 位). 然后, 卷积层, Max 池化层非线性激活函数 被用来提取句子的向量表示 $x$.


CNN 的输入是句子 $x$ 的原始单词. 首先将单词转换成低维向量. 本论文的模型通过词嵌入矩阵输入的每一个单词转换成一个向量. 此外, 为了指定每个实体对的位置, 为句子中的每个单词使用了位置嵌入.

词嵌入 (Word Embeddings). 词嵌入旨在将离散字符形式的单词转换为连续向量空间中分布式表示, 从而捕捉到单词句法和所对应的语义信息. 给定一个包含 $m$ 个单词的句子 $x = {w_1,w_2,\cdot\cdot\cdot,w_m}$, 每一个单词 $w_i$ 都用一个实值向量表示. 单词的表示用一个词嵌入矩阵 $V \in \mathbb{R}^{d^a\times\mid V\mid}$列向量来编码, 其中 $V$ 是一个固定大小的词汇表 (单词的总数固定).

位置嵌入 (Position Embeddings). 在关系抽取的任务中, 靠近目标实体的单词通常具有决定目标实体间关系的信息. 类似于 (Zeng et al., 2014)6 的处理方法, 由实体对指定的位置嵌入帮助 CNN 观察每一单词相对头实体尾实体相对距离, 位置嵌入被定义为当前词相对头实体或尾实体的相对距离的组合. 例如, “Bill_Gates is the founder of Microsoft.”, 单词 “founder” 到头实体 “Bill_Gates” 的相对距离是 3, 到尾实体 “Microsoft” 的相对距离是 2.

上图中, 假定词嵌入的维度 $d^a$ 是 3, 位置嵌入的维度 $d^b$ 是 1. 最后, 将所有单词的词嵌入和位置嵌入拼接 (concatenate) 起来, 表示成一个向量序列 $w = {w_1,w_2,\cdot\cdot\cdot,w_m}$, 其中 $w_i \in \mathbb{R}^d$ ($d = d^a + d^b \times 2$).

卷积层, Max 池化层 和 非线性激活函数

在关系抽取中, 主要的挑战是:

  1. 句子的长度是可变的.

  2. 重要信息可能出现在句子的任何位置.

因此, 应该利用所有的局部特征, 并在全局范围上进行关系预测. 可以使用一个卷积层融合所有局部特征.

卷积层首先使用一个在句子上滑动的长度为 $l$ 的窗口提取局部特征 (一维卷积), 上图中, 假定滑动窗口的长度3. 然后, 通过一个 Max 池化层合并所有的局部特征, 进而为每一个输入的句子得到一个固定大小的向量.

卷积被定义为一个向量序列 $w$ 和一个卷积矩阵 $W \in \mathbb{R}^{d^c \times (l \times d)}$ 间的操作, 其中 $d^c$ 是句子嵌入的维度. 向量 $q_i \in \mathbb{R}^{l \times d}$ 是第 $i$ 个窗口中的词嵌入 $w$ 序列拼接.

q_i = w_{i - l + 1 : i}\quad\quad(1 \leq i \leq m + l - 1). \tag{1}

当窗口在边界附近滑动时, 它可能在句子边界之外, 因此, 为句子设置了特殊的填充标记. 意味着将所有超出范围的输入向量 $w_i(i < 1\quad or\quad i > m)$ 视为零向量.

卷积层的第 $i$ 个卷积输出为:

p_i = [Wq + b]_i \tag{2}

其中 $b$ 是偏置向量. 句子向量 $x \in \mathbb{R}^{d^c}$ 的第 $i$ 个元素:

[x]_i = max(p_i), \tag{3}

其中 $[x]_i$ 中的 $i$ 是句子向量 $x \in \mathbb{R}^{d^c}$ 的第 $i$ 个元素, $p_i$ 中的 $i$ 是第 $i$ 窗口.

进一步, PCNN (Zeng et al., 2015)4, 是一个 CNN 的变体, 采用了分段 Max 池化操作来进行关系抽取, 每一个卷积输出 $p_i$ 被头实体和尾实体划分成三个片段 $(p_{i1},p_{i2},p_{i3})$. 最大池化过程分别在三个片段中执行. 定义如下:

[x]{ij} = max(p{ij}), \tag{4}

句子向量 $[x]i$ 是三部分池化结果 $[x]{ij}$ 的拼接 (concatenation).

最后, 是一个非线性激活函数, 如双曲切线函数 (the hyperbolic tangent).


tanh x = \frac{sinh x}{cosh x} = \frac{e^x - e^{-x}}{e^x + e^{-x}}


(tanh x)^{‘} = sech^2x = \frac{1}{cosh^2x} = 1 - tanh^2x



假设有一个包含 $n$ 个句子的集合 $S = {x_1,x_2,\cdot\cdot\cdot,x_n}$, 每一个句子都包含实体对 $(head,tail)$.

为了利用所有句子的信息, 本论文的模型在预测关系 $r$ 时, 用实值向量 $s$ 表示集合 $S$. 很容易想到, 集合 $S$ 的表示取决于所有句子的表示 $x_1,x_2,\cdot\cdot\cdot,x_n$. 每个句子表示 $x_i$ 包含对于输入句子 $x_i$ 其中实体对 $(head,tail)$ 是否包含关系 $r$ 的信息.

集合向量 $s$ 被计算为这些句子向量 $x_i$ 的加权和:

s = \sum_i a_ix_i, \tag{5}

其中 $a_i$ 是每一个句子向量 $x_i$ 的权重.

本论文中, $a_i$ 有两种方式的定义:

Average: 假定所有的句子对于 $s$ 有相同的贡献, 所以集合 $S$ 的嵌入向量 $s$ 是所有句子向量的平均值:

s = \sum_i \frac{1}{n}x_i, \tag{6}


Selective Attention: 远程监督不可避免的带来错误标注的问题, 因此, 如果简单将每个句子视为等价的, 错误标注的句子将在训练和测试过程中带来大量的噪声. 因此, 本论文的模型使用选择性注意力机制降噪 (de-emphasize the noisy sentence). $a_i$ 进一步被定义为:

a_i = \frac{exp(e_i)}{\sum_kexp(e_k)} \tag{7}

其中, $e_i$ 被称为基于查询 (query-based) 的函数, 它对输入句子 $x_i$ 和预测关系 $r$ 的匹配程度进行评分. 本论文的模型选择在不同替代方案中实现最佳性能的双线性形式 (the bilinear form):

e_i = x_iAr, \tag{8}

其中, $A$ 是一个加权对角矩阵 (a weighted diagonal matrix), $r$ 是与关系 $r$ 相关联的查询向量, 它指示了关系 $r$ 的表示.

最终, 通过一个 softmax 层定义了条件概率 $p(r\mid S, θ)$:

p(r\mid S, θ) = \frac{exp(o_r)}{\sum_{k=1}^{n_r} exp(o_k)}, \tag{9}

其中, $n_r$ 是关系的总数, $o$ 是神经网络的最终输出, 它表示对所有关系类型预测评分, 被定义为:

o = Ms + d. \tag{10}

其中 $d \in \mathbb{R}^{n_r}$ 是一个偏置向量, $M$ 是所有关系类型的表示矩阵 (即所有关系类型对应的特征向量所构成的矩阵).

(Zeng et al., 2015)4follows the assumption that at least one mention of the entity pair will reflect their relation, and only uses the sentence with the highest probability in each set for training. Hence, the method which they adopted for multi-instance learning can be regarded as a special case as our selective attention when the weight of the sentence with the highest probability is set to 1 and others to 0.


目标函数. 交叉熵误差 (cross entropy error), 定义如下:

J(θ) = \sum_{i=1}^{s} log p(r_i \mid S_i, θ), \tag{11}

其中, $s$ 是句子的个数, $θ$ 是模型的全部参数, $r_i$ 中的 $i$ 是第 $i$ 个关系, $S_i$ 中的 $i$ 是第 $i$ 个句子. 优化方法是随机梯度下降 (stochastic gradient descent, SGD). 从训练集中随机选择一个小批次 (mini-batch) 迭代训练直到模型收敛.

在最终的输出层使用 dropout (Srivastava et al., 2014)7 预防过拟合. dropout 被定义为与一个向量 $h$ 的对应元素的乘法 (element-wise multiplication), 该向量的元素是概率为 $p$ 的伯努利随机变量 (Bernoulli random variables), 因此公式 $(10)$ 被重写为:

o = M(s \circ h) + d. \tag{12}

在测试阶段, 学习到的集合表示被 $p$ 缩放, 即 $\hat{s_i} = ps_i$. 缩放过的集合向量 $\hat{o_i}$ 最终被用于预测关系.


Our experiments are intended to demonstrate that our neural models with sentence-level selective attention can alleviate the wrong labelling problem and take full advantage of informative sentences for distant supervised relation extraction.


在关系抽取任务中, (Riedel et al., 2010)8 开发的数据集被全世界研究者广泛应用. 该数据集是通过将 Freebase 知识图谱中的世界知识与 <<纽约时报>> 语料库 (NYT) 中的语料进行对齐而生成的 (This dataset was generated by aligning Freebase relations with the New York Times corpus (NYT)). 实体是使用斯坦福大学命名实体标记器找到的, 并进一步与 Freebase 实体名称相匹配 (Entity mentions are found using the Stanford named entity tagger (Finkel et al., 2005), and are further matched to the names of Freebase entities). 数据集包含两部分: 训练集测试集. 对齐了 2005-2006 年语料库中的句子, 并将它们视为训练实例. 测试实例2007 年的对齐句子. 整个数据集合包含 53 种关系类型, 包含一种特殊类型关系 —— NA, 其表示头尾实体之间没有明确定义关系.

number of sentences number of entity pairs number of relational facts (not NA) number of sentences / number of entity pairs
training set 522,611 281,270 18,252 1.86
testing set 172,448 96,678 1,950 1.78

通过比较模型在测试集中挖掘出的世界知识与 Freebase 中的世界知识的重合度来评估关系抽取效果.

we evaluate our model in the held-out evaluation. It evaluates our model by comparing the relation facts discovered from the test articles with those in Freebase.

It assumes that the testing systems have similar performances in relation facts inside and outside Freebase.

Hence, the held-out evaluation provides an approximate measure of precision without time consumed human evaluation.

具体的模型性能则通过精度——召回率曲线 (the aggregate curves precision/recall curves)最高置信度预测精度 (Precision@N, P@N) 来体现.



使用 word2vec9 工具在 NYT 语料库训练词嵌入. 将语料库出现超过 100 次的单词保留为词汇. 当一个实体有多个单词时, 连接 (concatenate) 它的单词.


训练集上使用三折交叉验证 (three-fold validation) 调整模型, 使用网格搜索 (grid search) 确定最优参数.

对于训练, 将所有训练数据迭代次数设置为 25.


卷积窗口大小 $l$ 3
句子表示维度 $d^c$ 230
词向量维度 $d^a$ 50
位置向量维度 $d^b$ 5
训练批次大小 $B$ 160
学习率 $\lambda$ 0.01
Dropout probability $p$ 0.5


为了证明语句级别选择性注意力机制的有效性, 通过保留评估 ( held-out evaluation) 比较不同的方法. 选择 Zeng 等人46提出的卷积神经网路模型 CNN 及其变种模型 PCNN 作为句子编码器 (implement them by ourselves which achieve comparable results as the authors reported). 作者将两种不同类型卷积神经网络分别与句子级别注意力机制 ATTATT 的基线版本 AVE (在该版本中, 每个实例集合的向量表示为集合内部实例的平均向量) 及 Zeng 等人4提出的多实例学习方法 ONE 进行了结合, 并比较了它们的表现.


  1. the CNN model proposed in (Zeng et al., 2014)6

  2. the PCNN model proposed in (Zeng et al., 2015)4

比较了两种 CNN, 它们带有句子级别注意力机制的版本 (ATT), 它们的朴素版本 (AVE), 它们的多实例学习方法4 (the at-least-one multi-instance learning, ONE) 的表现.

Precion/recall curves of CNN, CNN+ONE, CNN+AVE, CNN+ATT

Precion/recall curves of PCNN, PCNN+ONE, PCNN+AVE, PCNN+ATT

从上图, 作者得到了如下观察结果:

  1. 对于 CNNPCNN, ONE 方法与 CNN/PCNN 相比具有更好的性能. 原因在于原始的基于远程监督得到的训练数据包含大量的噪声数据, 而噪声数据会损害关系抽取的性能. ONE 方法引入多实例学习, 这在一定程度上减缓了该问题.

  2. 对于 CNNPCNN, 与 CNN/PCNN 相比, AVE 方法关系抽取模型效果提升是有作用的. 这表明考虑更多的实例有利于关系抽取, 因为噪声信息可以通过信息的互补减少负面影响, 更多的实例也带来了更多的信息.

  3. 对于 CNNPCNN, AVE 方法ONE 方法相比具有相似的性能. 这说明, 尽管 AVE 方法引入了更多的实例信息, 但由于它将每个句子赋予同等的权重, 它也会从错误标注的语句中得到负面的噪声信息, 从而损害关系抽取的性能. 所以 AVE 方法与 ONE 方法难以分出优劣.

  4. 对于 CNNPCNN, 与包括 AVE 方法在内的其他方法相比, ATT 方法整个召回范围内实现了最高的精度. 它表明, 所提出的选择性注意力机制是有益的. 它可以有效地滤除无意义的句子, 解决基于远程监督的关系抽取中的错误标注问题, 并尽可能地充分利用每一个实例的信息进行关系抽取.


原始测试数据集中, 有 74,857 个实体对仅对应于一个句子, 几乎占所有实体对的 3/4. 由于选择性注意力机制的优势在于处理包含多个实例的实体对, 所以实验比较了 CNN/PCNN+ONECNN/PCNN+AVE、以及采用了注意力机制CNN/PCNN+ATT具有不同实例数量的实体对集合上的表现. 具体有以下 3 个实验场景.

  • One: 对于每个测试实体对, 随机选择其对应的实例集合中的一个实例, 并将这个实例用作关系预测.

  • Two: 对于每个测试实体对, 随机选择其对应的实例集合中的两个实例, 并将这两个实例用作关系预测.

  • All: 对于每个测试实体对, 使用其对应的实例集合中的所有实例进行关系预测.

值得注意的是, 在训练过程中, 使用了所有实例. 实验汇报了所有预测中评分最高的 N 项预测预测精度 P@N, 具体有 P@100P@200P@300 及它们的平均值. 各个模型在实体对拥有不同实例数目情况下P@N 的效果对比如下表所示.

从上表中, 可以观察到:

  1. 对于 CNNPCNN, ATT 方法所有测试设置均达到最佳性能. 它表明了句子级选择性注意力机制对于多实例学习有效性.

  2. 对于 CNNPCNN, AVE 方法One 测试设置下, 效果与 ATT 方法相当. 然而, 当每个实体对测试实例数量增加时, AVE 方法的性能几乎没有改善. 随着实例的增加, 它甚至在 P@100P@200逐渐下降. 原因在于, 由于 AVE 方法对每个实例同等看待, 实例包含的不表达任何关系的噪声数据对于关系抽取的表现会产生负面影响.

  3. One 测试设置下, CNN+AVECNN+ATTCNN+ONE 相比有 5 ~ 8 个百分点的改进. 每个实体对在这个测试设置中只有一个实例, 这些方法的唯一区别来自训练方式的不同. 因此, 实验结果表明利用所有的实例会带来更多的信息, 尽管这也可能带来一些额外的噪声. 这些附带的信息在训练过程中提升了模型效果.

  4. 对于 CNNPCNN, ATT 方法TwoAll 测试设置中优于其他两个基线 (over 5% and 9%). 这表明, 通过考虑更多有用的信息, CNN+ATT 排名较高的关系事实更可靠, 更有利于关系提取.


为了验证所提出的方法, 作者选择了以下 3 种基于人工特征的方法来进行性能比较.

  • Mintz (Mintz et al., 2009) 是一个传统的基于远程监督的模型.

  • MultiR (Hoffmann et al., 2011) 提出了一个概率图模型用于多实例学习, 它的特点在于可以处理关系类型之间的重合.

  • MIML (Surdeanu et al., 2012) 同时考虑了多实例多关系类型两种情况 (即每个实体对可能有多个句子, 也可能有多个关系类型).

We implement them with the source codes released by the authors.


从上图中, 可以观察到:

  1. 整个召回率范围内, CNN/PCNN+ATT 显著优于所有基于人工特征的方法. 当召回率 > 0.1 时, 基于特征的方法的性能迅速下降. 相比之下, 在召回率达到约 0.3 之前, 该论文的模型都具有合理的准确率. 这表明人工设计的特征不能简洁地表达实例的语义含义, 而自然语言处理工具带来的错误则会损害关系抽取的性能. 相比之下, 可以自主学习每个实例向量表示CNN/PCNN+ATT 模型可以很好地表达每个实例的语义信息.

  2. 整个召回率范围内, PCNN+ATTCNN+ATT 相比表现要好得多. 这意味着选择性注意力机制可以很好地考虑所有实例的全局信息, 但无法使模型对于单个实例的理解和表示变好. 因此, 如果有更好的句子编码器, 那么模型的性能可以进一步提高.


下表显示了测试数据选择性注意力机制两个示例. 对于每个关系, 展示了其对应的拥有高注意力权值的句子拥有低注意力权值的句子, 并且对每个实体对都进行了加粗显示.

From the table we find that: The former example is related to the relation employer of. The sentence with low attention weight does not express the relation between two entities, while the high one shows that Mel Karmazin is the chief executive of Sirius Satellite Radio. The later example is related to the relation place of birth. The sentence with low attention weight expresses where Ernst Haefliger is died in, while the high one expresses where he is born in.

Conclusion and Future Works

In this paper, we develop CNN with sentence-level selective attention. Our model can make full use of all informative sentences and alleviate the wrong labelling problem for distant supervised relation extraction. In experiments, we evaluate our model on relation extraction task. The experimental results show that our model significantly and consistently outperforms state-of-the-art feature-based methods and neural network methods.

In the future, we will explore the following directions:

  • Our model incorporates multi-instance learning with neural network via instance-level selective attention. It can be used in not only distant supervised relation extraction but also other multi-instance learning tasks. We will explore our model in other area such as text categorization.

  • CNN is one of the effective neural networks for neural relation extraction. Researchers also propose many other neural network models for relation extraction. In the future, we will incorporate our instance-level selective attention technique with those models for relation extraction.



代码仓库地址: https://github.com/LuYF-Lemon-love/susu-knowledge-graph/tree/main/neural-relation-extraction/C%2B%2B .

$ tree
│   ├── clean.sh
│   ├── init.h
│   ├── output
│   │   ├── attention_weights.txt
│   │   ├── conv_1d.txt
│   │   ├── position_vec.txt
│   │   ├── pr.txt
│   │   ├── relation_matrix.txt
│   │   └── word2vec.txt
│   ├── run.sh
│   ├── test.cpp
│   ├── test.h
│   └── train.cpp
├── data
│   ├── relation.txt
│   ├── test.txt
│   ├── train.txt
│   └── vec.bin
├── data.zip
├── papers
│   └── Neural Relation Extraction with Selective Attention over Instances.pdf
└── README.md

4 directories, 19 files



链接:https://pan.baidu.com/s/1SIswYS8vvuDAPiJd2L0d5A 提取码:g90p .

The original data of NYT10 can be downloaded from:

Relation Extraction: NYT10 is originally released by the paper “Sebastian Riedel, Limin Yao, and Andrew McCallum. Modeling relations and their mentions without labeled text.” [Download]

Pre-Trained Word Vectors are learned from New York Times Annotated Corpus (LDC Data LDC2008T19), which should be obtained from LDC (https://catalog.ldc.upenn.edu/LDC2008T19).

The train set is generated by merging all training data of manual and held-out datasets, deleted those data that have overlap with the test set, and used the remain one as our training data.

To run the code, the dataset should be put in the folder data/ using the following format, containing four files

  • train.txt: training file, format (fb_mid_e1, fb_mid_e2, e1_name, e2_name, relation, sentence).

  • test.txt: test file, same format as train.txt.

  • relation.txt: all relations, one per line.

  • vec.bin: the pre-train word embedding file.

$ tree
├── relation.txt
├── test.txt
├── train.txt
└── vec.bin

0 directories, 4 files
$ head relation.txt 
$ head test.txt 
m.01l443l	m.04t_bj	dave_holland	barry_altschul	NA	the occasion was suitably exceptional : a reunion of the 1970s-era sam rivers trio , with dave_holland on bass and barry_altschul on drums .	###END###
m.01l443l	m.04t_bj	dave_holland	barry_altschul	NA	tonight he brings his energies and expertise to the miller theater for the festival 's thrilling finale : a reunion of the 1970s sam rivers trio , with dave_holland on bass and barry_altschul on drums .	###END###
m.04t_bj	m.01l443l	barry_altschul	dave_holland	NA	the occasion was suitably exceptional : a reunion of the 1970s-era sam rivers trio , with dave_holland on bass and barry_altschul on drums .	###END###
m.04t_bj	m.01l443l	barry_altschul	dave_holland	NA	tonight he brings his energies and expertise to the miller theater for the festival 's thrilling finale : a reunion of the 1970s sam rivers trio , with dave_holland on bass and barry_altschul on drums .	###END###
m.0frkwp	m.04mh_g	ruth	little_neck	NA	shapiro -- ruth of little_neck , ny .	###END###
m.04mh_g	m.0frkwp	little_neck	ruth	NA	shapiro -- ruth of little_neck , ny .	###END###
m.02bv2x	m.01w7tkh	henry	nicole	NA	cherished grandmother of henry , stephanie , harrison and jill shapiro and nicole and eric beinhorn .	###END###
m.01w7tkh	m.02bv2x	nicole	henry	NA	cherished grandmother of henry , stephanie , harrison and jill shapiro and nicole and eric beinhorn .	###END###
m.0124lx	m.07hjs9	lewis	john_gross	NA	beloved wife of the late dr. frederick e. lane , and mother of joseph , ila lane gross , lewis , and edward ; mother-in-law of bobbi , john_gross , nancy , and judy .	###END###
m.0124lx	m.07hjs9	lewis	john_gross	NA	beloved wife of the late dr. frederick e. lane , and mother of joseph , ila lane gross , lewis , and edward ; mother-in-law of bobbi , john_gross , nancy , and judy .	###END###
$ head train.txt 
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	sen. charles e. schumer called on federal safety officials yesterday to reopen their investigation into the fatal crash of a passenger jet in belle_harbor , queens , because equipment failure , not pilot error , might have been the cause .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	but instead there was a funeral , at st. francis de sales roman catholic church , in belle_harbor , queens , the parish of his birth .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	rosemary antonelle , the daughter of teresa l. antonelle and patrick antonelle of belle_harbor , queens , was married yesterday afternoon to lt. thomas joseph quast , a son of peggy b. quast and vice adm. philip m. quast of carmel , calif. .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	one was for st. francis de sales roman catholic church in belle_harbor ; another board studded with electromechanical magnets will go under the pipes of an organ at the evangelical lutheran church of christ in rosedale , queens .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	the firefighter , whom a fire department official identified as joseph moore , of belle_harbor , queens , was taken to newyork-presbyterian\/weill cornell hospital , where he was in critical but stable condition last night , the police said .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	in st. francis de sales roman catholic church in belle_harbor , queens , the second verse of the opening hymn , '' be not afraid , '' seemed to connect katrina and sept. 11 : '' if you pass through raging waters in the sea , you shall not drown .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	on nov. 12 , while walking his dog near his home in belle_harbor , queens , he saw a passenger plane plunge to the ground .	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	colm j. neilson , of belle_harbor , queens , said he thought the conductors ' role was overrated . ''	###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	she is a daughter of marion i. rabbin and dr. murvin rabbin of belle_harbor , queens .###END###
m.0ccvx	m.05gf08	queens	belle_harbor	/location/location/contains	he is a son of vera and william lichtenberg of belle_harbor , queens .	###END###


  • init.h: 该 C++ 文件用于初始化, 即读取训练数据和测试数据.

  • test.h: 该 C++ 文件用于模型测试.

  • train.cpp: 该 C++ 文件用于模型训练.

  • test.cpp: 该 C++ 文件用于模型测试.

  • run.sh: 该 Shell 脚本用于模型训练和模型测试.

  • clean.sh: 该 Shell 脚本用于清理临时文件.

$ tree
├── clean.sh
├── init.h
├── output
│   ├── attention_weights.txt
│   ├── conv_1d.txt
│   ├── position_vec.txt
│   ├── pr.txt
│   ├── relation_matrix.txt
│   └── word2vec.txt
├── run.sh
├── test.cpp
├── test.h
└── train.cpp

1 directory, 12 files


// init.h
// created by LuYF-Lemon-love <luyanfeng_nlp@qq.com>
// 该 C++ 文件用于初始化, 即读取训练数据和测试数据
// prerequisites:
//     ../data/vec.bin
//     ../data/relation.txt
//     ../data/train.txt
//     ../data/test.txt

// ##################################################
// 包含标准库
// ##################################################

#ifndef INIT_H
#define INIT_H

#include <cstdio>          // FILE, fscanf, fopen, fclose, fgetc, feof, fread
#include <cstdlib>         // malloc, calloc, free, rand, RAND_MAX
#include <cmath>           // exp, fabs
#include <cstring>         // memcpy
#include <cfloat>          // FLT_MAX
#include <cassert>         // assert
#include <pthread.h>       // pthread_create, pthread_join, pthread_mutex_t
#include <sys/time.h>      // timeval, gettimeofday
#include <vector>          // std::vector, std::vector::resize, std::vector::operator[], std::vector::push_back, std::vector::size
#include <map>             // std::map, std::map::operator[], std::map::clear, std::map::size
#include <string>          // std::string, std::string::c_str
#include <algorithm>       // std::sort, std::min
#include <utility>         // std::make_pair

// ##################################################
// 声明和定义超参数变量
// ##################################################

#define INT int
#define REAL float

// batch: batch size
// num_threads: number of threads
// alpha: learning rate
// current_rate: init rate of learning rate
// reduce_epoch: reduce of init rate of learning rate per epoch
// epochs: epochs
// limit: 限制句子中 (头, 尾) 实体相对每个单词的最大距离
// dimension_pos: position dimension
// window: window size
// dimension_c: sentence embedding size
// dropout_probability: dropout probability
// output_model: 是否保存模型, 1: 保存模型, 0: 不保存模型
// note: 保存模型时, 文件名的额外的信息, ("./output/word2vec" + note + ".txt")
// data_path: folder of data
// output_path: folder of outputing results (precion/recall curves) and models
INT batch = 40;
INT num_threads = 32;
REAL alpha = 0.00125;
REAL current_rate = 1.0;
REAL reduce_epoch = 0.98;
INT epochs = 25;
INT limit = 30;
INT dimension_pos = 5;
INT window = 3;
INT dimension_c = 230;
REAL dropout_probability = 0.5;
INT output_model = 0;
std::string note = "";
std::string data_path = "../data/";
std::string output_path = "./output/";

// ##################################################
// 声明和定义保存训练数据和测试数据的变量
// ##################################################

// word_total: 词汇总数, 包括 "UNK"
// dimension: 词嵌入维度
// word_vec (word_total * dimension): 词嵌入矩阵
// word2id (word_total): word2id[name] -> name 对应的词汇 id
INT word_total, dimension;
REAL *word_vec;
std::map<std::string, INT> word2id;

// relation_total: 关系总数
// id2relation (relation_total): id2relation[id] -> id 对应的关系名
// relation2id (relation_total): relation2id[name] -> name 对应的关系 id
INT relation_total;
std::vector<std::string> id2relation;
std::map<std::string, INT> relation2id;

// position_min_head: 保存数据集 (训练集, 测试集) 句子中头实体相对每个单词的最小距离, 理论上取值范围为 -limit
// position_max_head: 保存数据集 (训练集, 测试集) 句子中头实体相对每个单词的最大距离, 理论上取值范围为 limit
// position_min_tail: 保存数据集 (训练集, 测试集) 句子中尾实体相对每个单词的最小距离, 理论上取值范围为 -limit
// position_max_tail: 保存数据集 (训练集, 测试集) 句子中尾实体相对每个单词的最大距离, 理论上取值范围为 limit
// position_total_head = position_max_head - position_min_head + 1
// position_total_tail = position_max_tail - position_min_tail + 1
INT position_min_head, position_max_head, position_min_tail, position_max_tail;
INT position_total_head, position_total_tail;

// bags_train: key -> (头实体 + "\t" + 尾实体 + "\t" + 关系名), value -> 句子索引 (训练文件中该句子的位置)
// train_relation_list: 保存训练集每个句子的关系 id, 按照训练文件句子的读取顺序排列
// train_length: 保存训练集每个句子的单词个数, 按照训练文件句子的读取顺序排列
// train_sentence_list: 保存训练集中的句子, 按照训练文件句子的读取顺序排列
// train_position_head: 保存训练集每个句子的头实体相对每个单词的距离, 理论上取值范围为 [0, 2 * limit], 其中头实体对应单词的取值为 limit
// train_position_tail: 保存训练集每个句子的尾实体相对每个单词的距离, 理论上取值范围为 [0, 2 * limit], 其中尾实体对应单词的取值为 limit
std::map<std::string, std::vector<INT> > bags_train;
std::vector<INT> train_relation_list, train_length;
std::vector<INT *> train_sentence_list, train_position_head, train_position_tail;

// bags_test: key -> (头实体 + "\t" + 尾实体), value -> 句子索引 (测试文件中该句子的位置)
// test_relation_list: 保存测试集每个句子的关系 id, 按照测试文件句子的读取顺序排列
// test_length: 保存测试集每个句子的单词个数, 按照测试文件句子的读取顺序排列
// test_sentence_list: 保存测试集中的句子, 按照测试文件句子的读取顺序排列
// test_position_head: 保存测试集每个句子的头实体相对每个单词的距离, 理论上取值范围为 [0, 2 * limit], 其中头实体对应单词的取值为 limit
// test_position_tail: 保存测试集每个句子的尾实体相对每个单词的距离, 理论上取值范围为 [0, 2 * limit], 其中尾实体对应单词的取值为 limit
std::map<std::string, std::vector<INT> > bags_test;
std::vector<INT> test_relation_list, test_length;
std::vector<INT *> test_sentence_list, test_position_head, test_position_tail;

// ##################################################
// 声明和定义模型的权重矩阵
// ##################################################

// position_vec_head (position_total_head * dimension_pos): 头实体的位置嵌入矩阵
// position_vec_tail (position_total_tail * dimension_pos): 尾实体的位置嵌入矩阵
REAL *position_vec_head, *position_vec_tail;

// conv_1d_word (dimension_c * window * dimension): 一维卷积的权重矩阵 (词嵌入)
// conv_1d_position_head (dimension_c * window * dimension_pos): 一维卷积的权重矩阵 (头实体的位置嵌入)
// conv_1d_position_tail (dimension_c * window * dimension_pos): 一维卷积的权重矩阵 (尾实体的位置嵌入)
// conv_1d_bias (dimension_c): 一维卷积的偏置向量
REAL *conv_1d_word, *conv_1d_position_head, *conv_1d_position_tail, *conv_1d_bias;

// attention_weights (relation_total * dimension_c * dimension_c): 注意力权重矩阵
std::vector<std::vector<std::vector<REAL> > > attention_weights;

// relation_matrix (relation_total * dimension_c): the representation matrix of relation
// relation_matrix_bias (relation_total): the bias vector of the representation matrix of relation
REAL *relation_matrix, *relation_matrix_bias;

// ##################################################
// 声明和定义模型的权重矩阵的副本, 用于每一训练批次计算损失值
// ##################################################

// word_vec_copy (word_total * dimension): 词嵌入矩阵副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
// position_vec_head_copy (position_total_head * dimension_pos): 头实体的位置嵌入矩阵副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
// position_vec_tail_copy (position_total_tail * dimension_pos): 尾实体的位置嵌入矩阵副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
REAL *word_vec_copy, *position_vec_head_copy, *position_vec_tail_copy;

// conv_1d_word_copy (dimension_c * window * dimension): 一维卷积的权重矩阵 (词嵌入) 副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
// conv_1d_position_head_copy (dimension_c * window * dimension_pos): 一维卷积的权重矩阵 (头实体的位置嵌入) 副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
// conv_1d_position_tail_copy (dimension_c * window * dimension_pos): 一维卷积的权重矩阵 (尾实体的位置嵌入) 副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
// conv_1d_bias_copy (dimension_c): 一维卷积的偏置向量副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
REAL *conv_1d_word_copy, *conv_1d_position_head_copy, *conv_1d_position_tail_copy, *conv_1d_bias_copy;

// attention_weights_copy (relation_total * dimension_c * dimension_c): 注意力权重矩阵副本, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
std::vector<std::vector<std::vector<REAL> > > attention_weights_copy;

// relation_matrix_copy (relation_total * dimension_c): the copy of the representation matrix of relation, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
// relation_matrix_bias_copy (relation_total): the copy of the bias vector of the representation matrix of relation, 由于使用多线程训练模型, 该副本用于每一训练批次计算损失值
REAL *relation_matrix_copy, *relation_matrix_bias_copy;

// 初始化函数, 即读取训练数据和测试数据
void init() {
	printf("\n##################################################\n\nInit start...\n\n");

	INT tmp;

	// 读取预训练词嵌入
	FILE *f = fopen((data_path + "vec.bin").c_str(), "rb");
	tmp = fscanf(f, "%d", &word_total);
	tmp = fscanf(f, "%d", &dimension);
	word_vec = (REAL *)malloc((word_total + 1) * dimension * sizeof(REAL));
	word2id["UNK"] = 0;
	for (INT i = 1; i <= word_total; i++) {
		std::string name = "";
		while (1) {
			char ch = fgetc(f);
			if (feof(f) || ch == ' ') break;
			if (ch != '\n') name = name + ch;
		word2id[name] = i;

		long long last = i * dimension;
		REAL sum = 0;
		for (INT a = 0; a < dimension; a++) {
			tmp = fread(&word_vec[last + a], sizeof(REAL), 1, f);
			sum += word_vec[last + a] * word_vec[last + a];
		sum = sqrt(sum);
		for (INT a = 0; a < dimension; a++)
			word_vec[last + a] = word_vec[last + a] / sum;
	word_total += 1;

	// 读取 relation.txt 文件
	char buffer[1000];
	f = fopen((data_path + "relation.txt").c_str(), "r");
	while (fscanf(f, "%s", buffer) == 1) {
		relation2id[(std::string)(buffer)] = relation_total++;
	// 读取训练文件 (train.txt)
	position_min_head = 0;
	position_max_head = 0;
	position_min_tail = 0;
	position_max_tail = 0;
	f = fopen((data_path + "train.txt").c_str(), "r");
	while (fscanf(f, "%s", buffer) == 1)  {
		std::string e1 = buffer;
		tmp = fscanf(f, "%s", buffer);
		std::string e2 = buffer;

		tmp = fscanf(f, "%s", buffer);
		std::string head_s = (std::string)(buffer);
		tmp = fscanf(f, "%s", buffer);
		std::string tail_s = (std::string)(buffer);
		tmp = fscanf(f, "%s", buffer);
		bags_train[e1 + "\t" + e2 + "\t" + (std::string)(buffer)].push_back(train_relation_list.size());
		INT relation_id = relation2id[(std::string)(buffer)];

		INT len_s = 0, head_pos = 0, tail_pos = 0;
		std::vector<INT> sentence;
		while (fscanf(f, "%s", buffer) == 1) {
			std::string word = buffer;
			if (word == "###END###") break;
			INT word_id = word2id[word];
			if (word == head_s) head_pos = len_s;
			if (word == tail_s) tail_pos = len_s;

		INT *sentence_ptr = (INT *)calloc(len_s, sizeof(INT));
		INT *sentence_head_pos = (INT *)calloc(len_s, sizeof(INT));
		INT *sentence_tail_pos = (INT *)calloc(len_s, sizeof(INT));
		for (INT i = 0; i < len_s; i++) {
			sentence_ptr[i] = sentence[i];
			sentence_head_pos[i] = head_pos - i;
			sentence_tail_pos[i] = tail_pos - i;
			if (sentence_head_pos[i] >= limit) sentence_head_pos[i] = limit;
			if (sentence_tail_pos[i] >= limit) sentence_tail_pos[i] = limit;
			if (sentence_head_pos[i] <= -limit) sentence_head_pos[i] = -limit;
			if (sentence_tail_pos[i] <= -limit) sentence_tail_pos[i] = -limit;
			if (sentence_head_pos[i] > position_max_head) position_max_head = sentence_head_pos[i];
			if (sentence_tail_pos[i] > position_max_tail) position_max_tail = sentence_tail_pos[i];
			if (sentence_head_pos[i] < position_min_head) position_min_head = sentence_head_pos[i];
			if (sentence_tail_pos[i] < position_min_tail) position_min_tail = sentence_tail_pos[i];


	// 读取测试文件 (test.txt)
	f = fopen((data_path + "test.txt").c_str(), "r");
	while (fscanf(f, "%s", buffer)==1)  {
		std::string e1 = buffer;
		tmp = fscanf(f, "%s", buffer);
		std::string e2 = buffer;

		tmp = fscanf(f, "%s", buffer);
		std::string head_s = (std::string)(buffer);
		tmp = fscanf(f, "%s", buffer);
		std::string tail_s = (std::string)(buffer);

		tmp = fscanf(f, "%s", buffer);
		bags_test[e1 + "\t" + e2].push_back(test_relation_list.size());	
		INT relation_id = relation2id[(std::string)(buffer)];

		INT len_s = 0 , head_pos = 0, tail_pos = 0;
		std::vector<INT> sentence;
		while (fscanf(f, "%s", buffer) == 1) {
			std::string word = buffer;
			if (word=="###END###") break;
			INT word_id = word2id[word];
			if (head_s == word) head_pos = len_s;
			if (tail_s == word) tail_pos = len_s;


		INT *sentence_ptr=(INT *)calloc(len_s, sizeof(INT));
		INT *sentence_head_pos=(INT *)calloc(len_s, sizeof(INT));
		INT *sentence_tail_pos=(INT *)calloc(len_s, sizeof(INT));
		for (INT i = 0; i < len_s; i++) {
			sentence_ptr[i] = sentence[i];
			sentence_head_pos[i] = head_pos - i;
			sentence_tail_pos[i] = tail_pos - i;
			if (sentence_head_pos[i] >= limit) sentence_head_pos[i] = limit;
			if (sentence_tail_pos[i] >= limit) sentence_tail_pos[i] = limit;
			if (sentence_head_pos[i] <= -limit) sentence_head_pos[i] = -limit;
			if (sentence_tail_pos[i] <= -limit) sentence_tail_pos[i] = -limit;
			if (sentence_head_pos[i] > position_max_head) position_max_head = sentence_head_pos[i];
			if (sentence_tail_pos[i] > position_max_tail) position_max_tail = sentence_tail_pos[i];
			if (sentence_head_pos[i] < position_min_head) position_min_head = sentence_head_pos[i];
			if (sentence_tail_pos[i] < position_min_tail) position_min_tail = sentence_tail_pos[i];


	// 将 train_position_head, train_position_tail, test_position_head, test_position_tail 的元素值转换到 [0, 2 * limit] 范围内
	for (INT i = 0; i < train_position_head.size(); i++) {
		INT len_s = train_length[i];
		INT *position = train_position_head[i];
		for (INT j = 0; j < len_s; j++)
			position[j] = position[j] - position_min_head;
		position = train_position_tail[i];
		for (INT j = 0; j < len_s; j++)
			position[j] = position[j] - position_min_tail;

	for (INT i = 0; i < test_position_head.size(); i++) {
		INT len_s = test_length[i];
		INT *position = test_position_head[i];
		for (INT j = 0; j < len_s; j++)
			position[j] = position[j] - position_min_head;
		position = test_position_tail[i];
		for (INT j = 0; j < len_s; j++)
			position[j] = position[j] - position_min_tail;

	position_total_head = position_max_head - position_min_head + 1;
	position_total_tail = position_max_tail - position_min_tail + 1;


// 打印一些重要的信息
void print_information() {
	std::string save_model[] = {"不会保存模型.", "将会保存模型."};

	printf("batch: %d\nnumber of threads: %d\nlearning rate: %.8f\n", batch, num_threads, alpha);
	printf("init_rate: %.2f\nreduce_epoch: %.2f\nepochs: %d\n\n", current_rate, reduce_epoch, epochs);
	printf("word_total: %d\nword dimension: %d\n\n", word_total, dimension);
	printf("limit: %d\nposition_total_head: %d\nposition_total_tail: %d\ndimension_pos: %d\n\n",
		limit, position_total_head, position_total_tail, dimension_pos);
	printf("window: %d\ndimension_c: %d\n\n", window, dimension_c);
	printf("relation_total: %d\ndropout_probability: %.2f\n\n", relation_total, dropout_probability);
	printf("%s\nnote: %s\n\n", save_model[output_model].c_str(), note.c_str());
	printf("folder of data: %s\n", data_path.c_str());
	printf("folder of outputing results (precion/recall curves) and models: %s\n\n", output_path.c_str());

	printf("number of training samples: %7d - average sentence number of per training sample: %.2f\n",
		INT(bags_train.size()), float(float(train_sentence_list.size()) / bags_train.size()));
	printf("number of testing samples:  %7d - average sentence number of per testing sample:  %.2f\n\n",
		INT(bags_test.size()), float(float(test_sentence_list.size()) / bags_test.size()));
	printf("Init end.\n\n");

// 寻找特定参数的位置
INT arg_pos(char *str, INT argc, char **argv) {
	INT a;
	for (a = 1; a < argc; a++) if (!strcmp(str, argv[a])) {
		if (a == argc - 1) {
			printf("Argument missing for %s\n", str);
		return a;
	return -1;

// ##################################################
// 数学函数
// ##################################################

// 计算双曲正切函数(tanh)
REAL calc_tanh(REAL value) {
	if (value > 20) return 1.0;
	if (value < -20) return -1.0;
	REAL sinhx = exp(value) - exp(-value);
	REAL coshx = exp(value) + exp(-value);
	return sinhx / coshx;

// 返回取值为 [min, max) 的伪随机整数
INT get_rand_i(INT min, INT max) {
	INT d = max - min;
	INT res = rand() % d;
	if (res < 0)
		res += d;
	return res + min;

// 返回取值为 [min, max) 的伪随机浮点数 
REAL get_rand_u(REAL min, REAL max) {
	return min + (max - min) * rand() / (RAND_MAX + 1.0);



// test.h
// created by LuYF-Lemon-love <luyanfeng_nlp@qq.com>
// 该 C++ 文件用于模型测试
// 输出 precion/recall curves
// output:
//     ./output/pr + note + .txt
// 输出模型 (可选)
// output:
//     ./output/word2vec + note + .txt
//     ./output/position_vec + note + .txt
//     ./output/conv_1d + note + .txt
//     ./output/attention_weights + note + .txt
//     ./output/relation_matrix + note + .txt

// ##################################################
// 包含标准库和头文件
// ##################################################

#ifndef TEST_H
#define TEST_H
#include "init.h"

// ##################################################
// 声明和定义变量
// ##################################################

// predict_relation_vector: 每一个元素的 key -> (头实体 + "\t" + 尾实体 + "\t" + 预测关系名)
// value 的 key -> (0 或 1, 0 表示关系预测错误, 1 表示关系预测正确)
// value 的 value -> 模型给出的该关系成立的概率
// 以模型给出的关系成立的概率降序排列
std::vector<std::pair<std::string, std::pair<INT,double> > > predict_relation_vector;

// num_test_non_NA: 计算测试集中样本数 (其中 relation 非 NA,每个样本包含 n 个句子, 每个句子包含相同的 head, relation (label), tail)
// bags_test_key: 保存 bags_test 的 key (头实体 + "\t" + 尾实体), 按照 bags_test 的迭代顺序
// thread_first_bags_test (num_threads + 1): 保存每个线程第一个样本在 bags_test_key 中的位置
// test_mutex: 互斥锁, 线程同步 predict_relation_vector 变量
INT num_test_non_NA;
std::vector<std::string> bags_test_key;
std::vector<INT> thread_first_bags_test;
pthread_mutex_t test_mutex;

struct timeval test_start, test_end;

// 为 std::sort() 定义比较函数
// 以模型给出的关系成立的概率降序排列, 用于 predict_relation_vector 变量
bool cmp_predict_probability(std::pair<std::string, std::pair<INT,double> > a,
	std::pair<std::string, std::pair<INT,double> >b)
    return a.second.second > b.second.second;

// 计算句子的一维卷积
std::vector<REAL> calc_conv_1d(INT *sentence, INT *test_position_head,
	INT *test_position_tail, INT sentence_length) {
	std::vector<REAL> conv_1d_result_k;
	conv_1d_result_k.resize(dimension_c, 0);
	for (INT i = 0; i < dimension_c; i++) {
		INT last_word = i * window * dimension;
		INT last_pos = i * window * dimension_pos;
		REAL max_pool_1d = -FLT_MAX;
		for (INT last_window = 0; last_window <= sentence_length - window; last_window++) {
			REAL sum = 0;
			INT total_word = 0;
			INT total_pos = 0;
			for (INT j = last_window; j < last_window + window; j++)  {
				INT last_word_vec = sentence[j] * dimension;
			 	for (INT k = 0; k < dimension; k++) {
			 		sum += conv_1d_word[last_word + total_word] * word_vec[last_word_vec + k];
			 	INT last_pos_head = test_position_head[j] * dimension_pos;
			 	INT last_pos_tail = test_position_tail[j] * dimension_pos;
			 	for (INT k = 0; k < dimension_pos; k++) {
			 		sum += conv_1d_position_head[last_pos + total_pos] * position_vec_head[last_pos_head + k];
			 		sum += conv_1d_position_tail[last_pos + total_pos] * position_vec_tail[last_pos_tail + k];

			// 对应于论文中的公式 (3), [x]_i = max(p_i), 其中 x \in R^{d^c}
			if (sum > max_pool_1d) max_pool_1d = sum;
		conv_1d_result_k[i] = max_pool_1d + conv_1d_bias[i];

	for (INT i = 0; i < dimension_c; i++)
		conv_1d_result_k[i] = calc_tanh(conv_1d_result_k[i]);
	return conv_1d_result_k;

// 单个线程内运行的任务
void* test_mode(void *thread_id) 
	INT id;
	id = (unsigned long long)(thread_id);
	INT left = thread_first_bags_test[id];
	INT right;
	if (id == num_threads-1)
		right = bags_test_key.size();
		right = thread_first_bags_test[id + 1];

	// 保存样本的正确标签 (关系)
	std::map<INT,INT> sample_relation_list;

	for (INT i_sample = left; i_sample < right; i_sample++)
		// 一维卷积部分
		std::vector<std::vector<REAL> > conv_1d_result;
		INT bags_size = bags_test[bags_test_key[i_sample]].size();
		for (INT k = 0; k < bags_size; k++)
			INT i = bags_test[bags_test_key[i_sample]][k];
			sample_relation_list[test_relation_list[i]] = 1;

				test_position_head[i], test_position_tail[i], test_length[i]));

		// 对应于论文中的公式 (8), e_i = x_iAr, 其中 r is the query vector associated with relation r which
		// indicates the representation of relation r, 也就是 predict 时, 需要用每一个关系依次查询.
		std::vector<float> result_final;
		result_final.resize(relation_total, 0.0);
		for (INT index_r = 0; index_r < relation_total; index_r++) {
			// 获取每一个句子的权重
			std::vector<REAL> weight;
			REAL weight_sum = 0;
			for (INT k = 0; k < bags_size; k++)
				REAL s = 0;
				for (INT i_r = 0; i_r < dimension_c; i_r++) 
					REAL temp = 0;
					for (INT i_x = 0; i_x < dimension_c; i_x++)
						temp += conv_1d_result[k][i_x] * attention_weights[index_r][i_x][i_r];
					s += temp * relation_matrix[index_r * dimension_c + i_r];
				s = exp(s);
				weight_sum += s;

			for (INT k = 0; k < bags_size; k++)
				weight[k] /= weight_sum;
			// 获取 s, i.e., s indicates the representation of the sentence set
			std::vector<REAL> result_sentence;
			for (INT i = 0; i < dimension_c; i++) 
				for (INT k = 0; k < bags_size; k++)
					result_sentence[i] += conv_1d_result[k][i] * weight[k];

			// 获取关系 (id 为 index_r) 成立的概率
			std::vector<REAL> result_final_r;
			double temp = 0;
			for (INT i_r = 0; i_r < relation_total; i_r++) {
				REAL s = 0;
				for (INT i_s = 0; i_s < dimension_c; i_s++)
					s +=  dropout_probability * result_sentence[i_s] *
						relation_matrix[i_r * dimension_c + i_s];
				s += relation_matrix_bias[i_r];
				s = exp(s);
				temp += s;
			result_final[index_r] = result_final_r[index_r]/temp;

		// 保存该测试样本各个关系 (非 NA) 成立的概率, 使用线程同步
		pthread_mutex_lock (&test_mutex);
		for (INT i_r = 1; i_r < relation_total; i_r++) 
			predict_relation_vector.push_back(std::make_pair(bags_test_key[i_sample] + "\t" + id2relation[i_r],
				std::make_pair(sample_relation_list.count(i_r), result_final[i_r])));

// 测试函数
void test() {

	printf("##################################################\n\nTest start...\n\n");

	gettimeofday(&test_start, NULL);

	num_test_non_NA = 0;

	std::vector<INT> sample_sum;
	for (std::map<std::string, std::vector<INT> >::iterator it = bags_test.begin();
		it != bags_test.end(); it++)
		std::map<INT, INT> sample_relation_list;
		for (INT i = 0; i < it->second.size(); i++)
			INT pos = it->second[i];
			if (test_relation_list[pos] > 0)
				sample_relation_list[test_relation_list[pos]] = 1;
		num_test_non_NA += sample_relation_list.size();

	for (INT i = 1; i < sample_sum.size(); i++)
		sample_sum[i] += sample_sum[i - 1];
	INT thread_id = 0;
	thread_first_bags_test.resize(num_threads + 1);
	for (INT i = 0; i < sample_sum.size(); i++)
		if (sample_sum[i] >= (sample_sum[sample_sum.size()-1] / num_threads) * thread_id)
			thread_first_bags_test[thread_id] = i;
			thread_id += 1;
	printf("Number of test samples for non NA relation: %d\n\n", num_test_non_NA);

	// 多线程模型测试
	pthread_t *pt = (pthread_t *)malloc(num_threads * sizeof(pthread_t));
	for (long a = 0; a < num_threads; a++)
		pthread_create(&pt[a], NULL, test_mode,  (void *)a);
	for (long a = 0; a < num_threads; a++)
		pthread_join(pt[a], NULL);

	// 以模型给出的关系成立的概率降序排列
	std::sort(predict_relation_vector.begin(),predict_relation_vector.end(), cmp_predict_probability);

	// 输出 precion/recall curves
	REAL correct = 0;
	FILE* f = fopen((output_path + "pr" + note + ".txt").c_str(), "w");
	INT top_2000 = std::min(2000, INT(predict_relation_vector.size()));
	for (INT i = 0; i < top_2000; i++)
		if (predict_relation_vector[i].second.first != 0)
		REAL precision = correct / (i + 1);
		REAL recall = correct / num_test_non_NA;
		if ((i+1) % 50 == 0)
			printf("precion/recall curves %4d / %4d - precision: %.3lf - recall: %.3lf\n", (i + 1), top_2000, precision, recall);
		fprintf(f, "precision: %.3lf  recall: %.3lf  correct: %d  predict_probability: %.2lf  predict_triplet: %s\n",
			precision, recall, predict_relation_vector[i].second.first, predict_relation_vector[i].second.second,

	gettimeofday(&test_end, NULL);
	long double time_use = (1000000 * (test_end.tv_sec - test_start.tv_sec)
		+ test_end.tv_usec - test_start.tv_usec) / 1000000.0;
	printf("\ntest use time - %02d:%02d:%02d\n\n", INT(time_use / 3600.0),
		INT(time_use) % 3600 / 60, INT(time_use) % 60);

	if (!output_model)return;

	// 输出词嵌入
	FILE *fout = fopen((output_path + "word2vec" + note + ".txt").c_str(), "w");
	fprintf(fout, "%d\t%d\n", word_total, dimension);
	for (INT i = 0; i < word_total; i++)
		for (INT j = 0; j < dimension; j++)
			fprintf(fout, "%f\t", word_vec[i * dimension + j]);
		fprintf(fout, "\n");

	// 输出位置嵌入
	fout = fopen((output_path + "position_vec" + note + ".txt").c_str(), "w");
	fprintf(fout, "%d\t%d\t%d\n", position_total_head, position_total_tail, dimension_pos);
	for (INT i = 0; i < position_total_head; i++) {
		for (INT j = 0; j < dimension_pos; j++)
			fprintf(fout, "%f\t", position_vec_head[i * dimension_pos + j]);
		fprintf(fout, "\n");
	for (INT i = 0; i < position_total_tail; i++) {
		for (INT j = 0; j < dimension_pos; j++)
			fprintf(fout, "%f\t", position_vec_tail[i * dimension_pos + j]);
		fprintf(fout, "\n");

	// 输出一维卷积权重矩阵和对应的偏置向量
	fout = fopen((output_path + "conv_1d" + note + ".txt").c_str(), "w");
	fprintf(fout,"%d\t%d\t%d\t%d\n", dimension_c, window, dimension, dimension_pos);
	for (INT i = 0; i < dimension_c; i++) {
		for (INT j = 0; j < window * dimension; j++)
			fprintf(fout, "%f\t", conv_1d_word[i * window * dimension + j]);
		for (INT j = 0; j < window * dimension_pos; j++)
			fprintf(fout, "%f\t", conv_1d_position_head[i * window * dimension_pos + j]);
		for (INT j = 0; j < window * dimension_pos; j++)
			fprintf(fout, "%f\t", conv_1d_position_tail[i * window * dimension_pos + j]);
		fprintf(fout, "%f\n", conv_1d_bias[i]);

	// 输出注意力权重矩阵
	fout = fopen((output_path + "attention_weights" + note + ".txt").c_str(), "w");
	fprintf(fout,"%d\t%d\n", relation_total, dimension_c);
	for (INT r = 0; r < relation_total; r++) {
		for (INT i_x = 0; i_x < dimension_c; i_x++)
			for (INT i_r = 0; i_r < dimension_c; i_r++)
				fprintf(fout, "%f\t", attention_weights[r][i_x][i_r]);
			fprintf(fout, "\n");

	// 输出 relation_matrix 和对应的偏置向量
	fout = fopen((output_path + "relation_matrix" + note + ".txt").c_str(), "w");
	fprintf(fout, "%d\t%d\t%f\n", relation_total, dimension_c, dropout_probability);
	for (INT i_r = 0; i_r < relation_total; i_r++) {
		for (INT i_s = 0; i_s < dimension_c; i_s++)
			fprintf(fout, "%f\t", relation_matrix[i_r * dimension_c + i_s]);
		fprintf(fout, "\n");
	for (INT i_r = 0; i_r < relation_total; i_r++) 
		fprintf(fout, "%f\t", relation_matrix_bias[i_r]);
	fprintf(fout, "\n");

	printf("模型保存成功, 保存目录为: %s\n\n", output_path.c_str());



// train.cpp
// 使用方法:
//     编译:
//           $ g++ train.cpp -o ./build/train -pthread -O3 -march=native
//     运行:
//           $ ./build/train
// created by LuYF-Lemon-love <luyanfeng_nlp@qq.com>
// 该 C++ 文件用于模型训练

// ##################################################
// 包含标准库和头文件
// ##################################################

#include "init.h"
#include "test.h"

// bags_test_key: 保存 bags_train 的 key (头实体 + "\t" + 尾实体 + "\t" + 关系名), 按照 bags_train 的迭代顺序
// total_loss: 每一轮次的总损失
// current_alpha: 当前轮次的学习率
// current_sample, final_sample: 由于使用多线程训练模型, 这两个变量用于确定当前训练批次是否完成, 进而更新各种权重矩阵的副本, 如 word_vec_copy
// train_mutex: 互斥锁, 线程同步 current_sample 变量
// len = bags_train.size()
// nbatches  =  len / (batch * num_threads)
std::vector<std::string> bags_train_key;
double total_loss = 0;
REAL current_alpha;
double current_sample = 0, final_sample = 0;
pthread_mutex_t train_mutex;
INT len;
INT nbatches;

struct timeval train_start, train_end;

// 计算句子的一维卷积
std::vector<REAL> calc_conv_1d(INT *sentence, INT *train_position_head,
	INT *train_position_tail, INT sentence_length, std::vector<INT> &max_pool_window_k) {
	std::vector<REAL> conv_1d_result_k;
	conv_1d_result_k.resize(dimension_c, 0);

	for (INT i = 0; i < dimension_c; i++) {
		INT last_word = i * window * dimension;
		INT last_pos = i * window * dimension_pos;
		REAL max_pool_1d = -FLT_MAX;
		for (INT last_window = 0; last_window <= sentence_length - window; last_window++) {
			REAL sum = 0;
			INT total_word = 0;
			INT total_pos = 0;
			for (INT j = last_window; j < last_window + window; j++)  {
				INT last_word_vec = sentence[j] * dimension;
			 	for (INT k = 0; k < dimension; k++) {
			 		sum += conv_1d_word_copy[last_word + total_word] * word_vec_copy[last_word_vec + k];
			 	INT last_pos_head = train_position_head[j] * dimension_pos;
			 	INT last_pos_tail = train_position_tail[j] * dimension_pos;
			 	for (INT k = 0; k < dimension_pos; k++) {
			 		sum += conv_1d_position_head_copy[last_pos + total_pos] * position_vec_head_copy[last_pos_head + k];
			 		sum += conv_1d_position_tail_copy[last_pos + total_pos] * position_vec_tail_copy[last_pos_tail + k];

			// 对应于论文中的公式 (3), [x]_i = max(p_i), 其中 x \in R^{d^c}
			if (sum > max_pool_1d) {
				max_pool_1d = sum;
				max_pool_window_k[i] = last_window;
		conv_1d_result_k[i] = max_pool_1d + conv_1d_bias_copy[i];

	for (INT i = 0; i < dimension_c; i++) {
		conv_1d_result_k[i] = calc_tanh(conv_1d_result_k[i]);
	return conv_1d_result_k;

// 根据梯度更新一维卷积的权重矩阵, 位置嵌入矩阵, 词嵌入矩阵
void gradient_conv_1d(INT *sentence, INT *train_position_head, INT *train_position_tail,
	std::vector<REAL> &conv_1d_result_k, std::vector<INT> &max_pool_window_k, std::vector<REAL> &grad_x_k)
	for (INT i = 0; i < dimension_c; i++) {
		if (fabs(grad_x_k[i]) < 1e-8)
		INT last_word = i * window * dimension;
		INT last_pos = i * window * dimension_pos;
		INT total_word = 0;
		INT total_pos = 0;
		// (tanh x)^{'} = sech^2x = \frac{1}{cosh^2x} = 1 - tanh^2x
		REAL grad_word_pos = grad_x_k[i] * (1 -  conv_1d_result_k[i] * conv_1d_result_k[i]);
		for (INT j = 0; j < window; j++)  {
			INT last_word_vec = sentence[max_pool_window_k[i] + j] * dimension;
			for (INT k = 0; k < dimension; k++) {
				conv_1d_word[last_word + total_word] -= grad_word_pos * word_vec_copy[last_word_vec + k];
				word_vec[last_word_vec + k] -= grad_word_pos * conv_1d_word_copy[last_word + total_word];
			INT last_pos_head = train_position_head[max_pool_window_k[i] + j] * dimension_pos;
			INT last_pos_tail = train_position_tail[max_pool_window_k[i] + j] * dimension_pos;
			for (INT k = 0; k < dimension_pos; k++) {
				conv_1d_position_head[last_pos + total_pos] -= grad_word_pos * position_vec_head_copy[last_pos_head + k];
				conv_1d_position_tail[last_pos + total_pos] -= grad_word_pos * position_vec_tail_copy[last_pos_tail + k];
				position_vec_head[last_pos_head + k] -= grad_word_pos * conv_1d_position_head_copy[last_pos + total_pos];
				position_vec_tail[last_pos_tail + k] -= grad_word_pos * conv_1d_position_tail_copy[last_pos + total_pos];
		conv_1d_bias[i] -= grad_word_pos;

// 训练一个样本
REAL train_bags(std::string bags_name)
	// ##################################################
	// 正向传播
	// ##################################################

	// 一维卷积部分
	// relation: 该训练样本的正确标签 (关系)
	INT relation = -1;
	INT bags_size = bags_train[bags_name].size();
	std::vector<std::vector<INT> > max_pool_window;
	std::vector<std::vector<REAL> > conv_1d_result;
	for (INT k = 0; k < bags_size; k++)
		INT pos = bags_train[bags_name][k];
		if (relation == -1)
			relation = train_relation_list[pos];
			assert(relation == train_relation_list[pos]);
		conv_1d_result.push_back(calc_conv_1d(train_sentence_list[pos], train_position_head[pos],
			train_position_tail[pos], train_length[pos], max_pool_window[k]));

	// 获取每一个句子的权重
	std::vector<REAL> weight;
	REAL weight_sum = 0;
	for (INT k = 0; k < bags_size; k++)
		REAL s = 0;
		for (INT i_r = 0; i_r < dimension_c; i_r++) 
			REAL temp = 0;
			for (INT i_x = 0; i_x < dimension_c; i_x++)
				temp += conv_1d_result[k][i_x] * attention_weights_copy[relation][i_x][i_r];
			s += temp * relation_matrix_copy[relation * dimension_c + i_r];
		s = exp(s); 
		weight_sum += s;

	for (INT k = 0; k < bags_size; k++)
		weight[k] /= weight_sum;

	// 获取 s, i.e., s indicates the representation of the sentence set
	std::vector<REAL> result_sentence;
	for (INT i = 0; i < dimension_c; i++) 
		for (INT k = 0; k < bags_size; k++)
			result_sentence[i] += conv_1d_result[k][i] * weight[k];
	// 计算各种关系成立的概率
	std::vector<REAL> result_final;
	std::vector<INT> dropout;
	for (INT i_s = 0; i_s < dimension_c; i_s++)
		dropout.push_back((REAL)(rand()) / (RAND_MAX + 1.0) < dropout_probability);

	REAL sum = 0;
	for (INT i_r = 0; i_r < relation_total; i_r++) {
		REAL s = 0;
		for (INT i_s = 0; i_s < dimension_c; i_s++) {
			s += dropout[i_s] * result_sentence[i_s] * relation_matrix_copy[i_r * dimension_c + i_s];
		s += relation_matrix_bias_copy[i_r];
		s = exp(s);
		sum += s;

	// 计算损失值
	double loss = -(log(result_final[relation]) - log(sum));

	// ##################################################
	// 反向传播
	// ##################################################
	// 更新 relation_matrix, 对应于论文中的公式 (12), o = M(s \circ h) + d
	std::vector<REAL> grad_s;

	for (INT i_r = 0; i_r < relation_total; i_r++)
		// 由于损失函数是 cross-entropy, 负标签是 0
		// 对于负标签 (关系) 的梯度是计算的概率, 即 result_final[i_r] / sum
		// 这样做, 能省略一层 softmax
		REAL grad_final = result_final[i_r] / sum * current_alpha;
		// 正标签是 0, 对于正标签 (关系) 的梯度是计算的概率 - 1, 即 result_final[i_r] / sum - 1
		// 这样做, 能省略一层 softmax
		if (i_r == relation)
			grad_final -= current_alpha;

		for (INT i_s = 0; i_s < dimension_c; i_s++) 
			REAL grad_i_s = 0;
			if (dropout[i_s] != 0)
				grad_i_s += grad_final * relation_matrix_copy[i_r * dimension_c + i_s];
				relation_matrix[i_r * dimension_c + i_s] -= grad_final * result_sentence[i_s];
			grad_s[i_s] += grad_i_s;
		relation_matrix_bias[i_r] -= grad_final;

	// 更新注意力权重矩阵和 relation_matrix, 对应于论文中的公式 (5), (7), (8)
	std::vector<std::vector<REAL> > grad_x;

	for (INT k = 0; k < bags_size; k++)

	for (INT i_r = 0; i_r < dimension_c; i_r++) 
		REAL grad_i_s = grad_s[i_r];
		double a_denominator_sum_exp = 0;

		for (INT k = 0; k < bags_size; k++)
			// grad_i_s * weight[k] 对应于论文中的公式 5
			grad_x[k][i_r] += grad_i_s * weight[k];
			for (INT i_x = 0; i_x < dimension_c; i_x++)
				// 对应于论文中的公式 7 中分子 (exp(e_i)) 的公式 8 中的 x_i
				grad_x[k][i_x] += grad_i_s * conv_1d_result[k][i_r] * weight[k] *
					relation_matrix_copy[relation * dimension_c + i_r] *

				// 对应于论文中的公式 7 中分子 (exp(e_i)) 的公式 8 中的 r
				relation_matrix[relation * dimension_c + i_r] -= grad_i_s *
					conv_1d_result[k][i_r] * weight[k] * conv_1d_result[k][i_x] *
				// 对应于论文中的公式 7 中分子 (exp(e_i)) 的公式 8 中的 A
				if (i_r == i_x)
					attention_weights[relation][i_x][i_r] -= grad_i_s * conv_1d_result[k][i_r] *
						weight[k] * conv_1d_result[k][i_x] *
						relation_matrix_copy[relation * dimension_c + i_r];

			// 由于 1/x 的导数是 -1/x^2, exp(x) 的导数是 exp(x)
			// 所以论文中的公式 (7) 中分母 (exp(e_i)) 的公式 8 的求导需要一个和 (exp(x_1), exp(x_2) ,...)
			// 并且需要多乘一次 weight[k]
			a_denominator_sum_exp += conv_1d_result[k][i_r] * weight[k];
		for (INT k = 0; k < bags_size; k++)
			for (INT i_x = 0; i_x < dimension_c; i_x++)
				// 对应于论文中的公式 7 中分母 (exp(e_i)) 的公式 8 中的 x_i
				grad_x[k][i_x]-= grad_i_s * a_denominator_sum_exp * weight[k] *
					relation_matrix_copy[relation * dimension_c + i_r] *
				// 对应于论文中的公式 7 中分母 (exp(e_i)) 的公式 8 中的 r
				relation_matrix[relation * dimension_c + i_r] += grad_i_s *
					a_denominator_sum_exp * weight[k] * conv_1d_result[k][i_x] *
				// 对应于论文中的公式 7 中分母 (exp(e_i)) 的公式 8 中的 A
				if (i_r == i_x)
					attention_weights[relation][i_x][i_r] += grad_i_s * a_denominator_sum_exp *
						weight[k] * conv_1d_result[k][i_x] *
						relation_matrix_copy[relation * dimension_c + i_r];

	// 根据梯度更新一维卷积的权重矩阵, 位置嵌入矩阵, 词嵌入矩阵
	for (INT k = 0; k < bags_size; k++)
		INT pos = bags_train[bags_name][k];
		gradient_conv_1d(train_sentence_list[pos], train_position_head[pos], train_position_tail[pos],
			conv_1d_result[k], max_pool_window[k], grad_x[k]);
	return loss;

// 单个线程内运行的任务
void* train_mode(void *id) {
	while (true)
		pthread_mutex_lock (&train_mutex);
		if (current_sample >= final_sample)
			pthread_mutex_unlock (&train_mutex);
		current_sample += 1;
		pthread_mutex_unlock (&train_mutex);
		INT i = get_rand_i(0, len);
		total_loss += train_bags(bags_train_key[i]);

// 训练函数
void train() {

	len = bags_train.size();
	nbatches  =  len / (batch * num_threads);

	for (std::map<std::string, std::vector<INT> >:: iterator it = bags_train.begin();
		it != bags_train.end(); it++)

	// 为模型的权重矩阵分配内存空间
	position_vec_head = (REAL *)calloc(position_total_head * dimension_pos, sizeof(REAL));
	position_vec_tail = (REAL *)calloc(position_total_tail * dimension_pos, sizeof(REAL));
	conv_1d_word = (REAL*)calloc(dimension_c * window * dimension, sizeof(REAL));
	conv_1d_position_head = (REAL *)calloc(dimension_c * window * dimension_pos, sizeof(REAL));
	conv_1d_position_tail = (REAL *)calloc(dimension_c * window * dimension_pos, sizeof(REAL));
	conv_1d_bias = (REAL*)calloc(dimension_c, sizeof(REAL));
	for (INT i = 0; i < relation_total; i++)
		for (INT j = 0; j < dimension_c; j++)
			attention_weights[i][j][j] = 1.00;
	relation_matrix = (REAL *)calloc(relation_total * dimension_c, sizeof(REAL));
	relation_matrix_bias = (REAL *)calloc(relation_total, sizeof(REAL));

	// 为模型的权重矩阵的副本分配内存空间
	word_vec_copy = (REAL *)calloc(dimension * word_total, sizeof(REAL));
	position_vec_head_copy = (REAL *)calloc(position_total_head * dimension_pos, sizeof(REAL));
	position_vec_tail_copy = (REAL *)calloc(position_total_tail * dimension_pos, sizeof(REAL));
	conv_1d_word_copy =  (REAL*)calloc(dimension_c * window * dimension, sizeof(REAL));
	conv_1d_position_head_copy = (REAL *)calloc(dimension_c * window * dimension_pos, sizeof(REAL));
	conv_1d_position_tail_copy = (REAL *)calloc(dimension_c * window * dimension_pos, sizeof(REAL));
	conv_1d_bias_copy =  (REAL*)calloc(dimension_c, sizeof(REAL));
	attention_weights_copy = attention_weights;
	relation_matrix_copy = (REAL *)calloc(relation_total * dimension_c, sizeof(REAL));
	relation_matrix_bias_copy = (REAL *)calloc(relation_total, sizeof(REAL));

	// 模型的权重矩阵初始化
	REAL relation_matrix_init = sqrt(6.0 / (relation_total + dimension_c));
	REAL conv_1d_position_vec_init = sqrt(6.0 / ((dimension + dimension_pos) * window));

	for (INT i = 0; i < position_total_head; i++) {
		for (INT j = 0; j < dimension_pos; j++) {
			position_vec_head[i * dimension_pos + j] = get_rand_u(-conv_1d_position_vec_init,
	for (INT i = 0; i < position_total_tail; i++) {
		for (INT j = 0; j < dimension_pos; j++) {
			position_vec_tail[i * dimension_pos + j] = get_rand_u(-conv_1d_position_vec_init,
	for (INT i = 0; i < dimension_c; i++) {
		INT last = i * window * dimension;
		for (INT j = 0; j < window * dimension; j++)
			conv_1d_word[last + j] = get_rand_u(-conv_1d_position_vec_init, conv_1d_position_vec_init);
		last = i * window * dimension_pos;
		for (INT j = dimension_pos * window - 1; j >=0; j--) {
			conv_1d_position_head[last + j] = get_rand_u(-conv_1d_position_vec_init, conv_1d_position_vec_init);
			conv_1d_position_tail[last + j] = get_rand_u(-conv_1d_position_vec_init, conv_1d_position_vec_init);
		conv_1d_bias[i] = get_rand_u(-conv_1d_position_vec_init, conv_1d_position_vec_init);
	for (INT i = 0; i < relation_total; i++) 
		for (INT j = 0; j < dimension_c; j++)
			relation_matrix[i * dimension_c + j] = get_rand_u(-relation_matrix_init, relation_matrix_init);
		relation_matrix_bias[i] = get_rand_u(-relation_matrix_init, relation_matrix_init);

	printf("##################################################\n\nTrain start...\n\n");

	for (INT epoch = 1; epoch <= epochs; epoch++) {

		// 更新当前 epoch 的学习率		
		current_alpha = alpha * current_rate;

		current_sample = 0;
		final_sample = 0;
		total_loss = 0;

		gettimeofday(&train_start, NULL);

		for (INT i = 1; i <= nbatches; i++) {
			final_sample += batch * num_threads;
			memcpy(word_vec_copy, word_vec, word_total * dimension * sizeof(REAL));
			memcpy(position_vec_head_copy, position_vec_head, position_total_head * dimension_pos * sizeof(REAL));
			memcpy(position_vec_tail_copy, position_vec_tail, position_total_tail * dimension_pos * sizeof(REAL));
			memcpy(conv_1d_word_copy, conv_1d_word, dimension_c * window * dimension * sizeof(REAL));
			memcpy(conv_1d_position_head_copy, conv_1d_position_head, dimension_c * window * dimension_pos * sizeof(REAL));
			memcpy(conv_1d_position_tail_copy, conv_1d_position_tail, dimension_c * window * dimension_pos * sizeof(REAL));
			memcpy(conv_1d_bias_copy, conv_1d_bias, dimension_c * sizeof(REAL));
			attention_weights_copy = attention_weights;
			memcpy(relation_matrix_copy, relation_matrix, relation_total * dimension_c * sizeof(REAL));
			memcpy(relation_matrix_bias_copy, relation_matrix_bias, relation_total * sizeof(REAL));
			pthread_t *pt = (pthread_t *)malloc(num_threads * sizeof(pthread_t));
			for (long a = 0; a < num_threads; a++)
				pthread_create(&pt[a], NULL, train_mode,  (void *)a);
			for (long a = 0; a < num_threads; a++)
				pthread_join(pt[a], NULL);

		gettimeofday(&train_end, NULL);
		long double time_use = (1000000 * (train_end.tv_sec - train_start.tv_sec)
			+ train_end.tv_usec - train_start.tv_usec) / 1000000.0;

		printf("Epoch %d/%d - current_alpha: %.8f - loss: %f - %02d:%02d:%02d\n\n", epoch, epochs,
			current_alpha, total_loss / final_sample, INT(time_use / 3600.0),
			INT(time_use) % 3600 / 60, INT(time_use) % 60);
		printf("Test end.\n\n##################################################\n\n");

		current_rate = current_rate * reduce_epoch;
	printf("Train end.\n\n##################################################\n\n");

// ##################################################
// Main function
// ##################################################

void setparameters(INT argc, char **argv) {
	INT i;
	if ((i = arg_pos((char *)"-batch", argc, argv)) > 0) batch = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-threads", argc, argv)) > 0) num_threads = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-alpha", argc, argv)) > 0) alpha = atof(argv[i + 1]);
	if ((i = arg_pos((char *)"-init_rate", argc, argv)) > 0) current_rate = atof(argv[i + 1]);
	if ((i = arg_pos((char *)"-reduce_epoch", argc, argv)) > 0) reduce_epoch = atof(argv[i + 1]);
	if ((i = arg_pos((char *)"-epochs", argc, argv)) > 0) epochs = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-limit", argc, argv)) > 0) limit = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-dimension_pos", argc, argv)) > 0) dimension_pos = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-window", argc, argv)) > 0) window = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-dimension_c", argc, argv)) > 0) dimension_c = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-dropout", argc, argv)) > 0) dropout_probability = atof(argv[i + 1]);
	if ((i = arg_pos((char *)"-output_model", argc, argv)) > 0) output_model = atoi(argv[i + 1]);	
	if ((i = arg_pos((char *)"-note", argc, argv)) > 0) note = argv[i + 1];
	if ((i = arg_pos((char *)"-data_path", argc, argv)) > 0) data_path = argv[i + 1];
	if ((i = arg_pos((char *)"-output_path", argc, argv)) > 0) output_path = argv[i + 1];

void print_train_help() {
	std::string str = R"(
// ##################################################
// ./train [-batch BATCH] [-threads THREAD] [-alpha ALPHA]
//         [-init_rate INIT_RATE] [-reduce_epoch REDUCE_EPOCH]
//         [-epochs EPOCHS] [-limit LIMIT] [-dimension_pos DIMENSION_POS]
//         [-window WINDOW] [-dimension_c DIMENSION_C]
//         [-dropout DROPOUT] [-output_model 0/1]
//         [-note NOTE] [-data_path DATA_PATH]
//         [-output_path OUTPUT_PATH] [--help]

// optional arguments:
// -batch BATCH                   batch size. if unspecified, batch will default to [40]
// -threads THREAD                number of worker threads. if unspecified, num_threads will default to [32]
// -alpha ALPHA                   learning rate. if unspecified, alpha will default to [0.00125]
// -init_rate INIT_RATE           init rate of learning rate. if unspecified, current_rate will default to [1.0]
// -reduce_epoch REDUCE_EPOCH     reduce of init rate of learning rate per epoch. if unspecified, reduce_epoch will default to [0.98]
// -epochs EPOCHS                 number of epochs. if unspecified, epochs will default to [25]
// -limit LIMIT                   限制句子中 (头, 尾) 实体相对每个单词的最大距离. 默认值为 [30]
// -dimension_pos DIMENSION_POS   位置嵌入维度,默认值为 [5]
// -window WINDOW                 一维卷积的 window 大小. 默认值为 [3]
// -dimension_c DIMENSION_C       sentence embedding size, if unspecified, dimension_c will default to [230]
// -dropout DROPOUT               dropout probability. if unspecified, dropout_probability will default to [0.5]
// -output_model 0/1              [1] 保存模型, [0] 不保存模型. 默认值为 [1]
// -note NOTE                     information you want to add to the filename, like ("./output/word2vec" + note + ".txt"). if unspecified, note will default to ""
// -data_path DATA_PATH           folder of data. if unspecified, data_path will default to "../data/"
// -output_path OUTPUT_PATH       folder of outputing results (precion/recall curves) and models. if unspecified, output_path will default to "./output/"
// --help                         print help information of ./train
// ##################################################

	printf("%s\n", str.c_str());

// ##################################################
// ./train [-batch BATCH] [-threads THREAD] [-alpha ALPHA]
//         [-init_rate INIT_RATE] [-reduce_epoch REDUCE_EPOCH]
//         [-epochs EPOCHS] [-limit LIMIT] [-dimension_pos DIMENSION_POS]
//         [-window WINDOW] [-dimension_c DIMENSION_C]
//         [-dropout DROPOUT] [-output_model 0/1]
//         [-note NOTE] [-data_path DATA_PATH]
//         [-output_path OUTPUT_PATH] [--help]

// optional arguments:
// -batch BATCH                   batch size. if unspecified, batch will default to [40]
// -threads THREAD                number of worker threads. if unspecified, num_threads will default to [32]
// -alpha ALPHA                   learning rate. if unspecified, alpha will default to [0.00125]
// -init_rate INIT_RATE           init rate of learning rate. if unspecified, current_rate will default to [1.0]
// -reduce_epoch REDUCE_EPOCH     reduce of init rate of learning rate per epoch. if unspecified, reduce_epoch will default to [0.98]
// -epochs EPOCHS                 number of epochs. if unspecified, epochs will default to [25]
// -limit LIMIT                   限制句子中 (头, 尾) 实体相对每个单词的最大距离. 默认值为 [30]
// -dimension_pos DIMENSION_POS   位置嵌入维度,默认值为 [5]
// -window WINDOW                 一维卷积的 window 大小. 默认值为 [3]
// -dimension_c DIMENSION_C       sentence embedding size, if unspecified, dimension_c will default to [230]
// -dropout DROPOUT               dropout probability. if unspecified, dropout_probability will default to [0.5]
// -output_model 0/1              [1] 保存模型, [0] 不保存模型. 默认值为 [1]
// -note NOTE                     information you want to add to the filename, like ("./output/word2vec" + note + ".txt"). if unspecified, note will default to ""
// -data_path DATA_PATH           folder of data. if unspecified, data_path will default to "../data/"
// -output_path OUTPUT_PATH       folder of outputing results (precion/recall curves) and models. if unspecified, output_path will default to "./output/"
// --help                         print help information of ./train
// ##################################################

INT main(INT argc, char **argv) {
	for (INT a = 1; a < argc; a++) if (!strcmp((char *)"--help", argv[a])) {
		return 0;
	output_model = 1;
	setparameters(argc, argv);
	return 0;


// test.cpp
// 使用方法:
//     编译:
//           $ g++ test.cpp -o ./build/test -pthread -O3 -march=native
//     运行:
//           $ ./build/test
// created by LuYF-Lemon-love <luyanfeng_nlp@qq.com>
// 该 C++ 文件用于模型测试
// 加载模型
// prerequisites:
//     ./output/word2vec + note + .txt
//     ./output/position_vec + note + .txt
//     ./output/conv_1d + note + .txt
//     ./output/attention_weights + note + .txt
//     ./output/relation_matrix + note + .txt

// ##################################################
// 包含标准库和头文件
// ##################################################

#include "init.h"
#include "test.h"

// 加载模型
void load_model()
	// 为模型的权重矩阵分配内存空间
	position_vec_head = (REAL *)calloc(position_total_head * dimension_pos, sizeof(REAL));
	position_vec_tail = (REAL *)calloc(position_total_tail * dimension_pos, sizeof(REAL));

	conv_1d_word = (REAL*)calloc(dimension_c * dimension * window, sizeof(REAL));
	conv_1d_position_head = (REAL *)calloc(dimension_c * dimension_pos * window, sizeof(REAL));
	conv_1d_position_tail = (REAL *)calloc(dimension_c * dimension_pos * window, sizeof(REAL));
	conv_1d_bias = (REAL*)calloc(dimension_c, sizeof(REAL));

	for (INT i = 0; i < relation_total; i++)
		for (INT j = 0; j < dimension_c; j++)

	relation_matrix = (REAL *)calloc(relation_total * dimension_c, sizeof(REAL));
	relation_matrix_bias = (REAL *)calloc(relation_total, sizeof(REAL));
	INT tmp;

	// 加载词嵌入
	FILE *fout = fopen((output_path + "word2vec" + note + ".txt").c_str(), "r");
	tmp = fscanf(fout,"%d%d", &word_total, &dimension);
	for (INT i = 0; i < word_total; i++)
		for (INT j = 0; j < dimension; j++)
			tmp = fscanf(fout, "%f", &word_vec[i * dimension + j]);

	// 加载位置嵌入
	fout = fopen((output_path + "position_vec" + note + ".txt").c_str(), "r");
	tmp = fscanf(fout, "%d%d%d", &position_total_head, &position_total_tail, &dimension_pos);
	for (INT i = 0; i < position_total_head; i++) {
		for (INT j = 0; j < dimension_pos; j++)
			tmp = fscanf(fout, "%f", &position_vec_head[i * dimension_pos + j]);
	for (INT i = 0; i < position_total_tail; i++) {
		for (INT j = 0; j < dimension_pos; j++)
			tmp = fscanf(fout, "%f", &position_vec_tail[i * dimension_pos + j]);

	// 加载一维卷积权重矩阵和对应的偏置向量
	fout = fopen((output_path + "conv_1d" + note + ".txt").c_str(), "r");
	tmp = fscanf(fout, "%d%d%d%d", &dimension_c, &window, &dimension, &dimension_pos);
	for (INT i = 0; i < dimension_c; i++) {
		for (INT j = 0; j < window * dimension; j++)
			tmp = fscanf(fout, "%f", &conv_1d_word[i * window * dimension + j]);
		for (INT j = 0; j < window * dimension_pos; j++)
			tmp = fscanf(fout, "%f", &conv_1d_position_head[i * window * dimension_pos + j]);
		for (INT j = 0; j < window * dimension_pos; j++)
			tmp = fscanf(fout, "%f", &conv_1d_position_tail[i * window * dimension_pos + j]);
		tmp = fscanf(fout, "%f", &conv_1d_bias[i]);

	// 加载注意力权重矩阵
	fout = fopen((output_path + "attention_weights" + note + ".txt").c_str(), "r");
	tmp = fscanf(fout,"%d%d", &relation_total, &dimension_c);
	for (INT r = 0; r < relation_total; r++) {
		for (INT i_x = 0; i_x < dimension_c; i_x++)
			for (INT i_r = 0; i_r < dimension_c; i_r++)
				tmp = fscanf(fout, "%f", &attention_weights[r][i_x][i_r]);

	// 加载 relation_matrix 和对应的偏置向量
	fout = fopen((output_path + "relation_matrix" + note + ".txt").c_str(), "r");
	tmp = fscanf(fout, "%d%d%f", &relation_total, &dimension_c, &dropout_probability);
	for (INT i_r = 0; i_r < relation_total; i_r++) {
		for (INT i_s = 0; i_s < dimension_c; i_s++)
			tmp = fscanf(fout, "%f", &relation_matrix[i_r * dimension_c + i_s]);
	for (INT i_r = 0; i_r < relation_total; i_r++) 
		tmp = fscanf(fout, "%f", &relation_matrix_bias[i_r]);


// ##################################################
// Main function
// ##################################################

void setparameters(INT argc, char **argv) {
	INT i;
	if ((i = arg_pos((char *)"-threads", argc, argv)) > 0) num_threads = atoi(argv[i + 1]);
	if ((i = arg_pos((char *)"-note", argc, argv)) > 0) note = argv[i + 1];
	if ((i = arg_pos((char *)"-data_path", argc, argv)) > 0) data_path = argv[i + 1];
	if ((i = arg_pos((char *)"-load_path", argc, argv)) > 0) output_path = argv[i + 1];

void print_test_help() {
	std::string str = R"(
// ##################################################
// ./test [-threads THREAD] [-dropout DROPOUT]
//        [-note NOTE] [-data_path DATA_PATH]
//        [-load_path LOAD_PATH] [--help]

// optional arguments:
// -threads THREAD                number of worker threads. if unspecified, num_threads will default to [32]
// -note NOTE                     information you want to add to the filename, like ("./output/word2vec" + note + ".txt"). if unspecified, note will default to ""
// -data_path DATA_PATH           folder of data. if unspecified, data_path will default to "../data/"
// -load_path LOAD_PATH           folder of pretrained models. if unspecified, load_path will default to "./output/"
// --help                         print help information of ./test
// ##################################################

	printf("%s\n", str.c_str());

// ##################################################
// ./test [-threads THREAD] [-dropout DROPOUT]
//        [-note NOTE] [-data_path DATA_PATH]
//        [-load_path LOAD_PATH] [--help]

// optional arguments:
// -threads THREAD                number of worker threads. if unspecified, num_threads will default to [32]
// -note NOTE                     information you want to add to the filename, like ("./output/word2vec" + note + ".txt"). if unspecified, note will default to ""
// -data_path DATA_PATH           folder of data. if unspecified, data_path will default to "../data/"
// -load_path LOAD_PATH           folder of pretrained models. if unspecified, load_path will default to "./output/"
// --help                         print help information of ./test
// ##################################################

INT main(INT argc, char **argv) {
	for (INT a = 1; a < argc; a++) if (!strcmp((char *)"--help", argv[a])) {
		return 0;
	setparameters(argc, argv);
	printf("Test end.\n\n##################################################\n\n");
	return 0;



# run.sh
# 使用方法:$ bash run.sh
# created by LuYF-Lemon-love <luyanfeng_nlp@qq.com>
# 该 Shell 脚本用于模型训练和模型测试

# 创建 build 目录
echo ""
echo "##################################################"
echo ""
mkdir -p build
mkdir -p output
echo "./build 和 ./output 目录创建成功."

# compile
g++ train.cpp -o ./build/train -pthread -O3 -march=native
g++ test.cpp -o ./build/test -pthread -O3 -march=native

# train



# clean.sh
# 使用方法:$ bash clean.sh
# created by LuYF-Lemon-love <luyanfeng_nlp@qq.com>
# 该 Shell 脚本用于清理临时文件

# 删除目标文件
echo ""
echo "##################################################"
echo ""
rm -rf ./build
echo "./build 目录递归删除成功."
echo ""
echo "##################################################"
echo ""


$ ls
clean.sh  init.h  run.sh  test.cpp  test.h  train.cpp
$ bash run.sh 


./build 和 ./output 目录创建成功.


Init start...


batch: 40
number of threads: 32
learning rate: 0.00125000
init_rate: 1.00
reduce_epoch: 0.98
epochs: 25

word_total: 114043
word dimension: 50

limit: 30
position_total_head: 61
position_total_tail: 61
dimension_pos: 5

window: 3
dimension_c: 230

relation_total: 53
dropout_probability: 0.50


folder of data: ../data/
folder of outputing results (precion/recall curves) and models: ./output/

number of training samples:  281270 - average sentence number of per training sample: 1.86
number of testing samples:    96678 - average sentence number of per testing sample:  1.78

Init end.


Train start...

Epoch 1/25 - current_alpha: 0.00125000 - loss: 0.392392 - 00:04:00


Test start...

Number of test samples for non NA relation: 1950

precion/recall curves   50 / 2000 - precision: 0.280 - recall: 0.007
precion/recall curves  100 / 2000 - precision: 0.210 - recall: 0.011
precion/recall curves  150 / 2000 - precision: 0.187 - recall: 0.014
precion/recall curves  200 / 2000 - precision: 0.185 - recall: 0.019
precion/recall curves  250 / 2000 - precision: 0.164 - recall: 0.021
precion/recall curves  300 / 2000 - precision: 0.147 - recall: 0.023
precion/recall curves  350 / 2000 - precision: 0.134 - recall: 0.024
precion/recall curves  400 / 2000 - precision: 0.125 - recall: 0.026
precion/recall curves  450 / 2000 - precision: 0.120 - recall: 0.028
precion/recall curves  500 / 2000 - precision: 0.122 - recall: 0.031
precion/recall curves  550 / 2000 - precision: 0.125 - recall: 0.035
precion/recall curves  600 / 2000 - precision: 0.118 - recall: 0.036
precion/recall curves  650 / 2000 - precision: 0.118 - recall: 0.039
precion/recall curves  700 / 2000 - precision: 0.116 - recall: 0.042
precion/recall curves  750 / 2000 - precision: 0.109 - recall: 0.042
precion/recall curves  800 / 2000 - precision: 0.109 - recall: 0.045
precion/recall curves  850 / 2000 - precision: 0.105 - recall: 0.046
precion/recall curves  900 / 2000 - precision: 0.104 - recall: 0.048
precion/recall curves  950 / 2000 - precision: 0.102 - recall: 0.050
precion/recall curves 1000 / 2000 - precision: 0.101 - recall: 0.052
precion/recall curves 1050 / 2000 - precision: 0.102 - recall: 0.055
precion/recall curves 1100 / 2000 - precision: 0.099 - recall: 0.056
precion/recall curves 1150 / 2000 - precision: 0.097 - recall: 0.057
precion/recall curves 1200 / 2000 - precision: 0.094 - recall: 0.058
precion/recall curves 1250 / 2000 - precision: 0.092 - recall: 0.059
precion/recall curves 1300 / 2000 - precision: 0.090 - recall: 0.060
precion/recall curves 1350 / 2000 - precision: 0.087 - recall: 0.061
precion/recall curves 1400 / 2000 - precision: 0.085 - recall: 0.061
precion/recall curves 1450 / 2000 - precision: 0.083 - recall: 0.062
precion/recall curves 1500 / 2000 - precision: 0.085 - recall: 0.065
precion/recall curves 1550 / 2000 - precision: 0.084 - recall: 0.067
precion/recall curves 1600 / 2000 - precision: 0.083 - recall: 0.068
precion/recall curves 1650 / 2000 - precision: 0.082 - recall: 0.069
precion/recall curves 1700 / 2000 - precision: 0.081 - recall: 0.071
precion/recall curves 1750 / 2000 - precision: 0.081 - recall: 0.073
precion/recall curves 1800 / 2000 - precision: 0.081 - recall: 0.075
precion/recall curves 1850 / 2000 - precision: 0.082 - recall: 0.078
precion/recall curves 1900 / 2000 - precision: 0.080 - recall: 0.078
precion/recall curves 1950 / 2000 - precision: 0.080 - recall: 0.080
precion/recall curves 2000 / 2000 - precision: 0.079 - recall: 0.081

test use time - 00:01:21

模型保存成功, 保存目录为: ./output/

Test end.


Epoch 25/25 - current_alpha: 0.00076973 - loss: 0.124694 - 00:04:05


Test start...

Number of test samples for non NA relation: 1950

precion/recall curves   50 / 2000 - precision: 0.760 - recall: 0.019
precion/recall curves  100 / 2000 - precision: 0.730 - recall: 0.037
precion/recall curves  150 / 2000 - precision: 0.720 - recall: 0.055
precion/recall curves  200 / 2000 - precision: 0.715 - recall: 0.073
precion/recall curves  250 / 2000 - precision: 0.692 - recall: 0.089
precion/recall curves  300 / 2000 - precision: 0.657 - recall: 0.101
precion/recall curves  350 / 2000 - precision: 0.634 - recall: 0.114
precion/recall curves  400 / 2000 - precision: 0.623 - recall: 0.128
precion/recall curves  450 / 2000 - precision: 0.611 - recall: 0.141
precion/recall curves  500 / 2000 - precision: 0.590 - recall: 0.151
precion/recall curves  550 / 2000 - precision: 0.587 - recall: 0.166
precion/recall curves  600 / 2000 - precision: 0.577 - recall: 0.177
precion/recall curves  650 / 2000 - precision: 0.565 - recall: 0.188
precion/recall curves  700 / 2000 - precision: 0.551 - recall: 0.198
precion/recall curves  750 / 2000 - precision: 0.540 - recall: 0.208
precion/recall curves  800 / 2000 - precision: 0.533 - recall: 0.218
precion/recall curves  850 / 2000 - precision: 0.519 - recall: 0.226
precion/recall curves  900 / 2000 - precision: 0.520 - recall: 0.240
precion/recall curves  950 / 2000 - precision: 0.511 - recall: 0.249
precion/recall curves 1000 / 2000 - precision: 0.501 - recall: 0.257
precion/recall curves 1050 / 2000 - precision: 0.492 - recall: 0.265
precion/recall curves 1100 / 2000 - precision: 0.484 - recall: 0.273
precion/recall curves 1150 / 2000 - precision: 0.477 - recall: 0.282
precion/recall curves 1200 / 2000 - precision: 0.469 - recall: 0.289
precion/recall curves 1250 / 2000 - precision: 0.468 - recall: 0.300
precion/recall curves 1300 / 2000 - precision: 0.463 - recall: 0.309
precion/recall curves 1350 / 2000 - precision: 0.454 - recall: 0.314
precion/recall curves 1400 / 2000 - precision: 0.449 - recall: 0.323
precion/recall curves 1450 / 2000 - precision: 0.440 - recall: 0.327
precion/recall curves 1500 / 2000 - precision: 0.436 - recall: 0.335
precion/recall curves 1550 / 2000 - precision: 0.428 - recall: 0.341
precion/recall curves 1600 / 2000 - precision: 0.426 - recall: 0.350
precion/recall curves 1650 / 2000 - precision: 0.419 - recall: 0.354
precion/recall curves 1700 / 2000 - precision: 0.413 - recall: 0.360
precion/recall curves 1750 / 2000 - precision: 0.406 - recall: 0.365
precion/recall curves 1800 / 2000 - precision: 0.398 - recall: 0.368
precion/recall curves 1850 / 2000 - precision: 0.395 - recall: 0.374
precion/recall curves 1900 / 2000 - precision: 0.389 - recall: 0.379
precion/recall curves 1950 / 2000 - precision: 0.383 - recall: 0.383
precion/recall curves 2000 / 2000 - precision: 0.379 - recall: 0.389

test use time - 00:01:19

模型保存成功, 保存目录为: ./output/

Test end.


Train end.


$ tree
├── build
│   ├── test
│   └── train
├── clean.sh
├── init.h
├── output
│   ├── attention_weights.txt
│   ├── conv_1d.txt
│   ├── position_vec.txt
│   ├── pr.txt
│   ├── relation_matrix.txt
│   └── word2vec.txt
├── run.sh
├── test.cpp
├── test.h
└── train.cpp

2 directories, 14 files
$ bash clean.sh 


./build 目录递归删除成功.


$ tree
├── clean.sh
├── init.h
├── output
│   ├── attention_weights.txt
│   ├── conv_1d.txt
│   ├── position_vec.txt
│   ├── pr.txt
│   ├── relation_matrix.txt
│   └── word2vec.txt
├── run.sh
├── test.cpp
├── test.h
└── train.cpp

1 directory, 12 files



// ./train [-batch BATCH] [-threads THREAD] [-alpha ALPHA]
//         [-init_rate INIT_RATE] [-reduce_epoch REDUCE_EPOCH]
//         [-epochs EPOCHS] [-limit LIMIT] [-dimension_pos DIMENSION_POS]
//         [-window WINDOW] [-dimension_c DIMENSION_C]
//         [-dropout DROPOUT] [-output_model 0/1]
//         [-note NOTE] [-data_path DATA_PATH]
//         [-output_path OUTPUT_PATH] [--help]

// optional arguments:
// -batch BATCH                   batch size. if unspecified, batch will default to [40]
// -threads THREAD                number of worker threads. if unspecified, num_threads will default to [32]
// -alpha ALPHA                   learning rate. if unspecified, alpha will default to [0.00125]
// -init_rate INIT_RATE           init rate of learning rate. if unspecified, current_rate will default to [1.0]
// -reduce_epoch REDUCE_EPOCH     reduce of init rate of learning rate per epoch. if unspecified, reduce_epoch will default to [0.98]
// -epochs EPOCHS                 number of epochs. if unspecified, epochs will default to [25]
// -limit LIMIT                   限制句子中 (头, 尾) 实体相对每个单词的最大距离. 默认值为 [30]
// -dimension_pos DIMENSION_POS   位置嵌入维度,默认值为 [5]
// -window WINDOW                 一维卷积的 window 大小. 默认值为 [3]
// -dimension_c DIMENSION_C       sentence embedding size, if unspecified, dimension_c will default to [230]
// -dropout DROPOUT               dropout probability. if unspecified, dropout_probability will default to [0.5]
// -output_model 0/1              [1] 保存模型, [0] 不保存模型. 默认值为 [1]
// -note NOTE                     information you want to add to the filename, like ("./output/word2vec" + note + ".txt"). if unspecified, note will default to ""
// -data_path DATA_PATH           folder of data. if unspecified, data_path will default to "../data/"
// -output_path OUTPUT_PATH       folder of outputing results (precion/recall curves) and models. if unspecified, output_path will default to "./output/"
// --help                         print help information of ./train


// ./test [-threads THREAD] [-dropout DROPOUT]
//        [-note NOTE] [-data_path DATA_PATH]
//        [-load_path LOAD_PATH] [--help]

// optional arguments:
// -threads THREAD                number of worker threads. if unspecified, num_threads will default to [32]
// -note NOTE                     information you want to add to the filename, like ("./output/word2vec" + note + ".txt"). if unspecified, note will default to ""
// -data_path DATA_PATH           folder of data. if unspecified, data_path will default to "../data/"
// -load_path LOAD_PATH           folder of pretrained models. if unspecified, load_path will default to "./output/"
// --help                         print help information of ./test




文章作者: LuYF-Lemon-love
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 LuYF-Lemon-love !