前言

OpenKE清华大学自然语言处理与社会人文计算实验室(THUNLP) 开发的知识图谱表示学习工具包.

OpenKE 代码仓库链接: https://github.com/thunlp/OpenKE .

操作系统:Ubuntu 20.04.5 LTS

参考文档

  1. OpenKE:知识图谱表示学习工具包

  2. OpenKE

项目简介

OpenKE 是 THUNLP 基于 TensorFlow、PyTorch 开发的用于将知识图谱嵌入到低维连续向量空间进行表示的开源框架。在 OpenKE 中,提供了快速且稳定的各类接口,也实现了诸多经典的知识表示学习模型。该框架易于扩展,基于框架设计新的知识表示模型也十分的方便。具体来说,OpenKE 具有如下特点:

  1. 接口设计简单,可以轻松在各种不同的训练环境下部署模型。

  2. 底层的数据处理进行了优化,模型训练速度较快。

  3. 提供了轻量级的 C++ 模型实现,在 CPU 多线程环境下也能快速运行。

  4. 提供了大规模知识图谱的预训练向量,可以直接在下游任务中使用。

  5. 长期的工程维护来解决问题和满足新的需求。

OpenKE 工具包地址:https://github.com/thunlp/OpenKE

此前 THUNLP 还开源了知识图谱表示学习必读论文列表:KRLPapers,覆盖了较为经典的知识图谱表示学习领域的已发表论文、综述等,欢迎搭配使用。

总体介绍

知识图谱是由实体(节点)和关系(不同类型的边)组成的多关系图。每条边都表示为形式(头实体、关系、尾实体)的三个部分,也称为事实,表示两个实体通过特定的关系连接在一起,例如(北京, 首都, 中国)。虽然在表示结构化数据方面很有效,但是这类三元组的底层符号特性通常使知识图谱很难操作。为了解决这个问题,知识图谱表示学习被引入,将实体和关系转化为连续的向量空间,从而简化操作,同时保留知识图谱的原有的结构。那些实体和关系嵌入能进一步应用于各种任务中,如知识图谱补全关系提取实体分类实体解析。下表给出一些典型知识图谱的评分函数与模型形式:

设计思路与样例

整体设计分为三层,包括底层数据处理中层模型构建上层训练与评测策略,每一块均有足够封装,确保调用的方便。如下图所示,通过简单的代码可以对不同层的模块进行调用,最终支持知识图谱表示学习模型的训练与部署。

上图中的 code 是 OpenKE 旧版的使用实例.

使用 OpenKE 复现出的模型与公开论文中公布的历史最高结果相比是基本一致的,相关的参数和训练代码也作为使用案例在 OpenKE 工具包中。

结语

OpenKE 工具包将会长期维护并保持更新,欢迎大家使用 OpenKE 作为知识图谱表示学习领域学术研究和应用开发的工具。大家在使用过程中有任何问题或是意见和建议都欢迎提出。也欢迎大家加入我们,共同开发、完善 OpenKE 工具包。

相关论文

[1] A Three-Way Model for Collective Learning on Multi-Relational Data. Nickel et al. Proceedings of ICML 2011.

[2] Translating Embeddings for Modeling Multi-relational Data. Bordes et al. Proceedings of NIPS 2013.

[3] Knowledge Graph Embedding by Translating on Hyperplanes. Wang et al. Proceedings of AAAI, 2014.

[4] Learning Entity and Relation Embeddings for Knowledge Graph Completion. Lin et al. Proceedings of AAAI, 2015.

[5] Knowledge Graph Embedding via Dynamic Mapping Matrix. Ji et al. Proceedings of ACL 2015.

[6] Embedding Entities and Relations for Learning and Inference in Knowledge Bases. Yang et al. Proceedings of ICLR 2015.

[7] Holographic Embeddings of Knowledge Graphs. Nickel et al. Proceedings of AAAI 2016.

[8] Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. Ji et al. Proceedings of AAAI 2016.

[9] Complex Embeddings for Simple Link Prediction. Trouillon et al. Proceedings of ICML 2016.

[10] Rotate: Knowledge Graph Embedding by Relational Rotation in Complex Space. Sun et al. Proceedings of ICLR 2018.

[11] Simple Embedding for Link Prediction in Knowledge Graphs. Kazemi et al. Proceedings of NIPS 2018.

OpenKE-PyTorch

This repository is a subproject of THU-OpenSK, and all subprojects of THU-OpenSK are as follows.

An Open-source Framework for Knowledge Embedding implemented with PyTorch.


More information is available on our website http://openke.thunlp.org/

Note: This document is an old version.

If you use the code, please cite the following paper:

1
2
3
4
5
6
@inproceedings{han2018openke,
title={OpenKE: An Open Toolkit for Knowledge Embedding},
author={Han, Xu and Cao, Shulin and Lv Xin and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong and Li, Juanzi},
booktitle={Proceedings of EMNLP},
year={2018}
}

Overview

This is an Efficient implementation based on PyTorch for knowledge representation learning (KRL). We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by PyTorch with Python interfaces so that there is a convenient platform to run models on GPUs. OpenKE composes 4 repositories:

OpenKE-PyTorch: the project based on PyTorch, which provides the optimized and stable framework for knowledge graph embedding models.

OpenKE-Tensorflow1.0: OpenKE implemented with TensorFlow, also providing the optimized and stable framework for knowledge graph embedding models.

TensorFlow-TransX: light and simple version of OpenKE based on TensorFlow, including TransE, TransH, TransR and TransD.

Fast-TransX: efficient lightweight C++ inferences for TransE and its extended models utilizing the framework of OpenKE, including TransH, TransR, TransD, TranSparse and PTransE.

We are now developing a new version of OpenKE-PyTorch. The project has been completely reconstructed and is faster, more extendable and the codes are easier to read and use now. If you need get to the old version, please refer to branch OpenKE-PyTorch(old).

New Features

  • RotatE

  • More enhancing strategies (e.g., adversarial training)

  • More scripts of the typical models for the benchmark datasets.

  • More extendable interfaces

Models

  • RESCAL, DistMult, ComplEx, Analogy, TransE, TransH, TransR, TransD, SimplE, RotatE, HolE

Experimental Settings

For each test triplet, the head is removed and replaced by each of the entities from the entity set in turn. The scores of those corrupted triplets are first computed by the models and then sorted by the order. Then, we get the rank of the correct entity. This whole procedure is also repeated by removing those tail entities. We report the proportion of those correct entities ranked in the top 10/3/1 (Hits@10, Hits@3, Hits@1). The mean rank (MRR) and mean reciprocal rank (MRR) of the test triplets under this setting are also reported.

Because some corrupted triplets may be in the training set and validation set. In this case, those corrupted triplets may be ranked above the test triplet, but this should not be counted as an error because both triplets are true. Hence, we remove those corrupted triplets appearing in the training, validation or test set, which ensures the corrupted triplets are not in the dataset. We report the proportion of those correct entities ranked in the top 10/3/1 (Hits@10 (filter), Hits@3(filter), Hits@1(filter)) under this setting. The mean rank (MRR (filter)) and mean reciprocal rank (MRR (filter)) of the test triplets under this setting are also reported.

More details of the above-mentioned settings can be found from the papers TransE, ComplEx.

For those large-scale entity sets, to corrupt all entities with the whole entity set is time-costing. Hence, we also provide the experimental setting named “type constraint” to corrupt entities with some limited entity sets determining by their relations.

Experiments

We have provided the hyper-parameters of some models to achieve the state-of-the-art performace (Hits@10 (filter)) on FB15K237 and WN18RR. These scripts can be founded in the folder "./examples/". Up to now, these models include TransE, TransH, TransR, TransD, DistMult, ComplEx. The results of these models are as follows:

Model WN18RR FB15K237 WN18RR (Paper*) FB15K237 (Paper*)
TransE 0.512 0.476 0.501 0.486
TransH 0.507 0.490 - -
TransR 0.519 0.511 - -
TransD 0.508 0.487 - -
DistMult 0.479 0.419 0.49 0.419
ComplEx 0.485 0.426 0.51 0.428
ConvE 0.506 0.485 0.52 0.501
RotatE 0.549 0.479 - 0.480
RotatE (+adv) 0.565 0.522 0.571 0.533

We are still trying more hyper-parameters and more training strategies (e.g., adversarial training and label smoothing regularization) for these models. Hence, this table is still in change. We welcome everyone to help us update this table and hyper-parameters.

Installation

  1. Install PyTorch

  2. Clone the OpenKE-PyTorch branch:

1
2
3
git clone -b OpenKE-PyTorch https://github.com/thunlp/OpenKE --depth 1
cd OpenKE
cd openke
  1. Compile C++ files
1
bash make.sh
  1. Quick Start
1
2
3
cd ../
cp examples/train_transe_FB15K237.py ./
python train_transe_FB15K237.py

Data

  • For training, datasets contain three files:

    • train2id.txt: training file, the first line is the number of triples for training. Then the following lines are all in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2 . Note that train2id.txt contains ids from entitiy2id.txt and relation2id.txt instead of the names of the entities and relations. If you use your own datasets, please check the format of your training file. Files in the wrong format may cause segmentation fault.

    • entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

    • relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.

  • For testing, datasets contain additional two files (totally five files):

    • test2id.txt: testing file, the first line is the number of triples for testing. Then the following lines are all in the format (e1, e2, rel).

    • valid2id.txt: validating file, the first line is the number of triples for validating. Then the following lines are all in the format (e1, e2, rel) .

    • type_constrain.txt: type constraining file, the first line is the number of relations. Then the following lines are type constraints for each relation. For example, the relation with id 1200 has 4 types of head entities, which are 3123, 1034, 58 and 5733. The relation with id 1200 has 4 types of tail entities, which are 12123, 4388, 11087 and 11088. You can get this file through n-n.py in folder benchmarks/FB15K.

To do

The document of the new version of OpenKE-PyTorch will come soon.

examples

train_analogy_WN18RR.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_analogy_WN18RR.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import openke
from openke.config import Trainer, Tester
from openke.module.model import Analogy
from openke.module.loss import SoftplusLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
analogy = Analogy(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200
)

# define the loss function
model = NegativeSampling(
model = analogy,
loss = SoftplusLoss(),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 1.0
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 2000, alpha = 0.5, use_gpu = True, opt_method = "adagrad")
trainer.run()
analogy.save_checkpoint('./checkpoint/analogy.ckpt')

# test the model
analogy.load_checkpoint('./checkpoint/analogy.ckpt')
tester = Tester(model = analogy, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_complex_WN18RR.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_complex_WN18RR.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import openke
from openke.config import Trainer, Tester
from openke.module.model import ComplEx
from openke.module.loss import SoftplusLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
complEx = ComplEx(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200
)

# define the loss function
model = NegativeSampling(
model = complEx,
loss = SoftplusLoss(),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 1.0
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 2000, alpha = 0.5, use_gpu = True, opt_method = "adagrad")
trainer.run()
complEx.save_checkpoint('./checkpoint/complEx.ckpt')

# test the model
complEx.load_checkpoint('./checkpoint/complEx.ckpt')
tester = Tester(model = complEx, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_distmult_WN18RR.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_distmult_WN18RR.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import openke
from openke.config import Trainer, Tester
from openke.module.model import DistMult
from openke.module.loss import SoftplusLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
distmult = DistMult(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200
)

# define the loss function
model = NegativeSampling(
model = distmult,
loss = SoftplusLoss(),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 1.0
)


# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 2000, alpha = 0.5, use_gpu = True, opt_method = "adagrad")
trainer.run()
distmult.save_checkpoint('./checkpoint/distmult.ckpt')

# test the model
distmult.load_checkpoint('./checkpoint/distmult.ckpt')
tester = Tester(model = distmult, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_distmult_WN18RR_adv.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_distmult_WN18RR_adv.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import openke
from openke.config import Trainer, Tester
from openke.module.model import DistMult
from openke.module.loss import SigmoidLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
batch_size = 2000,
threads = 8,
sampling_mode = "cross",
bern_flag = 0,
filter_flag = 1,
neg_ent = 64,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
distmult = DistMult(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 1024,
margin = 200.0,
epsilon = 2.0
)

# define the loss function
model = NegativeSampling(
model = distmult,
loss = SigmoidLoss(adv_temperature = 0.5),
batch_size = train_dataloader.get_batch_size(),
l3_regul_rate = 0.000005
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 400, alpha = 0.002, use_gpu = True, opt_method = "adam")
trainer.run()
distmult.save_checkpoint('./checkpoint/distmult.ckpt')

# test the model
distmult.load_checkpoint('./checkpoint/distmult.ckpt')
tester = Tester(model = distmult, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_hole_WN18RR.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_hole_WN18RR.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import openke
from openke.config import Trainer, Tester
from openke.module.model import HolE
from openke.module.loss import SoftplusLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
hole = HolE(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 100
)

# define the loss function
model = NegativeSampling(
model = hole,
loss = SoftplusLoss(),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 1.0
)


# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 1000, alpha = 0.5, use_gpu = True, opt_method = "adagrad")
trainer.run()
hole.save_checkpoint('./checkpoint/hole.ckpt')

# test the model
hole.load_checkpoint('./checkpoint/hole.ckpt')
tester = Tester(model = hole, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_rescal_FB15K237.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_rescal_FB15K237.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import openke
from openke.config import Trainer, Tester
from openke.module.model import RESCAL
from openke.module.loss import MarginLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/FB15K237/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/FB15K237/", "link")

# define the model
rescal = RESCAL(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 50
)

# define the loss function
model = NegativeSampling(
model = rescal,
loss = MarginLoss(margin = 1.0),
batch_size = train_dataloader.get_batch_size(),
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 1000, alpha = 0.1, use_gpu = True, opt_method = "adagrad")
trainer.run()
rescal.save_checkpoint('./checkpoint/rescal.ckpt')

# test the model
rescal.load_checkpoint('./checkpoint/rescal.ckpt')
tester = Tester(model = rescal, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_rotate_WN18RR_adv.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_rotate_WN18RR_adv.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import openke
from openke.config import Trainer, Tester
from openke.module.model import RotatE
from openke.module.loss import SigmoidLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
batch_size = 2000,
threads = 8,
sampling_mode = "cross",
bern_flag = 0,
filter_flag = 1,
neg_ent = 64,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
rotate = RotatE(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 1024,
margin = 6.0,
epsilon = 2.0,
)

# define the loss function
model = NegativeSampling(
model = rotate,
loss = SigmoidLoss(adv_temperature = 2),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 0.0
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 6000, alpha = 2e-5, use_gpu = True, opt_method = "adam")
trainer.run()
rotate.save_checkpoint('./checkpoint/rotate.ckpt')

# test the model
rotate.load_checkpoint('./checkpoint/rotate.ckpt')
tester = Tester(model = rotate, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_simple_WN18RR.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_simple_WN18RR.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import openke
from openke.config import Trainer, Tester
from openke.module.model import SimplE
from openke.module.loss import SoftplusLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
simple = SimplE(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200
)

# define the loss function
model = NegativeSampling(
model = simple,
loss = SoftplusLoss(),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 1.0
)


# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 2000, alpha = 0.5, use_gpu = True, opt_method = "adagrad")
trainer.run()
simple.save_checkpoint('./checkpoint/simple.ckpt')

# test the model
simple.load_checkpoint('./checkpoint/simple.ckpt')
tester = Tester(model = simple, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_transd_FB15K237.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_transd_FB15K237.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import openke
from openke.config import Trainer, Tester
from openke.module.model import TransD
from openke.module.loss import MarginLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/FB15K237/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/FB15K237/", "link")

# define the model
transd = TransD(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim_e = 200,
dim_r = 200,
p_norm = 1,
norm_flag = True)


# define the loss function
model = NegativeSampling(
model = transd,
loss = MarginLoss(margin = 4.0),
batch_size = train_dataloader.get_batch_size()
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 1000, alpha = 1.0, use_gpu = True)
trainer.run()
transd.save_checkpoint('./checkpoint/transd.ckpt')

# test the model
transd.load_checkpoint('./checkpoint/transd.ckpt')
tester = Tester(model = transd, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_transe_FB15K237.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_transe_FB15K237.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import openke
from openke.config import Trainer, Tester
from openke.module.model import TransE
from openke.module.loss import MarginLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/FB15K237/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/FB15K237/", "link")

# define the model
transe = TransE(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200,
p_norm = 1,
norm_flag = True)


# define the loss function
model = NegativeSampling(
model = transe,
loss = MarginLoss(margin = 5.0),
batch_size = train_dataloader.get_batch_size()
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 1000, alpha = 1.0, use_gpu = True)
trainer.run()
transe.save_checkpoint('./checkpoint/transe.ckpt')

# test the model
transe.load_checkpoint('./checkpoint/transe.ckpt')
tester = Tester(model = transe, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_transe_WN18_adv_sigmoidloss.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_transe_WN18_adv_sigmoidloss.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import openke
from openke.config import Trainer, Tester
from openke.module.model import TransE
from openke.module.loss import SigmoidLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/WN18RR/",
batch_size = 2000,
threads = 8,
sampling_mode = "cross",
bern_flag = 0,
filter_flag = 1,
neg_ent = 64,
neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
transe = TransE(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 1024,
p_norm = 1,
norm_flag = False,
margin = 6.0)


# define the loss function
model = NegativeSampling(
model = transe,
loss = SigmoidLoss(adv_temperature = 1),
batch_size = train_dataloader.get_batch_size(),
regul_rate = 0.0
)

# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 3000, alpha = 2e-5, use_gpu = True, opt_method = "adam")
trainer.run()
transe.save_checkpoint('./checkpoint/transe_2.ckpt')

# test the model
transe.load_checkpoint('./checkpoint/transe_2.ckpt')
tester = Tester(model = transe, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_transh_FB15K237.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_transh_FB15K237.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import openke
from openke.config import Trainer, Tester
from openke.module.model import TransH
from openke.module.loss import MarginLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/FB15K237/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/FB15K237/", "link")

# define the model
transh = TransH(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200,
p_norm = 1,
norm_flag = True)

# define the loss function
model = NegativeSampling(
model = transh,
loss = MarginLoss(margin = 4.0),
batch_size = train_dataloader.get_batch_size()
)


# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 1000, alpha = 0.5, use_gpu = True)
trainer.run()
transh.save_checkpoint('./checkpoint/transh.ckpt')

# test the model
transh.load_checkpoint('./checkpoint/transh.ckpt')
tester = Tester(model = transh, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

train_transr_FB15K237.py

link: https://github.com/thunlp/OpenKE/blob/OpenKE-PyTorch/examples/train_transr_FB15K237.py .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import openke
from openke.config import Trainer, Tester
from openke.module.model import TransE, TransR
from openke.module.loss import MarginLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
in_path = "./benchmarks/FB15K237/",
nbatches = 100,
threads = 8,
sampling_mode = "normal",
bern_flag = 1,
filter_flag = 1,
neg_ent = 25,
neg_rel = 0)

# dataloader for test
test_dataloader = TestDataLoader(
in_path = "./benchmarks/FB15K237/",
sampling_mode = 'link')

# define the model
transe = TransE(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim = 200,
p_norm = 1,
norm_flag = True)

model_e = NegativeSampling(
model = transe,
loss = MarginLoss(margin = 5.0),
batch_size = train_dataloader.get_batch_size())

transr = TransR(
ent_tot = train_dataloader.get_ent_tot(),
rel_tot = train_dataloader.get_rel_tot(),
dim_e = 200,
dim_r = 200,
p_norm = 1,
norm_flag = True,
rand_init = False)

model_r = NegativeSampling(
model = transr,
loss = MarginLoss(margin = 4.0),
batch_size = train_dataloader.get_batch_size()
)

# pretrain transe
trainer = Trainer(model = model_e, data_loader = train_dataloader, train_times = 1, alpha = 0.5, use_gpu = True)
trainer.run()
parameters = transe.get_parameters()
transe.save_parameters("./result/transr_transe.json")

# train transr
transr.set_parameters(parameters)
trainer = Trainer(model = model_r, data_loader = train_dataloader, train_times = 1000, alpha = 1.0, use_gpu = True)
trainer.run()
transr.save_checkpoint('./checkpoint/transr.ckpt')

# test the model
transr.load_checkpoint('./checkpoint/transr.ckpt')
tester = Tester(model = transr, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

结语

第五十篇博文写完,开心!!!!

今天,也是充满希望的一天。