zincbase package

The main Zincbase package.

See README.md for some simple docs.

zincbase.zincbase module

class zincbase.zincbase.KB

Bases: object

Knowledge Base Class

>>> kb = KB()
>>> kb.__class__
<class 'zincbase.KB'>
add_node_to_trained_kg(sub, pred, ob)
attr(node_name, attributes)

Set attributes on an existing graph node.

Parameters:
  • node_name (str) – Name of the node
  • attributes (dict) – Dictionary of attributes to set
Example:
>>> kb = KB()
>>> kb.store('eats(tom, rice)')
0
>>> kb.attr('tom', {'is_person': True})
>>> kb.node('tom')
{'is_person': True}
bfs(start_node, target_node, max_depth=10, reverse=False)

Find a path from start_node to target_node

binary_classify(subject, pred, ob)

Predict whether triple (sub, pred, ob) is true or not.

build_kg_model(cuda=False, embedding_size=256, gamma=24, model_name='RotatE', node_attributes=[], attr_loss_to_graph_loss=1.0, pred_loss_to_graph_loss=1.0, pred_attributes=[])

Build the dictionaries and KGE model

Parameters:
  • node_attributes (list) – List of node attributes to include in the model. If node doesn’t possess the attribute, will be treated as zero. So far attributes must be floats.
  • pred_attributes (list) – List of predicate attributes to include in the model.
  • attr_loss_to_graph_loss (float) – % to scale attribute loss against graph loss. 0 would only take into account graph loss, math.inf would only take into account attr loss.
create_binary_classifier(pred, ob)

Creates a binary classifier (SVM) for pred(?, ob) using embeddings from the trained model. Automatically compensates for class imbalance.

Follow it with binary_classify(sub, pred, ob) to predict whether the relation holds or not.

May be useful because although the model can estimate a probability for (sub, pred, ob), what threshold should you use to decide what constitutes True vs False?

Example:
>>> kb = KB()
>>> kb.seed(555)
>>> kb.from_csv('./assets/countries_s1_train.csv', delimiter='\t')
>>> kb.build_kg_model(cuda=False, embedding_size=100)
>>> kb.train_kg_model(steps=2000, batch_size=1, verbose=False, neg_to_pos=4)
>>> _ = kb.create_binary_classifier('locatedin', 'asia')
>>> kb.binary_classify('india', 'locatedin', 'asia')
True
>>> kb.binary_classify('brazil', 'locatedin', 'asia')
False
create_multi_classifier(pred)

Build a classifier (SVM) for a predicate that can classify a subject, given a predicate, into one of the object entities from the KB that has that predicate relation. Automatically compensates for class imbalance.

Example:
>>> kb = KB()
>>> kb.from_csv('./assets/countries_s1_train.csv', delimiter='\t')
>>> kb.seed(555)
>>> kb.build_kg_model(cuda=False, embedding_size=40)
>>> kb.train_kg_model(steps=1000, batch_size=1, verbose=False)
>>> _ = kb.create_multi_classifier('locatedin')
>>> kb.multi_classify('philippines', 'locatedin')
'south_eastern_asia'
delete_edge_attr(sub, pred, ob, attributes)

Delete attributes previously set on a predicate between subject and object. To set the attribute in the first place, see also edge_attr.

Parameters:
  • sub (str) – Subject node/entity
  • pred (str) – Predicate between subject and object
  • ob (str) – Object node/entity
  • attributes (list) – List of attributes to delete.
Returns:

False if attribute was not present, else None.

delete_rule(rule_idx)

Delete a rule from the KB.

Parameters:rule_idx – The index of the rule in the KB. Returned when the rule was added. May be int (if it was a real rule) or str (if it was a negative example - preceded by ~).
Example:
>>> kb = KB()
>>> kb.store('a(a)')
0
>>> kb.delete_rule(0)
True
edge(sub, pred, ob)

Returns an edge and its attributes.

Parameters:
  • sub (str) – Subject node/entity
  • pred (str) – Predicate between subject and object
  • ob (str) – Object node/entity
Example:
>>> kb = KB()
>>> kb.store('eats(tom, rice)')
0
>>> kb.edge_attr('tom', 'eats', 'rice', {'used_to': 1.0})
>>> kb.edge('tom', 'eats', 'rice')
{'used_to': 1.0}
edge_attr(sub, pred, ob, attributes)

Set attributes on a predicate between subject and object. Useful for example to encode time, or truthiness.

Note that if any of the specified attributes have been previously set, this updates them with new values. To delete a set edge attribute, see also delete_edge_attr.

Parameters:
  • sub (str) – Subject node/entity
  • pred (str) – Predicate between subject and object
  • ob (str) – Object node/entity
  • attributes (dict) – Attributes to set on the individual edge. Must be floats.
Example:
>>> kb = KB()
>>> kb.store('eats(tom, rice)')
0
>>> kb.edge_attr('tom', 'eats', 'rice', {'used_to': 1.0})
>>> kb.edge('tom', 'eats', 'rice')
{'used_to': 1.0}
>>> kb.edge_attr('tom', 'eats', 'rice', {'still_does': 1.0})
>>> kb.edge('tom', 'eats', 'rice')
{'used_to': 1.0, 'still_does': 1.0}
entities

All the entities in the KB.

Returns generator:
 Generator of all the entities
estimate_triple_prob(sub, pred, ob)

Estimate the probability of the triple (sub, pred, ob) according to the trained model.

estimate_triple_prob_with_attrs(sub, pred, ob, pred_prop)
filter(filter_condition, candidate_nodes=None)

Filter (ie query) nodes by attributes.

Parameters:
  • filter_condition (function) – Test function
  • candidate_nodes (List) – Nodes to test (optional; defaults to whole graph)
Example:
>>> kb = KB()
>>> kb.store('person(tom)')
0
>>> kb.attr('tom', {'cats': 0})
>>> list(kb.filter(lambda x: x['cats'] < 1))
['tom']
fit_knn(entities=None)

Fit an unsupervised sklearn kNN to the embeddings of entities.

Parameters:entities (list) – The entities that should be part of the kNN. Defaults to all if not specified
from_csv(csvfile, header=None, start=0, size=None, delimiter=', ')
from_triples(triples)

Stores facts from a list of tuples into the KB.

Parameters:triples (list) – List of tuples each of the form (subject, pred, object)
Example:
>>> kb = KB()
>>> kb.from_triples([('b', 'a', 'c')])
>>> len(list(kb.query('a(b, c)')))
1
get_embedding(entity)
get_most_likely(sub, pred, ob, candidates=None, k=1)

Return the k most likely triples to satisfy the input triple. One of sub, pred, or ob may be ‘?’.

Parameters:
  • candidates (list<str>) – Candidate entities/predicates. If None or not specified, this function will generate possible candidates from the rest of the triple.
  • k (int) – The k in top k.
Example:
>>> kb = KB()
>>> kb.from_csv('./assets/countries_s1_train.csv', delimiter='\t')
>>> kb.seed(555)
>>> kb.build_kg_model(cuda=False, embedding_size=100)
>>> kb.train_kg_model(steps=2000, batch_size=2, verbose=False, neg_to_pos=4)
>>> kb.get_most_likely('austria', 'neighbor', '?', k=2) # doctest:+ELLIPSIS
[{'prob': 0.9673, 'triple': ('austria', 'neighbor', 'germany')}, {'prob': 0.9656, 'triple': ('austria', 'neighbor', 'liechtenstein')}]
>>> kb.get_most_likely('?', 'neighbor', 'austria', candidates=list(kb.entities), k=2)
[{'prob': 0.9467, 'triple': ('slovenia', 'neighbor', 'austria')}, {'prob': 0.94, 'triple': ('liechtenstein', 'neighbor', 'austria')}]
>>> kb.get_most_likely('austria', '?', 'germany', k=3)
[{'prob': 0.9673, 'triple': ('austria', 'neighbor', 'germany')}, {'prob': 0.664, 'triple': ('austria', 'locatedin', 'germany')}]
get_nearest_neighbors(entity, k=1)

Get the nearest neighbors to entity (embedding), according to the previously fit knn.

Parameters:
  • entity (str) – An entity
  • k (int) – How many neighbors
load_all(dirname='.', cuda=False)

Load KB (and model, if it exists) from the specified directory.

Parameters:
  • dirname (str) – Directory to load zb.pkl and (if present) pytorch_model.dict
  • cuda (bool) – If the model exists, it will be loaded - specify if you want it to be on the GPU.
multi_classify(subject, pred)

Predict object for subject according to the multi-classifer previously trained on pred.

neighbors(node)

Return neighbors of node and predicates that connect them.

Parameters:node (str) – Name of the node
Returns:List[(node_name, List[predicate])]
Example:
>>> kb = KB()
>>> kb.store('knows(tom, shamala)')
0
>>> kb.neighbors('tom')
[('shamala', [{'pred': 'knows'}])]
node(node_name)

Get a node, and its attributes, from the graph.

Parameters:node_name (str) – Name of the node
Returns:The node and its attributes.
Example:
>>> kb = KB()
>>> kb.store('eats(tom, rice)')
0
>>> kb.node('tom')
{}
>>> kb.attr('tom', {'is_person': True})
>>> kb.node('tom')
{'is_person': True}
plot(density=1.0)

Plots a network diagram from (triple) nodes and edges in the KB.

Parameters:density (float) – Probability (0-1) that a given edge will be plotted, useful to thin out dense graphs for visualization.
predicates

All the predicates (aka relations) in the KB.

Returns generator:
 Generator of all the predicates
query(statement)

Query the KB.

Parameters:statement (str) – A rule to query on.
Returns:Generator of alternative bindings to variables that match the query
Example:
>>> kb = KB()
>>> kb.store('a(a)')
0
>>> kb.query('a(X)') #doctest: +ELLIPSIS
<generator object KB._search at 0x...>
>>> list(kb.query('a(X)'))
[{'X': 'a'}]
save_all(dirname='.')

Save current KB to the directory specified. Saves the (state dict of the) PyTorch model as well, if it has been built.

Parameters:dirname (str) – Directory in which to save the files. Creates the directory if it doesn’t already exist.
seed(seed)

Seed the RNGs for PyTorch, NumPy, and Python itself.

Parameters:seed (int) – random seed
Example:
>>> KB().seed(555)
solidify(predicate)

Query the KB (with Prolog) and ‘solidify’ facts in the KB, making them part of the graph, so that the NN can be trained.

Parameters:predicate (str) – A predicate (that’s a rule not a fact otherwise what’s the point)
Example:
>>> kb = KB()
>>> kb.store('is(tom, human)')
0
>>> kb.store('has_part(shamala, head)')
1
>>> kb.store('is(X, human) :- has_part(X, head)')
2
>>> next(kb.query('is(tom, human)'))
True
>>> kb.to_triples()
[('tom', 'is', 'human'), ('shamala', 'has_part', 'head')]
>>> kb.solidify('is')
1
>>> kb.to_triples()
[('tom', 'is', 'human'), ('shamala', 'has_part', 'head'), ('shamala', 'is', 'human')]
store(statement, node_attributes=[], edge_attributes={})

Store a fact/rule in the KB

It is possible to store ‘false’ facts (negative examples) by preceding the predicate with a tilde (~). In this case, they do not come out in the graph and cannot be queried, but may assist when building the model.

Parameters:
  • statement (str) – Fact or rule to store in the KB.
  • node_attributes (list<dict>) – List of length 2 with each element being a dict of items to set on the nodes (in order subject, object).
  • edge_attributes (dict) – Dictionary of attributes to set on the edge. May include truthiness which, if < 0, automatically makes the rule a negative example.
Returns:

the id of the fact/rule

Example:
>>> KB().store('a(a)')
0
to_tensorboard_projector(embeddings_filename, labels_filename, filter_fn=None)

Convert the KB’s trained embeddings to 2 files suitable for https://projector.tensorflow.org. This outputs only entity embeddings, not relation embeddings, a visualization of which may not be interpretable.

Parameters:
  • embeddings_filename (str) – Filename to output embeddings to, tsv format.
  • labels_filename (str) – Filename to output labels to, one label per row.
  • filter_fn (function) – Only include the embeddings/labels for which filter_fn(label) returns True
to_triples(data=False)

Convert all facts in the KB to a list of triples, each of length 3 (or 4 if data=True). Any fact that is not arity 2 will be ignored.

Note:While the Prolog style representation uses pred(subject, object), the triple representation is (subject, pred, object).
Parameters:data (bool) – Whether to return subject, predicate and object attributes as elements 4, 5, and 6 of the triple. The 7th element of the triple is usually False, but is True when the fact/triple is a negative example.
Returns:list of triples (tuples of length 3 or 7 if data=True)
Example:
>>> kb = KB()
>>> kb.store('a(b, c)')
0
>>> kb.to_triples()
[('b', 'a', 'c')]
>>> kb.store('a(a)')
1
>>> kb.to_triples()
[('b', 'a', 'c')]
>>> kb.attr('b', {'an_attribute': 'xyz'})
>>> kb.to_triples()
[('b', 'a', 'c')]
>>> kb.to_triples(data=True)
[('b', 'a', 'c', {'an_attribute': 'xyz'}, {}, {}, False)]
train_kg_model(steps=1000, batch_size=512, lr=0.001, reencode_triples=False, neg_to_pos=128, neg_ratio=1.0, verbose=True)

Train a KG model on the KB.

Parameters:
  • steps (int) – Number of training steps
  • batch_size (int) – Batch size for training
  • lr (float) – Initial learning rate for Adam optimizer
  • reencode_triples (bool) – If a node has been added since last training, set this to True
  • neg_to_pos (int) – Ratio of generated negative samples to real positive samples
  • neg_ratio (float) – How often real/inputted negative examples should appear, vs real pos + generated neg. Smaller (>0) means more often.

Negative Examples

Negative examples can be added to a Zincbase in two ways. Either:

  • Prefix a rule with ~, such as ~likes(tom, sprouts)
  • Give it a truthiness attribute that’s less than zero.

Concretely, this looks like:

kb.store('~likes(tom, sprouts)')
kb.store('likes(tom, sprouts)', edge_attributes={'truthiness': -1})

Negative examples are fed in to the KG model as part of the usual training regime; you may control the frequency that this happens with the neg_ratio kwarg of KB.train_kg_model.

Note that you can specify truthiness as something you want the model to learn to predict (i.e. specify pred_attributes=[‘truthiness’] when you call build_kg_model). But, negative truthiness takes the example out of the normal flow of this: only examples with 0 <= truthiness <= 1 are part of ‘proper’ training where the predicate prediction is taken into account.

Anecdotally, negative examples do not help much, or only help with small datasets.