A3SL: Learning Interpretable Relational Structures of Hinge-loss Markov Random Fields
Introduction
Further, guiding this exploration using domain-specific semantic constraints is helpful in learning meaningful structures that can potentially also achieve good prediction performance.
Contributions
In this work, we present asynchronous advantage actor-critic for structure learning (A3SL), a deep RL algorithm for learning interpretable structures of HL-MRFs that takes into account semantic constraints on domain-specific insights to guide the discovery of structures while simultaneously maintaining good prediction performance and ability to learn from real-world data instances.
Highlights of Fair-A3SL
- We encourage the learning of diverse models through exploration by including a diversity constraint on the actions. This can potentially help the domain expert choose the model with the best semantic meaning from a group of models that have similar prediction performance.
- We evaluate our structure learning algorithm on two important real-world computational social science applications: i) modeling recovery from alcohol use disorder, and ii) detecting bullying in online interactions. We choose these applica- tions as there is existing work on applying HL-MRFs in these domains and we are able to make a direct comparison of the learned rules from A3SL with the existing manually specified models. We show that the structures learned by our algorithm achieves better prediction performance when compared with structures learned using a greedy structure learning al- gorithm, structure learning algorithms developed for Markov logic networks (MLNs) (Hypergraph lifting) and Grafting-light [Zhu et al., 2010]), and manually defined model structures.
- We demonstrate that our model is able to learn complex clauses that encode network interactions such as the friend network and meaningful latent variables through a complex combination of features and target variables using semantic constraints. We also demonstrate that our learned clauses resemble the manually specified clauses capturing the same semantic meaning, while also learning others via exploration.
A3SL: Asynchronous Advantage Actor-Critic Structure Learning Algorithm for Learning HL-MRF Structures
We present our asynchronous advantage actor-critic structure learning algorithm, A3SL, which adapts a recently developed neural policy gradient algorithm asynchronous advantage actor-critic (A3C) for the structure learning problem. A3C is one of the most general and successful learning agents to date and has been shown to perform well in discrete and continuous action spaces. A3C’s ability to asynchronously execute multiple agents in parallel on multiple instances in the environment offers both algorithmic improvements that allow for effective integration of feedforward and recurrent deep neural network architectures in RL algorithms as well as practical benefits in speed, making it an appropriate choice for our structure learning problem.
Structure Learning Problem Definition
The structure learning problem can be modeled as a space search and sequential decision problem using reinforcement learning. Since HL-MRFs are specified using templated first-order logic clauses (also referred to as rules) in continuous space using the probabilistic programming language PSL, our problem translates to learning the structure of these clauses. We know that a first order logic clause has the form body → head, and in PSL head only can contain one predicate and body has the form b1 ∧ b2 ∧ … ∧ bN, N <L, where L denotes the maximum length of a clause. And, the order of predicates b in a clause within the body and clause in the list of clauses do not matter.
A clause of the structure b1 ∧ b2 ∧ … ∧ bN → head can be represented in different ways as shown below.
This can be as a sequence b0, b1, b2, … , bN, END, where END signifies the end of the sequence. If current length of sequence equals to L, we stop appending the current sequence. We can see that when we generate a sequence we do not need to specify which predicate is head in a clause beforehand, since inherently the model is undirected. After clause generation, we can randomly choose any predicate ¬bi as head. In practice, we can always choose the target predicate as head for ease of interpretability.
A3SL Reinforcement Learning Setup
Here, we define the problem space and reinforcement learning algorithm setup.
A3SL Objective Function
In the objective, we use a combination of the HL-MRF objective function and interpretability terms.
Interpretability terms consist of the constraints on the total number of clauses, the maximum possible length of a clause, and the domain-specific semantic constraints as given in the figure below.
The combination of semantic constraints and a performance-based utility allows our algorithm to learn structures that are interpretable and data-driven, thus optimizing for both while being able to rectify any domain-specific intuitions that are not true in the data.
Experimental Evaluation
We conduct experiments to answer the following questions:
- How well does our learned model structures perform in real-world prediction problems?
- How well do our rules imbibe the desirable characteristics present in models designed by human experts?
Modeling Recovery in AUD
To model recovery from AUD, we use the dataset in Zhang et al. [2018]. In this dataset, there are 302 users attending Alcoholics Anonymous (AA) labeled with recovery and relapse. For each of the AA-attending user, the dataset contains the most recent 3,200 tweets for each of these users, containing a total of 274,595 AA user tweets. For 302 AA users, there are 76,183 friends in dataset. For each friend, the dataset contains 3,200 tweets, containing a total of 14,921,997 tweets.
Here, we highlight some interesting rules learned by A3SL. The following diagram provides an example of a rule where a friend of a AA user uses an alcohol word in their tweet that could potentially influence the sobriety of the said AA user.
In another rule learned by the model as specified below, retweeting tweets with sober words is considered a sign of the user recovering.
Table 1 shows the 5-fold cross-validation results on pre- dicting recovery and relapse of AA-attending users. We observe that A3SL achieves a statistically significant prediction performance with a rejection threshold of p = 0.01 when compared with the manually specified model on the same dataset [Zhang et al., 2018], Greedy-SL, and a logistic regres- sion baseline.