Artificial IntelligenceMachine Learning

A3SL: Learning Interpretable Relational Structures of Hinge-loss Markov Random Fields

Introduction

Machine-learning models that possess superior interpretability and ease of specification without compromising modeling power have the potential to positively impact numerous downstream application domains that use them.
We focus on one such recently developed inherently interpretable model that efficiently combines logic and uncertainty, hinge-loss Markov random fields (HL-MRFs). In this work, we present an approach to learn meaningful structures of HL-MRFs by adapting a deep reinforcement learning (RL) algorithm, asynchronous advantage actor-critic (A3C).
The possibility of tackling huge state spaces asynchronously in A3C together with scalable inference in HL-MRFs paves the way for developing a scalable structure learning algorithm that allows for diverse exploration.

Further, guiding this exploration using domain-specific semantic constraints is helpful in learning meaningful structures that can potentially also achieve good prediction performance.

Contributions

In this work, we present asynchronous advantage actor-critic for structure learning (A3SL), a deep RL algorithm for learning interpretable structures of HL-MRFs that takes into account semantic constraints on domain-specific insights to guide the discovery of structures while simultaneously maintaining good prediction performance and ability to learn from real-world data instances.

A3SL is the first structure learning algorithm that directly optimizes for interpretability, learning networked dependency structures, and learning meaningful latent variables in the problem formulation itself, thus yielding structures with enhanced interpretability and semantic coherence.

A3SL: Asynchronous Advantage Actor-Critic Structure Learning Algorithm for Learning HL-MRF Structures

We present our asynchronous advantage actor-critic structure learning algorithm, A3SL, which adapts a recently developed neural policy gradient algorithm asynchronous advantage actor-critic (A3C) for the structure learning problem. A3C is one of the most general and successful learning agents to date and has been shown to perform well in discrete and continuous action spaces. A3C’s ability to asynchronously execute multiple agents in parallel on multiple instances in the environment offers both algorithmic improvements that allow for effective integration of feedforward and recurrent deep neural network architectures in RL algorithms as well as practical benefits in speed, making it an appropriate choice for our structure learning problem.

Structure Learning Problem Definition

The structure learning problem can be modeled as a space search and sequential decision problem using reinforcement learning. Since HL-MRFs are specified using templated first-order logic clauses (also referred to as rules) in continuous space using the probabilistic programming language PSL, our problem translates to learning the structure of these clauses. We know that a first order logic clause has the form body → head, and in PSL head only can contain one predicate and body has the form b1 ∧ b∧ … ∧ bN, N <L, where L denotes the maximum length of a clause. And, the order of predicates b in a clause within the body and clause in the list of clauses do not matter. 

A clause of the structure b1 ∧ b∧ … ∧ bN → head can be represented in different ways as shown below.

This can be as a sequence b0, b1, b2, … , bN, END, where END signifies the end of the sequence. If current length of sequence equals to L, we stop appending the current sequence. We can see that when we generate a sequence we do not need to specify which predicate is head in a clause beforehand, since inherently the model is undirected. After clause generation, we can randomly choose any predicate ¬bi as head. In practice, we can always choose the target predicate as head for ease of interpretability.

A3SL Reinforcement Learning Setup

Here, we define the problem space and reinforcement learning algorithm setup.

Environment ε: Our environment consists of predicates for features (denoted by X), target variables (Y ), and latent variables (Z) and data corresponding to X and ground truth data for Y , except latent variables.
State st: Each intermediate state st at time t comprises of either a partially constructed or a complete set of first-order logic clauses, denoted by C. The asynchronous nature of A3SL helps in combating the combinatorial explosion in the state space, making it an appropriate fit for the problem.
Action at: During an action at at time t the algorithm adds a new predicate to the current clause or chooses to return the clause by adding an END to the clause. The algorithm can choose from the action space of all the defined predicates X, Y, Z, and their negative counterparts.
Transition: The environment evolves by updating the current clause c with the predicate chosen during at.
The goal of the agent is to maximize the expected return from each state st. During training, the agent learns to maximize the cumulative reward, so the model can find the corresponding optimal model structure. The cumulative reward is the performance of the model when the RL algorithm finishes building a set of rules.
A3SL Objective Function

In the objective, we use a combination of the HL-MRF objective function and interpretability terms.

Interpretability terms consist of the constraints on the total number of clauses, the maximum possible length of a clause, and the domain-specific semantic constraints as given in the figure below.

The combination of semantic constraints and a performance-based utility allows our algorithm to learn structures that are interpretable and data-driven, thus optimizing for both while being able to rectify any domain-specific intuitions that are not true in the data.

Experimental Evaluation

We conduct experiments to answer the following questions:

  • How well does our learned model structures perform in real-world prediction problems?
  • How well do our rules imbibe the desirable characteristics present in models designed by human experts?
Modeling Recovery in AUD

To model recovery from AUD, we use the dataset in Zhang et al. [2018]. In this dataset, there are 302 users attending Alcoholics Anonymous (AA) labeled with recovery and relapse. For each of the AA-attending user, the dataset contains the most recent 3,200 tweets for each of these users, containing a total of 274,595 AA user tweets. For 302 AA users, there are 76,183 friends in dataset. For each friend, the dataset contains 3,200 tweets, containing a total of 14,921,997 tweets.

Here, we highlight some interesting rules learned by A3SL. The following diagram provides an example of a rule where a friend of a AA user uses an alcohol word in their tweet that could potentially influence the sobriety of the said AA user.

In another rule learned by the model as specified below, retweeting tweets with sober words is considered a sign of the user recovering.

Table 1 shows the 5-fold cross-validation results on pre- dicting recovery and relapse of AA-attending users. We observe that A3SL achieves a statistically significant prediction performance with a rejection threshold of p = 0.01 when compared with the manually specified model on the same dataset [Zhang et al., 2018], Greedy-SL, and a logistic regres- sion baseline.