Artificial Intelligence Machine Learning

A3SL: Learning Interpretable Relational Structures of Hinge-loss Markov Random Fields

Introduction

Machine-learning models that possess superior interpretability and ease of specification without compromising modeling power have the potential to positively impact numerous downstream application domains that use them.

We focus on one such recently developed inherently interpretable model that efficiently combines logic and uncertainty, hinge-loss Markov random fields (HL-MRFs). In this work, we present an approach to learn meaningful structures of HL-MRFs by adapting a deep reinforcement learning (RL) algorithm, asynchronous advantage actor-critic (A3C).

The possibility of tackling huge state spaces asynchronously in A3C together with scalable inference in HL-MRFs paves the way for developing a scalable structure learning algorithm that allows for diverse exploration.

Further, guiding this exploration using domain-specific semantic constraints is helpful in learning meaningful structures that can potentially also achieve good prediction performance.

Contributions

In this work, we present asynchronous advantage actor-critic for structure learning (A3SL), a deep RL algorithm for learning interpretable structures of HL-MRFs that takes into account semantic constraints on domain-specific insights to guide the discovery of structures while simultaneously maintaining good prediction performance and ability to learn from real-world data instances.

A3SL is the first structure learning algorithm that directly optimizes for interpretability, learning networked dependency structures, and learning meaningful latent variables in the problem formulation itself, thus yielding structures with enhanced interpretability and semantic coherence.

We encourage the learning of diverse models through exploration by including a diversity constraint on the actions. This can potentially help the domain expert choose the model with the best semantic meaning from a group of models that have similar prediction performance.
We evaluate our structure learning algorithm on two important real-world computational social science applications: i) modeling recovery from alcohol use disorder, and ii) detecting bullying in online interactions. We choose these applica- tions as there is existing work on applying HL-MRFs in these domains and we are able to make a direct comparison of the learned rules from A3SL with the existing manually specified models. We show that the structures learned by our algorithm achieves better prediction performance when compared with structures learned using a greedy structure learning al- gorithm, structure learning algorithms developed for Markov logic networks (MLNs) (Hypergraph lifting) and Grafting-light [Zhu et al., 2010]), and manually defined model structures.
We demonstrate that our model is able to learn complex clauses that encode network interactions such as the friend network and meaningful latent variables through a complex combination of features and target variables using semantic constraints. We also demonstrate that our learned clauses resemble the manually specified clauses capturing the same semantic meaning, while also learning others via exploration.

A3SL: Asynchronous Advantage Actor-Critic Structure Learning Algorithm for Learning HL-MRF Structures

We present our asynchronous advantage actor-critic structure learning algorithm, A3SL, which adapts a recently developed neural policy gradient algorithm asynchronous advantage actor-critic (A3C) for the structure learning problem. A3C is one of the most general and successful learning agents to date and has been shown to perform well in discrete and continuous action spaces. A3C’s ability to asynchronously execute multiple agents in parallel on multiple instances in the environment offers both algorithmic improvements that allow for effective integration of feedforward and recurrent deep neural network architectures in RL algorithms as well as practical benefits in speed, making it an appropriate choice for our structure learning problem.

Structure Learning Problem Definition

The structure learning problem can be modeled as a space search and sequential decision problem using reinforcement learning. Since HL-MRFs are specified using templated first-order logic clauses (also referred to as rules) in continuous space using the probabilistic programming language PSL, our problem translates to learning the structure of these clauses. We know that a first order logic clause has the form body → head, and in PSL head only can contain one predicate and body has the form b₁ ∧ b₂∧ … ∧ b_N, N <L, where L denotes the maximum length of a clause. And, the order of predicates b in a clause within the body and clause in the list of clauses do not matter.

A clause of the structure b₁ ∧ b₂∧ … ∧ b_N → head can be represented in different ways as shown below.

This can be as a sequence b₀, b₁, b₂, … , b_N, END, where END signifies the end of the sequence. If current length of sequence equals to L, we stop appending the current sequence. We can see that when we generate a sequence we do not need to specify which predicate is head in a clause beforehand, since inherently the model is undirected. After clause generation, we can randomly choose any predicate ¬b_i as head. In practice, we can always choose the target predicate as head for ease of interpretability.

A3SL Reinforcement Learning Setup

Here, we define the problem space and reinforcement learning algorithm setup.

Environment ε: Our environment consists of predicates for features (denoted by X), target variables (Y ), and latent variables (Z) and data corresponding to X and ground truth data for Y , except latent variables.

State s_t: Each intermediate state s_t at time t comprises of either a partially constructed or a complete set of first-order logic clauses, denoted by C. The asynchronous nature of A3SL helps in combating the combinatorial explosion in the state space, making it an appropriate fit for the problem.

Action a_t: During an action at at time t the algorithm adds a new predicate to the current clause or chooses to return the clause by adding an END to the clause. The algorithm can choose from the action space of all the defined predicates X, Y, Z, and their negative counterparts.

Transition: The environment evolves by updating the current clause c with the predicate chosen during a_t.

The goal of the agent is to maximize the expected return from each state s_t. During training, the agent learns to maximize the cumulative reward, so the model can find the corresponding optimal model structure. The cumulative reward is the performance of the model when the RL algorithm finishes building a set of rules.

A3SL Objective Function

In the objective, we use a combination of the HL-MRF objective function and interpretability terms.

Interpretability terms consist of the constraints on the total number of clauses, the maximum possible length of a clause, and the domain-specific semantic constraints as given in the figure below.

The combination of semantic constraints and a performance-based utility allows our algorithm to learn structures that are interpretable and data-driven, thus optimizing for both while being able to rectify any domain-specific intuitions that are not true in the data.

Experimental Evaluation

We conduct experiments to answer the following questions:

How well does our learned model structures perform in real-world prediction problems?
How well do our rules imbibe the desirable characteristics present in models designed by human experts?

Modeling Recovery in AUD

To model recovery from AUD, we use the dataset in Zhang et al. [2018]. In this dataset, there are 302 users attending Alcoholics Anonymous (AA) labeled with recovery and relapse. For each of the AA-attending user, the dataset contains the most recent 3,200 tweets for each of these users, containing a total of 274,595 AA user tweets. For 302 AA users, there are 76,183 friends in dataset. For each friend, the dataset contains 3,200 tweets, containing a total of 14,921,997 tweets.

Here, we highlight some interesting rules learned by A3SL. The following diagram provides an example of a rule where a friend of a AA user uses an alcohol word in their tweet that could potentially influence the sobriety of the said AA user.

In another rule learned by the model as specified below, retweeting tweets with sober words is considered a sign of the user recovering.

Table 1 shows the 5-fold cross-validation results on pre- dicting recovery and relapse of AA-attending users. We observe that A3SL achieves a statistically significant prediction performance with a rejection threshold of p = 0.01 when compared with the manually specified model on the same dataset [Zhang et al., 2018], Greedy-SL, and a logistic regres- sion baseline.

Data Science Station

A3SL: Learning Interpretable Relational Structures of Hinge-loss Markov Random Fields

Introduction

Contributions

Highlights of Fair-A3SL

A3SL: Asynchronous Advantage Actor-Critic Structure Learning Algorithm for Learning HL-MRF Structures

Structure Learning Problem Definition

A3SL Reinforcement Learning Setup

A3SL Objective Function

Experimental Evaluation

Modeling Recovery in AUD

Introduction

Contributions

Highlights of Fair-A3SL

A3SL: Asynchronous Advantage Actor-Critic Structure Learning Algorithm for Learning HL-MRF Structures

Structure Learning Problem Definition

A3SL Reinforcement Learning Setup

A3SL Objective Function

Experimental Evaluation

Modeling Recovery in AUD

You May Also Like