Inferring generative model structure with static analysis

Paroma Varma; Bryan He; Payal Bajaj; Nishith Khandwala; Imon Banerjee; Daniel Rubin; Christopher Ré

Inferring generative model structure with static analysis

Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré

Diagnostic Radiology

Research output: Contribution to journal › Conference article › peer-review

Abstract

Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning n^th degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.

Original language	English (US)
Pages (from-to)	240-250
Number of pages	11
Journal	Advances in Neural Information Processing Systems
Volume	2017-December
State	Published - 2017
Event	31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States Duration: Dec 4 2017 → Dec 9 2017

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

@article{05a4c394199648e38f09b9d82aba3ecb,

title = "Inferring generative model structure with static analysis",

abstract = "Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning nth degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.",

author = "Paroma Varma and Bryan He and Payal Bajaj and Nishith Khandwala and Imon Banerjee and Daniel Rubin and Christopher R{\'e}",

note = "Publisher Copyright: {\textcopyright} 2017 Neural information processing systems foundation. All rights reserved.; 31st Annual Conference on Neural Information Processing Systems, NIPS 2017 ; Conference date: 04-12-2017 Through 09-12-2017",

year = "2017",

language = "English (US)",

volume = "2017-December",

pages = "240--250",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - Inferring generative model structure with static analysis

AU - Varma, Paroma

AU - He, Bryan

AU - Bajaj, Payal

AU - Khandwala, Nishith

AU - Banerjee, Imon

AU - Rubin, Daniel

AU - Ré, Christopher

PY - 2017

Y1 - 2017

N2 - Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning nth degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.

AB - Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects the quality of the training labels, but is difficult to learn without any ground truth labels. We instead rely on weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus significantly reducing the amount of data required to learn structure. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations identified, improving over the standard sample complexity, which is exponential in n for learning nth degree relations. Empirically, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.

UR - http://www.scopus.com/inward/record.url?scp=85046998872&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046998872&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85046998872

SN - 1049-5258

VL - 2017-December

SP - 240

EP - 250

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017

Y2 - 4 December 2017 through 9 December 2017

ER -

Inferring generative model structure with static analysis

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this