BERT is a strong and generalizable architecture that can be transferred for a variety of NLP tasks. But, it is very, very large, which can make it very, very slow. In a recent analysis, SigOpt Machine Learning Engineer Meghana Ravikumar explored this tradeoff between size and performance for BERT on Squad 2.0.
In her first talk, Meghana explained how she set up a Multimetric Bayesian Optimization experiment to explore this tradeoff. In this talk, she builds on this discussion by explaining how she used insights from training runs and automated hyperparameter tuning to explore this tradeoff in greater depth, and draw specific conclusions related to the impact of reducing size on actual model performance on this particular task.
More specifically, Meghana will explain how SigOpt easily integrates into and helps organize her modeling process. Specifically she’ll walk through critical points of her modeling workflow and describe how she leveraged SigOpt to make informed decisions. Specifically she’ll explain how to: