A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
All content for Department of Statistics is the property of Oxford University and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
Graduate Lecture - Thursday 3rd June 2021, with Dr Fergus Boyles. Department of Statistics, University of Oxford. Drug discovery is a long and laborious process, with ever growing costs and dwindling productivity making it ever more difficult to bring new medicines to the market in an affordable and timely fashion. There is a long history of applying statistical modelling and machine learning to problems in drug discovery, and, as in many fields, there is growing excitement about the potential of modern machine learning techniques to both automate and accelerate time-consuming tasks, and to enable previously unfeasible experiments. In this talk I will describe the drug discovery pipeline and introduce computer-aided drug discovery. Drawing on my own research and that of others, I will explain how machine learning is currently being applied to problems in drug discovery and highlight challenges and pitfalls that remain to be addressed.
Department of Statistics
A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/