Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
TV & Film
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts112/v4/4c/78/c9/4c78c94e-1daa-45a1-8679-d10129ff5eb3/mza_14431068762409693551.jpg/600x600bb.jpg
Department of Statistics
Oxford University
35 episodes
8 months ago
A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
Show more...
Education
RSS
All content for Department of Statistics is the property of Oxford University and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
Show more...
Education
https://is1-ssl.mzstatic.com/image/thumb/Podcasts112/v4/4c/78/c9/4c78c94e-1daa-45a1-8679-d10129ff5eb3/mza_14431068762409693551.jpg/600x600bb.jpg
Ethics from the perspective of an applied statistician
Department of Statistics
39 minutes
3 years ago
Ethics from the perspective of an applied statistician
Professor Denise Lievesley discusses ethical issues and codes of conduct relevant to applied statisticians. Statisticians work in a wide variety of different political and cultural environments which influence their autonomy and their status, which in turn impact on the ethical frameworks they employ. The need for a UN-led fundamental set of principles governing official statistics became apparent at the end of the 1980s when countries in Central Europe began to change from centrally planned economies to market-oriented democracies. It was essential to ensure that national statistical systems in such countries would be able to produce appropriate and reliable data that adhered to certain professional and scientific standards. Alongside the UN initiative, a number of professional statistical societies adopted codes of conduct. Do such sets of principles and ethical codes remain relevant over time? Or do changes in the way statistics are compiled and used mean that we need to review and adapt them? For example as combining data sources becomes more prevalent, record linkage, in particular, poses privacy and ethical challenges. Similarly obtaining informed consent from units for access to and linkage of their data from non-survey sources continues to be challenging. Denise draws on her earlier role as a statistician in the United Nations, working with some 200 countries, to discuss some of the ethical issues she encountered then and how these might change over time. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/
Department of Statistics
A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks. Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales; http://creativecommons.org/licenses/by-nc-sa/2.0/uk/