Workshop on Learning Architectures, Representations, and Optimization for Speech and Visual Information Processing
a workshop in conjunction with
The 28th International Conference on Machine Learning (ICML 2011)
Time: 9am-5:30pm, July 2, 2011
Room: Grand-I Architectures
The program is available here: https://icml2011speechvision.wordpress.com/program/
Overview
This workshop is about bringing together and informing researchers and students from diverse communities of machine learning, speech recognition, computer vision, signal processing, cognitive science of human auditory and visual perception, optimization and applied mathematics to further the research in deep learning models for real-world applications in computer vision and speech. The special focus is placed on both commonality and uniqueness of speech and vision problems, and on how unified learning paradigms and representations can be developed to address these problems tackled largely by disparate communities until now.
Through invited talks and panel discussions, we will attempt to address the central topics in learning representations and architectures today, as well as the associated optimization techniques. The workshop will also invite paper submissions on the most recent development of unsupervised learning and hierarchical learning algorithms, theoretical foundations, inference and optimization, semi-supervised and transfer learning, and applications to real-world tasks in speech processing and computer vision. Papers will be presented as oral or poster presentations. Detailed topics of presentations are expected to include (but not limited to) the followings:
- Development of learning models, e.g., deep belief nets, deep neural nets, deep Boltzmann machines, high-order sparse coding, hierarchical generative models, temporal and/or recursive models with deep structure, generative models motivated by physical processes of human speech production and of natural image formation, discriminative models motivated by human speech and visual perception, etc.
- Algorithms for probabilistic inference, optimization strategies when the objective is non-convex, and large-scale implementations associated with the above models.
- Learning biologically inspired feature hierarchies in human visual and auditory signal processing.
- Novel representations via the use of side information in unsupervised feature learning, e.g., spatial correlations in image, sequential dynamics and temporal/spectral correlations in speech, physical constraints in speech production, perceptual constraints in vision, and other additional prior knowledge, etc.
- Theoretical understanding on the role of unsupervised feature learning in building complex predictive models. Under which conditions does the feature hierarchy provide a better regularization or achieve a higher statistical efficiency?
- Success, failures, and lessons learned in real-world applications including understanding of natural scenes, recognition of objects and events, speech recognition under controlled environments, large-vocabulary speech recognition under realistic acoustic environments, auditory coding of speech and music, etc.
Motivation
In recent years, there has been a lot of interest in algorithms that learn hierarchical representations from unlabeled data. Unsupervised learning and deep learning methods, such as sparse coding, restricted Boltzmann machines, deep belief networks, convolutional architectures, recursive compositional models, and hierarchical generative models, have been successfully applied to a variety of tasks in computer vision and speech processing with highly promising results. In this workshop, we will bring together researchers who are interested in learning representations and architectures and in developing efficient and robust optimization algorithms for speech and visual information processing, review the recent technical progress, and discuss the challenges and future research directions.
Impact and expected outcomes
This workshop is aimed to stimulate vigorous interactions among researchers in machine learning, neural networks, speech recognition, and computer vision. It will accelerate deep learning research and its applications to speech and visual information processing as the researchers in disparate research areas learn from each other and as they jointly establish the foundation of the architectural, representational, and optimization aspects of deep learning related to these two major classes of applications.
With this workshop, we plan to have in-depth discussions on the current state-of-the-art and next big challenges in learning representations and architectures and propose research directions to the research community. We will stimulate the exchange of ideas among all other members of the ICML community as well.
In addition to the main presentations, the workshop will also plan a panel discussion session. The main topics of the discussion will include:
- How to build hierarchical systems
- Principles underlying learning of hierarchical systems: sparsity, reconstruction, (if supervised) what kind of supervision, how to learn and use invariances, how to learn and use variability, etc.
- Similarities and differences of computer vision and speech recognition problems; hand-crafted features (e.g., SIFT vs. MFCC/PLP); learned features; nature of the variability in speech and natural images; nature of the invariance in speech and natural images
- Critiques of the current approaches
- Real-world applications and benchmark datasets
- Scalability: efficiency during training and inference; how to distribute training with mass data over many machines
- Major milestones and goals for the next 5 or 10 years
Panel discussions will be led by the members of the organizing committee as well as by prominent representatives of the vision and speech processing communities.
Call for papers
If you are interested in presenting your work, please submit an extended abstract (1-2 pages in conference proceedings format) via email to icml2011ws.visionspeech@gmail.com. Accepted contributions will be presented as posters.
Invited speakers
Andrew Ng (Stanford University)
Fei-Fei Li (Stanford University)
John Platt (Microsoft Research)
Pedro Domingos (U. Washington)
Xuedong Huang (Microsoft)
Steven Greenberg (Silicon Speech)
Panelists
Andrew Ng (Stanford University)
Fei-Fei Li (Stanford University)
John Platt (Microsoft Research)
Pedro Domingos (U. Washington)
Xuedong Huang (Microsoft)
Steven Greenberg (Silicon Speech)
Guangbin Huang (Nanyang University of Technology)
Key Dates
Paper Submission Deadline: May 9, 2011 (extended)
Paper Acceptance Notification: May 20, 2011
Camera Ready Submission: June 10, 2011
Workshop Date: July 02, 2011
Organizers
Li Deng, Microsoft Research
Honglak Lee, University of Michigan
Kai Yu, NEC Laboratories America