bannerWeb

Invited speakers

  • Devi Parikh
  • David Blei
  • Hinrich Schütze
  • Devi Parikh

    Title: Words, Pictures, and Common Sense

    Wouldn't it be nice if machines could understand content in images and communicate this understanding as effectively as humans? Such technology would be immensely powerful, be it for aiding a visually-impaired user navigate a world built by the sighted, assisting an analyst in extracting relevant information from a surveillance feed, educating a child playing a game on a touch screen, providing information to a spectator at an art gallery, or interacting with a robot. As computer vision and natural language processing techniques are maturing, we are closer to achieving this dream than we have ever been.

    Visual Question Answering (VQA) is one step in this direction. Given an image and a natural language question about the image (e.g., “What kind of store is this?”, “How many people are waiting in the queue?”, “Is it safe to cross the street?”), the machine’s task is to automatically produce an accurate natural language answer (“bakery”, “5”, “Yes”). In this talk, I will present our dataset, some neural models, and open research questions in free-form and open-ended Visual Question Answering (VQA). I will also show a teaser about the next step moving forward: Visual Dialog. Instead of answering individual questions about an image in isolation, can we build machines that can hold a sequential natural language conversation with humans about visual content?

    While machines are getting better at superficially connecting words to pictures, interacting with them quickly reveals that they lack a certain common sense about the world we live in. Common sense is a key ingredient in building intelligent machines that make "human-like" decisions when performing tasks -- be it automatically answering natural language questions, or understanding images and videos. How can machines learn this common sense? While some of this knowledge is explicitly stated in human-generated text (books, articles, blogs, etc.), much of this knowledge is unwritten. While unwritten, it is not unseen! The visual world around us is full of structure bound by commonsense laws. But machines today cannot learn common sense directly by observing our visual world because they cannot accurately perform detailed visual recognition in images and videos. We argue that one solution is to give up on photorealism. We propose to leverage abstract scenes -- cartoon scenes made from clip art by crowd sourced humans -- to teach our machines common sense. I will demonstrate how knowledge learnt from this abstract world can be used to solve commonsense textual tasks.



    Biography:
    Devi Parikh is an Assistant Professor in the School of Interactive Computing at Georgia Tech, and a Visiting Researcher at Facebook AI Research (FAIR). From 2013 to 2016, she was an Assistant Professor in the Bradley Department of Electrical and Computer Engineering at Virginia Tech. From 2009 to 2012, she was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with University of Chicago. She has held visiting positions at Cornell University, University of Texas at Austin, Microsoft Research, MIT, and Carnegie Mellon University. She received her M.S. and Ph.D. degrees from the Electrical and Computer Engineering department at Carnegie Mellon University in 2007 and 2009 respectively. She received her B.S. in Electrical and Computer Engineering from Rowan University in 2005. Her research interests include computer vision and AI in general and visual recognition problems in particular. Her recent work involves exploring problems at the intersection of vision and language, and leveraging human-machine collaboration for building smarter machines. She has also worked on other topics such as ensemble of classifiers, data fusion, inference in probabilistic models, 3D reassembly, barcode segmentation, computational photography, interactive computer vision, contextual reasoning, hierarchical representations of images, and human-debugging. She is a recipient of an NSF CAREER award, a Sloan Research Fellowship, an Office of Naval Research (ONR) Young Investigator Program (YIP) award, an Army Research Office (ARO) Young Investigator Program (YIP) award, an Allen Distinguished Investigator Award in Artificial Intelligence from the Paul G. Allen Family Foundation, four Google Faculty Research Awards, an Amazon Academic Research Award, an Outstanding New Assistant Professor award from the College of Engineering at Virginia Tech, a Rowan University Medal of Excellence for Alumni Achievement, Rowan University's 40 under 40 recognition, and a Marr Best Paper Prize awarded at the International Conference on Computer Vision (ICCV).

    MORE INFO

  • David Blei

    Title: Probabilistic Topic Models and User Behavior

    Topic modeling algorithms analyze a document collection to estimate its latent thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills. Behavior data is essential both for making predictions about users (such as for a recommendation system) and for understanding how a collection and its users are organized.

    I will review the basics of topic modeling and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. We studied collaborative topic models on 80,000 scientists' libraries from Mendeley and 100,000 users' click data from the arXiv. Collaborative topic models enable interpretable recommendation systems, capturing scientists' preferences and pointing them to articles of interest. Further, these models can organize the articles according to the discovered patterns of readership. For example, we can identify articles that are important within a field and articles that transcend disciplinary boundaries.



    Biography:
    David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013). He is a fellow of the ACM.

    MORE INFO

  • Hinrich Schütze

    Title: Don't cram two completely different meanings into a single !&??@#^$% vector! Or should you? Behavior

    It is tempting to interpret a high-dimensional embedding space cartographically, i.e., as a map each point of which represents a distinct identifiable meaning -- just as cities and mountains on a real map represent distinct identifiable geographic locations. On this interpretation, ambiguous words pose a problem: how can two completely different meanings be in the same location? Instead of learning a single embedding for an ambiguous word, should we rather learn a different embedding for each of its senses (as has often been proposed)? In this talk, I will take a fresh look at this question, drawing on simulations with pseudowords, sentiment analysis experiments, psycholinguistics and -- if time permits -- lexicography.




    Biography:

    Hinrich Schütze is professor of computational linguistics and director of the Center for Information and Language Processing at LMU Munich. He received his PhD from Stanford University's Department of Linguistics in 1995 and worked on natural language processing and information retrieval technology at Xerox PARC, at several Silicon Valley startups and at Google 1995-2004 and 2008/9. He coauthored Foundations of Statistical Natural Language Processing (with Chris Manning) and Introduction to Information Retrieval (with Chris Manning and Prabhakar Raghavan). His research is motivated by a fundamental question that computational linguists face today: Is domain knowledge about language dispensable (as many in deep learning seem to believe) or can linguistics and statistical NLP learn and benefit from each other?

    MORE INFO