(转) The major advancements in Deep Learning in 2016

简介: The major advancements in Deep Learning in 2016Pablo Tue, Dec 6, 2016 in MACHINE LEARNINGDEEP LEARNINGGANDeep Learning has been the core top...

 

The major advancements in Deep Learning in 2016

Pablo Tue, Dec 6, 2016 in MACHINE LEARNING

Deep Learning has been the core topic in the Machine Learning community the last couple of years and 2016 was not the exception. In this article, we will go through the advancements we think have contributed the most (or have the potential) to move the field forward and how organizations and the community are making sure that these powerful technologies are going to be used in a way that is beneficial for all.

One of the main challenges researchers have historically struggled with has beenunsupervised learning. We think 2016 has been a great year for this area, mainly because of the vast amount of work on Generative Models.

Moreover, the ability to naturally communicate with machines has been also one of the dream goals and several approaches have been presented by giants like Google and Facebook. In this context, 2016 was all about innovation in Natural Language Processing (NLP) problems which are crucial to reach this goal.

Unsupervised learning

Unsupervised learning refers to the task of extracting patterns and structure from raw data without extra information, as opposed to supervised learning where labels are needed.

The classical approach for this problem using neural networks has beenautoencoders. The basic version consists of a Multilayer Perceptron (MLP) where the input and output layer have the same size and a smaller hidden layer is trained to recover the input. Once trained, the output from the hidden layer corresponds to data representation that can be useful for clustering, dimensionality reduction, improving supervised classification and even for data compression.

Generative Adversarial Networks (GANs)

Recently, a new approach based on generative models has emerged. CalledGenerative Adversarial Networks, it has enabled models to tackle unsupervised learning. GANs are a real revolution. Such has been the impact of this research that in this presentationYann LeCun (one of the fathers of Deep Learning) said that GANs are the most important idea in Machine Learning in the last 20 years.

Although introduced in 2014 by Ian Goodfellow, it is in 2016 that GANs have started to show their real potential. Improved techniques for helping training and better architectures (Deep Convolutional GAN) introduced this year have fixed some of the previous limitations, and new applications (we list some of them later) are revealing how powerful and flexible they can be.

The intuitive idea

Imagine an aspiring painter who wants to do art forgery (G), and someone who wants to earn his living by judging paintings (D). You start by showing D some examples of work by Picasso. Then G produces paintings in an attempt to fool Devery time, making him believe they are Picasso originals. Sometimes it succeeds; however as D starts learning more about Picasso style (looking at more examples),G has a harder time fooling D, so he has to do better. As this process continues, not only D gets really good in telling apart what is Picasso and what is not, but also Ggets really good at forging Picasso paintings. This is the idea behind GANs.

Technically GANs consist of a constant push between two networks (thus “adversarial”): a generator (G) and discriminator (D). Given a set of training examples (such as images), we can imagine that there is an underlying distribution(x) that governs them. With GANs, G will generate outputs and D will decide if they come from the same distribution of the the training set or not.

G will start from some noise z, so the generated images are G(z)D takes images from the distribution (real) and fake (from G) and classifies them: D(x) and D(G(z)).

How a GAN works.

D and G are both learning at the same time, and once G is trained it knows enough about the distribution of the training samples that it can generate new samples that share very similar properties:

Images generated by a GAN.

These images were generated by a GAN trained with CIFAR-10. If you pay attention to the details, you can see they are not indeed real objects. However, there is something to them that captures a certain concept that can make them look real from a distance.

InfoGAN

Recent developments have extended the GANs idea to not only to approximate the data distribution, but also to learn interpretable, useful vector representations of the data. These desired vector representations need to capture rich information (same as in autoencoders) and also need to be interpretable, meaning that we can distinguish parts of the vector that contribute to a specific type of shape transformation in the generated outputs.

The InfoGAN model proposed by OpenAI researchers in August addresses this issue. In a nutshell, InfoGAN is able to generate representations that contain information about the dataset in an unsupervised way. For instance, when applied to the MNIST dataset it is able to infer the type of number (1, 2, 3, …), the rotation and the width of the generated samples without the need for manually tagged data.

Conditional GANs

Another extension of GANs is a class of models called Conditional GAN (cGAN). These models are able to generate samples taking into account external information (class label, text, another image), using it to force G to generate a particular type of output. Some applications that have recently surfaced are:

You can check more about generative models in this blog post or in this talk by Ian Goodfellow.

Natural Language Processing

In order to be able to have fluent conversations with machines, several issues need to be solved first: text understandingquestion answering and machine translation.

Text understanding

Salesforce MetaMind has built a new model called Joint Many-Tasks (JMT) with the objective of creating a single model able to learn five common NLP tasks:

Part-of-speech tagging
Assign parts of speech to each word, such as noun, verb, adjective.
Chunking
Also called shallow parsing. Involves a range of tasks, like finding noun or verb groups.
Dependency parsing
Identify syntactic relationships (such as an adjective modifying a noun) between words.
Semantic relatedness
Measure the semantic distance between two sentences. The result is a real-valued score.
Textual entailment
Determine whether a premise sentences entails a hypothesis sentence. Possible classes: entailment, contradiction, and neutral.

The magic behind this model is that it is end-to-end trainable. This means it allowscollaboration between different layers, resulting in improvements on lower layers tasks (which are less complex), with the results from higher layers (more complex tasks). This is something new compared to older ideas, which could only use lower layers to improve higher level ones, but not the other way around. As a result, this model achieves state of the art results in all but POS tagging (where it came out in second place).

Question Answering

MetaMind also presented a new model called Dynamic Coattention Network(DCN) for the question answering problem, which builds on a pretty intuitive idea.

Imagine I was going to give you a long text and ask you some question. Would you prefer to read the text first and then be asked the question, or be given the question before you actually start reading the text? Naturally, knowing in advance what the question will be conditions you so you know what to pay attention to. If not, you would have to pay equal attention and keep track of every detail and dependencies, to cover for all possible future questions.

DCN does the same thing. First, it generates an internal representation of the documents conditioned on the question that it is trying to answer, and then starts iterating over a list of possible answers converging to the final answer.

Machine Translation

In September, Google presented a new model used by their translation service called Google Neural Machine Translation (GNMT). This model is trained separately for each pair of languages like Chinese-English.

new GNMT version was announced in November. It goes a step further, training a single model that is able to translate between multiple pairs of languages. The only difference with the previous model is that it now GNMT takes a new input that specifies the target language. It also enables zero-shot translation meaning that it is able to translate a pair of language that it wasn’t trained to.

GNMT results show that training it on multiple pairs of languages is better than training on a single pair, demonstrating that it is able to transfer the “translation knowledge” from one language pair to another.

Community

Several corporations and entrepreneurs have created non-profits and partnerships to discuss about the future of Machine Learning and making sure that these impressive technologies are used properly in favor of the community.

OpenAI is a non-profit organization that aims to collaborate with the research and industry community, and releasing the results to public for free. It was created in late 2015, and started delivering the first results (publications like InfoGAN, platforms like Universe and (un)conferences like this one) in 2016. The motivation behind it is to make sure that AI technology is reachable for as many people as possible, and by doing so, avoiding the creation of AI superpowers.

On the other hand, a partnership on AI was signed by Amazon, DeepMind, Google, Facebook, IBM and Microsoft. The goal is to advance public understanding of the field, support best practices and develop an open platform for discussion and engagement.

Another aspect worth highlighting is the openness of the research community. Not only can you find almost any publication on sites like Arxiv (or Arxiv-Sanity) for free, but you can also now replicate their experiments by using the same code. One useful tool is GitXiv, which links Arxiv papers with their open source project repository.

Open source tools are everywhere (as we highlighted in our 10 main takeaways from MLconf SF blogpost). They are used and created by researchers and companies. Here is a list of the most popular tools in 2016 for Deep Learning:

  • TensorFlow by Google.
  • Keras by François Chollet.
  • CNTK by Microsoft.
  • MXNET by Distributed (Deep) Machine Learning Community. Adapted by Amazon.
  • Theano by Université de Montréal.
  • Torch by Ronan Collobert, Koray Kavukcuoglu, Clement Farabet. Widely used by Facebook.

Final Thoughts

It is a great time to be part of the recent Machine Learning developments. As you can see this year has been particularly exciting; the research is moving at such a rapid pace that it’s hard to keep up with latest advancements. We are truly lucky to be living in an era where AI has been democratized.

At Tryolabs we are working in some very interesting projects with these technologies. We promise to keep you all posted with our findings and continue sharing experiences with the industry and all the interested developers out there.

We reviewed a lot in this post, but there were many other great developments that we had to leave out. If you feel we have not done enough justice to some of these, please feel free to say so in the comments below!

Update (12/07/2016): follow the discussion of this post on HackerNews and/r/MachineLearning. There are a lot of awesome contributions!


 
Comments powered by Disqus

CODE TIPS, TRICKS, AND FREEBIES. DELIVERED MONTHLY.

Signup to our newsletter.

 
No spam, ever. We'll never share your email address and you can opt out at any time.

Hire us

        ESTIMATED BUDGET         15k - 50k         50k - 75k         75k - 100k         + 100k      

Subscribe to receive news and blog updates.

 
, Number of shares
Share to LinkedIn
, Number of shares150
Share to Reddit
, Number of shares
相关文章
|
9月前
|
机器学习/深度学习 算法 决策智能
【5分钟 Paper】Deep Reinforcement Learning with Double Q-learning
【5分钟 Paper】Deep Reinforcement Learning with Double Q-learning
|
9月前
|
机器学习/深度学习 资源调度 算法
【RLchina第四讲】Model-Based Reinforcement Learning(上)
【RLchina第四讲】Model-Based Reinforcement Learning(上)
240 0
|
9月前
|
机器学习/深度学习 算法
【RLchina第四讲】Model-Based Reinforcement Learning(下)
【RLchina第四讲】Model-Based Reinforcement Learning(下)
116 0
|
9月前
|
机器学习/深度学习 人工智能 算法
【5分钟 Paper】Reinforcement Learning with Deep Energy-Based Policies
【5分钟 Paper】Reinforcement Learning with Deep Energy-Based Policies
|
9月前
|
机器学习/深度学习 编解码 算法
【5分钟 Paper】Dueling Network Architectures for Deep Reinforcement Learning
【5分钟 Paper】Dueling Network Architectures for Deep Reinforcement Learning
|
9月前
|
机器学习/深度学习 编解码 数据可视化
Speech Emotion Recognition With Local-Global aware Deep Representation Learning论文解读
语音情感识别(SER)通过从语音信号中推断人的情绪和情感状态,在改善人与机器之间的交互方面发挥着至关重要的作用。尽管最近的工作主要集中于从手工制作的特征中挖掘时空信息,但我们探索如何从动态时间尺度中建模语音情绪的时间模式。
86 0
|
机器学习/深度学习 算法 数据挖掘
A Generative Adversarial Network-based Deep Learning Method for Low-quality Defect ImageReconstructi
本文提出了一种基于生成对抗网络 (GAN) 的 DL 方法,用于低质量缺陷图像识别。 GAN用于重建低质量缺陷图像,并建立VGG16网络识别重建图像。
104 0
|
机器学习/深度学习 传感器 数据挖掘
Review on the Recent Welding Research with Application of CNN-Based Deep Learning
Guo等人16)将CNN应用于线管制造过程中的电阻焊,提出了一种正常焊缝与缺陷焊缝的分类模型,准确率达到99.01%。
69 0
《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》电子版地址
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition
74 0
《Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition》电子版地址
|
机器学习/深度学习 算法框架/工具
Some methods of deep learning and dimensionality reduction
Deep Learning 上一篇主要是讲了全连接神经网络,这里主要讲的就是深度学习网络的一些设计以及一些权值的设置。神经网络可以根据模型的层数,模型的复杂度和神经元的多少大致可以分成两类:Shallow Neural Network和Deep Neural Network。
827 0