As artificial intelligence becomes increasingly critical to the everyday workflow of enterprises, including increasing usage within security, computer scientists in the AI community are attempting to make the process of actually coming up with these decisions work better.
Neural networks are used in the inference part of the overall process, and are critical to decision making. There has been a recent flurry of academic papers published that address how to optimize these networks from both a training and operational standpoint.
One paper, "Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm" from Charbel Sakr and Naresh Shanbhag of the University of Illinois at Urbana-Champaign, describes a "precision assignment methodology for neural network training in which all network parameters, i.e., activations and weights in the feedforward path, gradients and weight accumulators in the feedback path, are assigned close to minimal precision. The precision assignment is derived analytically and enables tracking the convergence behavior of the full precision training, known to converge a priori."
So what does this mean?
This approach reduces the complexity of the neural network by using minimal precision, which also reduced the amount of work needed to train the algorithm. The paper describes an attempt to bypass the usual training methods that involve the stochastic gradient descent algorithm, which can make things very slow and energy hungry.
This differs from the approach described in the paper "Exploring Weight Symmetry in Deep Neural Networks" by Xu Shell Hu, Sergey Zagoruyko, and Nikos Komodakis of Université Paris-Est, École des Ponts ParisTech in Paris. Here, they propose imposing "symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines."
One might expect a significant drop in accuracy due to reduction in the number of parameters brought about by the symmetry constraints. However, the authors show that this
is not the case. Further, they find that depending on network size, symmetry can have little or no negative effect on network accuracy.
In the paper, the researchers write that "symmetry parameterizations satisfy universal approximation property for single hidden layer networks" with 25% less parameters yielding only a 0.2% accuracy loss.
Other researchers are looking at totally different aspects of the overall process.
In a paper entitled "Stanza: Distributed Deep Learning with Small Communication Footprint," Xiaorui Wu, Hong Xu, and Bo Li of the City University of Hong Kong, along with Yongqiang Xiong of Microsoft, look at how each compartmentalized part of a parameter server system works to train the complete model.
They found that the data transfer between the convolutional layers and the fully connected layers has a non-negligible impact on training time."
Their paper proposes layer separation in distributed training. The majority of the nodes will only train the convolutional layers, and the rest will train the fully connected layers only. Gradients and parameters of the fully connected layers no longer need to be exchanged across the cluster, which substantially reduces the data transfer volume and the time needed to do the data transfers.
In their conclusion, the researchers determined that on Amazon EC2 instances with an Nvidia Tesla V100 GPU and 10GB bandwidth, their system -- called Stanza -- is 1.34x to 13.9x faster for common deep learning models.
These are all ways to optimizing existing neural net architectures that remain similar in concept to those that were first developed. How that architecture may change for the better remains to be seen.
— Larry Loeb has written for many of the last century's major "dead tree" computer magazines, having been, among other things, a consulting editor for BYTE magazine and senior editor for the launch of WebWeek.