Sunday, April 24, 2022

Efficient training of deep networks with unitary matrices


Yann Lecun has given a few informal talks and published a couple of monographs about why "big data" is frequently not necessarily a good approach to solving difficult classification problems.  He cites his own failures at writing models that view radiological pictures (X-rays) to answer Yes / No questions about a patient. Almost all skilled radiologists solve these problems easily but ML models cannot.  The spectacular failures of IBM Watson in other medical fields is another example.  Lecun is looking for methods of using "good data" instead of "big data" for solving several narrow problems and then generalizing the approach to overcome bigger problems with bigger data.

Two of the major problems in recurrent neural net (RNN) deep learning when it is applied to large data sequences are setting the initial conditions and stabilizing the learning process. Learning normally consists of iterating a process of applying linear transformations and then a pointwise nonlinearity to the state data.   Sometimes the gradient disappears or becomes infinite, so the learning fails completely.

Bobak Kiani and a few other co-authors, including Yann Lecun have published a cool trick of using unitary matrices whose eigenvalues never go above or below a magnitude of one and therefore prevent these failures.  Not only will it always complete, but it will also run in O(kN^2) time.  The authors claim their new algorithm is faster in all cases and that even with k=1 it is nearly as accurate.

No comments: