PUBLICATIONS

Topics:
  1. Q. Qiu, J. Lezama, A. Bronstein, G. Sapiro, ForestHash: Semantic hashing with shallow random forests and tiny convolutional networks, Proc. European Conf. on Computer Vision (ECCV) details

    ForestHash: Semantic hashing with shallow random forests and tiny convolutional networks

    Q. Qiu, J. Lezama, A. Bronstein, G. Sapiro
    Proc. European Conf. on Computer Vision (ECCV)
    Picture for ForestHash: Semantic hashing with shallow random forests and tiny convolutional networks

    Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1′ for the visited tree leaf, and `0′ for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.

    E. Schwartz, R. Giryes, A. M. Bronstein, DeepISP: Towards learning an end-to-end image processing pipeline, IEEE Trans. on Image Processing details

    DeepISP: Towards learning an end-to-end image processing pipeline

    E. Schwartz, R. Giryes, A. M. Bronstein
    IEEE Trans. on Image Processing

    We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in the objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.

    E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein, RepMet: Representative-based metric learning for classification and one-shot object detection, arXiv:1806.04728 details

    RepMet: Representative-based metric learning for classification and one-shot object detection

    E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, R. Feris, A. Kumar, R. Giryes, A. M. Bronstein
    arXiv:1806.04728

    Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only few examples. In this work, we propose a new method for DML, featuring a joint learning of the embedding space and of the data distribution of the training categories, in a single training process. Our method improves upon leading algorithms for DML-based object classification. Furthermore, it opens the door for a new task in computer vision — a few-shot object detection, since the proposed DML architecture can be naturally embedded as the classification head of any standard object detector. In numerous experiments, we achieve state-of-the-art classification results on a variety of fine-grained datasets, and offer the community a benchmark on the few-shot detection task, performed on the Imagenet-LOC dataset.

    C. Baskin, E. Schwartz, E. Zheltonozhskii, N. Liss, R. Giryes, A. M. Bronstein, A. Mendelson, UNIQ: Uniform noise injection for non-uniform quantization of neural networks, arXiv:1804.10969 details

    UNIQ: Uniform noise injection for non-uniform quantization of neural networks

    C. Baskin, E. Schwartz, E. Zheltonozhskii, N. Liss, R. Giryes, A. M. Bronstein, A. Mendelson
    arXiv:1804.10969

    We present a novel method for training a neural network amenable to inference in low-precision arithmetic with quantized weights and activations. The training is performed in full precision with random noise injection emulating quantization noise. In order to circumvent the need to simulate realistic quantization noise distributions, the weight distributions are uniformized by a non-linear transfor- mation, and uniform noise is injected. This procedure emulates a non-uniform k-quantile quantizer at inference time, which adapts to the specific distribution of the quantized parameters. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. The method achieves state-of-the-art results for training low-precision networks on ImageNet. In particular, we observe no degradation in accuracy for MobileNet and ResNet-18/34/50 on ImageNet with as low as 4-bit quantization of weights. Our solution achieves the state-of-the-art results in accuracy, in the low computational budget regime, compared to similar models.

    S. Vedula, O. Senouf, A. M. Bronstein, O. V. Michailovich, M. Zibulevsky, Towards CT-quality ultrasound imaging using deep learning, arXiv:1710.06304 details

    Towards CT-quality ultrasound imaging using deep learning

    S. Vedula, O. Senouf, A. M. Bronstein, O. V. Michailovich, M. Zibulevsky
    arXiv:1710.06304

    The cost-effectiveness and practical harmlessness of ultra- sound imaging have made it one of the most widespread tools for medical diagnosis. Unfortunately, the beam-forming based image formation produces granular speckle noise, blur- ring, shading and other artifacts. To overcome these effects, the ultimate goal would be to reconstruct the tissue acoustic properties by solving a full wave propagation inverse prob- lem. In this work, we make a step towards this goal, using Multi-Resolution Convolutional Neural Networks (CNN). As a result, we are able to reconstruct CT-quality images from the reflected ultrasound radio-frequency(RF) data obtained by simulation from real CT scans of a human body. We also show that CNN is able to imitate existing computationally heavy despeckling methods, thereby saving orders of magni- tude in computations and making them amenable to real-time applications.

    O. Litany, T. Remez, A. Bronstein, Cloud Dictionary: Sparse coding and modeling for point clouds, arXiv:1612.04956 details

    Cloud Dictionary: Sparse coding and modeling for point clouds

    O. Litany, T. Remez, A. Bronstein
    arXiv:1612.04956

    With the development of range sensors such as LIDAR and time-of-flight cameras, 3D point cloud scans have become ubiquitous in computer vision applications, the most prominent ones being gesture recognition and autonomous driving. Parsimony-based algorithms have shown great success on images and videos where data points are sampled on a regular Cartesian grid. We propose an adaptation of these techniques to irregularly sampled signals by using continuous dictionaries. We present an example application in the form of point cloud denoising.

    T. Remez, O. Litany, R. Giryes, A. Bronstein, Deep class-aware denoising, arXiv:1701.01698 details

    Deep class-aware denoising

    T. Remez, O. Litany, R. Giryes, A. Bronstein
    arXiv:1701.01698

    The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. In this paper, we demonstrate how the reconstruction quality improves when a denoiser is aware of the type of content in the image. To this end, we first propose a new fully convolutional deep neural network architecture which is simple yet powerful as it achieves state-of-the-art performance even without be- ing class-aware. We further show that a significant boost in performance of up to 0.4 dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.

    T. Remez, O. Litany, R. Giryes, A. Bronstein, Deep convolutional denoising of low-light images, arXiv:1701.01687 details

    Deep convolutional denoising of low-light images

    T. Remez, O. Litany, R. Giryes, A. Bronstein
    arXiv:1701.01687

    Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than ever due to the booming market for mobile cameras. Restricted form factor limits the amount of absorbed light, thus computational post-processing is called for. In this paper, we make use of the powerful framework of deep convolutional neural networks for Poisson denoising. We demonstrate how by training the same network with images having a specific peak value, our denoiser outperforms previous state-of-the-art by a large margin both visually and quantitatively. Being flexible and data-driven, our solution resolves the heavy ad hoc engineering used in previous methods and is an order of magnitude faster. We further show that by adding a reasonable prior on the class of the image being processed, another significant boost in performance is achieved.

    Y. Choukroun, A. Shtern, A. Bronstein, R. Kimmel, Hamiltonian operator for spectral shape analysis, arXiv:1611.01990 details

    Hamiltonian operator for spectral shape analysis

    Y. Choukroun, A. Shtern, A. Bronstein, R. Kimmel
    arXiv:1611.01990

    Many shape analysis methods treat the geometry of an object as a metric space that can be captured by the Laplace-Beltrami operator. In this paper, we propose to adapt the classical Hamiltonian operator from quantum me- chanics to the field of shape analysis. To this end we study the addition of a potential function to the Laplacian as a generator for dual spaces in which shape processing is performed. We present a general optimization approach for solving variational problems involving the basis defined by the Hamilto- nian using perturbation theory for its eigenvectors. The suggested operator is shown to produce better functional spaces to operate with, as demon- strated on different shape analysis tasks.

    T. Remez, O. Litany, S. Yoseff, H. Haim, A. Bronstein, FPGA system for real-time computational extended depth of field imaging using phase aperture coding, arXiv:1608.01074 details

    FPGA system for real-time computational extended depth of field imaging using phase aperture coding

    T. Remez, O. Litany, S. Yoseff, H. Haim, A. Bronstein
    arXiv:1608.01074

    We present a proof-of-concept end-to-end system for computational extended depth of field (EDOF) imaging. The acquisition is performed through a phase-coded aperture implemented by placing a thin wavelength-dependent op- tical mask inside the pupil of a conventional camera lens, as a result of which, each color channel is focused at a different depth. The reconstruction process re- ceives the raw Bayer image as the input, and performs blind estimation of the output color image in focus at an extended range of depths using a patch-wise sparse prior. We present a fast non-iterative reconstruction algorithm operating with constant latency in fixed-point arithmetics and achieving real-time perfor- mance in a prototype FPGA implementation. The output of the system, on simu- lated and real-life scenes, is qualitatively and quantitatively better than the result of clear-aperture imaging followed by state-of-the-art blind deblurring.

    O. Litany, T. Remez, A. Bronstein, Image reconstruction from dense binary pixels, arXiv:1512.01774
    T. Remez, O. Litany, A. Bronstein, A Picture is Worth a Billion Bits: Real-time image reconstruction from dense binary pixels, arXiv:1510.04601 details

    A Picture is Worth a Billion Bits: Real-time image reconstruction from dense binary pixels

    T. Remez, O. Litany, A. Bronstein
    arXiv:1510.04601

    The pursuit of smaller pixel sizes at ever-increasing resolution in digital image sensors is mainly driven by the stringent price and form-factor requirements of sensors and optics in the cellular phone market. Recently, Eric Fossum proposed a novel concept of an image sensor with dense sub-diffraction limit one-bit pixels (jots), which can be considered a digital emulation of silver halide photographic film. This idea has been recently embodied as the EPFL Gigavision camera. A major bottleneck in the design of such sensors is the image reconstruction process, producing a continuous high dynamic range image from oversampled bi- nary measurements. The extreme quantization of the Pois- son statistics is incompatible with the assumptions of most standard image processing and enhancement frameworks. The recently proposed maximum-likelihood (ML) approach addresses this difficulty, but suffers from image artifacts and has impractically high computational complexity. In this work, we study a variant of a sensor with binary thresh- old pixels and propose a reconstruction algorithm combin- ing an ML data fitting term with a sparse synthesis prior. We also show an efficient hardware-friendly real-time approximation of this inverse operator. Promising results are shown on synthetic data as well as on HDR data emulated using multiple exposures of a regular CMOS sensor.

    P. Sprechmann, A. M. Bronstein, G. Sapiro, Supervised non-negative matrix factorization for audio source separation, Chapter in Excursions in Harmonic Analysis (R. Balan, M. Begue, J. J. Benedetto, W. Czaja, K. Okoudjou Eds.), Birkhaeuser details

    Supervised non-negative matrix factorization for audio source separation

    P. Sprechmann, A. M. Bronstein, G. Sapiro
    Chapter in Excursions in Harmonic Analysis (R. Balan, M. Begue, J. J. Benedetto, W. Czaja, K. Okoudjou Eds.), Birkhaeuser
    Picture for Supervised non-negative matrix factorization for audio source separation

    Source separation is a widely studied problems in signal processing. Despite the permanent progress reported in the literature it is still considered a significant challenge. This chapter first reviews the use of non-negative matrix factorization (NMF) algorithms for solving source separation problems, and proposes a new way for the supervised training in NMF. Matrix factorization methods have received a lot of attention in recent year in the audio processing community, producing particularly good results in source separation. Traditionally, NMF algorithms consist of two separate stages: a training stage, in which a generative model is learned; and a testing stage in which the pre-learned model is used in a high level task such as enhancement, separation, or classification. As an alternative, we propose a tasksupervised NMF method for the adaptation of the basis spectra learned in the first stage to enhance the performance on the specific task used in the second stage. We cast this problem as a bilevel optimization program efficiently solved via stochastic gradient descent. The proposed approach is general enough to handle sparsity priors of the activations, and allow non-Euclidean data terms such as beta-divergences. The framework is evaluated on speech enhancement.

    Q. Qiu, G. Sapiro, A. M. Bronstein, Random forests can hash, arXiv:1412.5083 details

    Random forests can hash

    Q. Qiu, G. Sapiro, A. M. Bronstein
    arXiv:1412.5083

    Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.

    A. M. Bronstein, Spectral descriptors for deformable shapes, arXiv:1110.5015 details

    Spectral descriptors for deformable shapes

    A. M. Bronstein
    arXiv:1110.5015

    Informative and discriminative feature descriptors play a fundamental role in deformable shape analysis. For example, they have been successfully employed in correspondence, registration, and retrieval tasks. In the recent years, significant attention has been devoted to descriptors obtained from the spectral decomposition of the Laplace-Beltrami operator associated with the shape. Notable examples in this family are the heat kernel signature (HKS) and the wave kernel signature (WKS). Laplacian-based descriptors achieve state-of-the-art performance in numerous shape analysis tasks; they are computationally efficient, isometry-invariant by construction, and can gracefully cope with a variety of transformations. In this paper, we formulate a generic family of parametric spectral descriptors. We argue that in order to be optimal for a specific task, the descriptor should take into account the statistics of the corpus of shapes to which it is applied (the “signal”) and those of the class of transformations to which it is made insensitive (the “noise”). While such statistics are hard to model axiomatically, they can be learned from examples. Following the spirit of the Wiener filter in signal processing, we show a learning scheme for the construction of optimal spectral descriptors and relate it to Mahalanobis metric learning. The superiority of the proposed approach is demonstrated on the SHREC’10 benchmark.

    D. Raviv, A. M. Bronstein, M. M. Bronstein, R. Kimmel, N. Sochen,, Affine-invariant geodesic geometry of deformable 3D shapes, arXiv:1012.5936 details

    Affine-invariant geodesic geometry of deformable 3D shapes

    D. Raviv, A. M. Bronstein, M. M. Bronstein, R. Kimmel, N. Sochen,
    arXiv:1012.5936

    Natural objects can be subject to various transformations yet still preserve properties that we refer to as invariants. Here, we use definitions of affine invariant arclength for surfaces in R3 in order to extend the set of existing non-rigid shape analysis tools. In fact, we show that by re-defining the surface metric as its equi-affine version, the surface with its modified metric tensor can be treated as a canonical Euclidean object on which most classical Euclidean processing and analysis tools can be applied. The new definition of a metric is used to extend the fast marching method technique for computing geodesic distances on surfaces, where now, the distances are defined with respect to an affine invariant arclength. Applications of the proposed framework demonstrate its invariance, efficiency, and accuracy in shape analysis.

    D. Raviv, A. M. Bronstein, M. M. Bronstein, R. Kimmel, N. Sochen, Affine-invariant diffusion geometry for the analysis of deformable 3D shapes, arXiv:1012.5933 details

    Affine-invariant diffusion geometry for the analysis of deformable 3D shapes

    D. Raviv, A. M. Bronstein, M. M. Bronstein, R. Kimmel, N. Sochen
    arXiv:1012.5933

    We introduce an (equi-)affine invariant diffusion geometry by which surfaces that go through squeeze and shear transformations can still be properly analyzed. The definition of an affine invariant metric enables us to construct an invariant Laplacian from which local and global geometric structures are extracted. Applications of the proposed framework demon- strate its power in generalizing and enriching the existing set of tools for shape analysis.

    A. M. Bronstein, M. M. Bronstein, R. Kimmel, Numerical geometry of non-rigid shapes, Springer details

    Numerical geometry of non-rigid shapes

    A. M. Bronstein, M. M. Bronstein, R. Kimmel
    Springer

    Deformable objects are ubiquitous in the world surrounding us, on all levels from micro to macro. The need to study such shapes and model their behavior arises in a wide spectrum of applications, ranging from medicine to security. In recent years, non-rigid shapes have attracted growing interest, which has led to rapid development of the field, where state-of-the-art results from very different sciences – theoretical and numerical geometry, optimization, linear algebra, graph theory, machine learning and computer graphics, to mention several – are applied to find solutions.

    This book gives an overview of the current state of science in analysis and synthesis of non-rigid shapes. Everyday examples are used to explain concepts and to illustrate different techniques. The presentation unfolds systematically and numerous figures enrich the engaging exposition. Practice problems follow at the end of each chapter, with detailed solutions to selected problems in the appendix. A gallery of colored images enhances the text.

    This book will be of interest to graduate students, researchers and professionals in different fields of mathematics, computer science and engineering. It may be used for courses in computer vision, numerical geometry and geometric modeling and computer graphics or for self-study.