Related papers: Efficient CNN with uncorrelated Bag of Features pooling

Efficient CNN with uncorrelated Bag of Features pooling

URL: http://arxiv.org/abs/2209.10865v1
Date: Thu, 22 Sep 2022 09:00:30 GMT
Title: Efficient CNN with uncorrelated Bag of Features pooling
Authors: Firas Laakom, Jenni Raitoharju, Alexandros Iosifidis and Moncef Gabbouj
Abstract summary: Bag of Features (BoF) has been recently proposed to reduce the complexity of convolution layers. We propose an approach that builds on top of BoF pooling to boost its efficiency by ensuring that the items of the learned dictionary are non-redundant. The proposed strategy yields an efficient variant of BoF and further boosts its performance, without any additional parameters.
Score: 98.78384185493624
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Despite the superior performance of CNN, deploying them on low computational power devices is still limited as they are typically computationally expensive. One key cause of the high complexity is the connection between the convolution layers and the fully connected layers, which typically requires a high number of parameters. To alleviate this issue, Bag of Features (BoF) pooling has been recently proposed. BoF learns a dictionary, that is used to compile a histogram representation of the input. In this paper, we propose an approach that builds on top of BoF pooling to boost its efficiency by ensuring that the items of the learned dictionary are non-redundant. We propose an additional loss term, based on the pair-wise correlation of the items of the dictionary, which complements the standard loss to explicitly regularize the model to learn a more diverse and rich dictionary. The proposed strategy yields an efficient variant of BoF and further boosts its performance, without any additional parameters.

Related papers

An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT [59.245414547751636]
We propose a circuit discovery framework alternative to activation patching. Our framework suffers less from out-of-distribution and proves to be more efficient in terms of complexity. We dig in a small transformer trained on a synthetic task named Othello and find a number of human-understandable fine-grained circuits inside of it.
arXiv Detail & Related papers (2024-02-19T15:04:53Z)
NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition [39.308634515653914]
We advocate a novel lexical enhancement method, InterFormer, that effectively reduces the amount of computational and memory costs. Compared with FLAT, it reduces unnecessary attention calculations in "word-character" and "word-word" This reduces the memory usage by about 50% and can use more extensive lexicons or higher batches for network training.
arXiv Detail & Related papers (2022-05-12T01:55:37Z)
Highly Parallel Autoregressive Entity Linking with Discriminative Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions. Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z)
NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs [15.289356276538662]
We propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary. In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types. Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks.
arXiv Detail & Related papers (2021-06-23T03:51:03Z)
PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation [4.081440927534577]
This paper offers the first theoretical proof for empirical results through PUDLE, a Provable Unfolded Dictionary LEarning method. We highlight the minimization impact of loss, unfolding, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments.
arXiv Detail & Related papers (2021-05-31T18:49:58Z)
The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification [0.0]
temporal dictionary ensemble (TDE) is more accurate than other dictionary based approaches. We show HIVE-COTE is significantly more accurate than the current best deep learning approach. This advance represents a new state of the art for time series classification.
arXiv Detail & Related papers (2021-05-09T05:27:42Z)
Text Information Aggregation with Centrality Attention [86.91922440508576]
We propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. We build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word.
arXiv Detail & Related papers (2020-11-16T13:08:48Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one. With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.