Learning Structured Compressed Sensing with Automatic Resource Allocation
thanks: This work was supported by the Thuringian Ministry of Economic Affairs, Science and Digital Society (TMWWDG).

Han Wang*,‡, Eduardo Pérez*,‡, Iris A. M. Huijben, Hans van Gorp, Ruud van Sloun, Florian Römer*,‡ *Applied AI Signal Processing and Data Analysis, Fraunhofer Institute for Nondestructive Testing IZFP, Germany
Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands
Electronic Measurements and Signal Processing, Ilmenau University of Technology, Germany
Email: [email protected]
Abstract

Multidimensional data acquisition often requires extensive time and poses significant challenges for hardware and software regarding data storage and processing. Rather than designing a single compression matrix as in conventional compressed sensing, structured compressed sensing yields dimension-specific compression matrices, reducing the number of optimizable parameters. Recent advances in machine learning (ML) have enabled task-based supervised learning of subsampling matrices, albeit at the expense of complex downstream models. Additionally, the sampling resource allocation across dimensions is often determined in advance through heuristics. To address these challenges, we introduce Structured COmpressed Sensing with Automatic Resource Allocation (SCOSARA) with an information theory-based unsupervised learning strategy. SCOSARA adaptively distributes samples across sampling dimensions while maximizing Fisher information content. Using ultrasound localization as a case study, we compare SCOSARA to state-of-the-art ML-based and greedy search algorithms. Simulation results demonstrate that SCOSARA can produce high-quality subsampling matrices that achieve lower Cramér Rao Bound values than the baselines. In addition, SCOSARA outperforms other ML-based algorithms in terms of the number of trainable parameters, computational complexity, and memory requirements while automatically choosing the number of samples per axis.

Index Terms:
Unsupervised Learning, Information Theory, Compressed Sensing, Subsampling.

I Introduction

Many practical problems involve multidimensional data with several axes that represent different domains [1], e.g. space, time, Doppler shift, color channels, azimuth and elevation angles, etc. [2, 3, 4]. Such data is information-rich, but quickly yields large data volumes and may involve time-consuming measurement procedures.

Compressed sensing (CS) [5] addresses the data volume and, if done judiciously, reduces the measurement time by measuring only a small number of linear projections of the data. The process of collecting these projections is represented through a rectangular matrix referred to as a compression matrix. Such compression matrices are classically designed as random matrices [6]. Such designs are optimal, but only asymptotically for large matrices or in expectation, and are also difficult to implement in hardware. This motivates the systematic design of optimal subsampling matrices, which is the goal of this paper.

Subsampling matrices are a particular case of compression matrices with one-hot rows, whereby only a subset of the original samples are measured [7, 8, 9, 10]. Although this results in loss of information when compared to random linear projections, the result is a CS scheme that is easily implemented through omission, planning, or reprogramming of the original measurement procedure without the need for additional hardware [11]. However, the design of subsampling matrices involves two combinatorial optimization subproblems: (I) “choosing k𝑘kitalic_k out of n𝑛nitalic_n items”, and (II) “distributing k𝑘kitalic_k items into q𝑞qitalic_q bins”. The present work addresses these problems on several fronts.

Problem (I) naturally arises when subsampling data. Advances in ML architectures and task-based learning have made it possible to use ML to learn a subsampling matrix[8, 12, 13, 14]. Due to the typically large volume of multidimensional data, task-based ML can be time-intensive or prohibitively resource-intensive. For example, a commonly-used deep unrolled neural network LISTA [15, 16] grows linearly with the number of unrolled iterations and also scales with the product of the dimension sizes of the input data (e.g. transmitters ×\times× receivers ×\times× time domain samples ×\times× size of the region of interest).

Problem (II) appears when structured CS of multidimensional data is applied [12], where each axis is compressed separately, turning a large subsampling problem into several smaller ones. More explicitly, the number of design parameters is reduced from the product of data dimension lengths to the sum of the data dimension lengths. However, the number of samples must now be distributed among the dimensions of the data. This resource allocation problem is often addressed through heuristics [8, 14, 12].

We alleviate the large-scale nature of problem (I) by replacing task-based learning with Fisher information maximization and exploiting the structure in the data. The Fisher Information Matrix (FIM) provides an alternative to task-based optimization [17], as commonly used in multistatic localization [18]. The FIM and its inverse, the Cramér-Rao Bound (CRB), are known to accurately describe the performance of CS methods when the signal-to-noise ratio is high and quantization error is low [19, 20, 21, 22, 23]. This makes the unsupervised maximization of the trace of the FIM an attractive alternative to task-based approaches in optimizing subsampling matrices. We address problem (II) by formulating the structured subsampling problem so that the number of samples per axis is learned automatically during training.

In this work, we propose a framework for the design of structured subsampling matrices by means of ML-based unsupervised maximization of the trace of the FIM. The framework, dubbed Structured COmpressed Sensing with Automatic Resource Allocation (SCOSARA), automatically distributes the samples among the axes of multidimensional data.

II Structured subsampling

Data acquisition is performed across q𝑞qitalic_q different dimensions, and the number of samples in the i𝑖iitalic_ith dimension is Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (with i=1,2,,q𝑖12𝑞i=1,2,\dots,qitalic_i = 1 , 2 , … , italic_q). The q𝑞qitalic_q-dimensional data array can be expressed as 𝐘N1×N2××Nq𝐘superscriptsubscript𝑁1subscript𝑁2subscript𝑁𝑞\mathbf{Y}\in\mathbb{R}^{N_{1}\times N_{2}\times\dots\times N_{q}}bold_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × ⋯ × italic_N start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, which can be vectorized as 𝐲NΠ𝐲superscriptsubscript𝑁Π\mathbf{y}\in\mathbb{C}^{N_{\Pi}}bold_y ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, with NΠ=Πi=1qNisubscript𝑁ΠsuperscriptsubscriptΠ𝑖1𝑞subscript𝑁𝑖N_{\Pi}=\Pi_{i=1}^{q}N_{i}italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the total number of samples. The sampling process is commonly represented as a linear transformation from the to-be-estimated signal 𝐱Ns𝐱superscriptsubscript𝑁s\mathbf{x}\in\mathbb{C}^{N_{\rm{s}}}bold_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to this measurement:

𝐲=𝐀𝐱+𝐧,𝐲𝐀𝐱𝐧\mathbf{y=Ax+n},\vspace{-0.1cm}bold_y = bold_Ax + bold_n , (1)

where 𝐧NΠ𝐧superscriptsubscript𝑁Π\mathbf{n}\in\mathbb{C}^{N_{\Pi}}bold_n ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represents circularly-symmetric Additive White Gaussian Noise (AWGN) following 𝒞𝒩(𝟎,𝚺)𝒞𝒩0𝚺\mathcal{CN}(\bm{0},\bm{\Sigma})caligraphic_C caligraphic_N ( bold_0 , bold_Σ ) and 𝐀NΠ×Ns𝐀superscriptsubscript𝑁Πsubscript𝑁s\mathbf{A}\in\mathbb{C}^{N_{\Pi}\times N_{\rm{s}}}bold_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the signal model.

We introduce a subsampling matrix 𝐒{0,1}MΠ×NΠ𝐒superscript01subscript𝑀Πsubscript𝑁Π\mathbf{S}\in\{0,1\}^{M_{\Pi}\times N_{\Pi}}bold_S ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT end_POSTSUPERSCRIPT that selects MΠsubscript𝑀ΠM_{\Pi}italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT out of NΠsubscript𝑁ΠN_{\Pi}italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT data samples thanks to its row-wise one-hot structure, resulting in subsampled measurement 𝐲sMΠsubscript𝐲ssuperscriptsubscript𝑀Π\mathbf{y}_{\rm{s}}\in\mathbb{C}^{M_{\Pi}}bold_y start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT end_POSTSUPERSCRIPT through

𝐲s=𝐒𝐲.subscript𝐲s𝐒𝐲\mathbf{y}_{\rm{s}}=\mathbf{S}\mathbf{y}.\vspace{-0.1cm}bold_y start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = bold_Sy . (2)

Structured COmpressed Sensing (SCOS) compresses data along each i𝑖iitalic_ith dimension separately. Structured subsampling – the focus of this work – is an instance of SCOS, where each i𝑖iitalic_ith axis is subsampled separately. To this end, we introduce subsampling matrices 𝐒i{0,1}Mi×Nisubscript𝐒𝑖superscript01subscript𝑀𝑖subscript𝑁𝑖\mathbf{S}_{i}\in\{0,1\}^{M_{i}\times N_{i}}bold_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, with Mi<Nisubscript𝑀𝑖subscript𝑁𝑖M_{i}<N_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This yields the following formulation:

𝐲s=(𝐒1𝐒2𝐒q)𝐲=𝐒𝐲.subscript𝐲stensor-productsubscript𝐒1subscript𝐒2subscript𝐒𝑞𝐲𝐒𝐲\mathbf{y}_{\rm{s}}=(\mathbf{S}_{1}\otimes\mathbf{S}_{2}\otimes\cdots\otimes% \mathbf{S}_{q})\mathbf{y}=\mathbf{Sy}.bold_y start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = ( bold_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ bold_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊗ ⋯ ⊗ bold_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) bold_y = bold_Sy . (3)

This reduces the problem of choosing MΠsubscript𝑀ΠM_{\Pi}italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT out of NΠsubscript𝑁ΠN_{\Pi}italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT samples into q𝑞qitalic_q problems where Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT out of Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT must be chosen. In general, (MΣ=i=1qMi)<MΠsubscript𝑀Σsuperscriptsubscript𝑖1𝑞subscript𝑀𝑖subscript𝑀Π(M_{\Sigma}=\sum_{i=1}^{q}M_{i})<M_{\Pi}( italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT and so the structured subsampling problem is simpler than the original one. However, the single hyperparameter MΠsubscript𝑀ΠM_{\Pi}italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT is replaced with q𝑞qitalic_q hyperparameters Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In the next section, we elaborate on how a sampling budget is distributed across the axis, i.e. how we set all Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in SCOS.

III Automatic Resource Allocation

Given a total sampling budget MΣ=i=1qMisubscript𝑀Σsuperscriptsubscript𝑖1𝑞subscript𝑀𝑖M_{\Sigma}=\sum_{i=1}^{q}M_{i}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the number of dimensions q𝑞qitalic_q in the data, the samples can be allocated in (M1q1)binomialsubscript𝑀1𝑞1\binom{M_{\sum}-1}{q-1}( FRACOP start_ARG italic_M start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_q - 1 end_ARG ) number of ways, resulting in a combinatorial resource allocation problem. Instead of heuristically specifying the active elements Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT per axis, we automatically learn how to distribute the sampling budget. We achieve this by modifying the ML-based subsampling approach proposed in [24] so that it automatically performs resource allocation while learning which samples to preserve.

Refer to caption
Figure 1: Flowchart of automatic resource allocation in SCOSARA. There are two parallel ways to obtain the subsampling matrices and the regularizer: the top path perturbs the logits with Gumbel noise to approximate the process of sampling from a categorical distribution, while the lower path directly uses the logits to produce the regularizer.

In [24], the design of subsampling matrices is treated as sampling without replacement from a categorical distribution. The log probabilities, or logits, of the categorical distribution are learned in a gradient-based fashion by employing the straight-through Gumbel estimator [25, 26]. In the forward direction, the Gumbel-max trick is employed, i.e. Gumbel noise Gumbel(0,1)Gumbel01~{}\text{Gumbel}(0,1)Gumbel ( 0 , 1 ) is added to the logits, followed by the application of argmax to sample from the distribution over elements in the subsampling matrix. During backpropagation, ‘soft samples’ are drawn from the Gumbel-softmax distribution by relaxing the non-differentiable argmax function with a softmax, allowing the usage of backpropagation for the optimization of the logits.

Based on this procedure, the subsampling matrices 𝐒isubscript𝐒𝑖\mathbf{S}_{i}bold_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be learned based only on the total budget MΣsubscript𝑀ΣM_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT, without specifying each Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT separately. To this end, we introduce a single vector ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ containing NΣ=i=1qNisubscript𝑁Σsuperscriptsubscript𝑖1𝑞subscript𝑁𝑖N_{\Sigma}=\sum_{i=1}^{q}N_{i}italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT logits. The entries of ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ are ordered in the same order the dimensions of 𝐘𝐘\mathbf{Y}bold_Y are vectorized in. The Gumbel-softmax trick is then used to obtain a differentiable approximation of the process of sampling MΣsubscript𝑀ΣM_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT items without replacement out of the total NΣsubscript𝑁ΣN_{\Sigma}italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT. To do so, Gumbel noise 𝒈NΣ𝒈superscriptsubscript𝑁Σ\bm{g}\in\mathbb{R}^{N_{\Sigma}}bold_italic_g ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝒈Gumbel(0,1)similar-to𝒈Gumbel(0,1)\bm{g}\sim\text{Gumbel(0,1)}bold_italic_g ∼ Gumbel(0,1) is added to the logits ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ, yielding the perturbed logits vector ϕ~NΣ~bold-italic-ϕsuperscriptsubscript𝑁Σ\tilde{\bm{\phi}}\in\mathbb{R}^{N_{\Sigma}}over~ start_ARG bold_italic_ϕ end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Next, the softmax function is applied to ϕ~~bold-italic-ϕ\tilde{\bm{\phi}}over~ start_ARG bold_italic_ϕ end_ARG. To obtain MΣsubscript𝑀ΣM_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT samples without replacement, the largest entry of is replaced by -\infty- ∞ (i.e. the probability of choosing the same entry again is set to 0), and then softmax is applied again [24]. This process is repeated until an auxiliary matrix 𝚽~~𝚽\tilde{\bm{\Phi}}over~ start_ARG bold_Φ end_ARG of size MΣ×NΣsubscript𝑀Σsubscript𝑁ΣM_{\Sigma}\times N_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT is obtained.

Since the entries of the logits vector ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ are ordered according to the vectorization of 𝐘𝐘\mathbf{Y}bold_Y, the auxiliary matrix 𝚽~~𝚽\tilde{\bm{\Phi}}over~ start_ARG bold_Φ end_ARG exhibits the structure

𝚽~=[𝐒1𝐍1,2𝐍1,q𝐍2,1𝐒2𝐍2,q𝐍q,1𝐍q,2𝐒q].~𝚽matrixsubscript𝐒1subscript𝐍12subscript𝐍1𝑞subscript𝐍21subscript𝐒2subscript𝐍2𝑞subscript𝐍𝑞1subscript𝐍𝑞2subscript𝐒𝑞\displaystyle\tilde{\bm{\Phi}}=\begin{bmatrix}\mathbf{S}_{1}&\mathbf{N}_{1,2}&% \cdots&\mathbf{N}_{1,q}\\ \mathbf{N}_{2,1}&\mathbf{S}_{2}&\cdots&\mathbf{N}_{2,q}\\ \vdots&\vdots&\ddots&\vdots\\ \mathbf{N}_{q,1}&\mathbf{N}_{q,2}&\cdots&\mathbf{S}_{q}\end{bmatrix}.over~ start_ARG bold_Φ end_ARG = [ start_ARG start_ROW start_CELL bold_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_N start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL bold_N start_POSTSUBSCRIPT 1 , italic_q end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_N start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL bold_N start_POSTSUBSCRIPT 2 , italic_q end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_N start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_N start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL bold_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (4)

The subsampling matrices 𝐒iMi×Nisubscript𝐒𝑖superscriptsubscript𝑀𝑖subscript𝑁𝑖\mathbf{S}_{i}\in\mathbb{R}^{M_{i}\times N_{i}}bold_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT appear along the diagonal blocks of 𝚽~~𝚽\tilde{\bm{\Phi}}over~ start_ARG bold_Φ end_ARG, whereas the off-diagonal blocks 𝐍i,jMi×Nj,1i,jqformulae-sequencesubscript𝐍𝑖𝑗superscriptsubscript𝑀𝑖subscript𝑁𝑗formulae-sequence1𝑖𝑗𝑞\mathbf{N}_{i,j}\in\mathbb{R}^{M_{i}\times N_{j}},1\leq i,j\leq qbold_N start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , 1 ≤ italic_i , italic_j ≤ italic_q, contain only nuisance terms that are nonzero due to the usage of softmax.

Two crucial observations must be made regarding 𝚽~~𝚽\tilde{\bm{\Phi}}over~ start_ARG bold_Φ end_ARG. First, the sizes of the diagonal blocks are unknown because the Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are unknown. Second, since the entries of ϕ~~bold-italic-ϕ\tilde{\bm{\phi}}over~ start_ARG bold_italic_ϕ end_ARG change when the noise 𝐠𝐠\mathbf{g}bold_g is added and when ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ is modified through backpropagation, the matrix 𝚽~~𝚽\tilde{\bm{\Phi}}over~ start_ARG bold_Φ end_ARG is not available in practice. Instead, the order the samples are drawn in, is unknown and only 𝚷𝚽~𝚷~𝚽\bm{\Pi}\tilde{\bm{\Phi}}bold_Π over~ start_ARG bold_Φ end_ARG can be obtained, where 𝚷MΣ×MΣ𝚷superscriptsubscript𝑀Σsubscript𝑀Σ\bm{\Pi}\in\mathbb{R}^{M_{\Sigma}\times M_{\Sigma}}bold_Π ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT × italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is an unknown permutation matrix. Both problems are addressed by computing 𝚿~=(𝚷𝚽~)T(𝚷𝚽~)=𝚽~T𝚽~NΣ×NΣ~𝚿superscript𝚷~𝚽T𝚷~𝚽superscript~𝚽T~𝚽superscriptsubscript𝑁Σsubscript𝑁Σ\tilde{\bm{\Psi}}=(\bm{\Pi}\tilde{\bm{\Phi}})^{\mathrm{T}}(\bm{\Pi}\tilde{\bm{% \Phi}})=\tilde{\bm{\Phi}}^{\text{T}}\tilde{\bm{\Phi}}\in\mathbb{R}^{N_{\Sigma}% \times N_{\Sigma}}over~ start_ARG bold_Ψ end_ARG = ( bold_Π over~ start_ARG bold_Φ end_ARG ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ( bold_Π over~ start_ARG bold_Φ end_ARG ) = over~ start_ARG bold_Φ end_ARG start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT over~ start_ARG bold_Φ end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, yielding a matrix of the general form

𝚿~=[𝐒1T𝐒1+𝐍1𝐍1,2𝐍1,q𝐍2,1𝐒2T𝐒2+𝐍2𝐍2,q𝐍q,1𝐍q,2𝐒qT𝐒q+𝐍q].~𝚿matrixsuperscriptsubscript𝐒1Tsubscript𝐒1subscriptsuperscript𝐍1subscriptsuperscript𝐍12subscriptsuperscript𝐍1𝑞subscriptsuperscript𝐍21superscriptsubscript𝐒2Tsubscript𝐒2subscriptsuperscript𝐍2subscriptsuperscript𝐍2𝑞subscriptsuperscript𝐍𝑞1subscriptsuperscript𝐍𝑞2superscriptsubscript𝐒𝑞Tsubscript𝐒𝑞subscriptsuperscript𝐍𝑞\displaystyle\tilde{\bm{\Psi}}=\begin{bmatrix}\mathbf{S}_{1}^{\text{T}}\mathbf% {S}_{1}+\mathbf{N}^{\prime}_{1}&\mathbf{N}^{\prime}_{1,2}&\cdots&\mathbf{N}^{% \prime}_{1,q}\\ \mathbf{N}^{\prime}_{2,1}&\mathbf{S}_{2}^{\text{T}}\mathbf{S}_{2}+\mathbf{N}^{% \prime}_{2}&\cdots&\mathbf{N}^{\prime}_{2,q}\\ \vdots&\vdots&\ddots&\vdots\\ \mathbf{N}^{\prime}_{q,1}&\mathbf{N}^{\prime}_{q,2}&\cdots&\mathbf{S}_{q}^{% \text{T}}\mathbf{S}_{q}+\mathbf{N}^{\prime}_{q}\end{bmatrix}.over~ start_ARG bold_Ψ end_ARG = [ start_ARG start_ROW start_CELL bold_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 , italic_q end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 , italic_q end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT end_CELL start_CELL bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL bold_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT + bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (5)

The block matrices along the diagonal of 𝚿~~𝚿\tilde{\bm{\Psi}}over~ start_ARG bold_Ψ end_ARG, i.e. 𝐒~i=𝐒iT𝐒i+𝐍iNi×Nisubscript~𝐒𝑖superscriptsubscript𝐒𝑖Tsubscript𝐒𝑖subscriptsuperscript𝐍𝑖superscriptsubscript𝑁𝑖subscript𝑁𝑖\tilde{\mathbf{S}}_{i}=\mathbf{S}_{i}^{\mathrm{T}}\mathbf{S}_{i}+\mathbf{N}^{% \prime}_{i}\in\mathbb{R}^{N_{i}\times N_{i}}over~ start_ARG bold_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, can then be used to construct

𝐒~=𝐒~1𝐒~2𝐒~qNΠ×NΠ.~𝐒tensor-productsubscript~𝐒1subscript~𝐒2subscript~𝐒𝑞superscriptsubscript𝑁Πsubscript𝑁Π\displaystyle\vspace{-0.4cm}\tilde{\mathbf{S}}=\tilde{\mathbf{S}}_{1}\otimes% \tilde{\mathbf{S}}_{2}\otimes\cdots\otimes\tilde{\mathbf{S}}_{q}\in\mathbb{R}^% {N_{\Pi}\times N_{\Pi}}.\vspace{-0.4cm}over~ start_ARG bold_S end_ARG = over~ start_ARG bold_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ over~ start_ARG bold_S end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊗ ⋯ ⊗ over~ start_ARG bold_S end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (6)

The structure of 𝚿~~𝚿\tilde{\bm{\Psi}}over~ start_ARG bold_Ψ end_ARG also motivates the following observation. Consider the unperturbed logits ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ and apply the Gumbel-softmax trick without the noise 𝐠𝐠\mathbf{g}bold_g so as to obtain a new auxiliary matrix 𝚽𝚽\bm{\Phi}bold_Φ, and from it, 𝚿=𝚽T𝚽𝚿superscript𝚽T𝚽\bm{\Psi}=\bm{\Phi}^{\text{T}}\bm{\Phi}bold_Ψ = bold_Φ start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_Φ. Since the rows of 𝚽𝚽{\bm{\Phi}}bold_Φ are non-negative and sum to one due to the softmaxsoftmax\operatorname*{softmax}roman_softmax function, maximizing Trace(𝚿)Trace𝚿\mathrm{Trace}(\bm{\Psi})roman_Trace ( bold_Ψ ) (equivalently, 𝚽F2superscriptsubscriptnorm𝚽F2\|{\bm{\Phi}}\|_{\mathrm{F}}^{2}∥ bold_Φ ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) reduces the magnitude of the off-diagonal elements of 𝚿𝚿\bm{\Psi}bold_Ψ, thereby aligning the forward and backward passes of the straight-through estimator. This term can thus be used as a regularizer when optimizing the logits.

The overall procedure described throughout this section is illustrated in Figure 1. In the figure, the process of sampling without replacement is depicted by the repetition of the perturbed logits vector ϕ~~bold-italic-ϕ\tilde{\bm{\phi}}over~ start_ARG bold_italic_ϕ end_ARG MΣsubscript𝑀ΣM_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT times, followed by the addition of a masking matrix 𝐖𝐖\mathbf{W}bold_W of size MΣ×NΣsubscript𝑀Σsubscript𝑁ΣM_{\Sigma}\times N_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT whose entries are taken from {,0}0\{-\infty,0\}{ - ∞ , 0 }. The upper branch of the figure uses Gumbel noise 𝐠𝐠\mathbf{g}bold_g to construct the matrix 𝚿~~𝚿\tilde{\bm{\Psi}}over~ start_ARG bold_Ψ end_ARG used for sampling, while the lower branch is noiseless and yields the regularizing matrix 𝚿𝚿\bm{\Psi}bold_Ψ.

IV Optimization Target

Replacing task-based optimization because of its time and resource-intensive nature, we take FIM and CRB into account. The CRB is a lower bound for the variance of unbiased estimators [17]. However, CS methods are biased, the sources of the bias being amplitude errors [21], parameter discretization errors [22], and the enforcing of sparse support [23]. In spite of this, the cited works illustrate that the CRB correctly describes the behavior of practical CS estimators when the signal-to-noise ratio is high and the sampling grid is fine enough.

The CRB is computed as the inverse of FIM, which quantifies the amount of information that an observable random variable carries about the parameters of its distribution. The FIM is also closely related to the Restricted Isometry Property (RIP) [19]. The inter-column coherence of a model matrix is inversely proportional to the eigenvalues of the FIM [23, 20]. Similarly, the Restricted Isometry Constant (RIC) involved in the RIP is smaller when the eigenvalues of the FIM are large.

Therefore, instead of the downstream task such as directly computing the signal recovery, we try to maximize the trace of FIM for two reasons. First, it largely reduces the network and training complexity since it is estimator-agnostic. Second, the A-optimality criterion, which refers to the minimization of the trace of the CRB, is a commonly employed optimization target in the design of experiments. However, in the present setting, directly using the CRB would require multiple matrix inversions during training. The maximization of the trace of FIM can be seen as a heuristic for the CRB, as it can be formulated in terms of the harmonic mean of the eigenvalues of the CRB matrix and is therefore closely related to A-optimality.

In the derivation of the cost function, we let 𝐀𝐱𝐀𝐱\mathbf{Ax}bold_Ax obey a differentiable parametric model 𝐛𝐛\mathbf{b}bold_b with a real-valued parameter vector 𝝃𝝃\bm{\xi}bold_italic_ξ, so that the subsampled data 𝐲ssubscript𝐲s\mathbf{y}_{\rm{s}}bold_y start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT can be reformulated as:

𝐲s=𝐒(𝐀𝐱+𝐧)=𝐒(𝐛(𝝃)+𝐧)=𝝁+𝐧~,subscript𝐲s𝐒𝐀𝐱𝐧𝐒𝐛𝝃𝐧𝝁~𝐧\vspace{-0.2cm}\mathbf{y}_{\rm{s}}=\mathbf{S(Ax+n)}=\mathbf{S(b(\bm{\xi})+n)}=% \bm{\mu}+\tilde{\mathbf{n}},bold_y start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = bold_S ( bold_Ax + bold_n ) = bold_S ( bold_b ( bold_italic_ξ ) + bold_n ) = bold_italic_μ + over~ start_ARG bold_n end_ARG , (7)

where 𝐛𝐛\mathbf{b}bold_b denotes the noiseless fully sampled data and 𝝁𝝁\bm{\mu}bold_italic_μ corresponds to the noiseless compressed data. The noise 𝐧𝐧\mathbf{n}bold_n is circularly-symmetric AWGN characterized by the distribution 𝒞𝒩(𝟎,𝚺)𝒞𝒩0𝚺\mathcal{CN}(\bm{0},\bm{\Sigma})caligraphic_C caligraphic_N ( bold_0 , bold_Σ ) and 𝚺=σ2𝐈𝚺superscript𝜎2𝐈\bm{\Sigma}=\sigma^{2}\mathbf{I}bold_Σ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I. Instead, since we can only compute 𝐒~~𝐒\tilde{\mathbf{S}}over~ start_ARG bold_S end_ARG from SCOSARA, following the notation in (6), we rewrite this as:

𝐲s=𝐒~(𝐛(𝝃)+𝐧)=𝝁~+𝐧~,subscript𝐲s~𝐒𝐛𝝃𝐧~𝝁~𝐧\vspace{-0.2cm}\mathbf{y}_{\rm{s}}=\tilde{\mathbf{S}}\mathbf{(b(\bm{\xi})+n)}=% \tilde{\bm{\mu}}+\tilde{\mathbf{n}},bold_y start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = over~ start_ARG bold_S end_ARG ( bold_b ( bold_italic_ξ ) + bold_n ) = over~ start_ARG bold_italic_μ end_ARG + over~ start_ARG bold_n end_ARG , (8)

where the covariance of 𝐧~~𝐧\tilde{\mathbf{n}}over~ start_ARG bold_n end_ARG is approximately given by σ2𝐒𝐒Tsuperscript𝜎2superscript𝐒𝐒T\sigma^{2}\mathbf{SS}^{\text{T}}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_SS start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT. Based on the Slepian-Bangs formalism, the FIM can be written as

𝓙𝓙\displaystyle\bm{\mathcal{J}}bold_caligraphic_J 2σ2{(𝝁𝝃)H(𝝁𝝃)}=2σ2{𝑱bH𝐒~T𝐒~𝑱b}absent2superscript𝜎2superscript𝝁𝝃H𝝁𝝃2superscript𝜎2superscriptsubscript𝑱𝑏Hsuperscript~𝐒T~𝐒subscript𝑱𝑏\displaystyle\approx\frac{2}{\sigma^{2}}\mathfrak{R}\left\{\left(\frac{% \partial\bm{\mu}}{\partial\bm{\xi}}\right)^{\text{H}}\left(\frac{\partial\bm{% \mu}}{\partial\bm{\xi}}\right)\right\}=\frac{2}{\sigma^{2}}\mathfrak{R}\left\{% \bm{J}_{b}^{\text{H}}\tilde{\mathbf{S}}^{\text{T}}\tilde{\mathbf{S}}\bm{J}_{b}\right\}≈ divide start_ARG 2 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG fraktur_R { ( divide start_ARG ∂ bold_italic_μ end_ARG start_ARG ∂ bold_italic_ξ end_ARG ) start_POSTSUPERSCRIPT H end_POSTSUPERSCRIPT ( divide start_ARG ∂ bold_italic_μ end_ARG start_ARG ∂ bold_italic_ξ end_ARG ) } = divide start_ARG 2 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG fraktur_R { bold_italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT H end_POSTSUPERSCRIPT over~ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT over~ start_ARG bold_S end_ARG bold_italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT } (9)

where ()HsuperscriptH(\cdot)^{\text{H}}( ⋅ ) start_POSTSUPERSCRIPT H end_POSTSUPERSCRIPT is the conjugate transpose, \mathfrak{R}fraktur_R refers to extracting the real-valued part and 𝑱bsubscript𝑱𝑏\bm{J}_{b}bold_italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT stands for the Jacobian matrix of data 𝐛𝐛\mathbf{b}bold_b with respect to parameters 𝝃𝝃\bm{\xi}bold_italic_ξ. Note that in the case when multiple targets coexist, the CRB of a single target can also adequately describe the estimation problem when they are well-spaced. In summary, the FIM-driven SCOSARA aims to maximize Fisher information, resulting in the following cost function:

=argminϕTrace(𝓙)Trace(𝚿).subscriptargminbold-italic-ϕTrace𝓙Trace𝚿\mathcal{L}=\operatorname*{argmin}_{\bm{\phi}}\,-\mathrm{Trace}(\bm{\mathcal{J% }})-\mathrm{Trace}(\bm{\Psi}).caligraphic_L = roman_argmin start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT - roman_Trace ( bold_caligraphic_J ) - roman_Trace ( bold_Ψ ) . (10)

V Evaluation

As an illustrative example of multidimensional data acquisition, we detail the application of the SCOS within the context of multichannel ultrasound localization. The simulation scenarios and experimental settings are similar to [12], but we use a larger data model whose parameters are summarized in TABLE I. The task is to localize a single scatterer in the Region of Interest (ROI) whose parameter vector can be formulated by 𝝃=[x,z,a,φ]𝝃𝑥𝑧𝑎𝜑\bm{\xi}=[x,z,a,\varphi]bold_italic_ξ = [ italic_x , italic_z , italic_a , italic_φ ], which corresponds to its two-dimensional coordinates, reflectivity coefficient, and phase. The dataset is generated by assuming a single scatterer with a varying reflectivity coefficient is randomly located in a given ROI.

TABLE I: Model parameters summary
Parameter NTsubscript𝑁TN_{\rm{T}}italic_N start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT NRsubscript𝑁RN_{\rm{R}}italic_N start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT NFsubscript𝑁FN_{\rm{F}}italic_N start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT Nsubscript𝑁N_{\sum}italic_N start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT NΠsubscript𝑁ΠN_{\Pi}italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT Msubscript𝑀M_{\sum}italic_M start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT
Value 64646464 64646464 113113113113 241241241241 462848462848462848462848 30220similar-to3022030\sim 22030 ∼ 220

We design/learn three subsampling matrices for reducing transmitters, receivers, and Fourier coefficients, denoted 𝐒TMT×NTsubscript𝐒Tsuperscriptsubscript𝑀Tsubscript𝑁T\mathbf{S}_{\rm{T}}\in\mathbb{R}^{M_{\rm{T}}\times N_{\rm{T}}}bold_S start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝐒RMR×NRsubscript𝐒Rsuperscriptsubscript𝑀Rsubscript𝑁R\mathbf{S}_{\rm{R}}\in\mathbb{R}^{M_{\rm{R}}\times N_{\rm{R}}}bold_S start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and 𝐒FMF×NFsubscript𝐒Fsuperscriptsubscript𝑀Fsubscript𝑁F\mathbf{S}_{\rm{F}}\in\mathbb{R}^{M_{\rm{F}}\times N_{\rm{F}}}bold_S start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. They compress all three axes while retaining as much information about scatterer locations as possible. To demonstrate the performance of SCOSARA, we compare it against Uniform Compression and other SOTA algorithms, including ML-based methods DPS-topK [8] and J-DPS [12], and a Greedy Search algorithm. To provide comprehensive and fair comparisons, we consider the following points in the evaluation:

  • We trained SCOSARA first and then used its resource allocation scheme for other algorithms for two reasons: the baseline methods provide no resource allocation scheme, and the compression factor MΠ/NΠsubscript𝑀Πsubscript𝑁Π{M_{\Pi}}/{N_{\Pi}}italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT / italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT should be consistent.

  • We test 20202020 values of Msubscript𝑀M_{\sum}italic_M start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT from 30303030 to 220220220220 at intervals of 10101010, each has a distinct compression factor.

Baselines: The Uniform Compression scheme implies the equally spaced allocation of MTsubscript𝑀TM_{\rm{T}}italic_M start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT choices among NTsubscript𝑁TN_{\rm{T}}italic_N start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT, and so forth. The DPS-TopK refers to optimizing a vector of logits ϕΠNΠsubscriptbold-italic-ϕΠsuperscriptsubscript𝑁Π\bm{\phi}_{\Pi}\in\mathbb{R}^{N_{\Pi}}bold_italic_ϕ start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and using the Gumbel-topKtopK\operatorname*{topK}roman_topK trick in the forward pass to select MΣsubscript𝑀ΣM_{\Sigma}italic_M start_POSTSUBSCRIPT roman_Σ end_POSTSUBSCRIPT elements set to 1111. The J-DPS divides the long vector into three smaller dimension-specific ones. The Greedy Search algorithm iteratively discards the entry of ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ which reduces Trace(𝒥)Trace𝒥\rm{Trace}(\mathcal{J})roman_Trace ( caligraphic_J ) the least until the desired compression factor is achieved.

Results: We used an NVIDIA A100 GPU node to run all experiments, the DPS-TopK failed to operate due to computational constraints because it optimizes a large logits vector ϕΠsubscriptbold-italic-ϕΠ\bm{\phi}_{\Pi}bold_italic_ϕ start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT, which requires the generation of large matrices of shape (MΠ×NΠ)subscript𝑀Πsubscript𝑁Π(M_{\Pi}\times N_{\Pi})( italic_M start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT ) in the computational graph. After running other algorithms given all the values of Msubscript𝑀M_{\sum}italic_M start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT, at each compression factor, we apply the four SCOS schemes to the evaluation dataset and compute the CRB values based on their compressed data. The resulting four curves are illustrated in Fig. 2, which clearly shows that SCOSARA outperforms the other methods (i.e. it has the lowest CRB) for all compression factors.

Furthermore, we also evaluate the recovery performance in the multi-scatterer case. Knowing the distance of two pixels is approximately equal to half the wavelength in the simulation, we define three pairs of scatterers whose distances are two, three, and four pixels, respectively. We applied the four SCOS schemes given M=120subscript𝑀120M_{\sum}=120italic_M start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT = 120 along with the 2000200020002000-iteration complex FISTA, obtaining the reconstructed images. We computed the MSE ϵitalic-ϵ\epsilonitalic_ϵ and the number of nonzero elements 0\|\cdot\|_{0}∥ ⋅ ∥ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for quantitative comparison. As illustrated in Fig. 3, SCOSARA outperforms with respect to image quality and localization accuracy.

Refer to caption
Figure 2: CRB as a function of the compression factor (higher factor denotes more selected samples).
Refer to caption
(a) Uniform compression
Refer to caption
(b) Greedy search
Refer to caption
(c) J-DPS
Refer to caption
(d) SCOSARA
Figure 3: Reconstructed images comparison in the multi-scatterer cases when M=120subscript𝑀120M_{\sum}=120italic_M start_POSTSUBSCRIPT ∑ end_POSTSUBSCRIPT = 120. The ground truth scatterers are represented by red circles.

VI Conclusion

In this work, we introduce the concept of FIM-driven SCOSARA, which can automatically allocate the sampling resources to each dimension while maximizing the Fisher information. Using ultrasound multichannel localization as a case study, we quantitatively compare SCOSARA with other baseline algorithms, where it shows superior performance in terms of CRB analysis and recovery performance. Preliminary results not included in this work show that the SCOSARA can be extended to allow preferential compression of user-specified axes by replacing the regularization term Trace{𝚿}Trace𝚿\text{Trace}\{\bm{\Psi}\}Trace { bold_Ψ } with Trace{𝐃𝚿}Trace𝐃𝚿\text{Trace}\{\mathbf{D}\bm{\Psi}\}Trace { bold_D bold_Ψ }, where 𝐃𝐃\mathbf{D}bold_D is a diagonal matrix. These results will be presented along with experiments on measurement data and comparisons against task-based algorithms by involving convolutional sparse coding algorithms [27] in a future manuscript.

References

  • [1] G. Dzemyda, O. Kurasova, and J. Zilinskas. Multidimensional data visualization. Methods and applications series: Springer optimization and its applications, 75(122):10–5555, 2013.
  • [2] J. Li and P. Stoica. MIMO radar signal processing. John Wiley & Sons, 2008.
  • [3] M. T. Vlaardingerbroek and J. A. Boer. Magnetic resonance imaging: theory and practice. Springer Science & Business Media, 2013.
  • [4] C. Holmes, B. W. Drinkwater, and P. D. Wilcox. Post-processing of the full matrix of ultrasonic transmit–receive array data for non-destructive evaluation. NDT & e International, 38(8):701–711, 2005.
  • [5] Y. C. Eldar and G. Kutyniok. Compressed sensing: theory and applications. Cambridge university press, 2012.
  • [6] D. L. Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
  • [7] M. Lustig, D. Donoho, and J. M. Pauly. Sparse mri: The application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, 58(6):1182–1195, 2007.
  • [8] I. A. Huijben, B. S. Veeling, K. Janse, M. Mischi, and R. JG. van Sloun. Learning sub-sampling and signal recovery with applications in ultrasound imaging. IEEE Transactions on Medical Imaging, 39(12):3955–3966, 2020.
  • [9] J. Kirchhof, S. Semper, C. W. Wagner, E. Pérez, F. Römer, and G. Del Galdo. Frequency subsampling of ultrasound nondestructive measurements: acquisition, reconstruction, and performance. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 68(10):3174–3191, 2021.
  • [10] H. Wang, E. Pérez, and F. Römer. Data-driven subsampling matrices design for phased array ultrasound nondestructive testing. In 2023 IEEE International Ultrasonics Symposium (IUS). IEEE, 2023.
  • [11] E. Pérez, J. Kirchhof, F. Krieg, and F. Römer. Subsampling approaches for compressed sensing with ultrasound arrays in non-destructive testing. Sensors, 20(23):6734, 2020.
  • [12] H. Wang, E. Zhou, Y.and Pérez, and F. Römer. Jointly learning selection matrices for transmitters, receivers and Fourier coefficients in multichannel imaging. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8691–8695, 2024.
  • [13] S. Mulleti, H. Zhang, and Y. C. Eldar. Learning to sample: Data-driven sampling and reconstruction of FRI signals. IEEE Access, 2023.
  • [14] H. Wang, E. Pérez, and F. Römer. Deep learning-based optimal spatial subsampling in ultrasound nondestructive testing. In 2023 31st European Signal Processing Conference (EUSIPCO), pages 1863–1867. IEEE, 2023.
  • [15] V. Monga, Y. Li, and Y. C. Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine, 38(2):18–44, 2021.
  • [16] K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on international conference on machine learning, pages 399–406, 2010.
  • [17] S. M. Kay. Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., 1993.
  • [18] G. Fatima, P. Stoica, A. Aubry, A. De Maio, and P. Babu. Optimal placement of the receivers for multistatic target localization. IEEE Transactions on Radar Systems, 2024.
  • [19] J. D. Blanchard, C. Cartis, and J. Tanner. Compressed sensing: How sharp is the restricted isometry property? SIAM review, 53(1):105–125, 2011.
  • [20] C. D. Austin, E. Ertin, J. N. Ash, and R. L. Moses. On the relation between sparse sampling and parametric estimation. In 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop, pages 387–392. IEEE, 2009.
  • [21] C. D. Austin, J. N. Ash, and R. L. Moses. Dynamic dictionary algorithms for model order and parameter estimation. IEEE Transactions on Signal Processing, 61(20):5117–5130, 2013.
  • [22] Y. Chi, L. L. Scharf, A. Pezeshki, and A. R. Calderbank. Sensitivity to basis mismatch in compressed sensing. IEEE Transactions on Signal Processing, 59(5):2182–2195, 2011.
  • [23] Z. Ben-Haim and Y. C. Eldar. The cramér-rao bound for estimating a sparse parameter vector. IEEE Transactions on Signal Processing, 58(6):3384–3389, 2010.
  • [24] Iris Huijben, Bastiaan S. Veeling, and Ruud JG. van Sloun. Deep probabilistic subsampling for task-adaptive compressed sensing. In 8th International Conference on Learning Representations, ICLR 2020, 2020.
  • [25] E. Jang, S. Gu, and B. Poole. Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations (ICLR 2017). OpenReview. net, 2017.
  • [26] C. Maddison, A. Mnih, and Y. Teh. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the International Conference on Learning Representations. International Conference on Learning Representations, 2017.
  • [27] H. Wang, Kvich Y., E. Pérez, F. Römer, and Y. C. Eldar. Efficient convolutional forward modeling and sparse coding in multichannel imaging. In 2024 32st European Signal Processing Conference (EUSIPCO). IEEE, 2024.