机器学习学术速递[1.10]

格林先生MrGreen arXiv每日学术速递 2022-05-05

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计43篇

Graph相关(图学习|图神经网络|图优化等)(2篇)

【1】 MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs
标题：MGAE：用于图的自监督学习的屏蔽自动编码器
链接：https://arxiv.org/abs/2201.02534

作者：Qiaoyu Tan,Ninghao Liu,Xiao Huang,Rui Chen,Soo-Hyun Choi,Xia Hu
摘要：我们介绍了一种新的蒙蔽图自动编码器（MGAE）框架，对图结构数据进行有效的学习。从自我监督学习中，我们随机屏蔽了大部分边缘，并试图在训练期间重建这些缺失的边缘。MGAE有两个核心设计。首先，我们发现屏蔽高比率的输入图结构（例如$70\%$）会产生一个非常重要且有意义的自我监督任务，这有利于下游应用程序。其次，我们使用一个图神经网络（GNN）作为编码器，在部分屏蔽图上执行消息传播。为了重建大量的掩蔽边缘，提出了一种定制的互相关解码器。它可以在多粒度上捕获锚边缘的头和尾节点之间的互相关。将这两种设计结合起来，可以有效地训练MGAE。在多个开放数据集（Planetoid和OGB基准）上进行的大量实验表明，MGAE在链路预测和节点分类方面通常优于最先进的无监督学习竞争对手。
摘要：We introduce a novel masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data. Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training. MGAE has two core designs. First, we find that masking a high ratio of the input graph structure, e.g., $70\%$, yields a nontrivial and meaningful self-supervisory task that benefits downstream applications. Second, we employ a graph neural network (GNN) as an encoder to perform message propagation on the partially-masked graph. To reconstruct the large number of masked edges, a tailored cross-correlation decoder is proposed. It could capture the cross-correlation between the head and tail nodes of anchor edge in multi-granularity. Coupling these two designs enables MGAE to be trained efficiently and effectively. Extensive experiments on multiple open datasets (Planetoid and OGB benchmarks) demonstrate that MGAE generally performs better than state-of-the-art unsupervised learning competitors on link prediction and node classification.

【2】 Budget-aware Few-shot Learning via Graph Convolutional Network
标题：基于图卷积网络的预算感知小概率学习
链接：https://arxiv.org/abs/2201.02304

作者：Shipeng Yan,Songyang Zhang,Xuming He
机构：ShanghaiTech University
摘要：本文讨论了少数镜头学习的问题，旨在从几个例子中学习新的视觉概念。少数镜头分类中的一个常见问题是在获取数据标签时采用随机抽样策略，这在实际应用中效率低下。在这项工作中，我们引入了一个新的预算感知Few-Shot学习问题，该问题不仅旨在学习新的对象类别，而且还需要选择信息性示例进行注释，以实现数据效率。我们为预算感知的Few-Shot学习任务开发了一种元学习策略，该策略基于图卷积网络（GCN）和基于示例的Few-Shot分类器联合学习一种新的数据选择策略。我们的选择策略通过图形消息传递计算每个未标记数据的上下文敏感表示，然后用于预测顺序选择的信息性分数。我们通过在mini ImageNet、分层ImageNet和Omniglot数据集上的大量实验来验证我们的方法。结果表明，我们的Few-Shot学习策略比基线学习策略有相当大的优势，这证明了我们方法的有效性。
摘要：This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples. A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels, which is inefficient in practical applications. In this work, we introduce a new budget-aware few-shot learning problem that not only aims to learn novel object categories, but also needs to select informative examples to annotate in order to achieve data efficiency. We develop a meta-learning strategy for our budget-aware few-shot learning task, which jointly learns a novel data selection policy based on a Graph Convolutional Network (GCN) and an example-based few-shot classifier. Our selection policy computes a context-sensitive representation for each unlabeled data by graph message passing, which is then used to predict an informativeness score for sequential selection. We validate our method by extensive experiments on the mini-ImageNet, tiered-ImageNet and Omniglot datasets. The results show our few-shot learning strategy outperforms baselines by a sizable margin, which demonstrates the efficacy of our method.

GAN|对抗|攻击|生成相关(2篇)

【1】 AugmentedPCA: A Python Package of Supervised and Adversarial Linear Factor Models
标题：增强的PCA：一个Python包，包含监督和对抗的线性因素模型
链接：https://arxiv.org/abs/2201.02547

作者：William E. Carson IV,Austin Talbot,David Carlson
机构：Department of Biomedical Engineering, Duke University, Durham, NC , Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA , Department of Civil and Environmental Engineering, Department of Biostatistics and Bioinformatics
备注：NeurIPS 2021 (Learning Meaningful Representations of Life Workshop)
摘要：深度自动编码器通常通过监督或对抗性损失进行扩展，以学习具有期望特性的潜在表示，例如标签和结果的更高预测性或敏感变量的公平性。尽管监督和对抗性深层潜在因素模型无处不在，但这些方法应证明比更简单的线性方法在实践中更为可取。这就需要一个可复制的线性模拟，它仍然坚持一个增强的监督或对抗目标。我们通过提出将主成分分析（PCA）目标增加为监督目标或对抗目标的方法，并提供分析和可再现的解决方案，来解决这一方法上的差距。我们在一个开源Python包AugmentedPCA中实现了这些方法，该包可以生成优秀的实际基线。我们在一个开源的RNA-seq癌症基因表达数据集上展示了这些因子模型的实用性，表明通过有监督的客观结果进行增强，可以改善下游分类性能，产生具有更高类别保真度的主成分，并有助于识别与数据方差主轴一致的基因，从而影响特定类型癌症的发展。
摘要：Deep autoencoders are often extended with a supervised or adversarial loss to learn latent representations with desirable properties, such as greater predictivity of labels and outcomes or fairness with respects to a sensitive variable. Despite the ubiquity of supervised and adversarial deep latent factor models, these methods should demonstrate improvement over simpler linear approaches to be preferred in practice. This necessitates a reproducible linear analog that still adheres to an augmenting supervised or adversarial objective. We address this methodological gap by presenting methods that augment the principal component analysis (PCA) objective with either a supervised or an adversarial objective and provide analytic and reproducible solutions. We implement these methods in an open-source Python package, AugmentedPCA, that can produce excellent real-world baselines. We demonstrate the utility of these factor models on an open-source, RNA-seq cancer gene expression dataset, showing that augmenting with a supervised objective results in improved downstream classification performance, produces principal components with greater class fidelity, and facilitates identification of genes aligned with the principal axes of data variance with implications to development of specific types of cancer.

【2】 Machine-learning-based arc selection for constrained shortest path problems in column generation
标题：基于机器学习的列生成约束最短路径问题的圆弧选择
链接：https://arxiv.org/abs/2201.02535

作者：Mouad Morabit,Guy Desaulniers,Andrea Lodi
摘要：列生成是一种迭代方法，用于解决各种优化问题。它将问题分解为两部分：主问题和一个或多个定价问题（PP）。该方法所需的总计算时间分为这两部分。在路由或调度应用中，问题大多定义在网络上，而PP通常是一个具有资源约束的NP难最短路径问题。在这项工作中，我们提出了一种新的基于机器学习的启发式定价算法。通过利用之前执行过程中收集的数据，目标是减小网络的大小并加速PP，仅保留极有可能成为线性松弛解决方案一部分的弧。该方法已应用于两个具体问题：公交车辆与乘务调度问题和带时间窗的车辆路径问题。计算时间最多可减少40%。
摘要：Column generation is an iterative method used to solve a variety of optimization problems. It decomposes the problem into two parts: a master problem, and one or more pricing problems (PP). The total computing time taken by the method is divided between these two parts. In routing or scheduling applications, the problems are mostly defined on a network, and the PP is usually an NP-hard shortest path problem with resource constraints. In this work, we propose a new heuristic pricing algorithm based on machine learning. By taking advantage of the data collected during previous executions, the objective is to reduce the size of the network and accelerate the PP, keeping only the arcs that have a high chance to be part of the linear relaxation solution. The method has been applied to two specific problems: the vehicle and crew scheduling problem in public transit and the vehicle routing problem with time windows. Reductions in computational time of up to 40% can be obtained.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 Leveraging Scale-Invariance and Uncertainity with Self-Supervised Domain Adaptation for Semantic Segmentation of Foggy Scenes
标题：基于尺度不变性和不确定性的自监督领域自适应模糊场景语义分割
链接：https://arxiv.org/abs/2201.02588

作者：Javed Iqbal,Rehan Hafiz,Mohsen Ali
机构：KEY WORDS: Foggy Scene Understanding, Semantic Segmentation, Self-supervised Learning, Domain Adaptation.
备注：Under Review
摘要：本文提出了一种新的领域自适应语义分割方法FogAdapt。尽管已经有大量的研究致力于减少语义分割中的领域转移，但适应恶劣天气条件下的场景仍然是一个悬而未决的问题。由于天气条件（如雾、烟雾和薄雾）造成的场景可见性的巨大变化加剧了域转移，从而使得在此类场景中进行无监督的自适应具有挑战性。我们提出了一种自熵和多尺度信息增强的自监督域自适应方法（FOGAAPT），以最小化雾场景分割中的域偏移。根据雾密度增加导致分割概率高自熵的经验证据，我们引入了基于自熵的损失函数来指导自适应方法。此外，在不同图像尺度下获得的推论被合并并通过不确定性加权，以生成目标域的尺度不变伪标签。这些比例不变伪标签对可见性和比例变化具有鲁棒性。我们评估了该模型在真实晴朗天气场景下对真实雾场景的适应性，以及在真实雾场景下对合成非雾图像的适应性。我们的实验表明，FogAdapt在模糊图像的语义分割方面明显优于目前的最新技术。具体而言，考虑到与最先进的（SOTA）方法相比的标准设置，当从城市景观调整到多雾的苏黎世时，雾适应在多雾的苏黎世增加3.8%，雾密驾驶增加6.0%，在密欧雾密驾驶增加3.6%。
摘要：This paper presents FogAdapt, a novel approach for domain adaptation of semantic segmentation for dense foggy scenes. Although significant research has been directed to reduce the domain shift in semantic segmentation, adaptation to scenes with adverse weather conditions remains an open question. Large variations in the visibility of the scene due to weather conditions, such as fog, smog, and haze, exacerbate the domain shift, thus making unsupervised adaptation in such scenarios challenging. We propose a self-entropy and multi-scale information augmented self-supervised domain adaptation method (FogAdapt) to minimize the domain shift in foggy scenes segmentation. Supported by the empirical evidence that an increase in fog density results in high self-entropy for segmentation probabilities, we introduce a self-entropy based loss function to guide the adaptation method. Furthermore, inferences obtained at different image scales are combined and weighted by the uncertainty to generate scale-invariant pseudo-labels for the target domain. These scale-invariant pseudo-labels are robust to visibility and scale variations. We evaluate the proposed model on real clear-weather scenes to real foggy scenes adaptation and synthetic non-foggy images to real foggy scenes adaptation scenarios. Our experiments demonstrate that FogAdapt significantly outperforms the current state-of-the-art in semantic segmentation of foggy images. Specifically, by considering the standard settings compared to state-of-the-art (SOTA) methods, FogAdapt gains 3.8% on Foggy Zurich, 6.0% on Foggy Driving-dense, and 3.6% on Foggy Driving in mIoU when adapted from Cityscapes to Foggy Zurich.

【2】 On robust risk-based active-learning algorithms for enhanced decision support
标题：增强决策支持的基于风险的鲁棒主动学习算法研究
链接：https://arxiv.org/abs/2201.02555

作者：Aidan J. Hughes,Lawrence A. Bull,Paul Gardner,Nikolaos Dervilis,Keith Worden
机构： Department of Mechanical Engineering, University of Sheffield, UKbThe Alan Turing Institute
备注：48 pages, 39 figures, submitted to Mechanical Systems and Signal Processing
摘要：分类模型是物理资产管理技术（如结构健康监测（SHM）系统和数字双胞胎）的基本组成部分。先前的工作介绍了\text{基于风险的主动学习}，这是一种用于开发统计分类器的在线方法，该方法考虑了应用这些分类器的决策支持环境。决策是通过根据\textit{完美信息的期望值}（EVPI）优先查询数据标签来考虑的。虽然通过采用基于风险的主动学习方法获得了一些好处，包括提高了决策性能，但由于引导查询过程，这些算法存在与采样偏差相关的问题。这种抽样偏差最终表现为主动学习后期决策绩效的下降，这反过来又与资源/效用的损失相对应。本文提出了两种新的方法来抵消抽样偏差的影响：\textit{半监督学习}和\textit{判别分类模型}。这些方法首先使用合成数据集进行可视化，然后应用于实验案例研究，特别是Z24桥数据集。半监督学习方法的性能是可变的；对采样偏差的鲁棒性取决于针对每个数据集为模型选择的生成分布的适用性。相比之下，判别分类器对采样偏差的影响具有良好的鲁棒性。此外，还发现，通过仔细选择决策支持监测系统内使用的统计分类，可以减少监测活动期间的视察次数，从而减少资源支出。
摘要：Classification models are a fundamental component of physical-asset management technologies such as structural health monitoring (SHM) systems and digital twins. Previous work introduced \textit{risk-based active learning}, an online approach for the development of statistical classifiers that takes into account the decision-support context in which they are applied. Decision-making is considered by preferentially querying data labels according to \textit{expected value of perfect information} (EVPI). Although several benefits are gained by adopting a risk-based active learning approach, including improved decision-making performance, the algorithms suffer from issues relating to sampling bias as a result of the guided querying process. This sampling bias ultimately manifests as a decline in decision-making performance during the later stages of active learning, which in turn corresponds to lost resource/utility. The current paper proposes two novel approaches to counteract the effects of sampling bias: \textit{semi-supervised learning}, and \textit{discriminative classification models}. These approaches are first visualised using a synthetic dataset, then subsequently applied to an experimental case study, specifically, the Z24 Bridge dataset. The semi-supervised learning approach is shown to have variable performance; with robustness to sampling bias dependent on the suitability of the generative distributions selected for the model with respect to each dataset. In contrast, the discriminative classifiers are shown to have excellent robustness to the effects of sampling bias. Moreover, it was found that the number of inspections made during a monitoring campaign, and therefore resource expenditure, could be reduced with the careful selection of the statistical classifiers used within a decision-supporting monitoring system.

【3】 Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
标题：具有远程监控和置信度校准的大规模蛋白质翻译后修饰提取
链接：https://arxiv.org/abs/2201.02229

作者：Aparna Elangovan,Yuan Li,Douglas E. V. Pires,Melissa J. Davis,Karin Verspoor
机构：au 4 School of Computing Technologies, RMIT University
备注：None
摘要：蛋白质-蛋白质相互作用（PPIs）是正常细胞功能的关键，与许多疾病途径有关。然而，只有4%的PPI在生物知识数据库（如完整数据库）中使用PTM进行注释，主要通过手动整理完成，这既不省时也不划算。我们使用完整的PPI数据库创建一个远程监督数据集，该数据集由相互作用的蛋白质对、它们对应的PTM类型以及来自PubMed数据库的相关摘要注释。我们训练了一组BioBERT模型，称为PPI-BioBERT-x10，以提高置信度校准。我们将集合平均置信度方法与置信度变化相结合，以抵消类别不平衡的影响，从而提取高置信度预测。在测试集上评估的PPI-BioBERT-x10模型得出了适度的F1 micro 41.3（P=58.1，R=32.1）。然而，通过结合高置信度和低变异来识别高质量预测，调整预测的精度，我们保留了19%的测试预测的100%精度。我们在1800万份PubMed摘要上评估了PPI-BioBERT-x10，提取了160万（546507个独特的PTM-PPI三联体）PTM-PPI预测，并过滤了约5700（4584个独特的）高置信度预测。在5700中，对一个随机抽样的小子集进行的人类评估表明，尽管进行了置信度校准，但精确度仍下降到33.7%，并突出了即使进行了置信度校准，也无法超越测试集的通用性。我们通过只包含与多篇论文相关的预测来避免这个问题，将精确度提高到58.8%。在这项工作中，我们强调了基于深度学习的文本挖掘在实践中的好处和挑战，以及需要更加强调信心校准，以促进人类的管理工作。
摘要：Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time nor cost-effective. We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models - dubbed PPI-BioBERT-x10 to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ~ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

【4】 3D Intracranial Aneurysm Classification and Segmentation via Unsupervised Dual-branch Learning
标题：基于无监督双分支学习的三维颅内动脉瘤分类与分割
链接：https://arxiv.org/abs/2201.02198

作者：Di Shao,Xuequan Lu,Xiao Liu
机构：Deakin University, Pigdons Rd, Waurn Ponds, Australia
备注：submitted for review (contact: xuequan.lu@deakin.edu.au)
摘要：颅内动脉瘤是当今常见的疾病，如何对其进行智能检测在数字健康领域具有重要意义。现有的深度学习研究大多集中在有监督的医学图像上，我们介绍了一种基于三维点云数据的无监督颅内动脉瘤检测方法。特别是，我们的方法包括两个阶段：无监督的预训练和下游任务。对于前者，主要思想是将每个点云与其抖动的对应点配对，并最大化它们的对应关系。然后，我们设计了一个双分支对比网络，每个分支有一个编码器和一个公共投影头。对于后者，我们设计了用于监督分类和分割训练的简单网络。在公共数据集（IntrA）上的实验表明，我们的无监督方法与一些最先进的监督技术相比，具有相当甚至更好的性能，并且在动脉瘤血管的检测中最为突出。在ModelNet40上的实验也表明，我们的方法达到了90.79%的准确率，优于现有的最先进的无监督模型。
摘要：Intracranial aneurysms are common nowadays and how to detect them intelligently is of great significance in digital health. While most existing deep learning research focused on medical images in a supervised way, we introduce an unsupervised method for the detection of intracranial aneurysms based on 3D point cloud data. In particular, our method consists of two stages: unsupervised pre-training and downstream tasks. As for the former, the main idea is to pair each point cloud with its jittered counterpart and maximise their correspondence. Then we design a dual-branch contrastive network with an encoder for each branch and a subsequent common projection head. As for the latter, we design simple networks for supervised classification and segmentation training. Experiments on the public dataset (IntrA) show that our unsupervised method achieves comparable or even better performance than some state-of-the-art supervised techniques, and it is most prominent in the detection of aneurysmal vessels. Experiments on the ModelNet40 also show that our method achieves the accuracy of 90.79\% which outperforms existing state-of-the-art unsupervised models.

强化学习(2篇)

【1】 Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations
标题：基于稀疏计算的强化学习任务神经网络优化
链接：https://arxiv.org/abs/2201.02571

作者：Dmitry Ivanov,Mikhail Kiselev,Denis Larionov
摘要：本文提出了一种基于稀疏计算的强化学习（RL）任务神经网络优化方法。该方法结合了两种思想：神经网络剪枝和考虑输入数据相关性；这使得只有当神经元状态的变化超过某个阈值时才有可能更新神经元状态。它显著减少了运行神经网络时的乘法次数。我们测试了不同的RL任务，乘法次数减少了20-150倍。没有重大业绩损失；有时，性能甚至有所提高。
摘要：This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neural networks. We tested different RL tasks and achieved 20-150x reduction in the number of multiplications. There were no substantial performance losses; sometimes the performance even improved.

【2】 Offline Reinforcement Learning for Road Traffic Control
标题：用于道路交通控制的离线强化学习
链接：https://arxiv.org/abs/2201.02381

作者：Mayuresh Kunjir,Sanjay Chawla
机构：Qatar Computing Research Institute, Doha, Qatar
备注：8 pages
摘要：交通信号控制是城市交通中的一个重要问题，具有巨大的经济和环境影响潜力。虽然人们对交通控制中的强化学习（RL）越来越感兴趣，但迄今为止的工作主要集中在通过互动进行学习，而互动在实践中成本高昂。相反，交通方面的真实经验数据是可用的，可以以最低的成本加以利用。脱机或批处理RL的最新进展正好支持了这一点。特别是基于模型的离线RL方法，已经被证明比其他方法更好地推广到经验数据。我们建立了一个基于模型的学习框架a-DAC，该框架从具有悲观代价的数据集推断出一个马尔可夫决策过程（MDP）来处理数据不确定性。成本通过MDP中的奖励自适应成形进行建模，与之前的相关工作相比，MDP提供了更好的数据规则化。在复杂的信号环形交叉口上，使用大小不同的多个数据集和批量收集策略对A-DAC进行评估。评估结果表明，可以使用简单的批收集策略以数据高效的方式构建高性能控制策略。
摘要：Traffic signal control is an important problem in urban mobility with a significant potential of economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic control, the work so far has focussed on learning through interactions which, in practice, is costly. Instead, real experience data on traffic is available and could be exploited at minimal costs. Recent progress in offline or batch RL has enabled just that. Model-based offline RL methods, in particular, have been shown to generalize to the experience data much better than others. We build a model-based learning framework, A-DAC, which infers a Markov Decision Process (MDP) from dataset with pessimistic costs built in to deal with data uncertainties. The costs are modeled through an adaptive shaping of rewards in the MDP which provides better regularization of data compared to the prior related work. A-DAC is evaluated on a complex signalized roundabout using multiple datasets varying in size and in batch collection policy. The evaluation results show that it is possible to build high performance control policies in a data efficient manner using simplistic batch collection policies.

聚类(1篇)

【1】 k-Center Clustering with Outliers in Sliding Windows
标题：滑动窗口中带离群点的K-中心聚类
链接：https://arxiv.org/abs/2201.02448

作者：Paolo Pellizzoni,Andrea Pietracaprina,Geppino Pucci
摘要：度量$k$-中心聚类是一种基本的无监督学习原语。尽管广泛使用，但该原语严重受数据中噪声的影响，因此更合理的变体寻求最佳解决方案，忽略数据集中给定数量的$z$点，称为异常值。我们在滑动窗口设置下为流模型中的这一重要变量提供了有效的算法，其中，在每个时间步，要聚集的数据集是最新数据项的窗口$W$。我们的算法实现了$O（1）$近似，并且，值得注意的是，需要一个以$k+z$为单位的线性工作记忆，而以$| W |为单位的对数工作记忆。作为一个副产品，我们展示了如何估算窗的有效直径$W$，这是一个衡量窗点扩展的指标，而不考虑噪声距离的给定部分。我们还提供了我们的理论结果的实际可行性的实验证据。
摘要：Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a given number $z$ of points of the dataset, called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window $W$ of the most recent data items. Our algorithms achieve $O(1)$ approximation and, remarkably, require a working memory linear in $k+z$ and only logarithmic in $|W|$. As a by-product, we show how to estimate the effective diameter of the window $W$, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of our theoretical results.

联邦学习|隐私保护|加密(2篇)

【1】 Multi-Model Federated Learning
标题：多模型联合学习
链接：https://arxiv.org/abs/2201.02582

作者：Neelkamal Bhuyan,Sharayu Moharir
摘要：联合学习是分布式学习的一种形式，其关键挑战在于参与的客户机中数据的非同分布性。在本文中，我们将联邦学习扩展到同时训练多个不相关模型的环境中。具体地说，每个客户机一次能够训练M个模型中的任何一个，并且服务器为M个模型中的每一个维护一个模型，该模型通常是客户机计算的模型的适当平均版本。我们提出了多种策略，用于随着时间的推移将学习任务分配给客户。在第一种策略中，我们通过以i.i.d.随机方式将模型分配给客户机，将广泛研究的FedAvg扩展到多模型学习。此外，我们还提出了两种新的多模型联合环境下的客户机选择策略，该策略根据每个客户机模型对的当前本地损失做出决策。我们比较了策略在包含合成数据和真实数据的任务上的性能，并描述了所提出策略的性能。从我们的工作中得到的关键是，所提出的多模型策略比使用FedAvg的单模型训练性能更好或至少一样好。
摘要：Federated learning is a form of distributed learning with the key challenge being the non-identically distributed nature of the data in the participating clients. In this paper, we extend federated learning to the setting where multiple unrelated models are trained simultaneously. Specifically, every client is able to train any one of M models at a time and the server maintains a model for each of the M models which is typically a suitably averaged version of the model computed by the clients. We propose multiple policies for assigning learning tasks to clients over time. In the first policy, we extend the widely studied FedAvg to multi-model learning by allotting models to clients in an i.i.d. stochastic manner. In addition, we propose two new policies for client selection in a multi-model federated setting which make decisions based on current local losses for each client-model pair. We compare the performance of the policies on tasks involving synthetic and real-world data and characterize the performance of the proposed policies. The key take-away from our work is that the proposed multi-model policies perform better or at least as good as single model training using FedAvg.

【2】 Auction-Based Ex-Post-Payment Incentive Mechanism Design for Horizontal Federated Learning with Reputation and Contribution Measurement
标题：基于拍卖的带声誉和贡献度的横向联合学习支付后激励机制设计
链接：https://arxiv.org/abs/2201.02410

作者：Jingwen Zhang,Yuezhou Wu,Rong Pan
摘要：联邦学习训练跨分布式数据的设备模型，同时保护隐私和获得类似于集中式ML的模型。大量的数据和计算能力的工人是联邦学习的基础。然而，不可避免的成本阻止了自私自利的员工免费服务。此外，由于数据隔离，任务发布者缺乏有效的方法来选择、评估和支付具有高质量数据的可靠员工。因此，我们设计了一个基于拍卖的水平联合学习激励机制，该机制具有声誉和贡献度量。通过设计一种合理的衡量贡献的方法，我们建立了员工的声誉，这种声誉容易下降，难以提高。通过反向拍卖，工人为任务出价，任务发布者结合声誉和出价选择工人。在预算限制下，获奖员工的薪酬基于绩效。我们证明了我们的机制满足诚实员工的个人理性、预算可行性、真实性和计算效率。
摘要：Federated learning trains models across devices with distributed data, while protecting the privacy and obtaining a model similar to that of centralized ML. A large number of workers with data and computing power are the foundation of federal learning. However, the inevitable costs prevent self-interested workers from serving for free. Moreover, due to data isolation, task publishers lack effective methods to select, evaluate and pay reliable workers with high-quality data. Therefore, we design an auction-based incentive mechanism for horizontal federated learning with reputation and contribution measurement. By designing a reasonable method of measuring contribution, we establish the reputation of workers, which is easy to decline and difficult to improve. Through reverse auctions, workers bid for tasks, and the task publisher selects workers combining reputation and bid price. With the budget constraint, winning workers are paid based on performance. We proved that our mechanism satisfies the individual rationality of the honest worker, budget feasibility, truthfulness, and computational efficiency.

推理|分析|理解|解释(1篇)

【1】 Explainable deep learning for insights in El Nino and river flows
标题：可解释的深度学习，以洞察厄尔尼诺现象和河流流动
链接：https://arxiv.org/abs/2201.02596

作者：Yumin Liu,Kate Duffy,Jennifer G. Dy,Auroop R. Ganguly
摘要：厄尔尼诺-南方涛动（ENSO）是热带中太平洋和东太平洋海面温度（SST）的一种半周期性波动，通过长程相关或遥相关影响全球区域水文的年际变化。最近的研究表明，深度学习（DL）方法对于改进ENSO预测以及复杂网络（CN）对于理解遥相关具有重要价值。然而，在对ENSO驱动的河流流量的预测理解方面的差距包括DL的黑箱性质、使用简单的ENSO指数来描述复杂现象以及将基于DL的ENSO预测转化为河流流量预测。在此，我们表明，基于显著性图的可解释DL（XDL）方法可以提取全球SST中包含的可解释预测信息，并发现与河流流量相关的新SST信息区域和依赖结构，这与气候网络结构相结合，有助于提高预测理解。我们的结果揭示了全球SST中超出ENSO指数的额外信息内容，发展了对SST如何影响河流流量的新理解，并产生了具有不确定性的改进河流流量预测。观测、再分析数据和地球系统模型模拟用于证明基于XDL-CN的方法在未来年际和十年尺度气候预测中的价值。
摘要：The El Nino Southern Oscillation (ENSO) is a semi-periodic fluctuation in sea surface temperature (SST) over the tropical central and eastern Pacific Ocean that influences interannual variability in regional hydrology across the world through long-range dependence or teleconnections. Recent research has demonstrated the value of Deep Learning (DL) methods for improving ENSO prediction as well as Complex Networks (CN) for understanding teleconnections. However, gaps in predictive understanding of ENSO-driven river flows include the black box nature of DL, the use of simple ENSO indices to describe a complex phenomenon and translating DL-based ENSO predictions to river flow predictions. Here we show that eXplainable DL (XDL) methods, based on saliency maps, can extract interpretable predictive information contained in global SST and discover novel SST information regions and dependence structures relevant for river flows which, in tandem with climate network constructions, enable improved predictive understanding. Our results reveal additional information content in global SST beyond ENSO indices, develop new understanding of how SSTs influence river flows, and generate improved river flow predictions with uncertainties. Observations, reanalysis data, and earth system model simulations are used to demonstrate the value of the XDL-CN based methods for future interannual and decadal scale climate projections.

检测相关(3篇)

【1】 iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection
标题：IDECODe：用于共形分布外检测的分布内等差
链接：https://arxiv.org/abs/2201.02331

作者：Ramneet Kaur,Susmit Jha,Anirban Roy,Sangdon Park,Edgar Dobriban,Oleg Sokolsky,Insup Lee
备注：Association for the Advancement of Artificial Intelligence (AAAI), 2022
摘要：机器学习方法，如深度神经网络（DNN），尽管它们在不同领域取得了成功，但通常会在其训练分布之外的输入上产生错误的预测。在安全关键域中部署DNN需要检测分布外（OOD）数据，以便DNN可以避免对这些数据进行预测。最近已经开发了许多用于OOD检测的方法，但仍有改进的余地。我们提出了一种新的方法iDECODe，利用分布等价性进行共形OOD检测。它依赖于感应共形异常检测框架中使用的一种新的基本不一致度量和一种新的聚合方法，从而保证有界的错误检测率。我们通过对图像和音频数据集的实验证明了iDECODe的有效性，获得了最先进的结果。我们还展示了iDECODe可以检测对抗性示例。
摘要：Machine learning methods such as deep neural networks (DNNs), despite their success across different domains, are known to often generate incorrect predictions with high confidence on inputs outside their training distribution. The deployment of DNNs in safety-critical domains requires detection of out-of-distribution (OOD) data so that DNNs can abstain from making predictions on those. A number of methods have been recently developed for OOD detection, but there is still room for improvement. We propose the new method iDECODe, leveraging in-distribution equivariance for conformal OOD detection. It relies on a novel base non-conformity measure and a new aggregation method, used in the inductive conformal anomaly detection framework, thereby guaranteeing a bounded false detection rate. We demonstrate the efficacy of iDECODe by experiments on image and audio datasets, obtaining state-of-the-art results. We also show that iDECODe can detect adversarial examples.

【2】 Applications of Signature Methods to Market Anomaly Detection
标题：签名方法在市场异常检测中的应用
链接：https://arxiv.org/abs/2201.02441

作者：Erdinc Akyildirim,Matteo Gambara,Josef Teichmann,Syang Zhou
机构：Department of Mathematics, ETH, Zurich, Switzerland, Department of Banking and Finance, University of Zurich, Zurich, Switzerland
摘要：异常检测是识别数据集中显著偏离规范的异常实例或事件的过程。在这项研究中，我们提出了一种基于特征码的机器学习算法来检测给定时间序列类型数据集中的罕见或意外项目。我们提出了签名或随机签名作为异常检测算法的特征提取器的应用；此外，我们还为随机签名的构造提供了一个简单的表示理论依据。我们的第一个应用程序基于合成数据，旨在区分真实和虚假的股价轨迹，这些轨迹通过目视检查无法区分。我们还通过使用加密货币市场的交易数据展示了一个真实的应用程序。在这种情况下，通过我们的无监督学习算法，我们能够识别F1分数高达88%的社交网络上组织的抽水和倾倒尝试，从而获得接近基于监督学习领域最先进水平的结果。
摘要：Anomaly detection is the process of identifying abnormal instances or events in data sets which deviate from the norm significantly. In this study, we propose a signatures based machine learning algorithm to detect rare or unexpected items in a given data set of time series type. We present applications of signature or randomized signature as feature extractors for anomaly detection algorithms; additionally we provide an easy, representation theoretic justification for the construction of randomized signatures. Our first application is based on synthetic data and aims at distinguishing between real and fake trajectories of stock prices, which are indistinguishable by visual inspection. We also show a real life application by using transaction data from the cryptocurrency market. In this case, we are able to identify pump and dump attempts organized on social networks with F1 scores up to 88% by means of our unsupervised learning algorithm, thus achieving results that are close to the state-of-the-art in the field based on supervised learning.

【3】 Bayesian Online Change Point Detection for Baseline Shifts
标题：基线偏移的贝叶斯在线变化点检测
链接：https://arxiv.org/abs/2201.02325

作者：Ginga Yoshizawa
机构：Intel K.K., Tokyo,-, Japan
备注：None
摘要：在时间序列数据分析中，实时（在线）检测变化点在金融、环境监测和医学等许多领域都非常重要。实现这一点的一种有希望的方法是贝叶斯在线变化点检测（BOCPD）算法，该算法已成功应用于感兴趣的时间序列具有固定基线的特定情况。然而，我们发现，当基线不可逆地从初始状态偏移时，算法会遇到困难。这是因为使用原始BOCPD算法，如果数据点在距离原始基线相对较远的位置波动，则可检测到变化点的灵敏度会降低。在本文中，我们不仅扩展了原始的BOCPD算法，使其适用于基线不断向未知值移动的时间序列，而且还可视化了所提出的扩展工作的原因。为了证明该算法与原始算法相比的有效性，我们在两个真实数据集和六个合成数据集上检验了这些算法。
摘要：In time series data analysis, detecting change points on a real-time basis (online) is of great interest in many areas, such as finance, environmental monitoring, and medicine. One promising means to achieve this is the Bayesian online change point detection (BOCPD) algorithm, which has been successfully adopted in particular cases in which the time series of interest has a fixed baseline. However, we have found that the algorithm struggles when the baseline irreversibly shifts from its initial state. This is because with the original BOCPD algorithm, the sensitivity with which a change point can be detected is degraded if the data points are fluctuating at locations relatively far from the original baseline. In this paper, we not only extend the original BOCPD algorithm to be applicable to a time series whose baseline is constantly shifting toward unknown values but also visualize why the proposed extension works. To demonstrate the efficacy of the proposed algorithm compared to the original one, we examine these algorithms on two real-world data sets and six synthetic data sets.

分类|识别(1篇)

【1】 Negative Evidence Matters in Interpretable Histology Image Classification
标题：负证据在可解释组织学图像分类中的作用
链接：https://arxiv.org/abs/2201.02445

作者：Soufiane Belharbi,Marco Pedersoli,Ismail Ben Ayed,Luke McCaffrey,Eric Granger
机构： Dept. of Systems Engineering, ÉTS Montreal, Canada, Goodman Cancer Research Centre, Dept. of Oncology, McGill University, Montreal, Canada
备注：10 figures, under review
摘要：弱监督学习方法仅使用全局注释（如图像类标签），允许CNN分类器对图像进行联合分类，并产生与预测类相关的感兴趣区域。然而，在像素级没有任何指导的情况下，这种方法可能产生不准确的区域。众所周知，与自然图像相比，组织学图像的这一问题更具挑战性，因为对象不太突出，结构有更多变化，前景和背景区域有更强的相似性。因此，计算机视觉文献中用于CNN视觉解释的方法可能不直接适用。在这项工作中，我们提出了一种基于复合损失函数的简单而有效的方法，该方法利用了来自完全负样本的信息。我们的新损失函数包含两个互补项：第一个利用从CNN分类器收集的积极证据，而第二个利用来自训练数据集的完全消极样本。特别是，我们为预先训练的分类器配备了一个解码器，该解码器允许细化感兴趣的区域。利用同一分类器在像素级收集正证据和负证据来训练解码器。这使得能够利用数据中自然出现的完全负采样，而不需要任何额外的监控信号，并且仅使用图像类作为监控。与最近的几种相关方法相比，相对于使用三种不同主干的公众基准结肠癌GlaS和基于Camelyon16贴片的乳腺癌基准，我们展示了我们的方法带来的实质性改进。我们的结果显示了使用消极和积极证据的好处，即从分类器获得的证据和数据集中自然可用的证据。我们提供了两个术语的烧蚀研究。我们的代码是公开的。
摘要：Using only global annotations such as the image class labels, weakly-supervised learning methods allow CNN classifiers to jointly classify an image, and yield the regions of interest associated with the predicted class. However, without any guidance at the pixel level, such methods may yield inaccurate regions. This problem is known to be more challenging with histology images than with natural ones, since objects are less salient, structures have more variations, and foreground and background regions have stronger similarities. Therefore, methods in computer vision literature for visual interpretation of CNNs may not directly apply. In this work, we propose a simple yet efficient method based on a composite loss function that leverages information from the fully negative samples. Our new loss function contains two complementary terms: the first exploits positive evidence collected from the CNN classifier, while the second leverages the fully negative samples from the training dataset. In particular, we equip a pre-trained classifier with a decoder that allows refining the regions of interest. The same classifier is exploited to collect both the positive and negative evidence at the pixel level to train the decoder. This enables to take advantages of the fully negative samples that occurs naturally in the data, without any additional supervision signals and using only the image class as supervision. Compared to several recent related methods, over the public benchmark GlaS for colon cancer and a Camelyon16 patch-based benchmark for breast cancer using three different backbones, we show the substantial improvements introduced by our method. Our results shows the benefits of using both negative and positive evidence, ie, the one obtained from a classifier and the one naturally available in datasets. We provide an ablation study of both terms. Our code is publicly available.

表征(1篇)

【1】 Audio representations for deep learning in sound synthesis: A review
标题：声音合成中深度学习的音频表征：综述
链接：https://arxiv.org/abs/2201.02490

作者：Anastasia Natsiou,Sean O'Leary
机构：Technological University of Dublin, Dublin, Ireland, Se´an O’Leary
摘要：深度学习算法的兴起使得许多研究人员不再使用经典的信号处理方法来产生声音。深度学习模型实现了富有表现力的语音合成、逼真的声音纹理和来自虚拟仪器的音符。然而，最合适的深度学习架构仍在研究中。架构的选择与音频表示紧密耦合。对于深度学习模型来说，声音的原始波形可能过于密集和丰富，无法有效处理，而且复杂性增加了训练时间和计算成本。此外，它并不代表声音的感知方式。因此，在许多情况下，原始音频已通过上采样、特征提取，甚至通过采用更高级别的波形图示，转换为压缩且更有意义的形式。此外，在所选择的形式的条件下，研究了附加的条件表示、不同的模型结构和许多用于评估重建声音的度量。本文概述了使用深度学习进行声音合成的音频表示。此外，它还介绍了使用深度学习模型开发和评估声音合成体系结构的最重要的方法，始终取决于音频表示。
摘要：The rise of deep learning algorithms has led many researchers to withdraw from using classic signal processing methods for sound generation. Deep learning models have achieved expressive voice synthesis, realistic sound textures, and musical notes from virtual instruments. However, the most suitable deep learning architecture is still under investigation. The choice of architecture is tightly coupled to the audio representations. A sound's original waveform can be too dense and rich for deep learning models to deal with efficiently - and complexity increases training time and computational cost. Also, it does not represent sound in the manner in which it is perceived. Therefore, in many cases, the raw audio has been transformed into a compressed and more meaningful form using upsampling, feature-extraction, or even by adopting a higher level illustration of the waveform. Furthermore, conditional on the form chosen, additional conditioning representations, different model architectures, and numerous metrics for evaluating the reconstructed sound have been investigated. This paper provides an overview of audio representations applied to sound synthesis using deep learning. Additionally, it presents the most significant methods for developing and evaluating a sound synthesis architecture using deep learning models, always depending on the audio representation.

优化|敛散性(1篇)

【1】 Optimality in Noisy Importance Sampling
标题：噪声重要抽样中的最优性
链接：https://arxiv.org/abs/2201.02432

作者：Fernando Llorente,Luca Martino,Jesse Read,David Delgado-Gómez
机构：∗ Universidad Carlos III de Madrid, Leganés, Spain., ⋆ Universidad Rey Juan Carlos, Fuenlabrada, Spain., † École Polytechnique, Palaiseau, France.
摘要：在这项工作中，我们分析了噪声重要性抽样（IS），即，正在对目标密度进行噪声评估。我们提出了一般框架，并推导了噪声IS估计的最优建议密度。最优方案包含噪声实现的方差信息，在噪声功率较高的区域提出点。我们还比较了最优方案的使用与先前在噪声IS框架中考虑的最优方法。
摘要：In this work, we analyze the noisy importance sampling (IS), i.e., IS working with noisy evaluations of the target density. We present the general framework and derive optimal proposal densities for noisy IS estimators. The optimal proposals incorporate the information of the variance of the noisy realizations, proposing points in regions where the noise power is higher. We also compare the use of the optimal proposals with previous optimality approaches considered in a noisy IS framework.

预测|估计(5篇)

【1】 Visual Attention Prediction Improves Performance of Autonomous Drone Racing Agents
标题：视觉注意预测提高自主无人机竞速智能体的性能
链接：https://arxiv.org/abs/2201.02569

作者：Christian Pfeiffer,Simon Wengeler,Antonio Loquercio,Davide Scaramuzza
备注：12 pages, 6 figures
摘要：人类驾驶无人机的速度比训练用于端到端自主飞行的神经网络要快。这可能与人类飞行员有效选择任务相关视觉信息的能力有关。这项工作研究了神经网络是否能够模仿人眼注视行为和注意力，从而提高神经网络在基于视觉的自主无人机竞赛中的性能。我们假设，在基于模拟器的无人机竞赛任务中，基于注视的注意力预测是一种有效的视觉信息选择和决策机制。我们使用18名人类无人驾驶飞机飞行员的眼睛注视和飞行轨迹数据来检验这一假设，以训练视觉注意预测模型。然后，我们使用该视觉注意力预测模型，通过模仿学习，为基于视觉的自主无人机比赛训练端到端控制器。我们将注意力预测控制器的无人机竞赛性能与使用原始图像输入和基于图像的抽象（即特征轨迹）的无人机竞赛性能进行比较。我们的研究结果表明，基于注意力预测的控制器性能优于基线，能够持续完成具有挑战性的赛道，成功率高达88%。此外，视觉注意预测和基于特征轨迹的模型在保持参考轨迹上的泛化性能优于基于图像的模型。我们的研究结果表明，人类视觉注意力预测提高了基于自主视觉的无人机竞赛代理的性能，为实现最终能够达到甚至超过人类性能的基于视觉、快速和敏捷的自主飞行迈出了重要的一步。
摘要：Humans race drones faster than neural networks trained for end-to-end autonomous flight. This may be related to the ability of human pilots to select task-relevant visual information effectively. This work investigates whether neural networks capable of imitating human eye gaze behavior and attention can improve neural network performance for the challenging task of vision-based autonomous drone racing. We hypothesize that gaze-based attention prediction can be an efficient mechanism for visual information selection and decision making in a simulator-based drone racing task. We test this hypothesis using eye gaze and flight trajectory data from 18 human drone pilots to train a visual attention prediction model. We then use this visual attention prediction model to train an end-to-end controller for vision-based autonomous drone racing using imitation learning. We compare the drone racing performance of the attention-prediction controller to those using raw image inputs and image-based abstractions (i.e., feature tracks). Our results show that attention-prediction based controllers outperform the baselines and are able to complete a challenging race track consistently with up to 88% success rate. Furthermore, visual attention-prediction and feature-track based models showed better generalization performance than image-based models when evaluated on hold-out reference trajectories. Our results demonstrate that human visual attention prediction improves the performance of autonomous vision-based drone racing agents and provides an essential step towards vision-based, fast, and agile autonomous flight that eventually can reach and even exceed human performances.

【2】 Churn prediction in online gambling
标题：在线赌博中的流失预测
链接：https://arxiv.org/abs/2201.02463

作者：Florian Merchie,Damien Ernst
备注：14 pages, 3 figures Submitted to Expert Systems with Applications
摘要：在业务保留中，防止流失一直是一个主要问题。这项工作有助于这一领域的形式化问题的流失预测的背景下，在线赌博作为一个二元分类任务。我们还提出了一种基于递归神经网络的算法。该算法以时间序列形式的在线赌博数据为例进行了测试，该数据可以通过递归神经网络进行有效处理。为了评估训练模型的性能，使用了标准的机器学习指标，如准确性、精确性和召回率。特别是对于这个问题，进行的实验可以评估特定架构的选择取决于最重要的度量。使用nBRC的体系结构有利于精确性，使用LSTM的体系结构具有更好的召回率，而基于GRU的体系结构允许更高的准确度并平衡其他两个指标。此外，进一步的实验表明，仅使用较新的时间序列历史来训练网络会降低结果的质量。我们还研究了在特定时刻$t$，其他时间$t^{\prime}>t$学习的模型的性能。结果表明，在时间$t$学习的模型在以下时刻$t^{\prime}>t$的性能保持良好，表明不需要以高速率刷新模型。然而，由于影响数据的一次性事件，模型的性能受到显著差异的影响。
摘要：In business retention, churn prevention has always been a major concern. This work contributes to this domain by formalizing the problem of churn prediction in the context of online gambling as a binary classification task. We also propose an algorithmic answer to this problem based on recurrent neural network. This algorithm is tested with online gambling data that have the form of time series, which can be efficiently processed by recurrent neural networks. To evaluate the performances of the trained models, standard machine learning metrics were used, such as accuracy, precision and recall. For this problem in particular, the conducted experiments allowed to assess that the choice of a specific architecture depends on the metric which is given the greatest importance. Architectures using nBRC favour precision, those using LSTM give better recall, while GRU-based architectures allow a higher accuracy and balance two other metrics. Moreover, further experiments showed that using only the more recent time-series histories to train the networks decreases the quality of the results. We also study the performances of models learned at a specific instant $t$, at other times $t^{\prime} > t$. The results show that the performances of the models learned at time $t$ remain good at the following instants $t^{\prime} > t$, suggesting that there is no need to refresh the models at a high rate. However, the performances of the models were subject to noticeable variance due to one-off events impacting the data.

【3】 Spatial-Temporal Sequential Hypergraph Network for Crime Prediction
标题：时空序列超图网络在犯罪预测中的应用
链接：https://arxiv.org/abs/2201.02435

作者：Lianghao Xia,Chao Huang,Yong Xu,Peng Dai,Liefeng Bo,Xiyue Zhang,Tianyi Chen
机构：South China University of Technology, China, University of Hong Kong, Hong Kong, Communication and Computer Network Laboratory of Guangdong, China, Peng Cheng Laboratory, China, JD Finance America Corporation, USA
备注：IJCAI 2021 Research Paper
摘要：犯罪预测对于公共安全和资源优化至关重要，但由于以下两个方面的原因，犯罪预测极具挑战性：i）犯罪模式在时间和空间上的动态变化，犯罪事件在空间和时间域上分布不均；ii）不同类型犯罪（如盗窃、抢劫、袭击、破坏）之间的时间演化依赖性，揭示了犯罪的细粒度语义。为了应对这些挑战，我们提出了时空顺序超图网络（ST-SHN）来集体编码复杂的犯罪时空模式以及潜在的基于类别的犯罪语义关系。具体来说，为了处理长距离和全局环境下的时空动态，我们设计了一个集成超图学习范式的图结构消息传递体系结构。为了在动态环境中捕获不同类别的犯罪异构关系，我们引入了一种多通道路由机制来学习不同犯罪类型的时间演化结构依赖关系。我们在两个真实数据集上进行了大量实验，结果表明，与各种最先进的基线相比，我们提出的ST-SHN框架可以显著提高预测性能。源代码可从以下网址获得：https://github.com/akaxlh/ST-SHN.
摘要：Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e.g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of crimes. To tackle these challenges, we propose Spatial-Temporal Sequential Hypergraph Network (ST-SHN) to collectively encode complex crime spatial-temporal patterns as well as the underlying category-wise crime semantic relationships. In specific, to handle spatial-temporal dynamics under the long-range and global context, we design a graph-structured message passing architecture with the integration of the hypergraph learning paradigm. To capture category-wise crime heterogeneous relations in a dynamic environment, we introduce a multi-channel routing mechanism to learn the time-evolving structural dependency across crime types. We conduct extensive experiments on two real-world datasets, showing that our proposed ST-SHN framework can significantly improve the prediction performance as compared to various state-of-the-art baselines. The source code is available at: https://github.com/akaxlh/ST-SHN.

【4】 Forecasting emissions through Kaya identity using Neural Ordinary Differential Equations
标题：基于神经元常微分方程的Kaya恒等式排放量预测
链接：https://arxiv.org/abs/2201.02433

作者：Pierre Browne,Aranildo Lima,Rossella Arcucci,César Quilodrán-Casas
备注：5 pages, 2 figures, Tackling Climate Change with Machine Learning workshop at ICML 2021
摘要：从Kaya恒等式开始，我们使用神经ODE模型在国家一级预测与碳排放相关的几个指标的演变：人口、人均GDP、GDP的能源强度、能源的碳强度。我们将该模型与基线统计模型VAR进行了比较，获得了良好的性能。我们得出结论，这种机器学习方法可用于产生广泛的结果，并为决策者提供相关的见解
摘要：Starting from the Kaya identity, we used a Neural ODE model to predict the evolution of several indicators related to carbon emissions, on a country-level: population, GDP per capita, energy intensity of GDP, carbon intensity of energy. We compared the model with a baseline statistical model - VAR - and obtained good performances. We conclude that this machine-learning approach can be used to produce a wide range of results and give relevant insight to policymakers

【5】 Time Series Forecasting Using Fuzzy Cognitive Maps: A Survey
标题：基于模糊认知图的时间序列预测研究综述
链接：https://arxiv.org/abs/2201.02297

作者：Omid Orang,Petrônio Cândido de Lima e Silva,Frederico Guimarães Gadelha
机构：Machine Intelligence and Data Science (MINDS) Laboratory, Federal University of Minas Gerais, Belo Horizonte, Brazil, Federal Institute of Education Science and Technology of Northern Minas Gerais, Janu´aria Campus, Brazil
摘要：在用于时间序列预测的各种软计算方法中，模糊认知图（FCM）作为一种建模和分析复杂系统动力学的工具已经显示出显著的效果。FCM与递归神经网络有相似之处，可以归类为一种神经模糊方法。换句话说，FCMs是模糊逻辑、神经网络和专家系统的混合体，是模拟和研究复杂系统动态行为的有力工具。最有趣的特征是知识的可解释性、动态特性和学习能力。本文的目的主要是概述文献中提出的最相关和最新的基于FCM的时间序列预测模型。此外，本文还将介绍FCM模型的基本原理和学习方法。此外，本调查还为未来的研究提供了一些思路，以增强FCM的能力，从而应对现实世界实验中的一些挑战，如处理非平稳数据和可伸缩性问题。此外，为FCMs配备快速学习算法是该领域的主要关注点之一。
摘要：Among various soft computing approaches for time series forecasting, Fuzzy Cognitive Maps (FCM) have shown remarkable results as a tool to model and analyze the dynamics of complex systems. FCM have similarities to recurrent neural networks and can be classified as a neuro-fuzzy method. In other words, FCMs are a mixture of fuzzy logic, neural network, and expert system aspects, which act as a powerful tool for simulating and studying the dynamic behavior of complex systems. The most interesting features are knowledge interpretability, dynamic characteristics and learning capability. The goal of this survey paper is mainly to present an overview on the most relevant and recent FCM-based time series forecasting models proposed in the literature. In addition, this article considers an introduction on the fundamentals of FCM model and learning methodologies. Also, this survey provides some ideas for future research to enhance the capabilities of FCM in order to cover some challenges in the real-world experiments such as handling non-stationary data and scalability issues. Moreover, equipping FCMs with fast learning algorithms is one of the major concerns in this area.

其他神经网络|深度学习|模型|建模(12篇)

【1】 Bayesian Neural Networks for Reversible Steganography
标题：用于可逆隐写的贝叶斯神经网络
链接：https://arxiv.org/abs/2201.02478

作者：Ching-Chun Chang
机构： Chang is with the University of Warwick
摘要：深度学习的最新进展导致可逆隐写术的范式转变。可逆隐写术的一个基本支柱是可通过深层神经网络实现的预测建模。然而，对于一些分布不均和有噪声的数据，在推理中存在非平凡误差。鉴于此问题，我们建议考虑基于贝叶斯深度学习的理论框架的预测模型中的不确定性。贝叶斯神经网络可以看作是一种自感知的机器；也就是说，一台了解自身局限性的机器。为了量化不确定性，我们通过随机正向传递的蒙特卡罗抽样来近似后验预测分布。我们进一步表明，预测不确定性可以分解为任意不确定性和认知不确定性，并且这些数量可以在无监督的方式下学习。实验结果表明，贝叶斯不确定性分析提高了隐写容量失真性能。
摘要：Recent advances in deep learning have led to a paradigm shift in reversible steganography. A fundamental pillar of reversible steganography is predictive modelling which can be realised via deep neural networks. However, non-trivial errors exist in inferences about some out-of-distribution and noisy data. In view of this issue, we propose to consider uncertainty in predictive models based upon a theoretical framework of Bayesian deep learning. Bayesian neural networks can be regarded as self-aware machinery; that is, a machine that knows its own limitations. To quantify uncertainty, we approximate the posterior predictive distribution through Monte Carlo sampling with stochastic forward passes. We further show that predictive uncertainty can be disentangled into aleatoric and epistemic uncertainties and these quantities can be learnt in an unsupervised manner. Experimental results demonstrate an improvement delivered by Bayesian uncertainty analysis upon steganographic capacity-distortion performance.

【2】 Similarities and Differences between Machine Learning and Traditional Advanced Statistical Modeling in Healthcare Analytics
标题：医疗分析中机器学习与传统高级统计建模的异同
链接：https://arxiv.org/abs/2201.02469

作者：Michele Bennett,Karin Hayes,Ewa J. Kleczyk,Rajesh Mehta
机构： Kleczyk is also an Affiliated Graduate Faculty in the School of Economics atThe University of Maine, and Business Analytics at Grand Canyon University • Competing Interest
备注：16 pages, 2 figures
摘要：数据科学家和统计学家在确定解决分析难题的最佳方法（机器学习或统计建模）时经常会产生分歧。然而，在分析战场的不同方面，机器学习和统计建模是近亲，而不是对手。选择这两种方法或在某些情况下同时使用这两种方法是基于待解决的问题和所需的结果，以及可供使用的数据和分析环境。机器学习和统计建模是互补的，基于相似的数学原理，但只是在总体分析知识库中使用不同的工具。确定主要方法应基于待解决的问题以及经验证据，如数据的大小和完整性、变量数量、假设或缺乏，以及预期结果，如预测或因果关系。优秀的分析师和数据科学家应该精通这两种技术及其正确应用，从而为正确的项目使用正确的工具来实现预期的结果。
摘要：Data scientists and statisticians are often at odds when determining the best approach, machine learning or statistical modeling, to solve an analytics challenge. However, machine learning and statistical modeling are more cousins than adversaries on different sides of an analysis battleground. Choosing between the two approaches or in some cases using both is based on the problem to be solved and outcomes required as well as the data available for use and circumstances of the analysis. Machine learning and statistical modeling are complementary, based on similar mathematical principles, but simply using different tools in an overall analytics knowledge base. Determining the predominant approach should be based on the problem to be solved as well as empirical evidence, such as size and completeness of the data, number of variables, assumptions or lack thereof, and expected outcomes such as predictions or causality. Good analysts and data scientists should be well versed in both techniques and their proper application, thereby using the right tool for the right project to achieve the desired results.

【3】 Mirror Learning: A Unifying Framework of Policy Optimisation
标题：镜像学习：政策优化的统一框架
链接：https://arxiv.org/abs/2201.02373

作者：Jakub Grudzien Kuba,Christian Schroeder de Witt,Jakob Foerster
机构： Thegeneralisation capabilities of neural networks (Goodfellow 1Department of Statistics, University of Oxford 2Departmentof Engineering Science
摘要：一般策略改进（GPI）和信赖域学习（TRL）是当代强化学习（RL）的主要框架，是解决马尔可夫决策过程（MDP）的核心模型。不幸的是，在数学形式上，它们对修改很敏感，因此，实现它们的实际实例化不会自动继承它们的改进保证。因此，可用的严格MDP解算器的范围很窄。事实上，许多最先进的（SOTA）算法，如TRPO和PPO，都没有被证明是收敛的。在本文中，我们提出了\textsl{mirror learning}——RL问题的一般解决方案。我们揭示了GPI和TRL在这个更大的算法空间中只是一个小点，它具有单调改进特性并收敛到最优策略。我们表明，几乎所有用于RL的SOTA算法都是镜像学习的实例，因此表明它们的经验性能是其理论特性的结果，而不是近似类比的结果。令人兴奋的是，我们证明了镜像学习为具有收敛保证的策略学习方法开辟了一个全新的空间。
摘要：General policy improvement (GPI) and trust-region learning (TRL) are the predominant frameworks within contemporary reinforcement learning (RL), which serve as the core models for solving Markov decision processes (MDPs). Unfortunately, in their mathematical form, they are sensitive to modifications, and thus, the practical instantiations that implement them do not automatically inherit their improvement guarantees. As a result, the spectrum of available rigorous MDP-solvers is narrow. Indeed, many state-of-the-art (SOTA) algorithms, such as TRPO and PPO, are not proven to converge. In this paper, we propose \textsl{mirror learning} -- a general solution to the RL problem. We reveal GPI and TRL to be but small points within this far greater space of algorithms which boasts the monotonic improvement property and converges to the optimal policy. We show that virtually all SOTA algorithms for RL are instances of mirror learning, and thus suggest that their empirical performance is a consequence of their theoretical properties, rather than of approximate analogies. Excitingly, we show that mirror learning opens up a whole new space of policy learning methods with convergence guarantees.

【4】 GenLabel: Mixup Relabeling using Generative Models
标题：GenLabel：使用产生式模型的混合重标记
链接：https://arxiv.org/abs/2201.02354

作者：Jy-yong Sohn,Liang Shang,Hongxu Chen,Jaekyun Moon,Dimitris Papailiopoulos,Kangwook Lee
摘要：Mixup是一种数据增强方法，通过混合一对输入数据生成新的数据点。虽然混合通常会提高预测性能，但有时会降低性能。在本文中，我们首先通过对混合算法的理论和实证分析，找出造成这种现象的主要原因。为了解决这个问题，我们提出了GenLabel，这是一种简单而有效的重新标记算法，专为混音而设计。特别是，GenLabel通过使用生成模型学习类条件数据分布，帮助混合算法正确标记混合样本。通过广泛的理论和实证分析，我们表明，混搭技术与GenLabel技术结合使用，可以有效地解决上述现象，提高泛化性能和对抗鲁棒性。
摘要：Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for mixup. In particular, GenLabel helps the mixup algorithm correctly label mixup samples by learning the class-conditional data distribution using generative models. Via extensive theoretical and empirical analysis, we show that mixup, when used together with GenLabel, can effectively resolve the aforementioned phenomenon, improving the generalization performance and the adversarial robustness.

【5】 Learning Multi-Tasks with Inconsistent Labels by using Auxiliary Big Task
标题：利用辅助大任务学习标签不一致的多任务
链接：https://arxiv.org/abs/2201.02305

作者：Quan Feng,Songcan Chen
机构：College of Computer Science and Technology, Nanjing University of Aeronautics and, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing University of, Aeronautics and Astronautics, Nanjing, Jiangsu, China
摘要：多任务学习是通过在任务之间传递和利用公共知识来提高模型的性能。现有的MTL工作主要关注多个任务（MTs）之间的标签集通常相同的场景，因此它们可以用于跨任务学习。几乎很少有作品探讨这样的场景：每个任务只有少量的训练样本，而它们的标签集只是部分重叠，甚至没有重叠。由于这些任务之间的相关性信息较少，因此学习此类MTs更具挑战性。为此，我们提出了一个学习这些任务的框架，通过联合利用学习到的辅助大任务中的丰富信息，该辅助大任务有足够多的类来覆盖所有这些任务，以及这些部分重叠任务之间共享的信息。在我们使用学习到的辅助任务的相同神经网络结构来学习单个任务的实现中，关键思想是利用可用的标签信息自适应地修剪辅助网络的隐层神经元，为每个任务构建相应的网络，同时伴随着跨单个任务的联合学习。我们的实验结果表明，它的有效性与国家的最先进的方法相比。
摘要：Multi-task learning is to improve the performance of the model by transferring and exploiting common knowledge among tasks. Existing MTL works mainly focus on the scenario where label sets among multiple tasks (MTs) are usually the same, thus they can be utilized for learning across the tasks. While almost rare works explore the scenario where each task only has a small amount of training samples, and their label sets are just partially overlapped or even not. Learning such MTs is more challenging because of less correlation information available among these tasks. For this, we propose a framework to learn these tasks by jointly leveraging both abundant information from a learnt auxiliary big task with sufficiently many classes to cover those of all these tasks and the information shared among those partially-overlapped tasks. In our implementation of using the same neural network architecture of the learnt auxiliary task to learn individual tasks, the key idea is to utilize available label information to adaptively prune the hidden layer neurons of the auxiliary network to construct corresponding network for each task, while accompanying a joint learning across individual tasks. Our experimental results demonstrate its effectiveness in comparison with the state-of-the-art approaches.

【6】 Learning to be adversarially robust and differentially private
标题：学会变得相反的健壮和与众不同的私密
链接：https://arxiv.org/abs/2201.02265

作者：Jamie Hayes,Borja Balle,M. Pawan Kumar
备注：Preliminary work appeared at PPML 2021
摘要：我们研究的困难，在学习中产生的鲁棒性和差异私人优化。我们首先以线性可分数据上的一个简单的二元分类任务为例，研究了基于梯度下降的具有差异隐私的对抗训练的收敛性。我们比较了私人和非私人环境中对抗性风险和名义风险之间的差距，表明私人优化引入的数据维度相关项增加了学习稳健模型的难度。在此之后，我们讨论了对抗性训练和差分隐私对优化的影响，发现对抗性干扰的大小和差分隐私中的剪裁范数都会增加损失情况的曲率，这意味着泛化性能较差。
摘要：We study the difficulties in learning that arise from robust and differentially private optimization. We first study convergence of gradient descent based adversarial training with differential privacy, taking a simple binary classification task on linearly separable data as an illustrative example. We compare the gap between adversarial and nominal risk in both private and non-private settings, showing that the data dimensionality dependent term introduced by private optimization compounds the difficulties of learning a robust model. After this, we discuss what parts of adversarial training and differential privacy hurt optimization, identifying that the size of adversarial perturbation and clipping norm in differential privacy both increase the curvature of the loss landscape, implying poorer generalization performance.

【7】 ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks
标题：ITSA：立体匹配网络中自动回避捷径和区域泛化的信息论方法
链接：https://arxiv.org/abs/2201.02263

作者：WeiQin Chuah,Ruwan Tennakoon,Reza Hoseinnezhad,Alireza Bab-Hadiashar,David Suter
机构：RMIT University, Australia, Edith Cowan University (ECU), Australia†
备注：11 pages, 4 figures
摘要：仅在合成数据上训练的最先进的立体匹配网络通常无法推广到更具挑战性的真实数据域。在本文中，我们试图揭示阻碍网络跨领域推广的一个重要因素：通过快捷学习的视角。我们证明了立体匹配网络中特征表示的学习受到合成数据伪影（快捷属性）的严重影响。为了缓解这个问题，我们提出了一种信息论的快捷方式避免（ITSA）方法来自动限制与快捷方式相关的信息被编码到特征表示中。因此，我们提出的方法通过最小化潜在特征对输入变化的敏感性来学习鲁棒和快捷不变特征。为了避免直接输入灵敏度优化带来的高昂计算成本，我们提出了一种有效可行的算法来实现鲁棒性。我们表明，使用这种方法，纯基于合成数据训练的最先进的立体匹配网络可以有效地推广到具有挑战性和以前看不见的真实数据场景。重要的是，所提出的方法增强了合成训练网络的鲁棒性，使其在挑战域外立体数据集方面优于微调网络（在真实数据上）。
摘要：State-of-the-art stereo matching networks trained only on synthetic data often fail to generalize to more challenging real data domains. In this paper, we attempt to unfold an important factor that hinders the networks from generalizing across domains: through the lens of shortcut learning. We demonstrate that the learning of feature representations in stereo matching networks is heavily influenced by synthetic data artefacts (shortcut attributes). To mitigate this issue, we propose an Information-Theoretic Shortcut Avoidance~(ITSA) approach to automatically restrict shortcut-related information from being encoded into the feature representations. As a result, our proposed method learns robust and shortcut-invariant features by minimizing the sensitivity of latent features to input variations. To avoid the prohibitive computational cost of direct input sensitivity optimization, we propose an effective yet feasible algorithm to achieve robustness. We show that using this method, state-of-the-art stereo matching networks that are trained purely on synthetic data can effectively generalize to challenging and previously unseen real data scenarios. Importantly, the proposed method enhances the robustness of the synthetic trained networks to the point that they outperform their fine-tuned counterparts (on real data) for challenging out-of-domain stereo datasets.

【8】 Nonlocal Kernel Network (NKN): a Stable and Resolution-Independent Deep Neural Network
标题：非局部核网络(NKN)：一种稳定的与分辨率无关的深度神经网络
链接：https://arxiv.org/abs/2201.02217

作者：Huaiqian You,Yue Yu,Marta D'Elia,Tian Gao,Stewart Silling
机构：Department of Mathematics, Lehigh University, Bethlehem, PA, Computational Science and Analysis, Sandia National Laboratories, Livermore, CA, IBM Research, Yorktown Heights, NY, Center for Computing Research, Sandia National Laboratories, Albuquerque, NM
摘要：最近，神经算子已成为以神经网络形式设计函数空间之间解映射的流行工具。与经典的科学机器学习方法不同，传统的科学机器学习方法以固定的分辨率为单个输入参数实例学习已知偏微分方程（PDE）的参数，神经算子近似于一系列偏微分方程的解映射。尽管取得了成功，但迄今为止，神经算子的使用仅限于相对较浅的神经网络，并且仅限于学习隐藏的控制规律。在这项工作中，我们提出了一种新的非局部神经算子，我们称之为非局部核网络（nonlocalkernelnetwork，NKN），它与分辨率无关，以深度神经网络为特征，能够处理各种任务，如学习控制方程和分类图像。我们的NKN源于将神经网络解释为一个离散的非局部扩散反应方程，在无限层的限制下，该方程等价于一个抛物型非局部方程，其稳定性通过非局部向量演算进行分析。与神经算子积分形式的相似性使NKN能够捕获特征空间中的长期依赖性，而对节点间相互作用的连续处理使NKN的分辨率独立。与神经常微分方程的相似性，在非局部意义上重新解释，以及层间稳定的网络动力学，允许将NKN的最佳参数从浅网络推广到深网络。这一事实允许使用从浅到深的初始化技术。我们的测试表明，NKNs在学习控制方程和图像分类任务方面都优于基线方法，并且可以很好地推广到不同的分辨率和深度。
摘要：Neural operators have recently become popular tools for designing solution maps between function spaces in the form of neural networks. Differently from classical scientific machine learning approaches that learn parameters of a known partial differential equation (PDE) for a single instance of the input parameters at a fixed resolution, neural operators approximate the solution map of a family of PDEs. Despite their success, the uses of neural operators are so far restricted to relatively shallow neural networks and confined to learning hidden governing laws. In this work, we propose a novel nonlocal neural operator, which we refer to as nonlocal kernel network (NKN), that is resolution independent, characterized by deep neural networks, and capable of handling a variety of tasks such as learning governing equations and classifying images. Our NKN stems from the interpretation of the neural network as a discrete nonlocal diffusion reaction equation that, in the limit of infinite layers, is equivalent to a parabolic nonlocal equation, whose stability is analyzed via nonlocal vector calculus. The resemblance with integral forms of neural operators allows NKNs to capture long-range dependencies in the feature space, while the continuous treatment of node-to-node interactions makes NKNs resolution independent. The resemblance with neural ODEs, reinterpreted in a nonlocal sense, and the stable network dynamics between layers allow for generalization of NKN's optimal parameters from shallow to deep networks. This fact enables the use of shallow-to-deep initialization techniques. Our tests show that NKNs outperform baseline methods in both learning governing equations and image classification tasks and generalize well to different resolutions and depths.

【9】 Generalized quantum similarity learning
标题：广义量子相似学习
链接：https://arxiv.org/abs/2201.02310

作者：Santosh Kumar Radha,Casey Jao
机构：Agnostiq Inc., Front St W, Toronto, ON M,V ,Y
摘要：物体之间的相似性在很大范围内都很重要。虽然可以使用现成的距离函数来衡量相似性，但它们可能无法捕捉相似性的内在含义，而这往往取决于基础数据和任务。此外，传统的距离函数限制相似性度量的空间是对称的，并且不允许直接比较来自不同空间的对象。我们建议使用量子网络（GQSim）来学习不需要具有相同维度的数据之间的任务相关（a）对称相似性。我们通过分析（对于简单情况）和数值（对于复杂情况）分析了此类相似性函数的性质，并表明这些相似性度量可以提取数据的显著特征。我们还证明了使用该技术导出的相似性度量是$（\epsilon、\gamma、\tau）$-良好的，从而在理论上保证了性能。最后，我们将此技术应用于三个相关应用——分类、图完成和生成建模。
摘要：The similarity between objects is significant in a broad range of areas. While similarity can be measured using off-the-shelf distance functions, they may fail to capture the inherent meaning of similarity, which tends to depend on the underlying data and task. Moreover, conventional distance functions limit the space of similarity measures to be symmetric and do not directly allow comparing objects from different spaces. We propose using quantum networks (GQSim) for learning task-dependent (a)symmetric similarity between data that need not have the same dimensionality. We analyze the properties of such similarity function analytically (for a simple case) and numerically (for a complex case) and showthat these similarity measures can extract salient features of the data. We also demonstrate that the similarity measure derived using this technique is $(\epsilon,\gamma,\tau)$-good, resulting in theoretically guaranteed performance. Finally, we conclude by applying this technique for three relevant applications - Classification, Graph Completion, Generative modeling.

【10】 GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks
标题：GCWSNet：可扩展精确训练神经网络的广义一致加权抽样
链接：https://arxiv.org/abs/2201.02283

作者：Ping Li,Weijie Zhao
机构：Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA
摘要：我们开发了“广义一致加权采样”（GCWS），用于对“powered GMM”（pGMM）内核（带有一个调优参数$p$）进行散列。事实证明，GCWS提供了一个数值稳定的方案，用于对原始数据应用功率变换，而不管$p$的大小和数据。功率变换通常能有效地提高性能，在许多情况下效果显著。我们将散列数据提供给各种公共分类数据集上的神经网络，并将我们的方法命名为“GCWSNet”。我们的大量实验表明，GCWSNet通常可以提高分类精度。此外，从实验中可以明显看出，GCWSNet的收敛速度大大加快。事实上，GCWS通常只需（少于）训练过程的一个历元即可达到合理的精度。这种特性是非常理想的，因为许多应用程序，如广告点击率（CTR）预测模型或数据流（即只看到一次的数据），通常只训练一个历元。另一个有益的副作用是，神经网络第一层的计算变成了加法而不是乘法，因为输入数据变成了二进制（并且非常稀疏）。提供了与（归一化）随机傅里叶特征（NRFF）的经验比较。我们还建议通过计数草图来减小GCWSNet的模型尺寸，并发展了分析使用计数草图对GCWS精度影响的理论。我们的分析表明，“8位”策略应该可以很好地工作，因为我们总是可以在GCWS哈希的输出上应用8位计数草图哈希，而不会对精度造成太大的影响。在训练深度神经网络时，还有许多其他方法可以利用GCWS。例如，可以对最后一层的输出应用GCW，以提高经过训练的深层神经网络的准确性。
摘要：We develop the "generalized consistent weighted sampling" (GCWS) for hashing the "powered-GMM" (pGMM) kernel (with a tuning parameter $p$). It turns out that GCWS provides a numerically stable scheme for applying power transformation on the original data, regardless of the magnitude of $p$ and the data. The power transformation is often effective for boosting the performance, in many cases considerably so. We feed the hashed data to neural networks on a variety of public classification datasets and name our method ``GCWSNet''. Our extensive experiments show that GCWSNet often improves the classification accuracy. Furthermore, it is evident from the experiments that GCWSNet converges substantially faster. In fact, GCWS often reaches a reasonable accuracy with merely (less than) one epoch of the training process. This property is much desired because many applications, such as advertisement click-through rate (CTR) prediction models, or data streams (i.e., data seen only once), often train just one epoch. Another beneficial side effect is that the computations of the first layer of the neural networks become additions instead of multiplications because the input data become binary (and highly sparse). Empirical comparisons with (normalized) random Fourier features (NRFF) are provided. We also propose to reduce the model size of GCWSNet by count-sketch and develop the theory for analyzing the impact of using count-sketch on the accuracy of GCWS. Our analysis shows that an ``8-bit'' strategy should work well in that we can always apply an 8-bit count-sketch hashing on the output of GCWS hashing without hurting the accuracy much. There are many other ways to take advantage of GCWS when training deep neural networks. For example, one can apply GCWS on the outputs of the last layer to boost the accuracy of trained deep neural networks.

【11】 Comprehensive RF Dataset Collection and Release: A Deep Learning-Based Device Fingerprinting Use Case
标题：全面的射频数据集收集和发布：基于深度学习的设备指纹识别使用案例
链接：https://arxiv.org/abs/2201.02213

作者：Abdurrahman Elmaghbub,Bechir Hamdaoui
备注：This paper has been presented in IEEE GLOBECOM Workshop 2021
摘要：基于深度学习的射频指纹识别最近被认为是一种潜在的解决方案，用于支持新兴的无线网络应用，如频谱访问策略实施、自动网络设备认证和未经授权的网络访问监控。现在比以往任何时候都更需要真实、全面的射频数据集，以便对新开发的射频指纹识别方法进行研究、评估和验证。在本文中，我们展示并发布了一个大规模射频指纹数据集，该数据集收集自25个使用USRP B210接收机的支持LoRa的物联网传输设备。我们的数据集由大量符合SigMF的二进制文件组成，这些文件表示I/Q时域样本及其相应的基于FFT的LoRa传输文件。该数据集提供了一套全面的基本实验场景，考虑了室内和室外环境以及各种网络部署和配置，如发射机和接收机之间的距离、所考虑的LoRa调制配置、所进行实验的物理位置，以及用于训练和测试神经网络模型的接收器硬件。
摘要：Deep learning-based RF fingerprinting has recently been recognized as a potential solution for enabling newly emerging wireless network applications, such as spectrum access policy enforcement, automated network device authentication, and unauthorized network access monitoring and control. Real, comprehensive RF datasets are now needed more than ever to enable the study, assessment, and validation of newly developed RF fingerprinting approaches. In this paper, we present and release a large-scale RF fingerprinting dataset, collected from 25 different LoRa-enabled IoT transmitting devices using USRP B210 receivers. Our dataset consists of a large number of SigMF-compliant binary files representing the I/Q time-domain samples and their corresponding FFT-based files of LoRa transmissions. This dataset provides a comprehensive set of essential experimental scenarios, considering both indoor and outdoor environments and various network deployments and configurations, such as the distance between the transmitters and the receiver, the configuration of the considered LoRa modulation, the physical location of the conducted experiment, and the receiver hardware used for training and testing the neural network models.

【12】 Inferring Turbulent Parameters via Machine Learning
标题：基于机器学习的湍流参数推断
链接：https://arxiv.org/abs/2201.00732

作者：Michele Buzzicotti,Fabio Bonaccorso,Luca Biferale
摘要：我们设计了一种机器学习技术来解决从湍流观测中推断物理参数的一般问题，这是从工程到地球观测和天体物理学的许多理论和应用领域中的一项相关练习。我们的方法是训练机器学习系统，通过观察从3d域提取的2d平面上的流体速度振幅，回归流体参考框架的旋转频率。机器学习方法由计算机视觉中开发的同类深卷积神经网络（DCNN）组成。通过完全解析的直接数值模拟生成训练和验证数据集。这项研究从两个不同的角度显示了有趣的结果。从机器学习的角度来看，它显示了DCNN的潜力，在这样一个特别复杂的问题上取得了很好的结果，这远远超出了人类视觉的极限。第二，从物理学的角度来看，它提供了一个例子，说明如何在数据分析中利用机器学习来推断否则无法获得的信息。事实上，通过将DCNN与其他可能的贝叶斯方法进行比较，我们发现DCNN在所有检查案例中都具有更高的推理精度。
摘要：We design a machine learning technique to solve the general problem of inferring physical parameters from the observation of turbulent flows, a relevant exercise in many theoretical and applied fields, from engineering to earth observation and astrophysics. Our approach is to train the machine learning system to regress the rotation frequency of the flow's reference frame, from the observation of the flow's velocity amplitude on a 2d plane extracted from the 3d domain. The machine learning approach consists of a Deep Convolutional Neural Network (DCNN) of the same kind developed in computer vision. The training and validation datasets are produced by means of fully resolved direct numerical simulations. This study shows interesting results from two different points of view. From the machine learning point of view it shows the potential of DCNN, reaching good results on such a particularly complex problem that goes well outside the limits of human vision. Second, from the physics point of view, it provides an example on how machine learning can be exploited in data analysis to infer information that would be inaccessible otherwise. Indeed, by comparing DCNN with the other possible Bayesian approaches, we find that DCNN yields to a much higher inference accuracy in all the examined cases.

其他(6篇)

【1】 Generalized Category Discovery
标题：广义范畴发现
链接：https://arxiv.org/abs/2201.02609

作者：Sagar Vaze,Kai Han,Andrea Vedaldi,Andrew Zisserman
机构：⋆Visual Geometry Group, University of Oxford, †The University of Hong Kong, Setting: Generalized Category Discovery, Method
备注：13 pages, 6 figures
摘要：在本文中，我们考虑一个高度通用的图像识别设置，其中，给定一个标签和未标记的图像集，任务是分类未标记的集合中的所有图像。这里，未标记的图像可能来自标记类或新类。现有的识别方法无法处理这种设置，因为它们做出了一些限制性的假设，例如仅来自已知或未知类的未标记实例以及先验已知的未知类的数量。我们解决了更不受约束的设置，将其命名为“广义类别发现”，并挑战所有这些假设。我们首先通过从新的类别发现中获取最先进的算法并使其适应此任务来建立强大的基线。接下来，我们建议在这个开放的环境中使用视觉转换器和对比表征学习。然后，我们介绍了一种简单而有效的半监督$k$均值方法，将未标记的数据自动聚类到可见和不可见的类中，大大优于基线。最后，我们还提出了一种新的方法来估计未标记数据中的类数。我们全面评估了我们在公共数据集上的方法，用于通用对象分类，包括CIFAR10、CIFAR100和ImageNet-100，以及细粒度视觉识别，包括CUB、斯坦福汽车和Herbarium M19，在此新设置上进行基准测试，以促进未来的研究。
摘要：In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from novel ones. Existing recognition methods are not able to deal with this setting, because they make several restrictive assumptions, such as the unlabelled instances only coming from known - or unknown - classes and the number of unknown classes being known a-priori. We address the more unconstrained setting, naming it 'Generalized Category Discovery', and challenge all these assumptions. We first establish strong baselines by taking state-of-the-art algorithms from novel category discovery and adapting them for this task. Next, we propose the use of vision transformers with contrastive representation learning for this open world setting. We then introduce a simple yet effective semi-supervised $k$-means method to cluster the unlabelled data into seen and unseen classes automatically, substantially outperforming the baselines. Finally, we also propose a new approach to estimate the number of classes in the unlabelled data. We thoroughly evaluate our approach on public datasets for generic object classification including CIFAR10, CIFAR100 and ImageNet-100, and for fine-grained visual recognition including CUB, Stanford Cars and Herbarium19, benchmarking on this new setting to foster future research.

【2】 Sparse PCA on fixed-rank matrices
标题：固定秩矩阵上的稀疏PCA
链接：https://arxiv.org/abs/2201.02487

作者：Alberto Del Pia
备注：None
摘要：稀疏主成分分析是通过在主成分上增加稀疏性约束，从主成分分析得到的优化问题。稀疏主成分分析是NP难问题，即使在单分量情况下也难以逼近。在本文中，我们解决了稀疏PCA的计算复杂性与协方差矩阵的秩有关。我们证明了，如果协方差矩阵的秩是一个固定值，则有一个算法可将稀疏PCA求解为全局最优，其运行时间为特征数的多项式。我们还证明了稀疏PCA版本的类似结果，该版本要求主成分具有不相交的支持。
摘要：Sparse PCA is the optimization problem obtained from PCA by adding a sparsity constraint on the principal components. Sparse PCA is NP-hard and hard to approximate even in the single-component case. In this paper we settle the computational complexity of sparse PCA with respect to the rank of the covariance matrix. We show that, if the rank of the covariance matrix is a fixed value, then there is an algorithm that solves sparse PCA to global optimality, whose running time is polynomial in the number of features. We also prove a similar result for the version of sparse PCA which requires the principal components to have disjoint supports.

【3】 Neural calibration of hidden inhomogeneous Markov chains -- Information decompression in life insurance
标题：隐含非齐次马氏链的神经标定--人寿保险中的信息解压缩
链接：https://arxiv.org/abs/2201.02397

作者：Mark Kiermayer,Christian Weiß
机构：University of Applied Sciences Ruhr West, Department of Natural Sciences
摘要：马尔可夫链在包括人寿保险数学在内的许多领域发挥着关键作用。作为保费价值的标准精算量可以解释为关于潜在马尔可夫过程的压缩、有损信息。我们介绍了一种方法来重建潜在的马尔可夫链给定的集体信息的合同组合。我们的神经结构通过明确提供一步转移概率来解释该过程的特征。此外，我们提供了一个内在的、经济的模型验证来检验信息解压的质量。最后，我们的方法在德国定期人寿保险合同的实际数据集上得到了成功验证。
摘要：Markov chains play a key role in a vast number of areas, including life insurance mathematics. Standard actuarial quantities as the premium value can be interpreted as compressed, lossy information about the underlying Markov process. We introduce a method to reconstruct the underlying Markov chain given collective information of a portfolio of contracts. Our neural architecture explainably characterizes the process by explicitly providing one-step transition probabilities. Further, we provide an intrinsic, economic model validation to inspect the quality of the information decompression. Lastly, our methodology is successfully tested for a realistic data set of German term life insurance contracts.

【4】 Stochastic Saddle Point Problems with Decision-Dependent Distributions
标题：决策相关分布的随机鞍点问题
链接：https://arxiv.org/abs/2201.02313

作者：Killian Wood,Emiliano Dall'Anese
摘要：本文主要研究在静态和时变环境下具有决策依赖分布的随机鞍点问题。这些问题的目标是随机支付函数的期望值，其中随机变量是从分布图所诱导的分布中提取的。对于一般分布映射，即使分布已知，寻找鞍点的问题在一般情况下也是计算繁重的。为了实现一种易于处理的解决方法，我们引入了平衡点的概念——它们是平稳随机极大极小问题的鞍点——并提供了它们存在和唯一的条件。我们证明了当目标具有强凸-强凹支付和Lipschitz连续分布映射时，两类解之间的距离是有界的。我们发展了确定性和随机原始对偶算法，并证明了它们收敛到平衡点。特别是，通过将随机梯度估计器产生的误差建模为次威布尔随机变量，我们提供了适用于每次迭代的预期误差界和高概率误差界；此外，我们证明了收敛到一个邻域的期望和几乎肯定。最后，我们研究了分布图上的一个条件——我们称之为反向混合优势——确保目标是强凸-强凹的。在此假设下，我们证明了原对偶算法以类似的方式收敛到鞍点。
摘要：This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.

【5】 A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation
标题：用于离线政策评估的几乎无超参数超参数选择方法的理论框架
链接：https://arxiv.org/abs/2201.02300

作者：Kohei Miyaguchi
机构：IBM Research - Tokyo
备注：AAAI22-AI4DO (workshop)
摘要：我们关注离线策略评估（OPE）的超参数选择问题。OPE是离线强化学习的关键组成部分，它是无环境模拟器的数据驱动决策优化的核心技术。然而，目前最先进的OPE方法并不是无超参数的，这破坏了它们在实际应用中的实用性。我们通过为OPE引入一个新的近似超参数选择（AHS）框架来解决这个问题，该框架以定量和可解释的方式定义了一个最优性概念（称为选择标准），没有超参数。然后，我们推导了四种AHS方法，每种方法都具有不同的特性，如收敛速度和时间复杂度。最后，通过初步实验验证了这些方法的有效性和局限性。
摘要：We are concerned with the problem of hyperparameter selection of offline policy evaluation (OPE). OPE is a key component of offline reinforcement learning, which is a core technology for data-driven decision optimization without environment simulators. However, the current state-of-the-art OPE methods are not hyperparameter-free, which undermines their utility in real-life applications. We address this issue by introducing a new approximate hyperparameter selection (AHS) framework for OPE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner without hyperparameters. We then derive four AHS methods each of which has different characteristics such as convergence rate and time complexity. Finally, we verify effectiveness and limitation of these methods with a preliminary experiment.

【6】 PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences
标题：PWM2Vec：一种有效的从冠状病毒棘突序列中嵌入病毒宿主的方法
链接：https://arxiv.org/abs/2201.02273

作者：Sarwan Ali,Babatunde Bello,Prakash Chourasia,Ria Thazhe Punathil,Yijing Zhou,Murray Patterson
摘要：2019冠状病毒疾病流行，目前尚不清楚，是一个重要的开放性问题。有人推测蝙蝠是可能的起源。同样，也有许多密切相关的（冠冕）病毒，如SARS，被发现通过麝香猫传播。研究可能成为人类致命病毒潜在携带者和传播者的不同宿主，对于理解、减轻和预防当前和未来的流行病至关重要。在冠状病毒中，表面蛋白或棘突蛋白是决定宿主特异性的重要部分，因为它是病毒与宿主细胞膜之间的接触点。在这篇论文中，我们根据其棘突蛋白序列对5000多种冠状病毒的宿主进行分类，将它们分成不同的宿主群，包括鸟类、蝙蝠、骆驼、猪、人类和鼬鼠等。我们提出了一种基于众所周知的位置权重矩阵（PWM）的特征嵌入方法，我们称之为PWM2Vec，并使用该方法从这些冠状病毒的棘波蛋白序列生成特征向量。虽然我们的植入灵感来自PWMs在生物应用中的成功，如确定蛋白质功能或识别转录因子结合位点，但我们是第一个（据我们所知）在病毒序列的宿主分类中使用PWMs生成固定长度特征向量表示的人。真实数据的结果表明，在使用PWM2Vec时，与基线模型相比，我们能够表现得相当好。我们还利用信息增益来测量不同氨基酸的重要性，以显示对预测特定冠状病毒宿主很重要的氨基酸。
摘要：COVID-19 pandemic, is still unknown and is an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona-) viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating and preventing current and future pandemics. In coronaviruses, the surface (S) protein, or spike protein, is an important part of determining host specificity since it is the point of contact between the virus and the host cell membrane. In this paper, we classify the hosts of over five thousand coronaviruses from their spike protein sequences, segregating them into clusters of distinct hosts among avians, bats, camels, swines, humans and weasels, to name a few. We propose a feature embedding based on the well-known position-weight matrix (PWM), which we call PWM2Vec, and use to generate feature vectors from the spike protein sequences of these coronaviruses. While our embedding is inspired by the success of PWMs in biological applications such as determining protein function, or identifying transcription factor binding sites, we are the first (to the best of our knowledge) to use PWMs in the context of host classification from viral sequences to generate a fixed-length feature vector representation. The results on the real world data show that in using PWM2Vec, we are able to perform comparably well as compared to baseline models. We also measure the importance of different amino acids using information gain to show the amino acids which are important for predicting the host of a given coronavirus.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递

“家属和记者取得联系”：记者的退场意味深长

李尚福、魏凤和双双被拿下，与美国一份报告是否有关？

这位副市长，跨省升正厅

女主播性感斗舞，直播间惨遭拿下！知名团播整大活，邀女嘉宾家人做节目

要么空仓！要么盯紧这个！

机器学习学术速递[1.10]

您可能也对以下帖子感兴趣

“家属和记者取得联系”：记者的退场意味深长

李尚福、魏凤和双双被拿下，与美国一份报告是否有关？

这位副市长，跨省升正厅

女主播性感斗舞，直播间惨遭拿下！知名团播整大活，邀女嘉宾家人做节目

要么空仓！要么盯紧这个！

生成图片，分享到微信朋友圈

机器学习学术速递[1.10]

您可能也对以下帖子感兴趣