预测性方法保障油气开采生命线-石油圈

稳定高效的油气开采可离不开压缩机这个关键设备！

来自 | JPT
编译 | TOM 惊蛰

压缩机系统在油气生产中地位非常重要，大部分石油石化设备都离不开气体压缩机，该技术也被成为油气开采的生命线，如何保持其高效平稳运行是行业关注的重点。利用先进的机器学习方法，可以确定异常情况，预测潜在的压缩机跳机及原因，并提出足够的警告，以便进行干预。这种预测性维护方法有可能减少旋转设备故障造成的停机时间。

利用机器学习系统首当其冲就是训练模型，以识别正常与异常的条件。然后，利用该模型将设备的实时数据进行分类，指出设备性能何时偏离了确定的稳定状态。识别异常的能力是该方法与传统监测工具的主要区别。随着数字技术的进步，可以在几分钟内实现对比与警告，使工程师在收到故障警告时采取适当的预防措施。

研究人员根据某地区2016年的历史数据，分析了该系统在预测故障时的效率。还处于概念验证阶段的系统在一年内正确地预测了11次事故，而在这段时间内共发生了23次故障，成功预测率将近50%。更重要的发现之一是，机器学习模型提前数小时预测了故障。其中一次，提前了36小时。其他8个预测中，通知时间也平均提前了约7小时。

本文中将支持向量机（SVM）作为检测机器异常状态的分类器。利用SVMs进行二类分类。一些学者认为，与线性判别分析和反向传播神经网络等技术相比，SVM分类器具有更好的分类效果。

SVM的一个特性是它们可以通过非线性函数，将数据投影到更高维度的空间，来创建非线性决策边界。单类SVM可创建一个二进制函数，捕获输入空间中存在大部分数据的区域。由此形成的函数，对于由训练数据点定义的区域返回+1，而在其他区域返回-1。

低压压缩机是一个关键生产设备，当工程师发现异常时，通常会选择绕过或手动关闭。这是数据集中不含有实际跳机，但包括许多手动停机的主要原因。例如， 2017年数据的分析显示，31个记录的延期，4个是手动停机，7个是流程待机，19个被归类为故障。在这19次故障中，只有6次是由低压压缩机（LPC）的某些故障引起的，其余13次是由LPC上游子系统的问题引起的。无法区分LPC的所有失效模式与LPC的重复失效模式。因此，选择单类SVM作为模拟LPC正常工作的合适策略。无论故障模式如何，这都可以识别任何异常事件。

为了训练用于LPC以及其作业流程的SVM，确定了大约300个模拟标签。模拟标签比数字标签少，但数据点是连续的而不是离散的。这300个标签分为两个模型，LPC以及作业流程。LPC模型由近230个输入标签组成，流程模型由近70个输入标签组成。流程模型还包括许多子系统。两种模型的所有标签在被送入相应的SVM模型之前，都经过了一定的预处理步骤。

为了确定LPC的正常工作状态，研究人员主要关注了两种策略:

事件日志。LPC上的每个事件都被记录为错误，然后归类为故障或其他类型的延迟。
流程稳定状态。这是根据关键阀门的位置来决定的，这些阀门可表明LPC是否在线并处于生产状态。此外，还可及时为LPC与流程模型创建确定的边界。对于LPC模型，发生延迟2小时之前与2小时之后的数据点被认为是异常的。对于流程模型，采用了6小时的边界余量。

表现最好的单类模型是基于流程稳态来定义机器的正常工作状态，并且滚动窗口上的数据聚类作为延迟来计算变化率，计算值可被用于训练LPC与流程的SVM模型。每10分钟输入一次测试数据流，并执行用于训练模型的相同预处理步骤。1周的滚动窗口被确定为平滑数据以及计算当前观测到的变化率的最佳措施。模型输出包含一个简化的标记列表，LPC模型大约200个标记，流程模型大约30个标记。通过删除低方差标记以及删除警告的预处理步骤，来解释标记数量减少的原因。SVM模型的输出本质上是布尔值，表示数据点是正常的还是异常的。这种连续的输出流几乎可以实时地显示LPC及其流程的状态。

当单类SVM模型从正常翻转变为异常(从真变为假)时，它就被认为是LPC状态的警告或变化。每当发生上述翻转时，所有标签的所有值都按降序排列，并选择与报告前10个标记。根本原因识别是一种派生机制，它将SVM输出视为唯一输入。

LPC的性能数据存储在基于时间的数据库中。可在商业软件中构建工作流程，以执行端到端提取/变换/加载以及数据处理任务。该工作流程部署在服务器上，每10分钟运行一次。每次运行工作流程时，软件程序会以1分钟的频率为大约300个选定的压缩机以及流程变量输入前10分钟的数据。这些数据通过各种模块传递，以进行筛选、转换与验证。然后，数据通过工作流程中嵌入的模块进行传递；利用标准库，结合SVM算法，将压缩机数据生成二进制的异常值分类。再之，软件将结果输出到结构化查询语言数据库，结果只是表示故障预测的布尔值true或false。当结果从true变为false并保持false超过10分钟时，系统会发送一封电子邮件来警告出现了故障预测。

该方法还可精确识别故障发生的根本原因，并且在遇到类似故障时，可清晰显示出故障模式。例如，反馈臂松动，无法正确控制流量，而导致跳机。在这种情况下，算法将“LPC标准流速”识别为最主要的根本原因。在一个案例中，警报几乎提前12个小时发出，而在另一个案例中，只有几分钟。这归因于手动干预失败并导致跳机。在大多数情况下，工程师需要有足够的警告时间来采取预防措施。

For English, Please click here (展开/收缩)

This paper focuses on compressor systems associated with major production deferments. An advanced machine-learning approach is presented for determining anomalous behavior to predict a potential trip and probable root cause with sufficient warning to allow for intervention. This predictive-maintenance approach has the potential to reduce downtime associated with rotating-equipment failures.

Introduction

The first step in using a machine-learning system is to train the model to identify normal and abnormal operating conditions. The model can then classify real-time data from the equipment and indicate when the equipment’s performance strays outside the identified steady state. The ability to identify anomalies is a major difference between the proposed approach and traditional monitoring tools. With advances in digital technologies, correlations and warnings can be achieved in a matter of minutes, allowing engineers to take appropriate preventative action when they receive a failure warning.

The authors used historical data for 2016 in their analysis of system efficiency in predicting failures. The proof-of-concept system correctly predicted 11 trip events over the course of the year, almost 50% of the 23 failures that occurred during that period. One of the more important findings was that the machine-learning model predicted many failures hours in advance. In one case, it gave 36 hours’ notice. The median period of notice for eight events that were subsequently analyzed was approximately 7 hours.

Support Vector Machines (SVMs)

SVMs are used in this study as a classifier for detecting abnormal machine states. SVMs were developed for binary classification. Some authors have argued that the SVM classifier has better results compared with techniques such as linear discriminant analysis and back-propagation neural networks.

The compressor usually operates under normal working conditions. This poses a highly unbalanced problem for a two-class classification. Because of this, one-class classification using an SVM is implemented. The algorithm is trained on only normal data and creates a representation of this data. When the new points inferred are substantially different from the modeled class, they are labeled as outliers. Linear as well as radial kernel functions are explored.

One of the properties of SVMs is that they may create a nonlinear decision boundary by projecting the data through nonlinear function to a space in higher dimension (Fig. 1).The one-class SVM creates a binary function that captures the region in the input space where most of the data exist. The function, thus formed, returns +1 for the region defined by training data points and –1 everywhere else.

The low-pressure compressor, a production-critical piece of equipment, is bypassed or manually shut down when engineers notice an anomaly. This is the primary reason for having a data set that does not include real trips but many manual shutdowns. For instance, in an analysis of 2017 data, of 31 documented deferments, four were manual shutdowns, seven were process standby, and 19 were classified as breakdowns. Of these 19 breakdowns, only six were caused by some fault with the low-pressure compressor (LPC) and the remaining 13 were the result of problems with subsystems upstream of the LPC. It was, therefore, not possible to discern all modes of failure of the LPC or repetitive modes of failures of the LPC. Accordingly, a one-class SVM was chosen as an appropriate strategy to model the normal working of the LPC. This would enable identification of any abnormal event, irrespective of the mode of failure.

To train the SVM for the LPC and its processes, approximately 300 analog tags were identified. Analog tags are fewer than digital tags, but the data points are continuous and not discrete. These 300 tags were split into two models, LPC and process. The LPC model consisted of almost 230 input tags, and the process model input consisted of almost 70 tags. The process model included a number of subsystems. Respective tags of both models underwent certain preprocessing steps before they were fed into the respective SVM models.

To identify the normal working condition of the LPC, two strategies were explored:

Events log. Every event on the LPC is recorded as an error and later classified as a breakdown or other type of deferment.
Process steady state. This is determined on the basis of positioning of key valves that indicate the LPC is online and producing. Furthermore, it was decided to create certain margins in time for both the LPC and process models. For the LPC model, data points up to 2 hours before and after the deferment were considered as abnormal. For the process model, a margin of 6 hours was used.

Feature Engineering. Several methods, outlined in the complete paper, were adopted to generate features other than the raw data to train better and improve the accuracy of SVM models.

The best-performing one-class SVM model is based on the process steady state to define the normal working condition of the machine, and the aggregation of data over the rolling window as lag to calculate rate of change is used as a feature to train the SVM for both the LPC and process models. The test data stream is input every 10 minutes, and the same preprocessing steps used to train the model are performed. A rolling window of 1 week is identified as the best measure to smooth the data and to calculate the rate of change from the current observation. Model output contains a reduced list of tags, approximately 200 for the LPC model and 30 for the process model. The reduction in the number of tags is explained by the preprocessing steps Removal of Low-Variance Tags and Removal of Alarms. The output of the SVM model is Boolean in nature, indicating a data point as either normal or abnormal. This continuous stream of output indicates the state of the LPC and its process in near-real time.

Whenever the one-class SVM model flips from normal to abnormal (from true to false), it is considered as an alert or a change in the state of the LPC. Whenever a flip is encountered, all the values of all tags are placed in descending order and the top 10 tags are picked and reported. The root-cause identification is a derived mechanism that considers the SVM output as the only input.

The performance data for the LPC are stored in a time-series database. A work flow was built in a commercial software program to carry out the end-to-end extract/transform/load and data-processing task. This work flow is deployed on a server and run every 10 minutes. Each time the workflow is run, the software program intakes the previous 10 minutes of data, at 1-minute frequencies, for the approximately 300 selected compressor and process tags. These data are passed through various modules to be cleaned, transformed, and validated. The data are then passed through modules embedded within the work flow; standard libraries are used to generate the binary outlier classifications from the compressor data with an SVM algorithm. The software then outputs the results, which are simply a Boolean true or false indicating failure prediction, to a structured-query-language database. When the results change from true to false and stay false for more than 10 minutes, the software sends an email to alert that there has been a failure prediction.

The continuous output from the two SVM models can be fed into any dashboard for visual inspection. An autogenerated email is also configured to alert engineers about the state of the LPC and the top 10 tags that can be considered potential root causes. Accuracy is measured on the basis of the number of flips generated by the model in each time period (e.g., 1 week or 1 day). The model should not flip or raise alerts too frequently so that it does not generate random noise and, at the same time, should flag the trips sufficiently in advance for the engineers to act. The current best model for an LPC generated approximately 70 flips over a 6-month period—on average, 11 alerts a month, or two alerts a week. However, it was noted that the number of alerts piled up toward an impending trip, indicating a buildup. Therefore, the random noise is fewer than two alerts a week.

The root-cause identification has performed well and showed clear patterns when similar types of failures were encountered. For instance, the feedback arm came loose and flow could not be controlled properly, leading to a trip. In such instances, “LPC Standard Flow” was identified by the algorithm as the top root cause. In one instance, the alert was given almost 12 hours in advance, whereas, in others, it was just few minutes. This is attributed to manual intervention that failed and resulted in trips. In most cases, sufficient warning time exists for the engineers to take preventive measures.

未经允许，不得转载本站任何文章：

相关推荐