采收率算不准？来试试人工神经网络模型(下)-石油圈

摘要：人工神经网络（ANN）是目前最为炙手可热的建模方法，如何用它来计算采收率？效果又会如何呢？本文将为您介绍ANN原油采收率模型的详细内容。

编译丨TOM

在构建ANN时，存在两个重要的未知因素：隐藏层的数量、模型中包含的神经元数量。常用方法是使ANN尽可能的简单。对于大多数问题，一个隐藏层就足够了。利用数据质量良好的数据集，就可以使神经元的数量最小化，使它们小于或等于模型的输入数（计算原油采收率需要5个输入数）。

不幸的是，研究发现，当使用少量神经元（小于等于5）时，ANN无法处理采收率问题。具体而言，由于神经元数量较低，该神经网络无法模拟出高采收率与低采收率的趋势，如下图所示。

由于神经网络在拥有少量神经元的情况下，很难找到一个能够涵盖所有采收率的最优解，因此建立并评估了多个ANN模型。目标是找出最佳数量的神经元，以生成一个神经网络。该神经网络产生的数值，需要能够涵盖所有采收率。下图展示了使用不同数量神经元构建的ANN的斜率以及R2相关因子（实际采收率与ANN模拟采收率）。

在斟酌ANN中神经元的数量时，需要注意的是，数据可能会被过度拟合。神经网络训练是将数据拟合到复杂数学函数的过程。如果使用大量神经元，函数会过于精确地拟合训练数据。那么，当独立数据(盲测试数据)在ANN中运行时，结果可能会不够理想。

其原因是，由于神经网络过于复杂，它已不再是训练数据中的泛化模式，而是变成了查表模式。在这种情况下，ANN基本上重新生成了训练中使用的输入-输出结果，根本无法准确处理数据表外的新输入数据。牢记这一点，砂岩油藏的最终ANN原油采收率模型，是用十个神经元构建与训练的。

砂岩油藏

下图展示了砂岩油藏实际采收率与ANN模型基于训练与盲测数据生成的采收率之间的对比：

砂岩油藏：

1、ANN中拥有10个非线性神经元；
2、数据集：264个油藏；
3、训练/盲测：218/46；
4、在训练阶段，随机抽取训练数据集的20%来进行验证检查；
5、盲测数据集不是训练的一部分；
6、代表正负25%变化的线。

ANN原油采收率模型经过训练之后，70％ANN计算出的采收率在实际值的正负 25％之内。

所得到的代表砂岩油藏ANN原油采收率模型的方程如下图所示。砂岩油藏ANN原油采收率模型中使用的参数范围如下：

1、原始储量：10至55000百万桶；
2、渗透率：0.6至7000毫达西；
3、产层厚度：10至1800英尺；
4、孔隙度：5%至35%；
5、原油比重：15°至55°API；
6、原油粘度：0.1cp至88cp。

使用上述范围之外的输入数据，可能会产生不合理的结果。谨记，即使使用上述范围内的输入数据，也可能产生不合理的结果，因为没有足够的训练数据，来涵盖所有可能的输入组合。

举例分析ANN原油采收率模型的计算。某个砂岩水驱油藏拥有以下性质：原始储量为一千万桶；渗透率为10毫达西；产层厚度为50英尺；孔隙度为20%；原油API为35°；粘度为0.13cp，计算得到的采收率为34.5%。

若使用ANN原油采收率模型时，缺失了相关参数数据，则可以使用缺失参数的平均值来进行有效的模拟。对于砂岩油藏数据集，平均值如下：Log(原始储量)为2.6524、Log(渗透率)为4.3573、Log(粘度)为0.2167、Log(孔隙度)为3.2745、原油API比重为33.5。如果缺少多项参数，则不应使用ANN原油采收率模型。

结论

本文建立了砂岩油藏采收率的人工神经网络模型。该模型70%的预测结果都是在实际采收率的正负25%变化范围内。

尽管看似合理，但结果仍然存在很大差异。这种差异性来源于采收率的所有相关因素，也是输入数据的可变性以及取平均值带来的后果。

与所有采收率计算方法的情况一致，应谨慎使用ANN原油采收率模型生成的结果，并且需要与其他技术进行核对。

利用人工神经网络来构建预测采收率的通用模型，ANN原油采收率模型迈出了第一步。下一步是增加训练数据集的大小，并进一步细化训练数据。

For English, Please click here (展开/收缩)

Neural network-derived model accurately predicts oil recovery in water-drive reservoirs

The significant unknown when building ANN’s is the number of hidden layers and the number of neurons to include in the model. The generally accepted procedure is to make the ANN as simple as possible. For the majority of problems, one hidden layer is sufficient. With well-behaved data, it may be possible to minimize the neurons so that they are less than or equal to the number of inputs to the model—5 in the oil recovery factor case.

Unfortunately, it was found that when using a small number of neurons (<=5) the resulting ANN could not handle the recovery factor problem. Specifically, with a lower neuron count, the neural network was unable to model the high and low recovery factor trends, Fig. 4.

Since it appeared that the neural network had trouble finding an optimum solution that would cover the entire range of recovery factors for low neuron counts, multiple ANN models were built and evaluated. The objective was to find an optimum number of neurons that would yield a neural network that would generate values that were optimized over the entire range of expected recovery factors.

Figure 5 presents the slope and R2 correlation factor (actual RF vs ANN RF) for ANN’s built using different numbers of neurons. It appears that maximum accuracy of the ANN-calculated recovery factors occurs when the neural network contains ten neurons.

One caveat that should be kept in mind when considering the number of neurons to include is the potential of over-fitting the data. Neural network training is an exercise in fitting data to a complex mathematical function. If the function is made to match the training data too precisely by using a large number of neurons, then when independent data (blind test data) are run through the network, there is a risk that the results may not be acceptable.

The reason for this is that rather than generalizing patterns in the training data, the network has become a look-up table, because the neural network is too complex. In this situation, the ANN has essentially regenerated the input-output results used in training and cannot accurately process new inputs that are not within the look-up table. Keeping the caveat in mind, the final ANN Oil RF Model for sandstone reservoirs was built and trained with ten neurons.

SANDSTONE RESERVOIRS

Figure 6 presents the actual recovery factors compared to the ANN-generated recovery factors for both the training and blind test data sets for sandstone reservoirs:

Sandstone reservoirs
10 non-linear neurons in ANN
Dataset: 264 reservoirs
Training/blind: 218/46
A random 20% of the training data set was used for validation checks during the training stage
The blind data set as not part of the training
Lines representing +/– 25% variation
Correlation coefficient is 0.59 (the outliers cause the correlation coefficient to be low).

After training the ANN Oil RF Model, 70% of the ANN calculated recovery factors were within +/–25% of the actual value.

The resulting equation that represents the ANN Oil RF Model for sandstone reservoirs is given in Fig. 7. The data ranges for the parameters used in the sandstone reservoir ANN Oil RF Model are as follows:

Oil in place: 10 MMbbl to 55,000 MMbbl
Permeability: 0.6 md to 7,000 md
Net pay: 10 ft to 1,800 ft
Porosity: 5% to 35%
Oil gravity: 15° API to 55° API
Oil viscosity: 0.1 cp to 88 cp.

Using input data outside the ranges presented above may yield unreasonable results. It should be kept in mind that even using input data within the ranges presented above may still yield results that are unreasonable, since there was not enough training data to cover all possible input combinations.

As an example of an ANN Oil RF model calculation, a sandstone water drive oil reservoir containing the following: STOOIP=10,000,000 bbl; permeability=10 md; net pay=50 ft; porosity=20%; Oil API=35°, and viscosity=0.13 cp is calculated to have an ANN recovery factor of 34.5%.

If the ANN Oil RF model is going to be used with missing data, then it may be valid to use an average value for the missing parameter. For the sandstone reservoir data set, the averages are as follows: Log(STOOIP) = 2.6524, Log(kh) = 4.3573, Log(viscosity) = 0.2167, Log(phi-h) = 3.2745 and Oil API = 33.5. If more than one of the parameters is missing, the ANN Oil RF Model should not be used.

CONCLUSIONS

An artificial neural network was built to generate recovery factors for sandstone oil reservoirs. The resulting model predicted the actual recovery factors within +/–25% for 70% of the data.

Although the results appear to be reasonable, there is still large scatter in the results. This is common for all recovery factor correlations and is the result of the variability, and averaging, of the input data.

As is the case for all recovery factor calculations, the results generated by the ANN Oil RF Model should be used with caution and checked against other techniques.

The ANN Oil RF Model represents the first step in generating a generic oil recovery factor model using artificial neural networks. The next step is to increase the size of the training data set and to further refine the training data.

未经允许，不得转载本站任何文章：

采收率算不准？来试试人工神经网络模型(下)

延伸阅读：

相关推荐