Login

Proceedings

Find matching any: Reset
Add filter to result:
Use Of Quality And Quantity Information Towards Evaluating The Importance Of Independent Variables In Yield Prediction
E. Momsen, J. Xu, D. W. Franzen, J. F. Nowatzki, K. Farahmand, A. M. Denton
North Dakota State University

Yield predictions based on remotely sensed data are not always accurate.  Adding meteorological and other data can help, but may also result in over-fitting.  Working with American Crystal Sugar, we were able to demonstrate that the relevance of independent variables can be tested much more reliably when not only yield but also quality attributes are known, such as the sugar content and the sugar lost to molasses for sugarbeets.  

 

The problem of potentially over-fitting the data when working with a large number of independent variables is known as the curse of dimensionality.  We show that the over-fitting problem can be effectively countered by increasing the number of dependent variables.  An increased dimensionality on the side of dependent variables avoids that an independent variable may be considered relevant because similar values accidentally result in similar values of the dependent variable.  When multiple dependent variables are used, it becomes much less likely to see accidental matches in the combination of all of their values. We use those independent attributes in predictions that affect the combination of dependent variables, and find that their predictive value is indeed higher than that of variables that are derived by conventional techniques that only consider yield.  

 

In agricultural data, it has become increasingly common to not only collect the weight of the harvested crop but also quality information.  For sugarbeets, it is common to consider the sugar content and the sugar lost to molasses in addition to yield.  Together, these three variables allow determining much more precisely which independent variables affect plant growth than yield would alone.  We show that such reasoning can be very effectively applied to the problem of how to preprocess massively available data such as rainfall.  Using rainfall data with a finer granularity than the aggregate over the full year can clearly hold benefits.  In the absence of quantitative techniques, researchers and crop consultants have to decide how to preprocess rainfall data based on educated guessing alone.  We provide a computational approach to answering such questions quantitatively.  We show that the resulting recommendations can be confirmed by considering how accurately aggregate yield data can be predicted across different years.  We furthermore consider the problem of identifying subsets of data that correspond to similar growth condition.  For this purpose, we evaluate how binary data, like information on the presence or absence of a particular soil type, affect the combination of yield, sugar, and sugar lost to molasses.  We find that creating subsets based on those variables that have the strongest effect improves the accuracy of resulting regression models. 

 
Keyword: Quality data, yield prediction, variable selection