As each day passes, the amount of space on earth available to naturally store carbon decreases. Land development efforts compete and create conflict with Net Zero goals, yet a growing population necessitates more housing and more energy. The planet's future depends on sustainable and climate-smart land management that balances growth with ecosystem resilience. Carbon emissions, food security, timber supply, water management, clean air, and biodiversity all depend on protecting land with valuable natural capital from increasing development pressures. In the absence of proactive work to protect these lands, unsustainable land development may irreversibly deplete these valuable resources. Identifying the parcels of land that are most likely to be developed enables the proactive land management needed for a climate-positive future.
Development risk is defined as the desirability of a piece of land for real estate development, including commercial, industrial, and residential purposes. Such development activities can permanently erase nature-based sources of carbon storage and removal while degrading the natural capital and climate resilience of a region without proper mitigation planning. Development risk is only worth knowing before a piece of land gets developed, and can be leveraged strategically before market factors substantially affect land prices. To measure this, Upstream Carbon has developed an ensemble of machine learning models to produce a Greenfield Index. A parcel with a Greenfield Index close to 1 faces high development pressures, and a parcel with a Greenfield Index close to 0 faces low development pressures.
This model uses publicly available datasets. These datasets are combined and prepared by the Upstream Carbon team to train the machine learning models:
The highest predictors of development risk largely pertain to the location of the land. The physical attributes of a piece of land and how close a piece of land is to local amenities indicates a higher risk that this land will be developed. The strongest predictors of development risk are:
This prediction model is a blended model which uses a proprietary training dataset curated by Upstream Carbon. The team at Upstream Carbon leveraged automated machine learning to optimize across modeling approaches and ensure outcomes are risk adjusted based on customer needs. This model is updated as updated parcel data are published, which currently happens every 6 months.
Data for model training were sampled in a manner that increased the sample size of large-acre parcels (which are of greater interest for conservation planning), ensured sampling was proportional to the area of counties in which parcels were located, and distributed sampling across all counties. The target for training was set to 1 for parcels where use codes indicated development, and 0 where use codes indicated undeveloped, vacant areas. Additional care was taken to exclude already-conserved and municipal-owned land from the training set. The model was then trained using a five-fold cross validation approach with a 20% holdout, where data was partitioned to ensure that the model would be trained on parcels from one set of municipalities and validated on a separate set of municipalities. The model was trained to optimize for LogLoss, weighted to reward accurate predictions on larger parcels, producing a weighted LogLoss score of 0.583 and a weighted AUC score of 0.745 on cross validation test sets. When ranked by prediction (from 0 to 1), the highest 10% probability parcels averaged a development rate of 0.943 against a predicted development rate of 0.937; the lowest 10% probability parcels averaged a development rate of 0.329 against a predicted development rate of 0.352. Additional accuracy and performance reporting can be made available on request to matt@upstreamcarbon.com.
The current version of the model is an ensemble model of XG Boost with Early Stopping, LightGBM on ElasticNet Predictions, and Nystroem Kernel SVM Classifier algorithms. Prediction explanations displayed in the application are produced using Shapely Additive Explanations (SHAP). The predictions are then compared with other metrics of interest (e.g. Carbon, Soil Quality) to measure the amount of natural capital at risk of loss.
Upstream Carbon's goal is to arm every organization for climate-smart and sustainable land management. Development risk is just one metric, and there are always more metrics to consider. Our team is excited to evaluate modeling and analytics opportunities with our customers beyond the current scope of the platform. Requests for custom models, comparison metrics, or other improvements can be sent to matt@upstreamcarbon.com.