Poverty Outcome Generation from Survey Data#
Set up the configuration file and load some survey data (see standardized data formats for file schema).
datastore = DataStore('/path/to/config_file.yml')
outcomes_generator = SurveyOutcomesGenerator(datastore=datastore, clean_folders=True)
Calculate PCA asset index and proxy-means test (PMT). Only use binary and continuous columns in the asset index.
asset_index = outcomes_generator.asset_index(cols=['con1', 'con2', 'bin1', 'bin2'])
Select five components to be used in the proxy-means test using forward selection of predictors with a linear regression. Calculate a proxy-means test with these components and obtain out-of-sample PMT predictions for the training survey.
selected_cols, scores = outcomes_generator\
.select_features('consumption',
['con1', 'con2', 'cat1', 'cat2', 'bin1', 'bin2'],
5,
method=LinearRegession())
pmt = outcomes_generator.fit_pmt('consumption',
selected_cols,
model_name='linear',
winsorize=False,
scale=True)
[OUT] r2 score: 0.56
Use the trained proxy-means test on another survey dataset.
predictions = outcomes_generator.pretrained_pmt('/path/to/other/data.csv', selected_cols, 'linear')