We see two avenues for future work. First, we believe we can add value by synthesizing our analysis into set of tools that help prosper members make better decisions, and help prosper.com achieve a better understanding (and, perhaps, a smoother functioning) of its marketplace.
Second, there are specific areas, listed below, in which we hope to refine our models and methodology.
Feature selection: The textual features we extracted from loan descriptions were heavily influenced by the templates that prosper provides for descriptions of loans. Filtering out prompts from the template might yield more interesting or discriminatory words or phrases. Data from images (such as the type of thing in each image, whether the image contains the borrower, the borrower's children, or what not) could be incorporated as a feature. The evaluation function in feature selection algorithms such as the sequential forward feature search could be changed to be more robust against false negatives, to improve classification against the highly unequal priors of the unfunded-listing / fully-funded loan classes.
Classification: we hope to further refine classification algorithms by considering costs of misclassification.
Bayes nets: future work includes performing optimal discretization of information variables, augmenting the Variable (Feature Vector), and evaluating other structure learning algorithms on a larger dataset.
Decision trees: avenues to explore include comparing accuracy using different node split algorithms, such as entropy. We might also use decision trees as a tool to help borrowers set a more appropriate loan amount or maximum interest rate.
HMMs: with hidden markov models, we might consider the 2 N configuration with starting states that are not limited to state one. We might also further explore the accuracy of HMMs if used by prosper to flag loans in danger of defaulting.
Human-aided classification: we hope to further explore the mechanical turk as a tool to extract human judgments and to better understand how lenders assess borrower profile beyond numerical features. There are a number of ways to improve experimental design, such as pairing side-by-side photos and asking for a relative assessment of trustworthiness, or by including more contextual information such as listing description.
We also hope to compare the overall performance and advantages of each of the models.
Finally, prosper.com's i wealth of data suggests a number of new areas to explore. It would be interesting to consider the dynamics of lender decision-making by studying bid count buildup, or the social relationships suggested by longitudinal patterns of bids between members of close-knit networks. We might also look at the profiles of borrowers who are successful in the prosper.com marketplace to draw conclusions about the prosper model's advantages and disadvantages compared to traditional lending institutions.