Prosper offers its data as a multi-gigabyte XML file. To be useable, the data must be imported into a relational database. We chose for simplicity to use a MySQL database. While prosper does run an open source project for importing data, ProsperAPI, it currently only supports Microsoft SQL as an output format. We used "MyProImport", third party MySQL import software on the prosper.org wiki.
Once the data was imported, we installed PHPMyAdmin on a local server to allow us to easily collaboratively query the data.
Our first step after data import was to do some initial analysis into the data available from prosper to learn some of its basic statistical properties. At the time we downloaded our data set in November 2008, prosper.com had 36 months of data, consisting of:
Prosper members represent a diversity of credit grades from AA to high risk. Early in the 36 month cycle for this data set, Prosper stopped accepting listings from people with no credit, though some earlier no credit borrowers are still listed.
General statistics were computed for the following groups of data. The columns are sorted by credit grade and contain the min, max, mean, and variance of each column. These files are good to answer the following types of questions, "What is the average interest rate of someone with a B credit score who defaulted?" (SQL queries for these results)
Amount borrowed by credit grade and loan status
Interest rates by credit grade and loan status
This plot shows the number of listings on the X axis, and the number of members with that number of listings on the Y axis:
Darker colors indicate more members.