The Data

Importing the data

Prosper offers its data as a multi-gigabyte XML file. To be useable, the data must be imported into a relational database. We chose for simplicity to use a MySQL database. While prosper does run an open source project for importing data, ProsperAPI, it currently only supports Microsoft SQL as an output format. We used "MyProImport", third party MySQL import software on the wiki.

Once the data was imported, we installed PHPMyAdmin on a local server to allow us to easily collaboratively query the data.

Descriptive Statistics

Our first step after data import was to do some initial analysis into the data available from prosper to learn some of its basic statistical properties. At the time we downloaded our data set in November 2008, had 36 months of data, consisting of:

Basic stats

Prosper members represent a diversity of credit grades from AA to high risk. Early in the 36 month cycle for this data set, Prosper stopped accepting listings from people with no credit, though some earlier no credit borrowers are still listed.

basic stats

Several variables separated by status

General statistics were computed for the following groups of data. The columns are sorted by credit grade and contain the min, max, mean, and variance of each column. These files are good to answer the following types of questions, "What is the average interest rate of someone with a B credit score who defaulted?" (SQL queries for these results)

  1. Accepted loans and Loan Status is Current: html, csv
  2. Accepted loans and Loan Status is Paid: html, csv
  3. Accepted loans and Loan Status is Curent: html, csv
  4. Accepted loans and Loan Status is Defaulted: html, csv
  5. Rejected Loans: html, csv

Amount borrowed and interest rate by credit grade and status

amount borrowed by credit grade and status

Amount borrowed by credit grade and loan status

interest rates by credit grade and status

Interest rates by credit grade and loan status

Members and listings

This plot shows the number of listings on the X axis, and the number of members with that number of listings on the Y axis:


Geographical distribution of members

Darker colors indicate more members.

geographic distribution of members