I was hoping some one could please give me some guidance. I have datasets that contain details of the financial performance of different groups of individuals as well as another for those not in any group. My expectation was to see a difference in the performance of those individuals in groups relative to those not in a group due to the differing levels of support they receive. I measure performance in terms of their ability to get a loan at the lowest possible rate. The data is from a peer-to-peer lending site where individuals lend to other individuals through a reverse auction type process i.e. lenders bid down the rate at which they are willing to lend to the individual looking for the loan.
I have one independent variable which is grooup membership. The dependant variables are percentage of loan funded (by other individuals), the lender rate and the rate saving (i.e. difference between the max rate the individual was willing to pay and the rate at which the prosposed loan currently stands). I have also included some other variables that might influence the result i.e. group type (i.e. whether the group is open or invite only) and individual credit rating.
For anyone interested I have attached a summary of the datasets. My problem is that it is many years since I studied mathematics and I'm struggling to see a way forward. I wish to confirm or reject a number of hypotheses:
P1 Borrowers that are members of groups are more likely to get funded that those that are not.
P2 Borrowers that are members of groups are more likely to secure a lower interest rate than those that are not.
P3 Borrowers that are members of groups are more likely to secure a greater rate saving than those that are not.
While I can see some trends in the data, I'm not so sure they are statistically significant. I was considering doing a t-test between the average for the group individuals and the average for the non-group individuals. However, I recall that regression testing may also be of benefit in deciding whether or not various factors are significant in influencing the dependent variables. Any help or advice as to how I should proceed would be much appreciated.