I have a map of presence - absence (raster 1 or 0). The area where the phenomenon is present is much smaller in area than where it is absent.
What I would like to do is run a binomial regression using multiple independent variables and presence absence as the dependent variable. To do this I need to sample the whole area. My question is about sample sizes. I have run an algorithm to give me a representative sample size (9000 points for the area is statistically robust). However I am struggling to figure out how these points should be distributed. I just took 9000 random points over the whole area at first, however this meant that only 70 points fell within the 'presence' area. When I subsequently run regression analysis, it doesn't seem like there is enough data in the presence area to really get a handle on relationships.
So should I instead weight the points between the two distributions by area, or should I take 9000 points from the presence area, and 9000 points from the absence area, thus getting a representative sample of each area?