I have data on a spell duration which I would like to analyze with the Cox PH model. The defining feature of the data is the year (or date to be exact) of when the spell started. Some spells have already ended, some will never end. The Kaplan Meier estimates for the data shows the following:
The Kaplan Meier estimates suggest that there is a "fixed component" that has increased significantly from the early years. I believe that this "fixed component" messes up my analysis. The problem is that the data seems to violate some of the Cox PH assumptions. The following figure shows the Cox PH survivor functions (-----) relative to the Kaplan Meier estimates. The Cox PH survivor functions have been adjusted with the year dummies (and some other dummies).
The figure suggests, in my opinion, that something is wrong.
However, if I subtract, and latter add, the "fixed component" of each year, I get a much much better match between the between Cox PH survivor functions and the Kaplan Meier estimates:
The problem is that the choice of "fixed components" is completely arbitrary.
I would very much welcome any ideas of how to make the analysis more robust.
My ultimate goal is to estimate the survivor function for spells that started in 2010.