Interrupted

Unordered thoughts about programming, engineering and dealing with the people in the process.

How Long Waiting for an Answer in StackOverflow

| Comments

I’m not a StackOverflow active contributor, something I recently decided should start to change.

I think it’s amazing the speed an answer is given for any asked question, like freaking fast. If you are using Google Reader to peek new questions filtered by tag, when you see a question, almost for sure it’s already answered.

Fortunately all StackExchange data is open, so we can see exactly how fast is that. I used the online data browser, more than enough for the task.

I decided to consider only the questions having an accepted answer, since questions with many bogus answers should not be treated as having an answer at all.

tl;dr

The average answer time seems to be dependent on a mix of the maturity of the language and how many people is using it.

Hey, Haskell has pretty good answer times, at least considering its 33th position in the TIOBE Index.

Not all questions are the same

Of course not all questions are the same, this is from the first query I ran.

This is an unfiltered query using all the questions from year 2012, you see the average answer time is much higher than the previous chart, around 1000 minutes, looking at the data:

Language    Ans. Time         Stdev
c               934       7630.98957971267
c++            1036       7258.13498426685
clojure        1078       7485.94721484444
haskell        1199       9059.91937846459
php            1210       8588.58929278208
lua            1386       6569.08356022594
c#             1452       8875.00837073432
scala          1472      10707.9191188056
javascript     1490       9756.64151519177
java           1755      10541.6111024572
ruby           2124      11850.4353701107

The standard deviation is huge, we have a lot of questions that took ages to get answered, making the average answer time meaningless.

So I decided to take out questions with an answer time greater than 24 hours, as 92% of the questions have an approved answer in less than 5 hours. (here you can see the query used to get this table)

DifficultyGroup    Total     Average              StandardDev
Easy                47099        27           44.7263057563449 
Medium                344       691          339.312936469053
Hard                 1926      3769         2004.75027395979
Hell                 1623     66865        96822.8840748525

It started to look like something:

This is the query.

You see there, PHP running at front with 68 minutes average accepted answer time, either it’s too easy or there’re too many of them.

If you wanna see how the distribution goes when considering accepted answers in less than 5 hours, is the first picture of the page, the trend is also there.

What about the time?

Something unexpected, average answer time is almost unaffected by the time of the day the question was asked. The only thing I see here is that Ruby programmers are being killed by the lunch break and c++ programmers slowly fade out with the day, ain’t it?

This is the query.

There goes my idea of catching unanswered questions at night. It would be interesting to see how many cross-timezone answering is happening.

Conclusion

It should work better running a regression against the complete dataset using more features than only programming language and time of day to automatically guess which questions have more chance of have a long life unanswered. Maybe next time.

Follow me on Twitter

Comments