I’m not a StackOverflow active contributor, something I recently decided should start to change.
I think it’s amazing the
speed an answer is given for any asked question, like freaking fast. If you are using Google Reader to peek new questions filtered by tag, when you see a question, almost for sure it’s already answered.
I decided to consider only the questions having an accepted answer, since questions with many bogus answers should not be treated as having an answer at all.
The average answer time seems to be dependent on a mix of the maturity of the language and how many people is using it.
Not all questions are the same
Of course not all questions are the same, this is from the first query I ran.
This is an unfiltered query using all the questions from year 2012, you see the average answer time is much higher than the previous chart, around 1000 minutes, looking at the data:
The standard deviation is huge, we have a lot of questions that took ages to get answered, making the average answer time meaningless.
So I decided to take out questions with an answer time greater than 24 hours, as 92% of the questions have an approved answer in less than 5 hours. (here you can see the query used to get this table)
DifficultyGroup Total Average StandardDev Easy 47099 27 44.7263057563449 Medium 344 691 339.312936469053 Hard 1926 3769 2004.75027395979 Hell 1623 66865 96822.8840748525
It started to look like something:
This is the query.
You see there, PHP running at front with 68 minutes average accepted answer time, either it’s too easy or there’re too many of them.
If you wanna see how the distribution goes when considering accepted answers in less than 5 hours, is the first picture of the page, the trend is also there.
What about the time?
Something unexpected, average answer time is almost unaffected by the time of the day the question was asked. The only thing I see here is that Ruby programmers are being killed by the lunch break and c++ programmers slowly fade out with the day, ain’t it?
This is the query.
There goes my idea of catching unanswered questions at night. It would be interesting to see how many cross-timezone answering is happening.
It should work better running a regression against the complete dataset using more features than only programming language and time of day to automatically guess which questions have more chance of have a long life unanswered. Maybe next time.
Follow me on Twitter