Open
Description
Here's the timing of CommitCrawler:
is file binary check takes ~20%
get file content (old and new versions): ~40%
toFileHeader: ~25%
Old and new versions of a file are computed for each commit, which is an excess work, since old eventually becomes a new one. If a file content was stored, then it will make 20% faster.
Also it's worth to check whether is_binary check is any faster if we feed a file content instead of stream. I tested on a repo that has a low binary files ratio: 30 vs 2415, and it took 20% of time. I suppose binary check may load a whole file.