I've been pretty quiet lately with the blogging. The main reason is that I've been working certain parts of my body off as I try to implement a new indexing architecture for the CDT. There is a lot of good news and a little bad news with this project. The good news is that I can now index Mozilla in 14 minutes on my laptop! In CDT 3.0, that took around 50 minutes, and improvement of around 75%. As well, as you change files, you hardly notice the indexer running were as it could take up to 12 seconds to deal with the change in 3.0. I almost fell over when I got the first timing at 14. Followed shortly by a dance of joy.
How did I do it? Well I took a hint from the precompiled header feature that most compilers are starting to support. As I'm indexing, and potentially other parse activities as well, I skip over header files that I have already parsed previously and get the symbol information from the index. This required building a more structured database for the index as opposed to the string based flat table in 3.0. It turns out to be much faster since parsing C and especially C++ is a lot slower than the database lookup. This is why incremental times are so fast. I just didn't realize the whole reindex operation would be so fast as well (my target was 20 minutes for Mozilla).
The bad news, is that while it is incredibly faster, it does suffer from being young. There is less captured in the index than there was in 3.0, for Mozilla about 20% less symbols. So searching for certain things aren't going to get you everything you were looking for. But I have been able to capture the high runners. More bad news, is that we are getting spurious StackOverflow errors because not all information is in the index and some of the algorithms we have for symbol resolution weren't prepared for that. So as a result, the new index is only used for Search actions where we can recover gracefully and not for content assist and open declaration.
But back to the good news, as we work more on improving the contents of the index I'll be able to direct all parser operations to it and make the CDT much more responsive for all operations (including my baby - content assist). And even as it is today, there is enough information there for the majority of workflows. Even the field engineers at QNX are extremely happy with it and these are the front line guys who need to make sure their customers are happy. More good news is that I'm getting more help with the indexer, both testing and coding. It's tough to do this as a one man show and I am appreciating all the help I'm getting from the community.
With the new indexing framework in place in CDT 3.1, the opportunities for exciting new features is wide open. And one of the major objections to using the CDT on large complex projects has been eased greatly. It's time to get the message out, now that I can lift my head away from the code!
Remember to eat and shower ;)
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteCongratulations!
ReplyDeleteI am mentally doing a happy dance over this as well. This was one of the pain points I had with using cdt up till now as I watched it chug through an existing project for 50 minutes and then do it again as I made changes. I thought, "Awww I'll just come back around version 4.0 when this gets worked out".
It's exciting to see these things happen. I myself would like to help out but must resign myself to the role of a lurker on the mailing lists for now until I understand the underlying code a bit better. :)
Ah, I mis-spoke. It wasn't between changes that the indexer had to do it's work for another 50 minutes, it was with every restart! After bearing the pain of the initial indexing, which now that I think about it took a couple hours with certain projects, I sure didn't want to close my environment or anything like that, because I knew what awaited me if I did :)
ReplyDeleteThat doesn't happen to be fixed with cdt 3.1, does it?
Doug,
ReplyDeleteI'm an avid reader of your blog (and in a lesser degree the cdt-dev list) and all I can say is: keep up the excellent work.
Together with the Subclipse plugin CDT will RULE THE WORLD! Or at least make my development life so much smoother :)
Is there a paper or a brief description how the indexer works?
ReplyDeleteWhich are the main classes in CVS?
I would like to port it to another Eclipse plugin.
Hi, I just wanted to comment on the speed versus accuracy issue.
ReplyDeleteI'm not sensitive to speed-of-editing issues, so most editors are pretty equivalent in my view.
I'm very concerned with speed (read: convenience) of code browsing while I'm maintaining code.
I'm even more concerned with accuracy (read: completeness) of searches (in Eclipse's case read: indexing) while I'm maintaining code, because I need to have confidence in the impact analyses that the searches enable. As in, "If I tweak this variable here, where else in this product could that impact something?"
My interpretation of the blog is that speed has been greatly improved, though accuracy is admittedly less. If I have read aright, that means that when using Eclipse CDT to do code maintenance impact analysis, I'll have to back the Searches with greps directly on the codebase filesystem hierarchy to make sure i'm not missing anything. Frankly, I'd rather have the emphasis put on accuracy and fine-tuning the language-sensitive Search features for C/C++. (For example, I seem to recall it not handling #define macro names very well a few months ago.)
That said, I love Eclipse and the idea that everyone will just provide plug-ins so the era of multiple, vendor-specific IDEs and their limited lifetimes will fade away (and I say this as a loyal CodeWright buyer who watched Borland buy it and bury it).
Thanks for working on a valuable product, more power to you.
Great feedback, John. We started the CDT indexing with that approach, to be 100% accurate. However the cost in performance to get there just proved to be too high for many people and we even found that some people would refuse to use the CDT at all because of it.
ReplyDeleteThe good news is that, now that we have an architecture built for speed, we can go back and work the accuracy back towards 100%. And with the Full indexer still in tow, you will still have that option to pay the cost to get the accuracy that you may need.
Excellent work! I can't wait to use it. Thank you and the rest of the CDT team!
ReplyDeleteI want to use eclipse and i want my team to use it, but when I use standard make, the indexer pretty much doesn't work, maybe because I don't set the include paths in standard make. Hopefully there could be improvements in this area. Also, managed make is not accurate (complete). It's a great product, hope this can be done so that more people will like it.
ReplyDelete