Information on research I have conducted.
I designed, implemented, and evaluated a Java framework for performance prototyping, testing, and documentation for my thesis. The framework allows system designers to rapidly prototype various implementation options, write a set of performance tests, then compare the runtime performance of each prototype for various inputs. The core value of this idea is to allow system designers to compare performance trade offs much earlier in the development lifecycle.
I started with an existing Java implementation of the C/NC-value method (a technique for automated term recognition) created by Mike Klaas. This implementation was limited to corpora of about 22 MB in size due to the memory required to track the relationship between terms. I implemented an optimized version of the algorithm in C++ that was able to handle corpora about 20 times larger. I also implemented an automated acronym definition application that was reasonably accurate.
While I didn't have time to implement or test the results, near the end of the project I proposed a way to isolate sections of the lattice structure such that the cost of loading each section could be amortized across a large quantity of input data. This discovery would allow us to make the algorithm fully external and scalable to any sized data set; an exciting, but incomplete result.
I designed and implemented a system for sharing and organizing documents amongst multiple users. In the course of implementing this system, we compared the effectiveness of three organizational systems: keywords, directories, and tags. The published paper describes the system implementation and the results of a small user study comparing the effectiveness of each organizational system.