Abstract:
It is clear that code reuse is important task in software development and maintenance. The problem in code reuse is, after download all relevant code, we need to identify most relevant code among those. In this paper we use keyword search with n-gram NLP technique using GitHub Application Program Interface (API). Before search the source code, we retrieve all Repository name in GitHub belongs to particular programing language (JAVA, C++, etc.), as well as we retrieve all .java file name if we search java libraries using GitHub API. Then we compare our keyword with this list, if the keyword is extracted from Software architecture is connected word, then we will split using Apache Camel Splitter. If the particular keyword related to any project, we download the project. Otherwise using WordNet, we use synonym and repeat above process again. For further relevancy, we will use a speech recognition technique (Dynamic Time Warping (DTW)) and a NLP technique (Part of Speech Tagging (POS)). Because of this is a part of the whole research, in this paper we will consider only GitHub API use.