Show simple item record

dc.contributor.advisor Perera, I
dc.contributor.author Pirapuraj, P
dc.date.accessioned 2017
dc.date.available 2017
dc.date.issued 2017
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/16083
dc.description.abstract Today a massive amount of source code is available on the Internet and open to serve as a means for code reuse. Developers can reduce the time cost and resource cost by reusing these external open source code in their own projects. Even though a number of Code Search Engines (CSE) are available, finding the most relevant source code is often challenging. In this research, we proposed a framework that can be used to overcome the problem faced by developers in code searching and reusing. The framework starts with the software architecture design in XML format (Class Diagram), extracts information from the XML file, and then based on the extracted information, fetches relevant projects using three types of crawler from GitHub, SourceForge, and GoogleCode. We will have a huge amount of projects by downloading process using the crawlers and need to find most relevant projects among them. In this research, we particularly focus on projects developed using Java language. Each project will have a number of .java files, and all files will be represented as Abstract Syntax Trees (AST) to extract identifiers (class names, method names, and attributes name) and comments from the .java files. Then, on one hand, we will have the identifiers which are extracted from the XML file (Class diagram), and the other hand the identifiers and the action words (verbs) extracted from downloaded projects. Action words are extracted from comments using Part of Speech technique (POS). These two group of identifiers need to be analyzed for matching, if the identifiers are matched, an amount of marks will be given to these identifiers, likewise marks will be added together and then if the total marks is greater than 50%, the .java file belongs to these identifier will be suggested as relevant code. Otherwise, synonyms of the identifiers will be discovered using WordNet, and the matching process will be repeated for the synonyms. For the composite identifiers, camel case splitter is used to separate these words. If the programmers do not follow camel case naming convention, N-gram technique is used to separate these word. The Stanford Spellchecker is used to identify abbreviated words. Evaluation of our developed framework resulted in 95.25% of average accuracy of four subsystem [project downloader (100%), identifier analyzer (94%), word finder (87%), and comments analyzer (100%)] accuracy. en_US
dc.language.iso en en_US
dc.subject COMPUTER SCIENCE AND ENGINEERING-Theses en_US
dc.subject SOFTWARE ARCHITECTURE en_US
dc.subject CODE SEARCH ENGINES(CSE) en_US
dc.subject SOURCE CODES en_US
dc.subject JAVA LANGUAGE en_US
dc.subject PART OF SPEECH(POS) TAGGING en_US
dc.title Analyzing source code identifiers for code reuse en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc (Major Component Research) en_US
dc.identifier.department Department of Computer Science & Engineering en_US
dc.date.accept 2017
dc.identifier.accno TH3497 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record