Abstract:
Data is the most important part in machine
learning. In bioinformatics field the sensitivity of the data
is high and due to that the accessibility of the data for a
secondary purpose (e.g.: research) is consist with many
legal and ethical issues. Due to that in many bioinformatics
researches collecting the data consume more time than the
development phase. There are some researches done to
solve the legal and ethical issues by anonymising the data
using encryption, de-identification and perturbation of
potentially identifiable attributes. For some extend those
solutions restricted the data breach but in other hand
anonymized data not performed well during the analysis
and mining tasks. Recently Generative adversarial
networks (GANs) have become a research focus of artificial
intelligence. The goal of GANs is to estimate the potential
distribution of real data samples and generate new samples
from that distribution. Here, researcher review GAN in
bioinformatics to generate data sets, presenting examples
of current research. To provide a useful and comprehensive
perspective, Researcher categorize research both by the
bioinformatics data and GAN architecture and flow.
Additionally, discussed about the issues of GAN in
bioinformatics to generate data sets and suggest future
research directions. Researcher believes that this review
will provide valuable insights for researchers to apply GAN
to generate bioinformatics data sets.