Abstract:
Breast cancer is one of the dangerous and common invasive cancers that forms in the cells
of human breasts. Normally, it starts to develop in the three main parts of the breast called
ducts, lobules, and connective tissue when the growth of cells in these parts becomes out of
control. It is more common in women and almost one out of eight women can be diagnosed
with breast cancer during their life time. Its early and correctly detection is crucial as it will
be associated with more available treatment options and will increase chance of survival.
However, the number of specialists and experts that could correctly diagnose breast cancer
is limited and also those medical practitioner who are newly employed may have insufficient
knowledge about breast cancer diagnosis.
Biopsy is the only diagnostic procedure for breast cancer detection that can determine whether
a tumour in the breast is cancerous or not cancerous. However, in real world, it is not always
the case that a pathologist who is mainly concerned with breast cancer diagnosis through
biopsy will always interpret the result in the right way. Research has found that overall
75.3% percents of biopsy diagnoses are correctly interpreted.
This research study was conducted to extract a set of understandable and detailed rules from
the breast cancer related data and also to develop a biopsy diagnostic system that could help
pathologists in accurate detection of breast cancer disease. Four classification data mining
algorithms namely boosted C5.0, C4.5, JRipper, and PART were applied onWisconsin breast
cancer dataset. The highest performance accuracy of 99.31% was obtained through boosted
C5.0 algorithm. Beside this, a total of 89 biopsy diagnostic rules were extracted from Wisconsin
breast cancer dataset using boosted C5.0 algorithm. Furthermore, a web-based biopsy
diagnostic system was developed based on boosted C5.0 biopsy diagnostic model. R statistical
and programming language was used for both models and biopsy diagnostic system
development.