Statistical Analysis and Modeling of Factors Influencing Lung Cancer l^smv of mmrwum^ ]?\ Final Report The Dissertation submitted for the Degree of - O S " MASTER OF SCIENCE Department of Mathematics Faculty of Engineering Sri Lanka February, 2005 University of Moratuwa dz^r' University of Moratuwa 02/8029 A. N. Ekanayake Master of Science (Full Time) 83467 The 8J4G7 DECLARATION I certify that the d issertat ion entit led "STATISTICAL ANALYSIS AND M O D E L I N G O F F A C T O R S INFLUENCING LUNfir C A N C E R S IN S R I LANKA" i s entirely m y o w n work. It h a s not b e e n accepted for any degree a n d it i s not be ing s u b m i t t e d for a n y other degree. C a n d i d a t e A.N. E k a n a y a k e Signature":.. .S^Z^gLrr. Date :..Q).MM... S u p e r v i s o r Dr. M. Indral ingam Signature : Date ABSTRACT Stat i s t ics s h o w that l u n g cancer o c c u p i e s the third pos i t ion a m o n g the inc idence ra tes of c a n c e r s in Sri Lankan m a l e s a n d th i s rate i s increas ing yearly. Thi s research i s f ocused on two m a i n areas . T h e s e are to find factors a s s o c i a t e d wi th l u n g c a n c e r s a n d s t u d y o n t ime to dea th after detect ion of a l u n g cancer , k n o w n a s the survival t ime. i. D a t a col lect ion w a s d o n e at Cancer Inst i tute , *Maharagama (CIM) w h i c h is the largest hospi ta l for treatment for the d i s e a s e in Sri Lanka. Three s o u r c e s of d a t a have he lped in th i s research s tudy . First one w a s d a t a in s u m m a r y format at the CIM. S e c o n d w a s file be longs to e a c h of the pat i ents . Third w a s the pat ient 's detail form, w h i c h i s filled by a pat ient . All together two h u n d r e d a n d s ixty two l u n g cancer pat i ents have c o m e to CIM, in the s t u d y period from 1 s t J a n u a r y to 3 1 s t D e c e m b e r 2 0 0 2 . F indings of th i s research are a s follows. S m o k i n g i s the m a i n risk factor for l u n g cancers . People w h o do o c c u p a t i o n s in areas uncovered for pol luted air have h igh risk for l u n g cancer . There i s a genet ic effect for l u n g cancer . C o n s u m i n g alcohol a n d c h e w i n g betel are a l so cons iderable factors for l u n g cancer . Having Tubercu los i s i s a l so risk factor for l u n g cancer . A m o n g four t y p e s of l u n g cancer viz.; Adenocarc inoma, S q u a m o u s cell carc inoma, Smal l cell carc inoma a n d Large cell carc inoma, the m o s t c o m m o n types in Sri Lanka are Adenocarc inoma a n d S q u a m o u s cell carc inoma. Age, sex , religion a n d s m o k i n g habi t of the pat ient have h igh re lat ionship with t h o s e two types of l u n g cancer . A male p e r s o n wi th age greater t h a n 4 8 years h a v i n g s m o k i n g habi t i s more suscept ib le to S q u a m o u s cell carc inoma t h a n for Adenocarc inoma. This research s h o w s that the m e a n survival t ime of Tung cancer pat ient i s approximate ly 6 m o n t h s . Treatment given at Cancer Inst i tute , Stage of d iagnos i s a n d s e x of the pat ient affect survival t ime. Treatment mixture r e d u c e s risk of dea th by half compared to s ingle treatment . Our research s h o w s that of a pat ient i s d iagnosed for a l u n g cancer in ex tended s tage , h e / s h e h a s e leven t i m e s more in risk of death t h a n a pat ient wi th local ized s tage . Risk of dea th for m a l e s i s three t i m e s more t h a n females . ii ACKNOWLEDGEMENT It i s wi th greatest respect a n d venerat ion thajt I e x p r e s s m y s incere t h a n k s to x m y m a i n supervisor Dr. M Indralingarrf.'ihe Coordinator-of the pos tgraduate s t u d e n t s of D e p a r t m e n t of Mathemat ic s , 'University of Moratuwa. Also m y s incere t h a n k s to Proff. G.T.F. De Silva, Dr. R. b»kupit iya a n d Mr. T.M.J.A. Cooray, Senior Lectureres of Depar tment of Mathemat ic s , Universi ty of Moratuwa a n d Mrs. C.P.N. Attygalle, Inst i tute of Technology, University of Moratuwa. I have to appreciate the a s s i s t a n c e of Dr. Roshini Sooriarachchi , Senior Lecturer of D e p a r t m e n t of Stat i s t ics , University of Colombo. My special t h a n k s for Dr. Murali Val l ipuranathan, Ministry of Health a n d Dr.R.V.Rabel, Medical Officer, University of Moratuwa w h o give m e k ind support in Medical field. Without the knowledge , advice a n d vas t experience , w h i c h w a s imparted to m e by m y superv isors I m a y not have b e e n able to comple te the project success fu l ly . I w o u l d like to take th i s opportunity to t h a n k special ly for Proff (Mrs) N. Rathnayake , Director of Post Graduate un i t of University of Moratuwa a n d ADB for grant ing m e th i s s cho larsh ip to c o n d u c t th i s research a n d the h e a d of the D e p a r t m e n t a n d all the staff m e m b e r s a n d n o n academic staff of the D e p a r t m e n t of Mathemat i c s , University of Moratuwa. It i s m y obl igation to t h a n k all m y fr iends for their support . * I w o u l d a l so like to t h a n k Dr. Yasantha- Ariyarathna, Director, Cancer Inst i tute , Dr.(Mrs) Nirmala Gammanpi la , a n d the staff of the record room a n d director's office of the Cancer Inst i tute , M a h a r a g a m a for providing m e relevant information a n d d a t a to carry ou t th i s research . The a s s i s t a n c e given by Dr. T h u s h a r a Fernando a n d the staff m e m b e r s of P lanning divis ion, Ministry of Health i s gratefully acknowledged . Also a special t h a n k s h a s to be given for the Medical Stat i s t ics Unit of C e n s u s a n d Stat i s t i cs Department . i. • ' . . . ' I gratefully acknowledge the support a n d e n c o u r a g e m e n t given by m y loving p a r e n t s a n d m y sister . Finally spec ia l t h a n k are d u e to m y loving h u s b a n d for h i s va luable advice, e n c o u r a g e m e n t in all m y endeavors in th i s research. iii CONTENTS % Page no. DECLARATION V- ' i ABSTRACT ii ACKNOWLEDGEMENT iii CONTENTS iv LIST OF TABLES - « vii LIST OF FIGURE ix DEDICATION xi 1. INTRODUCTION 1.1 B a c k g r o u n d v 1 1.2 What i s a l u n g cancer? 1 1.2.1 Introduct ion . ± 1.2.2 U n d e r s t a n d i n g the cancer p r o c e s s 1 1.2.3 Risk factors of l u n g cancer 2 1.2.4 Types of l u n g cancer 4 1.2.5 S y m p t o m s a n d s i g n s of l u n g cancer 4 1.2.6 D iagnos ing l u n g cancer 5 1.2.7 Stage explanat ion '] 5 1.2.8 Treatment for l u n g cancer • 6 1.3 Objectives of the s t u d y 9 2 . LITERATURE REVIEW 2.1 S t u d i e s carried ou t in Sri Lanka 10 2 .2 S t u d i e s carried out in other countr ies 14 3 . METHODOLOGY AND TECHNIQUES" U S E D 3 .1 Methodology 2 7 3 . 1 . 1 Literature survey v ' . 2 7 3 . 1 . 2 S t u d y area a n d s t u d y populat ion 2 8 iv % 3 . 1 . 3 D a t a col lect ion 3 . 2 T e c h n i q u e s u s e d 3 . 2 . 1 Descript ive m e t h o d s 3 . 2 . 2 Univariate m e t h o d s 3 . 2 . 3 Model fitting 3 . 2 . 4 Survival data ana lys i s 3 . 2 . 5 Special software u s e d in analyz ing 4 . DESCRIPTIVE STUDY ON LUNG CANCER.PATIENTS 4 .1 Personal deta i l s of l u n g cancer pat i ent s in Sri Lanka 4 7 4 .2 General information o n Lung cancer pat i ents 5 4 4 . 3 Habi t s of Lung Cancer Pat ients 5 8 4 . 4 Clinical Fea tures of Lung Cancer Pat ients 6 0 4 . 5 Identify the c o m m o n l u n g c a n c e r s in Sri Lanka according to cell t ypes 6 2 4 . 6 S u m m a r y ' 6 7 5. LOGISTIC REGRESSION ANALYSIS TO LUNG CANCER PATIENTS 5.1 Univariate ana lys i s for determining variables affecting two c o m m o n types of l u n g cancer in Sri T*anka 6 8 5 .1 .1 Test ing the re lat ionship be tween two types of l u n g cancer wi th personal infofination 6 9 5 .1 .2 Tes t ing the re lat ionship b e t w e e n two t y p e s of l u n g cancer with personal h a b i t s 7 2 5 .2 Fitt ing logistic mode l for two types of l u n g cancer; Adenocarc inoma a n d S q u a m o u s cell carc inoma 7 3 %f 5 .2 .1 Model fitting procedure 7 3 5 .2 .2 Parameter e s t i m a t e s of the fitted mode l 7 5 5 . 2 . 3 G o o d n e s s of fit of the model 7 6 5 .3 Further areas of s t u d y 7 6 5 .4 S u m m a r y 7 7 6. SURVIVAL DATA ANALYSIS FOR LUNG CANCER PATIENTS 6 .1 S o m e Definit ions 7 8 6 .2 Descriptive Methods of Comparing Survival Time for Different Groups of Individuals 8 1 2 8 3 5 3 5 3 6 3 7 4 4 4 5 6 . 3 Log-Rank te s t to compare survival t i m e s for different g r o u p s of indiv iduals ^ 1 0 0 6 .4 Model ing survival data 102 6 .4 .1 The se lect ion of the variables to be ' inc luded in the mode l 1 0 3 6 . 4 . 2 Parameter e s t i m a t e s of the se lec ted mode l 106 6 . 5 Model D iagnos t i c s 107 6 .5 .1 Plots of the Cox-Snel l r e s idua l s 107 6 . 5 . 2 Plots of Martingale res idua l s 1 0 8 6 . 5 . 3 Plot of Dev iance re s idua l s 1 1 0 6 .6 S u m m e r y 110 7. CONCLUSIONS AND DISCUSSION 7.1 D i s c u s s i o n 112 7 .1 .1 Descript ive S t u d y on l u n g cancer 112 7 . 1 . 2 Assoc ia t ion be tween c o m m o n types of l u n g cancer viz. Adenocarc inoma a n d S q u a m o u s cell carc inoma a n d s o m e variables 1 1 3 7 . 1 . 3 Stat ist ical ana lys i s of t ime to death after detect ing l u n g cancer • - 114 7 .2 C o n c l u s i o n s of the s t u d y 1 1 5 7 .2 .1 C o n c l u s i o n s of the general f indings 115 7 . 2 . 2 C o n c l u s i o n s of analyz ing t ime to death after detect ing (survival time) a l u n g cancer 1 1 6 7 .3 R e c o m m e n d a t i o n s 117 7 .4 Problems of the s t u d y V 1 1 ? 7 .5 Limitat ions of the s t u d y - 118 7 .6 Further s t u d i e s 1 1 8 APPENDIX I - . i APPENDIX II ""• xxii REFERENCES A N D BIBLIOGRAPHY xxvii LIST OF TABLES Page no. Table 3 . 1 : Format of d a t a s u m m a r i z i n g at d a n c e r Inst i tute 3 0 Table 3 .2 : D a t a col lect ing format ' 3 4 Table 3 . 3 : Cont ingency Table of Factor A by Factor B •. 3 6 Table 4 . 1 : Frequency distr ibution of pa t i en t s district w i se 4 9 Table 4 .2 : Frequency distribution of pa t i en t s e thn ic group w i s e 5 1 Table 4 . 3 : Frequency distr ibution of religion 51 Table 4 . 4 : Frequency distr ibut ion of n o . of chi ldren to the l u n g cancer pat ient 5 2 Table 4 .5 : Frequency distr ibution of occupat ion of l u n g cancer pat ient 5 3 Table 4 .6 : Frequency distr ibution of m o n t h l y i n c o m e of l u n g cancer p a t i e n t s 5 4 Table 4 .7 : Frequency distr ibution of gett ing l u n g cancer previous ly 5 5 Table 4 .8 : Frequency distribution of g e n e effect 5 6 Table 4 .9 : Frequency distr ibut ion of type of the hospi ta l referring from 5 6 Table 4 . 1 0 : Frequency distr ibution of hav ing or h a d TB 5 7 Table 4 . 1 1 : Frequency distr ibution of years wi th s m o k i n g 5 8 Table 4 . 1 2 : Frequency distr ibution of s m o k i n g quant i ty per day 5 8 Table 4 . 1 3 : Frequency distr ibution of c o n s u m i n g alcohol 5 9 Table 4 . 1 4 : Frequency distr ibution of pa t i ent s c h e w i n g betel 5 9 Table 4 . 1 5 : Frequency distr ibut ion of laterality , 6 0 Table 4 . 1 6 : Frequency distr ibution of d iagnost ic s t a t u s 6 0 Table 4 . 1 7 : Frequency distr ibution of d iagnost ic ev idence 6 1 Table 4 . 1 8 : Frequency distr ibut ion of s tage of d iagnos i s 6 1 Table 4 . 1 9 : Frequency distr ibution of t rea tment given at cancer ins t i tute 6 2 Table 4 . 2 0 : Frequency distr ibution according to cell type of l u n g cancer 6 2 Table 5 . 1 : Assoc ia ted demographic variables wi th two types of l u n g cancer 6 9 Table 5 .2 : Assoc ia ted family background variables with two types of l u n g cancer '* 7 1 Table 5 .3 : Assoc ia ted personal h a b i t s wi th two types of l u n g cancer 7 2 Table 5 .4: Factors u s e d for fitting logist ic model 7 3 Table 5 .5: Deta i l s of adding m a i n effects to nul l mode l 7 4 vii Table 5 .6: Deta i l s of add ing m a i n effects to mode l conta in ing s m o k i n g s t a t u s 7 5 Table 5.7: Parameter e s t i m a t e s of the fitted mode l 7 5 Table 6 . 1 : Median survival t i m e s for type of the l u n g cancer 8 2 Table 6 .2: Median survival t i m e s for s tage of d iagnos i s 8 4 Table 6 .3 : Median survival t i m e s for t rea tment given at Cancer Inst i tute 8 5 Table 6 .4 : Median survival t imes by s e x of the pat ient 8 7 Table 6 .5: Median survival t imes by age 8 8 Table 6 .6: Median survival t i m e s for ethnic i ty 9 0 Table 6 .7: Median survival t i m e s for religion 9 1 Table 6 .8: Median survival t i m e s for s m o k i n g habit of the pat ient 9 3 Table 6 .9: Median survival t i m e s for c h e w i n g betel habi t of the pat ient 9 4 Table 6 .10 : Median survival t i m e s for c o n s u m i n g alcohol hab i t of the pat ient 9 6 Table 6 . 1 1 : Median survival t i m e s for genet ic effect of the pat ient 9 7 Table 6 . 1 2 : Median survival t i m e s for present ing Tubercu los i s of the pat ient 9 9 Table 6 . 1 3 : Log-Rank te s t r e su l t s for variables cons ider ing in survival ana lys i s 1 0 0 Table 6 .14 : Resu l t ing -21ogL a n d degrees of freedom o n fitting e a c h variable separate ly 1 0 3 Table 6 .15 : Resul t ing -21ogL a n d degrees of freedom o n adding e a c h term to the mode l adjus ted for t rea tment given 104 Table 6 .16 : Resul t ing -21ogL a n d degrees of freedom on add ing e a c h term to the mode l adjus ted for t rea tment given a n d s tage of d iagnos i s 104 Table 6 .17: Resu l t ing -21ogL a n d degrees of freed*om o n adding e a c h term to the mode l adjus ted for treatment given, s tage of d iagnos i s a n d s e x of the pat ient 1 0 5 Table 6 .18 : Parameter e s t i m a t e s of the selected'nuJdel 106 viii LIST OF FIGURES Page no. Figure 4 . 1 : Figure 4 . 2 Figure 4 . 3 Figure 4 . 4 Figure 4 . 5 Figure 4 .6 Figure 4 .7 Figure 4 . 8 Figure 4 . 9 Age distr ibut ion of l u n g cancer pat i ents 4 7 Lung cancer pat i ent s by s e x . • 4 8 Age distr ibut ion of l u n g cancer pat i ent s by s e x 4 8 District distr ibut ion of l u n g cancer *'» 5 0 Lung cancer pa t i en t s by their marital s ta te . 5 2 Lung c a n c e r pat i ents by their living s t a t u s 5 5 Lung cancer pat i ent s by transferred hospi ta l 5 7 Lung cancer pat i ent s by s m o k i n g s t a t u s 5 8 Lung cancer pat i ent s by cell type 6 2 Figure 4 . 1 0 : Age distr ibut ion in two types ; Adenocarc inoma a n d S q u a m o u s cell carc inoma 6 4 Figure 4 . 1 1 : Age distr ibut ion of two types of l u n g c a n c e r s by s e x 6 4 Figure 4 . 1 2 : C o m m o n two types of l u n g cancer pat i ent s by s e x 6 5 Figure 4 . 1 3 : Two types of l u n g cancer pat i ents by their s m o k i n g s t a t u s 6 6 Figure 6 . 1 : Kaplan-Meier E s t i m a t e s of Survivor funct ions for type of the l u n g cancer 8 2 Figure 6 .2 : Log-Cumulat ive Hazard plot for type of l u n g cancer pa t i en t s 8 3 Figure 6 .3 : Kaplan-Meier E s t i m a t e s of Survivor func t ions for s tage of d iagnos i s 8 4 Figure 6 .4 : Log-Cumulat ive Hazard plot for s tage of d iagnos i s of l u n g cancer pat i ent s ^ 8 5 Figure 6 .5: Kaplan-Meier E s t i m a t e s of Survivor func t ions for treatment given . 8 6 Figure 6 .6 : Log-Cumulat ive Hazard plot for treatment given at Cancer Inst i tute 8 6 Figure 6.7: Kaplan-Meier E s t i m a t e s of Survivor funct ions by s e x of the pat ient 8 7 Figure 6 .8: Log-Cumulat ive Hazard plot by s e x of the pat ient 8 8 Figure 6 .9: Kaplan-Meier E s t i m a t e s of Survivor funct ions by age of the pat ient 8 9 Figure 6 . 1 0 : Log-Cumulat ive Hazard plot by age of the pat ient 8 9 Figure 6 . 1 1 : Kaplan-Meier E s t i m a t e s of Survivor func t ions for ethnic i ty 9 0 Figure 6 . 1 2 : Log-Cumulat ive Hazard plot for ethnic i ty 9 1 Figure 6 . 1 3 : Kaplan-Meier E s t i m a t e s of Survivor func t ions for religion 9 2 Figure 6 . 1 4 : Log-Cumulat ive Hazard plot for religion • 9 2 Figure 6 .15 : Kaplan-Meier E s t i m a t e s of Survivor funct ions for s m o k i n g s t a t u s 9 3 Figure 6 .16 : Log-Cumulat ive Hazard plot for s m o k i n g s t a t u s 9 4 Figure 6 .17 : Kaplan-Meier E s t i m a t e s of Survivor func t ions for c h e w i n g betel * V 9 5 Figure 6 . 1 8 : Log-Cumulat ive Hazard plot for c h e w i n g betel 9 5 Figure 6 . 1 9 : Kaplan-Meier E s t i m a t e s of Survivor funct ions for c o n s u m i n g alcohol 9 6 Figure 6 .20 : Log-Cumulat ive Hazard plot for c o n s u m i n g alcohol 9 7 Figure 6 . 2 1 : Kaplan-Meier E s t i m a t e s of Survivor funct ions for genet ic effect • 9 8 Figure 6 . 2 2 : Log-Cumulat ive Hazard plot for g e n e t i c effect 9 8 Figure 6 . 2 3 : Kaplan-Meier E s t i m a t e s of Survivor •functions for present ing TB , 9 9 Figure 6 .24 : Log-Cumulat ive Hazard plot-for present ing TB 100 Figure 6 . 2 5 : Plot of Cox-Snel l r e s idua l s 107 Figure 6 . 2 6 : Log-Cumulat ive Hazard pfet of the Cox-Snel l r e s idua l s 108 Figure 6 .27 : Plot of Martingale res idua l s Vs Rank of survival t ime 1 0 8 Figure 6 . 2 9 : Plot of Martingale res idua l s Vs "treatment given 109 Figure 6 . 2 8 : Plot of Martingale re s idua l s Vs Stage of d iagnos i s 109 Figure 6 . 3 0 : Plot of Martingale re s idua l s Vs s e x of the pat ient 109 Figure 6 . 3 1 : Plot of Dev iance re s idua l s Vs Rank of survival t ime 1 1 0