Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.
GET A 40% DISCOUNT ON YOU FIRST ORDER
You will analyse data from a prospective cohort study. The Stata dataset tuberculosis_zambia_2017-18.dta (Stata version 13) contains the following variables, all of which were collected at baseline apart from the outcome (i.e. apart from “event” and “exit_date”). Missing data are coded as “.”, for all variables.
Research questions to be addressed:
- Does smoking increase the incidence rate of tuberculosis (TB) disease among household contacts of recently diagnosed tuberculosis cases, in Zambia?
- Does the effect of smoking on tuberculosis disease incidence rates differ according to an individual’s HIV status?
- Does the effect of smoking on tuberculosis disease incidence rates differ according to time since enrolment into the study (defined as (a) the first 12 months of follow-up in the study and (b) the time period >12 months after enrolment in the study)?
In order to address the research questions the following analyses should be included:
- Univariable analysis: to identify potential risk factors for the incidence of TB disease among adult household contacts of TB cases (all characteristics given should be considered – smoking, HIV and time since enrolment in the study, gender, alcohol consumption, age, area, education, number of durable household goods etc).
Note: Time since enrolment in the study should compare the time period >12 months since enrolment in the study against the time period ≤12 months since enrolment in the study. Age is given as “age at entry” and this should be used, rather than “current age”.
- Stratification: Classical methods for stratified analysis, to obtain preliminary evidence about whether the effect of smoking is confounded or modified by any of the other potential risk factors.
- Regression analysis: building on the findings in 1. and 2., to estimate the effect of smoking on the incidence of tuberculosis disease, controlled for confounding Use either Poisson or Cox regression modelling, and justify your choice.
Stata should be used for analysis if possible (the dataset is a stata dataset). And the stata log file should be included as an appendix.
5 A4 pages of text, double spaced. Arial 12-point font should be used and margins should be at least 2 cm. Tables (maximum 4) to be included at the end of the report and are not included in the 5 A4 page length. Tables can be single spaced
The report should be structured as follows:
- Analysis strategy: 1-1.5 pages, initial examination/exploration of the data (checking for data errors/missing data – and how you treated missing and erroneous data), what the primary exposure is and what are the potential confounders/effect modifiers that you will check for, and what data analysis techniques will be used and why.
- Results summary: 2-3 pages, this should present the results of the statistical tests carried out. It can refer to tables but as only 4 tables are allowed it will not allow for all results to be presented. However, what analyses you performed should be clear in the text and the description sufficiently clear to allow for reproduction.
- Conclusions: 1 page, discussing the results and how they relate to/answer the three research questions. The impact of potential biases should also be considered here.
Additional Notes for Report Writing:
There should be more detail on the strategy you used to analyse the data than is found in most papers;
You should not explain the background to the study or the methods of data collection;
You should not cite or discuss relevant literature but you should consider the internal validity of the study (and hence your results);
You should discuss the interpretation of your results and how they relate to the questions you were asked.
You may assume that the techniques you have used (eg chi-square test, Mantel- Haenszel method for stratified analysis, confidence interval, likelihood ratio test) are understood by the reader and do not need explanation, but you do need to say which techniques you have used.
– DON’T include formulae or Greek letters
– DON’T include variable names
– DON’T write “p=0.000”
– DON’T overdo the decimal places
– DON’T include unedited computer output
– DON’T rush into regression analyses before using simple methods to explore and understand the data
– DO start with simple analyses and techniques before moving on to more complex analyses using more advanced techniques
– DO take study design into account in analyses and in presenting and discussing results
– DO make sure tables have titles
– DO make sure tables can be understood without reading text
– DO try to ensure main argument of text can be understood without reference to tables or figures
– DO make sure that you have given a clear enough description of what you have done so that the reader can reproduce any numbers/results that you present
– DO present stratum-specific RRs (if necessary), DON’T present interaction parameters
– DO remember that you have been asked to address an epidemiological question, your focus should be on addressing that epidemiological question; statistical techniques are tools that you can use to help you address the question, they are not an end in themselves.
The prevalence and incidence of tuberculosis disease is high in Zambia. HIV infection is known to be the single strongest risk factor for tuberculosis in this setting; the majority of newly diagnosed tuberculosis (TB) cases are HIV-positive. Age, gender, smoking, alcohol, and poverty have been identified as risk factors for TB in some other settings.
During 2006-8, in one province of Zambia, a random sample of TB patients who were diagnosed during this time period were asked if they were willing to participate in a cohort study. Among those sampled, 92% agreed to participate, and were subsequently visited at home. As part of the study, all adults who were members of the TB patient’s household were listed, and one was then randomly selected and asked if they were willing to participate in a cohort study. Of all adult household members who were randomly selected to participate, 90% agreed to participate. One of the outcomes of the cohort study was the incidence of tuberculosis among the randomly selected adult household members who consented to participate. Adults were included in the analysis of TB incidence if they reported that they were not on TB treatment on the date of enrolment into the study. The follow-up period was intended to be 3 years, though some individuals were followed for over 4 years. 2,340 adult household contacts of TB patients agreed to participate, and 2,109 self-reported that they were not on TB treatment at the time of enrolment into the study. At the time of the baseline visit, self-reported data on gender, age, area of residence, education, household assets, smoking history, and alcohol consumption were collected, and all participants were tested for HIV using a blood test. For all participants who tested HIV-positive on a first-line rapid test, a second (and sometimes third) confirmatory test was done. HIV status was determined using the results of all tests. The start date of the cohort was defined as the date the adult household contact took part in the baseline survey. Household visits were scheduled for 18 and 36 months after the baseline visit. At each of these two follow-up visits, information was collected on whether the adult household contact had been diagnosed with TB since the last visit, and if so the date that TB treatment was started. The “exit date” from the cohort study is the earliest of TB treatment start date, death, loss-to-follow-up (due to a change of residence, and then being unable to trace the participant, or because the participant did not want to continue in the study), and the date of the last household visit.