Scientific ASIA

Disease Onset Prediction Using Statistical Model

The prediction of the onset of disease can revolutionize the health care system by shifting it from reactive to preventive care. Besides having benefits in terms of improved quality of life, a major impact could be seen as cost-effectiveness by saving money. Countless genetic factors can affect the onset of common diseases like hypertension, heart disease, and type II diabetes.

DNA can affect how these diseases develop and manifest, but outlining the association between DNA and disease requires a reliable statistical model that works on huge datasets of millions of patients. An Assistant Professor Matthew Robinson in Austria worked with other scientists to develop a novel mathematical model to improve predictive quality obtained from large datasets of patient’s genomic data. This model works similar to that of a physician as they discuss a family’s medical history), develop tailored predictions about health risks, etc.

DNA is a hereditary material that consists of billions of chemical base pairs, the shortest part of the DNA sequence is a genetic marker.Scientists collected many hundred thousand genetic markers for study. Later the researchers used the statistical model of Matthew Robinson to link the composition of the markers with the beginning of hypertension, type II diabetes, and cardiac disease in the patients who are in the database. Their prime interest was the age of the patient at the time of onset of disease, later they can use the model for prediction probabilities of disease occurrence.

It is to be made clear that this model cannot tell about the direct relationship between genes and disease onset. It can only improve the prediction probabilities of the onset of a disease. Robinson’s model has a track-able statistical computation, it’s lacking in the Black-box model, where the prediction procedure cannot be easily comprehended by humans, due to multi layer abstraction. The knowledge and complete understanding of the mathematical functioning of the prediction model is important to make appropriate ethical considerations when using a large number of sensitive data of patients, simultaneously giving the researcher the ability to better explain how the predictions were made.

Effective models and large genomic datasets are required to take the maximum advantage of the prediction method, several apprehensions come along with it, like privacy and data security, which is a concern for both the health care system and the researcher. Ethical approval is required before access anonymized patient data from state-funded bio-banks. It is to maintain strict data security measures.

Concerning Robinson’s model, they have taken data from the UK for model construction and data from Estonia for testing its predictive potential. The data collected from Estonia was used to develop primary personalized risk assessments for disease onset, that the patients were informed about so they make take incentive of the study by taking preventive steps.

To encapsulate, Robinson and colleagues’ model is a primary example and a fundamental step in showing the potential of large genomic datasets for preventive health care. This model along with the infrastructure of bio-banks, if provided with a robust data protection system can soon develop personalized predictive medicine.


The onset of disease, statistical model, prediction probability Hypertension, high blood pressure, type 2 diabetes, heart disease, Genetic marker, biobanks, predictive medicine, genomic datasets.

1 comment

  • Such predictive statistical model is most essential in Cancer diseases. As most of the cancers are due to genetic alteration and DNA mutation. It would be more helpful if we have such predictive model for cancer disease. I would be more interested to learn in detail about this predictive model.