Paroxysmal Atrial Fibrillation

and Electrocardiogram Predictors

John Rondoni

MAS.622J Fall 2002

Paroxysmal Atrial Fibrillation

Paroxysmal atrial fibrillation (PAF) is a result of irregular and repeated depolarization of the atria. As a result the atria do not contract in a coordinated manner preventing the heart from operating normally. The onset of atrial fibrillation is can be sudden and life threatening. Naturally there is a strong interest in being able to accurately predict its onset.

An Electrocardiogram (ECG) is a measurement of the hearts electrical activity which allows physicians to accurately diagnose a wide range of disorders. ECGs come in a wide variety. They vary in the number of leads used (one to twelve or more), clarity (absence of noise distorting the hearts signal), strength, and in other ways.

Below are two example ECGs the first from a normal heart, the second from a heart undergoing atrial fibrillation.

The differences, while not extreme, indicate a significant change in the behavior of the heart. A normal ECG can be broken down into three primary components: the P-wave, QRS complex, and T-wave. These different components are illustrated below.

Each of the different waves correspond to different electrical activities of a normal heart. The P-wave is of special interest because it corresponds to atrial depolarization. As you can from the second ECG above, the P-wave in becomes distorted during atrial fibrillation, and so it is a key indicator of the onset of PAF.

Classification Strategies

The goal of this project is to evaluate several approaches to the accurate classification of patients who have PAF based solely on their ECG. In particular we will focus on the relatively new classification technique of tree augmented naïve Bayesian (TAN) networks.

TAN networks are a way to relax the strong attribute independence assumptions made when forming naïve Bayesian networks. They do this by introducing single link dependencies between attributes. For example, below is a typical Bayesian network in which the class is dependant upon several attributes.

A TAN network would allow the addition of dependencies between the attributes that allow the network to more closely model the real world. The above network might be changed as shown below if it were a TAN network.

To understand if TAN networks are an appropriate choice for the problem of PAF classification their performance will be compared to that of naïve Bayesian networks, the C4.5 classifier, and FAN networks (a further extension of TAN which allows the introduction of an number of arbitrary links.

Data

The ECG data we will be using contains 100 two channel ECG from 50 different patients. Each sample is thirty minutes in length and is sampled at 128Hz. Half of the patients are known to have PAF, while the other half has never had any signs of cardiac disorders. Of the 50 samples from patients known to have PAF, one sample from each is distant from any PAF episode and one sample directly precedes and episode.

Overall the dataset widely varies in quality. Some samples have large amounts of noise, other are missing a channel of data, and still others completely ideal. Errors and inconsistent data could be removed by hand however I did not choose to do so. My goal is to evaluate the abilities of the above classifiers on real world data – doctors would never have time to manually edit ECGs to help in automated PAF detection.

Features

Research points to several possible choices of features for PAF detection. I will focus on three: R-R intervals, P-wave analysis, and power spectral densities.

Normalized ratios of long to short R-R intervals have been shown to be a good indicator of the onset of PAF when they occur within six intervals of each other. P-wave morphology is one of the standard methods of detecting PAF, a good example of P-wave changes during PAF are the two ECG at the top of this page. Finally, during fibrillation the atria are unable to coordinate their contractions and as a result tend to flutter instead of beat, as is indicated by the changes in the P-wave. Therefore it seems reasonable that the power spectral density (power per unit of frequency) of a heartbeat will change during fibrillation and perhaps will be significantly different in patients that do and do not suffer from PAF.

Throughout my work, when unsure about the efficacy of a feature I generally chose to include it. This is primarily because the learning techniques I will be employing inherently tend to decide based on the most effective features and ignore the rest.

R-R Intervals

The R-R intervals were easily extracted from QRS peak markings provided in the database. These were, unfortunately very noisy and there was no easy solution to detect which unusual intervals were the result of QRS detection errors, noise, or actual phenomena. As a result I simply discarded the extreme outliers. Some investigation indicated that this was warranted since nearly all such outliers were the result of QRS detection errors.

The R-R intervals provided three features: a maximum long to short interval ratio (of intervals within six QRS peaks of each other), an average long to short interval ratio, and the normalized maximum ratio.

P-wave Analysis

Extraction and analysis of the P-waves was difficult due to the immense size of the database and the problem of reliably and repeatedly locating anything within it. Below are the P-waves extracted from two different individuals, the one on the left suffers from PAF (however, don’t jump to the conclusion that these are typical P-waves for either group).

The P-waves translated into several features. I included the basic statistics of each sample’s averaged P-wave: maxima, mean, and variance. In addition, I treated each average P-wave as its own feature vector and simplified the entire space using a Fisher Linear Discriminant and included the resulting scalar in the final feature vector.

Power Spectral Density

The power spectral density (PSD) is a description of how the power of a time signal is distributed with respect to frequency. Instead of calculating the auto-PSD of each channel of each signal I computed the cross spectral density (CSD) across both channels contained in each sample. This is valid because each channel of the ECG recorded the same electrical events, simply from a different viewpoint.

This brings up an interesting point. Since the ECG is simply the sum of several independent electrical phenomena in the heart observed at different points, it is a perfect candidate for independent component analysis (ICA). Since we only have two sensed signals, there is no chance of reconstructing an accurate picture of the independent electrical phenomena of the heart, it will be possible to decouple two statistically independent signals in the ECG. Below are the first and second signals (second below first) from a PAF patient and normal individual (PAF on left).

From these plots it appears that ICA may be an effective strategy. To turn this data into features I found the PSD of each of the decoupled signals (separately this time) and used their statistics as the descriptive features.

For all PSD and CSD data, I used their mean, variance, and 12Hz to 50Hz slope approximation as features using both log and linear plots. Blow are example log plots of two second ICA components, PAF patient is again on the left.

Classification and Results

Before going into the various results, it should be noted that the TAN and FAN classifier used was not yet fully featured. Currently it only supports discrete input data. I had hoped to find a system that would allow the use of continuous data, but that was simply not available. This, gave the CART and naïve Bayesian methods an advantage since I did test them with continuous data. All tests were run with 10-fold cross validation on the entire test set.

Continuous Data

C4.5	68%
Naïve Bayes	56%

Discrete Data

C4.5	59%
Naïve Bayes	25%
TAN	50%
FAN	25% (space constraint à Naïve Bayes)

Overall I was most impressed with the ability of TAN to handle the comparatively weak discrete data. Given how closely the naïve Bayesian approach came to the CART algorithm with the continuous data I suspect TAN to perform at least as well. Furthermore the FAN result was disappointing but unavoidable, the algorithm seems to require rather prohibitive amounts of space, which when lacking causes it to perform no better than naïve Bayes.

Future Work

The most glaring problem with this analysis is the lack of a continuous TAN and FAN classifier. I wouldn’t say that this result is broadly significant until that point.

Additionally, I would like to see future work focus on bringing in more features from the medical world. For example, weight, age, smoking habits, and diet all influence ones risk for PAF. An ideal classifier would be able to take such outside variables into account when they are available.

Sources and Recognition

I am indebted to the researchers, particularly in the medical field, whose papers allowed me to understand and analyze this dataset despite limited exposure to cardiology. Furthermore, the machine learning packages below allowed me to quickly (and repeatedly) try new things with a great deal of flexibility. Nearly all of this work was done in MATLAB.

Medial Research

[1] “Some Important R-R Interval Based Paroxysmal Atrial Fibrillation Predictors”, G Krstacic, D Gamberger, T Smuc2, A Krstacic; Institute for Cardiovascular Disease and Rehabilitation1, Rudjer

[2] “Atrial fibrillation: current knowledge and recommendations for management”, S. Le´vy, G. Breithardt, R. W. F. Campbell, A. J. Camm, J.-C. Daubert, M. Allessie, E. Aliot, A. Capucci, F. Cosio, H. Crijns, L. Jordaens, R. N. W. Hauer, F. Lombardi and B. Lu¨ deritz on behalf of the Working Group on Arrhythmias of the European Society of Cardiology

[3] “Detecting electrocardiogram abnormalities with independent component analysis” Seong-Bin Yim, Steven E. Noel, and Harold H. Szu; The George Washington University,

[4] “Signal-Averaged P-Wave Abnormalities and Atrial Size in Patients With and Without Idiopathic Paroxysmal Atrial Fibrillation” Naoko Ishimoto, MD, Makoto Ito, MD, PhD, and Masahiko Kinoshita, MD, PhD. First Department of Internal Medicine, Shiga University of Medical Science, Shiga, Japan.

[5] “Clinical, electrocardiographic and electrophysiological predictors of atrial fibrillation development in different cardiac substrates” Paolo ZECCHI, Antonio DELLO RUSSO, Gemma PELARGONIO, Tommaso SANNA, Italo PORTO, Loredana MESSANO and Giuseppe DE MARTINO; Istituto di Cardiologia, Università Cattolica del Sacro Cuore, Rome, Italy

[6] “Non-invasive assessment of atrial fibrillation (AF) cycle length in man: potential application for studying AF” Carl J. MEURLING (a), Leif SÖRNMO (b), Martin STRIDH (b) and S. Bertil OLSSON (a) Department of Cardiology, Lund University, Lund, Sweden (b) Department of Applied Electronics, Lund Institute of Technology, Lund, Sweden

Software Packages

[7] Bayesian Network Classifier Toolbox, Jarek Sacha

[8] Weka 3, Ian Witten and Eibe Frank