Evaluation of data preprocessings for the comparison of GC-MS chemical profiles of seized cannabis samples.
Forensic Sci Int. 2020 Mar 03;310:110228
Authors: Slosse A, Van Durme F, Samyn N, Mangelings D, Vander Heyden Y
Cannabis is the most frequently used illicit drug in Belgium, where it is mainly cultivated indoor. To improve the fight against this drug, cannabis-profiling methods are required. Cannabis is a natural product and its chemical composition depends on many factors, which cause a high heterogeneity and variability in the secondary metabolites, and make this study challenging. The aim of this study is to combine cannabis profiling with statistical methodology to evaluate the intra (within)- and inter (between)-plantation variabilities with the goal to define a suitable approach linking seized marijuana to given plantations. The data set used contains 46 samples from 9 locations. The chemical profiles, consisting of data from eight cannabinoids, are obtained by gas chromatography – mass spectrometry. The raw data (peak areas) is pretreated with different preprocessing methods. The Pearson correlation coefficients between intra-location profiles were calculated after each pre-treatment, and the 95 and 99 % confidence limits determined. All preprocessed data were then compared with the internal standard normalization reference method with the aim to minimize the overlap between intra- and inter-location results, i.e. to reduce the number of false positives, and to obtain the best discrimination. Furthermore, cross-validation was used to evaluate the model originating from the most efficient data pre-treatment technique. The best results were obtained, when the peak areas were normalized to the internal standard with subsequent calculation of the fourth root. It results in a reduction of false positives for both confidence limits to 11 % and 14 % compared to 21 % and 27 % for the reference method. Cross-validation reveals similar false positive results as for the calibration set. In conclusion, when preprocessing the data, an improved model is obtained resulting in a significant decrease in the number of false positives. After studying the predictive performance of the model, it appears to be representative for the entire plantation information.
PMID: 32169669 [PubMed – as supplied by publisher]
Source: ncbi 2