Methodology

INTERACTIVE
VISUALIZATION

The model is based on so called classification and regression trees (CART), which is used due to character of data (categorical variables with many categories, assumption of nonlinear relationships, non-normal distribution of variables and multicolinearity). Regression tree is a variant of decision tree, which is generated if the dependent variable is continuous. Independent variables can be both continuous and discrete. We used a traditional tree-building technique CART (Breiman et al., 1984), which is a form of binary recursive partitioning. The data set is successively split into subgroups that are increasingly homogeneous in the values of independent variable. At each node the CART algorithm tries to find the best possible variable to split the node into two child nodes by splitting value that gives the best discrimination between these two outcome nodes. For regression-type problems, a least-square deviation criterion is used. Advantage of the CART model is simple interpretation (visualized as tree structure with hierarchical ordering rules); it is easy to follow the division of groups of samples according to distribution criteria and get an overview about the fate of these substances and the environment variables that affect their occurrence. For detailed description of the methodology, see Kubošová et al, 2009.