Humboldt-Universität zu Berlin - Statistics

Humboldt-Universität zu Berlin | School of Business and Economics | Statistics | News | The DFG funds project "Flexible density regression methods"

The DFG funds project "Flexible density regression methods"



The DFG funds project Flexible density regression methods.

 

Project participants

 

Abstract

The goal of this project is to develop a unified framework of flexible semiparametric regression methods for densities to better describe and understand relationships between variables of interest. While limitations in traditional mean-oriented parametric modeling of (scalar) data have promoted a variety of extensions to other distributional characteristics (quantiles) or multiple distributional parameters (distributional regression), first methods have recently been developed in functional and compositional data analysis for probability densities as objects of statistical analysis. However, these branches of regression with flexible distributions (individual-level approaches) and statistical analysis of distributions (density-level approaches) have so far been independently developed. Our mission is to join them for fruitful mutual enrichment, facilitating methodological developments. To illustrate the demand of flexible density regression methods, we refer to an example from gender economics: the distribution of the woman's share of a couple's labor income is important for questions on gender identity norms. As a probability distribution, it presents a mixed distribution on [0,1] with positive probability mass at 0 and 1 (for single-income couples) and a continuous - often bimodal - density in between (for double-income couples). It is then of interest to relate this distribution to variables that may influence the distribution such as (age of) children in the household, year or living region. Clear differences occur, for instance, between the eastern and western states in Germany. The density is particularly of interest, as it makes shifts in probability mass or bimodalities due to subgroups easily visible, and as it extends well to discrete, continuous, mixed or bivariate distributions. Such analyses require methods that can model the whole distribution flexibly depending on covariates without parametric assumptions such as normality, and that allow for linear, nonlinear and random effects. We will address this by developing suitable methods for density regression. Depending on the data situation, density regression methods are required and will be developed for density-valued data - e.g. when a histogram is provided for administrative data or is used to summarize massive data - as well as for individual scalar data, when interest lies in modeling the conditional density given covariates. In our unified approach, these two scenarios present two sides of the same coin instead of referring to two different branches of statistics. We will develop methods for continuous densities (e.g. income), discrete densities, i.e. compositional data (e.g. time use over discrete categories), as well as mixed densities (e.g. a woman's fraction of household labor income). Additionally, we will extend this to bivariate densities e.g. when jointly looking at a couple within a household, allowing to overcome also restrictive correlation assumptions.