Back to projects
Probabilistic Modelling · Business AnalyticsMethodological academic work

Probabilistic Modelling for Business Data

Fitting continuous distributions and zero-inflated count models to interpret uncertainty, tails, and rare/repeated events in business data.

Type

Methodological academic work

Area

Probabilistic Modelling · Business Analytics

Tools

R · fitdistrplus · glmmTMB · DHARMa · VGAM

Techniques

MLE · Goodness-of-fit · AIC / BIC · Lognormal · Gamma · Zero-inflated models

Output

Probabilistic model comparison

Value

Academic work where I compared distributions and probabilistic models using R, maximum likelihood, and fit criteria to interpret uncertainty and business-data structure.

309

Observations in continuous fit

2.894,9

Selected Lognormal AIC

1.482,9

Selected ZINB AIC

Executive summary

Methodological case showing the ability to choose distributions according to data shape, compare models with objective criteria, and validate whether a model captures tails, overdispersion, or excess zeros.

Business context

Business data often includes skewed monetary variables, low-frequency events, many zeros, or concentrated counts. Modelling them properly avoids defaulting to normality and helps interpret uncertainty, risk, and operational variability.

My role

Individual academic works. I fitted models in R, compared alternatives using AIC/BIC and goodness-of-fit tests, and validated simulated residuals in count models.

Data & methods

  • Fitting continuous distributions to a positive, skewed monetary variable.
  • Comparison of Normal, Lognormal, Gamma, and Weibull with fitdistrplus.
  • Poisson, Negative Binomial, ZIP, and ZINB count models with glmmTMB.
  • Diagnostics for overdispersion, zero inflation, and simulated residuals with DHARMa.

Process

  1. 01Explore empirical data shape.
  2. 02Propose candidate distribution families.
  3. 03Estimate parameters by maximum likelihood.
  4. 04Compare visual fit and AIC/BIC criteria.
  5. 05Validate diagnostics of the selected model.
  6. 06Translate results into business interpretation.

Key findings

  • For positive, skewed monetary variables, the Lognormal can clearly outperform the Normal.
  • In count data with variance far above the mean, Poisson can be insufficient.
  • Excess zeros require distinguishing structural zeros from sampling zeros.
  • AIC/BIC and residual diagnostics help avoid selecting models only by intuition.

Business implications

  • Transferable to prices, costs, revenue, purchase frequency, incidents, leads per account, or risk events.
  • Useful for analysts who need to quantify uncertainty and avoid simplistic distribution assumptions.

Limitations

  • Academic and methodological cases.
  • One count dataset is not suitable as a highlighted public case due to its sensitive topic; it should be communicated generically if mentioned.
  • Does not include full multivariate predictive models.

What I would do next

  • Apply to real business datasets with explanatory variables.
  • Compare with Bayesian or machine-learning models depending on the case.
  • Create scenario simulators for operational decisions.

Assets

View summaryComing soonMethodological summary can be expanded.View notebookComing soonR code available for a reproducible version.

Suggested visuals

Cullen and Frey plot.

Q-Q / P-P comparison.

AIC/BIC comparison table.

Zero-inflated count model diagram.

Related projects

01

Bayesian Conversion Rate Estimation for Bank Telemarketing Campaigns

Individual academic project · CRM Analytics · Bayesian Inference · Decision Science · FirstBayes / Excel / UCI Bank Marketing dataset

Individual project where I estimated bank telemarketing conversion using Binomial-Beta Bayesian inference and predictive distributions to translate uncertainty into operational marketing expectations.

CRM Analytics · Bayesian Inference · Decision ScienceIndividual academic project

Techniques

Binomial-Beta model · Posterior update · Prior sensitivity

02

Seasonal ARIMA Forecasting of Ozone Concentration

Individual academic project · Forecasting · Time Series · Business Planning · Stata / .do script

Individual project where I modelled a monthly ozone-concentration series using Stata, Box-Jenkins methodology, and SARIMA to generate forecasts with residual diagnostics and uncertainty intervals.

Forecasting · Time Series · Business PlanningIndividual academic project

Techniques

Box-Jenkins · SARIMA · ACF / PACF

03

Hedonic Pricing Analysis of European Electric Vehicles

Individual academic project · Pricing Analytics · Econometrics · Market Intelligence · R / ggplot2 / lmtest

Individual project where I analysed how range, power, and segment influence European EV prices using R, OLS regression, interaction effects, and robust errors to extract pricing and product implications.

Pricing Analytics · Econometrics · Market IntelligenceIndividual academic project

Techniques

Hedonic pricing · OLS regression · Log-log model