Probabilistic Modelling for Business Data
Fitting continuous distributions and zero-inflated count models to interpret uncertainty, tails, and rare/repeated events in business data.
Type
Methodological academic work
Area
Probabilistic Modelling · Business Analytics
Tools
R · fitdistrplus · glmmTMB · DHARMa · VGAM
Techniques
MLE · Goodness-of-fit · AIC / BIC · Lognormal · Gamma · Zero-inflated models
Output
Probabilistic model comparison
Value
Academic work where I compared distributions and probabilistic models using R, maximum likelihood, and fit criteria to interpret uncertainty and business-data structure.
Observations in continuous fit
Selected Lognormal AIC
Selected ZINB AIC
Executive summary
Methodological case showing the ability to choose distributions according to data shape, compare models with objective criteria, and validate whether a model captures tails, overdispersion, or excess zeros.
Business context
Business data often includes skewed monetary variables, low-frequency events, many zeros, or concentrated counts. Modelling them properly avoids defaulting to normality and helps interpret uncertainty, risk, and operational variability.
My role
Individual academic works. I fitted models in R, compared alternatives using AIC/BIC and goodness-of-fit tests, and validated simulated residuals in count models.
Data & methods
- Fitting continuous distributions to a positive, skewed monetary variable.
- Comparison of Normal, Lognormal, Gamma, and Weibull with fitdistrplus.
- Poisson, Negative Binomial, ZIP, and ZINB count models with glmmTMB.
- Diagnostics for overdispersion, zero inflation, and simulated residuals with DHARMa.
Process
- 01Explore empirical data shape.
- 02Propose candidate distribution families.
- 03Estimate parameters by maximum likelihood.
- 04Compare visual fit and AIC/BIC criteria.
- 05Validate diagnostics of the selected model.
- 06Translate results into business interpretation.
Key findings
- For positive, skewed monetary variables, the Lognormal can clearly outperform the Normal.
- In count data with variance far above the mean, Poisson can be insufficient.
- Excess zeros require distinguishing structural zeros from sampling zeros.
- AIC/BIC and residual diagnostics help avoid selecting models only by intuition.
Business implications
- Transferable to prices, costs, revenue, purchase frequency, incidents, leads per account, or risk events.
- Useful for analysts who need to quantify uncertainty and avoid simplistic distribution assumptions.
Limitations
- Academic and methodological cases.
- One count dataset is not suitable as a highlighted public case due to its sensitive topic; it should be communicated generically if mentioned.
- Does not include full multivariate predictive models.
What I would do next
- Apply to real business datasets with explanatory variables.
- Compare with Bayesian or machine-learning models depending on the case.
- Create scenario simulators for operational decisions.
Assets
Suggested visuals
Cullen and Frey plot.
Q-Q / P-P comparison.
AIC/BIC comparison table.
Zero-inflated count model diagram.
Related projects
01
Bayesian Conversion Rate Estimation for Bank Telemarketing Campaigns
Individual academic project · CRM Analytics · Bayesian Inference · Decision Science · FirstBayes / Excel / UCI Bank Marketing dataset
Individual project where I estimated bank telemarketing conversion using Binomial-Beta Bayesian inference and predictive distributions to translate uncertainty into operational marketing expectations.
Area
CRM Analytics · Bayesian Inference · Decision Science
Techniques
Binomial-Beta model · Posterior update · Prior sensitivity
02
Seasonal ARIMA Forecasting of Ozone Concentration
Individual academic project · Forecasting · Time Series · Business Planning · Stata / .do script
Individual project where I modelled a monthly ozone-concentration series using Stata, Box-Jenkins methodology, and SARIMA to generate forecasts with residual diagnostics and uncertainty intervals.
Area
Forecasting · Time Series · Business Planning
Techniques
Box-Jenkins · SARIMA · ACF / PACF
03
Hedonic Pricing Analysis of European Electric Vehicles
Individual academic project · Pricing Analytics · Econometrics · Market Intelligence · R / ggplot2 / lmtest
Individual project where I analysed how range, power, and segment influence European EV prices using R, OLS regression, interaction effects, and robust errors to extract pricing and product implications.
Area
Pricing Analytics · Econometrics · Market Intelligence
Techniques
Hedonic pricing · OLS regression · Log-log model