Skip to main content

Creating a statistical model

Overview

A formula describes how the columns of your dataset are mapped in the statistical model. For example, the table shows three columns: CO2 Uptake, Fertilization, and Water Stress. Here, the CO2 Uptake would be the response variable, i.e., the left side of your model. The variables Fertilization and Water Stress would be the predictors, i.e., the right side of the model. The idea is to see how the predictors affect the response variable. The aim is to see how CO2 Uptake depends on sufficient Fertilization and Water Stress.

CO2 UptakeFertilizationWater Stress
0.010.5yes
0.010.5yes
0.60.5no
0.60.5no
0.310yes
0.210yes
0.0110no
0.0110no

Getting Started

To create your statistical model:

  1. Select a variable from the dropdown for the left side of your model.
  2. Choose predictor variables from the right side using the buttons provided.
  3. Add any necessary arithmetic operations using the operator buttons.
  4. Click the Create Statistical Model button to build your model.

Here is an example of a simple model:

DefineModel

Left Side of the Statistical Model

This section allows you to define the left side of your statistical model, which typically represents the dependent variable (the outcome you want to predict).

Features

  • Variable Selection: Use the dropdown menu to choose a variable for your model.
  • Create Model Button: Click this button to build the statistical model once you have selected your variables.

Right Side of the Statistical Model

The right side represents the independent variables (predictors) that explain the outcome.

Features

  • Variable buttons: Displays the selected predictor variables for your model.
  • Arithmetic Operators: Use the provided buttons to include mathematical operations between variables. Each operator button serves a specific purpose:
    • Add (+): Include an additional predictor variable.
    • Subtract (-): Remove a predictor variable from the model.
    • Multiply (*): Assess interactions between variables.
    • Colon (:): Include interactions between two variables.
    • Divide (/): Include nested effects that consider both variable levels.
    • Nested (%in%): Include nested effects without including the main level.
    • Interaction Level: Specify the interaction level in the model.
    • Add Arithmetic Operations (I()): Perform normal arithmetic operations within the I() function.
  • Editable Code Area: A text area where you can manually edit the right side of the model if needed. This is useful for users who want to refine their formulas directly.

Basic Formula with ~

Formula:
weight = α + β₁(height) + ε

heightweight
15050.71908
16060.07763
17070.42381
18080.13322
19090.86159

Formula:
weight ~ height

termestimatestd.errorstatisticp.value
(Intercept)-100.1359562.1344248-46.914732.13e-05
height1.0034060.012512280.194084.30e-06
termßsStandard error of ßst-statistic = ßs / Standard error of ßsp.value
Baseline weight (predicted weight if size = 0)-98.60015951.1248337-87.657553.3e-06
Effect of height on weight0.99444790.0065939150.813516.0e-07

Adding Multiple Predictors with +

Formula:
price = α + β₁(size) + β₂(bedrooms) + ε

pricesizebedrooms
200.180810002
250.535915003
300.035420003
350.914525004
400.165730004

Formula:
price ~ size + bedrooms

termestimatestd.errorstatisticp.value
(Intercept)98.79216560.4884948202.2378900.0000244
size0.09887380.0003674269.1192700.0000138
bedrooms1.19584610.34715823.4446720.0749271
termßsStandard error of ßst-statistic = ßs / Standard error of ßsp.value
Baseline price (for size = 0, bedrooms = 0)98.79216560.4884948202.2378900.0000244
Effect of size on price0.09887380.0003674269.1192700.0000138
Effect of bedrooms on price1.19584610.34715823.4446720.0749271
  • Add size and bedrooms as main effects
  • A main effect means that the factor is modeled independently from other factors.

Model with Main Effects and Interaction Using *

Formula:
yield = α + β₁(fertilizer_Low) + β₂(water_Low) + β₃(fertilizer_Low x water_Low) + ε

yieldfertilizerwater
80HighHigh
55HighLow
77HighHigh
56HighLow
35LowHigh
40LowLow
48LowHigh
53LowLow

Formula:
yield ~ fertilizer * water

termestimatestd.errorstatisticp.value
(Intercept)78.54.66369016.8321670.0000730
fertilizerLow-37.06.595453-5.6099260.0049602
waterLow-20.06.595453-3.0278300.0054674