I’ve been gathering data about my hens’ eggs, like how many eggs are laid per day and by whom. One of my baby hens ‘Ramsey’ started laying eggs on March 21st. I weighed the eggs each day and recorded the data. The weight appears to increase gradually over time.
|Day||Egg Weight (grams)|
I experimented with creating a linear regression (y = mx + b) to find the line of best fit using Python. I plotted the data and could tell this was not linear, so then I constructed a quadratic regression (y = ax^2 + bx + c).
# Set up Quadratic Regression def calculate_error(a, b, c, point): (x_point, y_point) = point y = a * x_point**2 + b*x_point + c # Quadratic distance = abs(y - y_point) return distance def calculate_all_error(a, b, c, points): total_error = 0 # Set initial value before starting loop calculation for point in points: total_error += calculate_error(a, b, c, point) return total_error
I entered the egg weight data as a list (datapoints), and iterated over a range of a, b, and c values to find what combination of a, b, and c would give the smallest error possible (smallest absolute distance between the regression line and actual values). I set initial values of a, b, and c = 0 and smallest_error = infinity and updated (replaced) them each time the error value was smaller than before.
# Ramsey Egg Data datapoints = [ (0,39), (1,42), (2,42), (3,43), (4,47), (5,44), (6,44), (7,43), (8,44), (9,46), (10,50), (11,55) ] a_list = list(range(80,100)) possible_as = [num * .001 for num in a_list] #your list comprehension here b_list = list(range(-10,10)) possible_bs = [num * .001 for num in b_list] #your list comprehension here c_list = list(range(400,440)) possible_cs = [num * .1 for num in c_list] #your list comprehension here smallest_error = float("inf") best_a = 0 best_b = 0 best_c = 0 for a in possible_as: for b in possible_bs: for c in possible_cs: loop_error_calc = calculate_all_error(a, b, c, datapoints) if loop_error_calc < smallest_error: best_a = a best_b = b best_c = c smallest_error = loop_error_calc print(smallest_error, best_a, best_b, best_c) print("y = ",best_a,"x^2 + ",best_b,"x + ", best_c)
Ultimately I got the following results:
y = 0.084 x^2 + -0.01 x + 41.7
Which gives a total error of 19.828.
This error feels big to me. I would like to get it as close to 0 as possible, or within single digits. One thing I may do is remove the data point of day 4, 47grams, which was unusually large.
I plotted the data in an Excel graph and added a quadratic regression line as well. The resulting regression line is y = 0.0972x2 – 0.1281x + 41.525. This is close to my Python quadratic regression, but not the same. I’d like to figure out why these differ when the model is similar. It believe this may have to do with formula of error calculation – I am using Total Absolute Error, whereas the more common standard is to get Mean Squared Error.