I’ve been gathering data about my hens’ eggs, like how many eggs are laid per day and by whom. One of my baby hens ‘Ramsey’ started laying eggs on March 21st. I weighed the eggs each day and recorded the data. The weight appears to increase gradually over time.
Day | Egg Weight (grams) |
0 | 39 |
1 | 42 |
2 | 42 |
3 | 43 |
4 | 47 |
5 | 44 |
6 | 44 |
7 | 43 |
8 | 44 |
9 | 46 |
10 | 50 |
11 | 55 |
I experimented with creating a linear regression (y = mx + b) to find the line of best fit using Python. I plotted the data and could tell this was not linear, so then I constructed a quadratic regression (y = ax^2 + bx + c).
# Set up Quadratic Regression
def calculate_error(a, b, c, point):
(x_point, y_point) = point
y = a * x_point**2 + b*x_point + c # Quadratic
distance = abs(y - y_point)
return distance
def calculate_all_error(a, b, c, points):
total_error = 0 # Set initial value before starting loop calculation
for point in points:
total_error += calculate_error(a, b, c, point)
return total_error
I entered the egg weight data as a list (datapoints), and iterated over a range of a, b, and c values to find what combination of a, b, and c would give the smallest error possible (smallest absolute distance between the regression line and actual values). I set initial values of a, b, and c = 0 and smallest_error = infinity and updated (replaced) them each time the error value was smaller than before.
# Ramsey Egg Data
datapoints = [
(0,39),
(1,42),
(2,42),
(3,43),
(4,47),
(5,44),
(6,44),
(7,43),
(8,44),
(9,46),
(10,50),
(11,55)
]
a_list = list(range(80,100))
possible_as = [num * .001 for num in a_list] #your list comprehension here
b_list = list(range(-10,10))
possible_bs = [num * .001 for num in b_list] #your list comprehension here
c_list = list(range(400,440))
possible_cs = [num * .1 for num in c_list] #your list comprehension here
smallest_error = float("inf")
best_a = 0
best_b = 0
best_c = 0
for a in possible_as:
for b in possible_bs:
for c in possible_cs:
loop_error_calc = calculate_all_error(a, b, c, datapoints)
if loop_error_calc < smallest_error:
best_a = a
best_b = b
best_c = c
smallest_error = loop_error_calc
print(smallest_error, best_a, best_b, best_c)
print("y = ",best_a,"x^2 + ",best_b,"x + ", best_c)
Ultimately I got the following results:
y = 0.084 x^2 + -0.01 x + 41.7
Which gives a total error of 19.828.
This error feels big to me. I would like to get it as close to 0 as possible, or within single digits. One thing I may do is remove the data point of day 4, 47grams, which was unusually large.
I plotted the data in an Excel graph and added a quadratic regression line as well. The resulting regression line is y = 0.0972x2 – 0.1281x + 41.525. This is close to my Python quadratic regression, but not the same. I’d like to figure out why these differ when the model is similar. It believe this may have to do with formula of error calculation – I am using Total Absolute Error, whereas the more common standard is to get Mean Squared Error.
