Ramsey’s First Eggs – Python Loop Regressions

I’ve been gathering data about my hens’ eggs, like how many eggs are laid per day and by whom. One of my baby hens ‘Ramsey’ started laying eggs on March 21st. I weighed the eggs each day and recorded the data. The weight appears to increase gradually over time.

DayEgg Weight (grams)
039
142
242
343
447
544
644
743
844
946
1050
1155

I experimented with creating a linear regression (y = mx + b) to find the line of best fit using Python. I plotted the data and could tell this was not linear, so then I constructed a quadratic regression (y = ax^2 + bx + c).

# Set up Quadratic Regression

def calculate_error(a, b, c, point):
  (x_point, y_point) = point
  y = a * x_point**2 + b*x_point + c # Quadratic
  distance = abs(y - y_point)
  return distance

def calculate_all_error(a, b, c, points):
  total_error = 0 # Set initial value before starting loop calculation

  for point in points:
    total_error += calculate_error(a, b, c, point)
  return total_error

I entered the egg weight data as a list (datapoints), and iterated over a range of a, b, and c values to find what combination of a, b, and c would give the smallest error possible (smallest absolute distance between the regression line and actual values). I set initial values of a, b, and c = 0 and smallest_error = infinity and updated (replaced) them each time the error value was smaller than before.

# Ramsey Egg Data
datapoints = [
  (0,39),
  (1,42),
  (2,42),
  (3,43),
  (4,47),
  (5,44),
  (6,44),
  (7,43),
  (8,44),
  (9,46),
  (10,50),
  (11,55)
]

a_list = list(range(80,100))
possible_as = [num * .001 for num in a_list] #your list comprehension here
b_list = list(range(-10,10))
possible_bs = [num * .001 for num in b_list] #your list comprehension here
c_list = list(range(400,440))
possible_cs = [num * .1 for num in c_list] #your list comprehension here

smallest_error = float("inf")
best_a = 0
best_b = 0
best_c = 0

for a in possible_as:
  for b in possible_bs:
    for c in possible_cs:
      loop_error_calc = calculate_all_error(a, b, c, datapoints)
      if loop_error_calc < smallest_error:
        best_a = a
        best_b = b
        best_c = c
        smallest_error = loop_error_calc

print(smallest_error, best_a, best_b, best_c)
print("y = ",best_a,"x^2 + ",best_b,"x + ", best_c)

Ultimately I got the following results:

y = 0.084 x^2 + -0.01 x + 41.7
Which gives a total error of 19.828.

This error feels big to me. I would like to get it as close to 0 as possible, or within single digits. One thing I may do is remove the data point of day 4, 47grams, which was unusually large.

I plotted the data in an Excel graph and added a quadratic regression line as well. The resulting regression line is y = 0.0972x2 – 0.1281x + 41.525. This is close to my Python quadratic regression, but not the same. I’d like to figure out why these differ when the model is similar. It believe this may have to do with formula of error calculation – I am using Total Absolute Error, whereas the more common standard is to get Mean Squared Error.

Note how the data points do not follow linear growth, hence quadratic time!
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s