Your Personal Database (PostgreSQL)

I was reading Choose FI: Your Blueprint to Financial Independence and one of the chapters concluded with a question like:
“What would you do if you didn’t have to work?”

Something rose to the surface. Even if I didn’t need to work to earn money, I would still practice data analysis using SQL.

This awakened my desire to set up a SQL server-database for personal use. Back-end database access where I can write queries. I miss this dearly from my previous job, where I had an in-house electronic record system and superuser access. I’ve tasted the forbidden fruit and cannot go back to measly front-end, web-browser button clicking to configure reports with limited functionality and flexibility. The power of back-end querying is what I seek, but this is challenging when my company doesn’t currently have a database. Setting one up is notoriously hard, even for professional developers.

I emerged through some struggles to set up a personal SQL database so I can practice queries with my own data. I like the IDE called Datagrip by Jetbrains (free with a student email address) and PostgreSQL (also free) which is what I used in the previous job. Here’s how to set it up.

Step 1: Download PostgreSQL
It’s free.
https://www.postgresql.org/download/

Step 2: Install PostgreSQL and set up postgres User Password and Port.

The super user credential will be used to set up the database connection in the IDE.
Username is postgres (by default). You define the Password.

The default port of 5432 worked for me and should work for most people.

Step 3: Complete PostgreSQL Installation. Restart computer to apply downloaded updates.

Step 4: Download and set up DataGrip.
It’s free with a student email account. There are other free IDEs such as DBeaver too.
https://www.jetbrains.com/help/datagrip/postgresql.html

Step 5: Set up the database in DataGrip.
In the “Database” pane on top left, click the + icon > new Data Source > PostgreSQL.

Give it a name. I called it Personal Postgres.

Use localhost, port 5432, and Authentication type as User & Password.
Enter the User: Postgres and the Password you defined in step 2. Choose your Save password preference (Forever is convenient for a personal computer).

Test the connection. If it works, then hit Apply and OK.

Note: If you get an error message like this, that means the PostgreSQL was not installed correctly (step 2).
You MUST use the username and password. The “No Auth” feature did not work for me.

Step 6: Savor the connection!
The database will take a few minutes to connect to an online server so that you can use PostgreSQL SQL functions. If you have very strict firewall settings on your computer, you might need to allow Windows firewall or similar to allow the 5432 port connection.

If everything is good, you’ll get a small Connected status update on the bottom right Event Log:


In a future post, I’ll share how to upload your first database table from a CSV file.

Happy querying!

Advertisement

Jan & Feb Egg Count

As a Chicken Tender, I raise some of the happiest chickens on the planet.
And as an analytical data tender, I like to track how many eggs have been laid and by whom.

People ask me how many eggs I collect in a week, and I’m a nut about calculating costs too. So I will share monthly updates on how many eggs were gathered and the average cost per dozen. Figuring out these numbers is satisfying. It’ll be interesting to see how the egg count and cost per dozen changes as we approach June where the longer daylight results in more egg laying, and then tapers down as we approach winter solstice.

I currently raise 6 layer hens. They eat organic chicken feed and garden greens, range freely, and slurp up tasty worms like noodles.

Here’s our monthly Egg Count for January and February 2023:

Month (2023)Laying HensEggs per Day (avg)Eggs per WeekDozens per WeekEggs per MonthDozens per MonthFeed CostCost per Dozen
Jan62.719.41.6836.9$ 30.00$ 4.30
Feb63.827.72.31078.9$ 30.00$ 3.33

Observations:

– Two of the reinas (2+ year old hens) resumed laying when they finished their winter molt (feather shedding and regrowth) in late Jan and early Feb. This increased the Feb egg count.
– The three bebitas (1 year old hens) laid daily last summer, but this tapered down to a rate of 0.8 per hen (or 4 eggs every 5 days) in January. This is to be expected due to short daylight.

Awaken the bread machine!

Last year, I received a bread machine from a neighbor who had previously received it from another neighbor. It sat unused for years at her home so she passed it on. Alas, the tradition continued as it sat on my shelf for many months, yearning for yeast and flour.

I resolved that 2023 was going to be different. I would awaken this bread machine from its deep slumber.

Also, I jogged with a friend along the lake, and we chatted about schmalzkuchen (German beignets), sourdough, and this bread machine. Thank you to this friend for sharing the joy of running and making bread!

Here is a 1 pound loaf, which is the smallest size possible with this machine. The machine kneaded away my fears with its automatic mixing, gentle heat application as the dough rose (proofing), and finally the actual baking. It took 2.5 hours from start to finish with very little hands on and cleaning effort. This cube loaf is chewy and tastes like normal bread! The crust is a little tougher through than store-bought loaves, but I will eventually figure out how to amend this. The yield is a perfect amount for me to enjoy in one week, 1 to 2 half slices per day as part of breakfast or a snack with vegan butter and homemade invasive blackberry jam.

Also: The proofed dough is an exquisite texture that I highly recommend patting. Far better than a baby’s bum!

I was curious…
How much did this 1 pound loaf of homemade bread cost?

IngredientGrams in packageCost of packageCost/gramGrams in loafCost for loaf
Water240g (1 cup)$00240g$0
Olive Oil (organic)1832$15.99$0.00872828g$0.24
Honey
(organic, raw)
2040$19.99$0.00979930g$0.29
Salt453$2.00$0.0044158g$0.04
All-Purpose Flour
(organic, 5 lb bag)
2268$3.49$0.001539300g$0.46
Active Yeast (2 lb)908$7.49$0.0082494.2g
1.5 teaspoons
$0.03
Sunflower Seeds, Toasted, Unsalted, Shelled454$2.99$0.00658660g$0.40
1 Loaf Total670.2g
(about 1 lb)
$1.47
Note: All ingredients were from Costco except for the flour (Grocery Outlet) and Sunflower Seeds (Trader Joe’s).

So I made 1 week’s worth of organic bread for just 147 cents! Definitely a tasty and inexpensively repeatable activity. Next time I will mix sunflower seeds into the dough (I love seed-rich, “bird food” bread) and maybe use a third or half whole wheat flour for more nuttiness.

*Update February 19, 2023*
I’ve now made this recipe 4 times. By happy accident, I discovered that doubling the honey from 15g to 30g somehow produced a softer, chewier bread (did not dry out as quickly) without noticeable sweetness. I also add 60g of sunflower seeds which gets mixed into the dough and gives nice texture. Trader Joe’s has the best value on these. The final bread weight is lighter than adding all the raw ingredients due to water evaporating as it bakes.

If you have a bread machine, you can follow the Ingredient + Grams in Loaf columns to get the recipe. I poured the ingredients in this order (top to bottom) in the Breadman Plus machine.

Compounding Knowledge

It’s been one year since I started studying programming using Codecademy.com. I set out to study 4 to 5 times a week, every week, 1 lesson page at a time. My longest streak on record is 12 weeks in a row. I’ve completed 86% of the Learn Python 3 course (a hefty course that covers programming fundamentals) and finished the Command Line course too (Linux terminal is not so scary anymore!)

I just finished an online project called ‘Fending the Hacker’ where I read and write to CSV and JSON files programmatically with Python. I didn’t realize this till the end, but this project built on prior lesson topics:

  • Functions
  • Loops
  • Lists
  • Dictionaries
  • Modules (JSON, CSV)
  • Files – Programmatic File Reading/Writing

Looking back on what I’m comfortable with now and how much I’ve learned in one year amazes me. I don’t look back much nor often. But I recall a sinking, confused feeling about not understanding loops, when to use a function, and the purpose of lists and dictionaries. Now I can’t imagine doing any Python analysis or mini project without loops and lists at a minimum. I’m comfortable using them, something distinctly different from before.

This shows me the power of bite-sized but consistent practice. Most lesson topics are divided into about a dozen pages, and I do the reading and practice for 1-2 lesson pages each sitting. That’s 10 minutes or less of light and easy studying. I don’t let long stretches of days pass between each sitting. Recently I’ve shifted my Python study time to earlier in the day to ensure I get it done. I feel the power of compounding knowledge and love it. Is this what the power of compounding interest is also like? The journey along the way has actually been fun.

Onward to the next and final lesson of Python 3, Classes!

Ramsey’s First Eggs – Python Loop Regressions

I’ve been gathering data about my hens’ eggs, like how many eggs are laid per day and by whom. One of my baby hens ‘Ramsey’ started laying eggs on March 21st. I weighed the eggs each day and recorded the data. The weight appears to increase gradually over time.

DayEgg Weight (grams)
039
142
242
343
447
544
644
743
844
946
1050
1155

I experimented with creating a linear regression (y = mx + b) to find the line of best fit using Python. I plotted the data and could tell this was not linear, so then I constructed a quadratic regression (y = ax^2 + bx + c).

# Set up Quadratic Regression

def calculate_error(a, b, c, point):
  (x_point, y_point) = point
  y = a * x_point**2 + b*x_point + c # Quadratic
  distance = abs(y - y_point)
  return distance

def calculate_all_error(a, b, c, points):
  total_error = 0 # Set initial value before starting loop calculation

  for point in points:
    total_error += calculate_error(a, b, c, point)
  return total_error

I entered the egg weight data as a list (datapoints), and iterated over a range of a, b, and c values to find what combination of a, b, and c would give the smallest error possible (smallest absolute distance between the regression line and actual values). I set initial values of a, b, and c = 0 and smallest_error = infinity and updated (replaced) them each time the error value was smaller than before.

# Ramsey Egg Data
datapoints = [
  (0,39),
  (1,42),
  (2,42),
  (3,43),
  (4,47),
  (5,44),
  (6,44),
  (7,43),
  (8,44),
  (9,46),
  (10,50),
  (11,55)
]

a_list = list(range(80,100))
possible_as = [num * .001 for num in a_list] #your list comprehension here
b_list = list(range(-10,10))
possible_bs = [num * .001 for num in b_list] #your list comprehension here
c_list = list(range(400,440))
possible_cs = [num * .1 for num in c_list] #your list comprehension here

smallest_error = float("inf")
best_a = 0
best_b = 0
best_c = 0

for a in possible_as:
  for b in possible_bs:
    for c in possible_cs:
      loop_error_calc = calculate_all_error(a, b, c, datapoints)
      if loop_error_calc < smallest_error:
        best_a = a
        best_b = b
        best_c = c
        smallest_error = loop_error_calc

print(smallest_error, best_a, best_b, best_c)
print("y = ",best_a,"x^2 + ",best_b,"x + ", best_c)

Ultimately I got the following results:

y = 0.084 x^2 + -0.01 x + 41.7
Which gives a total error of 19.828.

This error feels big to me. I would like to get it as close to 0 as possible, or within single digits. One thing I may do is remove the data point of day 4, 47grams, which was unusually large.

I plotted the data in an Excel graph and added a quadratic regression line as well. The resulting regression line is y = 0.0972x2 – 0.1281x + 41.525. This is close to my Python quadratic regression, but not the same. I’d like to figure out why these differ when the model is similar. It believe this may have to do with formula of error calculation – I am using Total Absolute Error, whereas the more common standard is to get Mean Squared Error.

Note how the data points do not follow linear growth, hence quadratic time!

TidyTuesday Tidbits from Animal Crossing

There is a popular video game captivating folks at home these days called “Animal Crossing”. When I first heard this, I imagined animals crossing a road to get to the other side, like the old game Frog Xing or Frogger. Alas, the game has nothing to do with crossing busy traffic roads!

“Animal Crossing” is a Japanese game whose original title (Doubutsu No Mori どうぶつの森) translates to something more like “Animal Forest”. From what I gather, it’s a SIMs-like game where you build up an island, interacting with both virtual (in-game) and real characters (actual people). My friend was totally unfazed when I showed her a picture of fish whose face only a mother could love (aka a Sculpin) because she’d grown accustomed to catching valuable tarantulas by night and selling them by day for the game. A rather strange-sounding way to make money…

Every Tuesday, some wonderful people share an interesting data set that becomes a source of data analysis. It’s a great way to practice “data wrangling” – that is, filtering, cleaning, and taking care of problematic data in a quest to tame it, understand it, and find some interesting insights. It’s called #TidyTuesday and comes with a fresh dataset, background information, and article that makes use of the data. This past Tuesday the data just so happened to pertain to Animal Crossing!

The “Tidy” aspect of “TidyTuesday” refers to ‘cleaning’ or ‘tidying’ the data. In R Programming, there’s a package called the “Tidyverse” that comes with many tools and enhanced functions great for tidying data. Why on Tuesday? I’m not sure, but it does add a spark of excitement to this day of the week for me. #TidyFriday or #TidyThursday wouldn’t sound bad, either.

This week’s Animal Crossing data included stats on all of the villagers (computer characters) on the island and items that can be bought/sold. There’s a plethora of information on each, like a villager’s personality type, birthday, and theme song! I decided to practice my R data wrangling skills while answering curiosities that popped into my mind as I examined this data.

(Check out my github site for the full analysis & R code, which I’ll update every Tuesday as I’m able!)

Items Analysis

# Analysis 1: Which items can be bought with ‘miles’ and then sold for bells?
# Answer: 19 “Nook Inc” items from the Nook Miles System. Not the e-reader Nook, from the Tanuki character Tom Nook.

items_miles % filter(buy_currency == "miles" & sell_currency == "bells")
view(items_miles)

# Analysis 2: Which ‘items’ give the highest profit (sell minus buy value)?
# Answer: None. All items have a sell_value < buy_value.
I guess you can’t sell things for more than you bought them for…

# Analysis 3: Which items have a greatest difference between buy and sell value?
# Answer: Royal Crown, Crown, Gold Armor, Golden Casket (?!?), Grand Piano

items_bells % filter(buy_currency == "bells" & sell_currency == "bells")
items_bells %>% filter(sell_value > buy_value)
# Create value difference column and add to items_bells table
value_dif <- items_bells$buy_value - items_bells$sell_value items_bells$value_dif % top_n(wt = value_dif, n = 5) # Done!! Expand Console width to view last col. The trick was to create the value_dif in a new column and add it to the table. top_n(items_bells %>% filter(buy_value > sell_value), n = 5)

My question is…is a Golden Casket really what I think it is? Who would use this?? (I can only think of one person, who is rumored to have a golden toilet).

…oh! It’s real! (The casket is in the lower left corner, I believe.)

Golden Casket 1

# Analysis 4: Which category of items is the most expensive?
# Furniture, Hats, Bugs
# Cheapest: Flowers, Fruit, Photos, Socks, Tools

items_bells %>%
group_by(category) %>%
dplyr::summarize_at(vars(buy_value, sell_value), funs(mean(., na.rm = TRUE)))

Villagers Analysis

# Analysis 5: Are there more male or female characters?
# Answer: 187 Females, 204 males! Slightly more males!
# count() does group_by() + tally() https://dplyr.tidyverse.org/reference/tally.html
villagers %>%
count(gender)

# Analysis 6: Which Personality Types are the most common?
# Answer: Lazy, Normal, Cranky/Jock/Snooty, Peppy

# Analysis 6b:Are there any personality types with only 1 character?
# Nope! But the least common type is ‘uchi’ (a translation of “sisterly/big sister” in Japanese).

villagers %>%
count(personality) %>%
arrange(desc(n))

Uchi

To be honest…Agnes the black pig scares me…

# Analysis 7: Who has their own song?
# Four special villagers have their own song: Angus, Diva, Gwen, and Zell.

villagers %>%
add_count(song) %>%
filter(n == 1)

# Analysis 8: Who has a birthday today (5/6)?
# Answer: Tank the Rhino! He looks cuter than I imagined from his name.

villagers %>%
filter(birthday == "5-6")

Tank Rhino

Crazy Impoverished Asians

Who knew that the most impoverished minority race group in King County (containing Seattle) are Asians? The number of Asians living under the federal poverty line is significantly greater than Latinos, Blacks, and Native Americans. I was surprised when looking at Census Bureau data from 2015-2018. Hover over and interact with this Tableau visualization or “viz” to see the numbers.

At the two Seattle Food Banks I visited, I noticed there were a considerable number of elderly Chinese folks. However, I found this result to be surprising!

Click on the preview image below to access the Tableau viz and stats.

As of now, WordPress doesn’t let fully embedded Tableau graphics onto posts.
WordPress, please get on this!