Programming Fundamentals Part 3: Storing Information
This article series is based on rough drafts of what I intend to eventually turn into a series of lectures and course ware for my brogrammers and siscripters out there. Feedback is welcome, and if it proves useful, I would be happy to list you as a contributor.
2020 UPDATE: I have put together a course for Java which introduces the concepts I have described in these articles, but in greater depth and clarity. If you like my writing, I think you will love my video lectures:
Working Class Java: A Beginner’s Guide To OOP & Software Architecture Udemy Link| Skillshare Link w/ Free Trial
Contents
1. What Is A Program? — A set of instructions to be executed by an Information Processing System
2. The Problem Domain — How to design a program/application
3. Storing Information — How to Model Information (data) in an Information Processing System.
4. Logic And Errors — The two (primary) types of logic in an Information Processing System; how to handle errors properly
5. Separation Of Concerns — The most important Software Architecture principle I have ever come across
6. Proving Programs With Tests — An explanation of the theory, practice, and benefits of testing your software, and applying Test Driven Development
The problem with explaining things is that it can be very difficult to discuss or define one thing, without using the definitions of other things which the student may not be aware of. One solution to this problem, is to attempt to define new things only with reference to things which the average person (who does not necessarily have any experience with the thing) is likely to be familiar with. Another way of explaining something in a way which does not necessitate as much assumed knowledge and technical definitions, is to show an example of the thing itself, and make observations about it.
In Part 1, I did my best to explain what I thought information looks like, without attempting to explain explicitly what I think it is. This is because, as far as I can see, the only means by which I can define information is by sharing information which I have learned and deduced about information.
Since this article series is about writing programs effectively, I will stop trying to explain what I think information truly means in a general philosophical sense, and if you have made it this far, I assure you that the rest of this article will read less like a Zen Koan:
“When both hands are clapped a sound is produced; listen to the sound of one hand clapping.”
What Is Information (as Far as a Computer is Concerned)?
“Messages sent or received which reduce uncertainty.”
*See Notes 1 and 2 for citations of the above definition.
This section may be a bit technical for beginners, but please continue on to the next section even if this one does not seem clear.
For the purposes of writing programs on present era computers, information should predominately be thought of as streams of electrical impulses (which are a representation in physical reality, of “0” and “1” or “On” and “Off”). Since human beings only recently discovered what the hell electricity even is, I would suggest thinking of “messages” in the above definition to mean collections of 0s and 1s being “fired” to and from the various hardware components of the system.
Not only are 0s and 1s a slightly less alien concept than digital circuits, but in doing so, we can take advantage of mathematics in order to reason about these messages in the same way that an engineer uses physical equations to predict how far a cannonball will fire, or the load bearing capacity of a bridge.
“Mathematics is a language plus reasoning; it is like a language plus logic. Mathematics is a tool for reasoning.”
- Richard P. Feynman, The Character of Physical Law
Moving even further from electrical impulses (I brought up that point just to briefly mention how things work under the hood), we will observe and discuss what is the kind of language which is even more suited to human eyes than collections of 0s and 1s: Programming Language.
What Is Information (as Far as a Programmer is Concerned)?
The following code block is pseudo-code which is intended to look a bit like any programming language, but not exactly like any real programming language. We call it pseudo-code:
Thing Note (Integer: numberOfCharactersString: noteContentsString: creationDate)
For those uninitiated (which is fine!), “Integer”, or “Int”/“int” as it is commonly abbreviated in most languages, is an instruction which indicates to the computer that we wish to store a non-decimal number in memory space. String refers to a sequence of characters like “ABCDEF” or “Hello World.”
If you have ever written a significant program in a high-level programming language, there is a very good chance that you have written something like the above. It probably used the word “class/object/struct” instead of “Thing”, but I deliberately used a more general synonym of those words. It also likely had some instructions called variables or fields that had names which were intended for humans to read (such as “creationDate”). These human-readable names also happen to be almost entirely arbitrary to the computer itself.
Supposing that my pseudo-code was to actually capable of being compiled (a process which involves translating these human-readable instructions into machine readable instructions), let us consider what we are telling the computer to do:
Thing Note (...)
Things (usually called classes), are the way in which we instruct a computer to build a definition out of conceptually related information.
Refer to part 2 on Problem Domains if the term “conceptually related” information is unclear).
As far as the information itself is concerned, it is very important to also tell the computer if the information is a:
- Value: Something which does not change during the course of the program’s execution, the constant π (think immutable, final, constant, static)
- Variable: Something is likely to change during the course of the program’s execution, such as the current system time in milliseconds
If us programmers are smart enough to tell the computer to do so, the computer typically has special places in memory space to store Values, which results in more efficient programs, and by extension, a better user experience! Anyways, we will learn all about values and variables shortly; for now, we will focus on the “Things.”
Since I made the point earlier that things like “Note” or “creationDate” do not hold the same significance to a computer than a person, you may wonder why we even bother with defining “Things” instead of just defining the values and variables) separately, omitting the “Thing Note(…)” like so:
Integer: numberOfCharactersString: noteContentsString: creationDate
Technically, we have not lost information by omitting not using a “Thing”, but it turns out that there are many good reasons for grouping conceptually related information together as “Things”, that directly benefit both programmers and computers alike:
Humans do not think primarily in random collections of numbers and characters; but we do think in objects especially well (especially visual models of objects). It also allows a programmer to look at code which he/she has never seen before, and understand very quickly what the program does without needing to look at large collections of random numbers and characters, or how they relate to one another. This is only possible if the writer of the program makes a special effort to pick good names and make good choices for what to put in a given “Thing” mind you.
If we never grouped conceptually related information together, the computer would be left to store this information in an unstructured and almost certainly inefficient manner. During run time, the computer generates what I will call “virtual space”, or “memory space”, based on a program’s instructions.
In order to explain what I mean here in an intelligible way, please imagine yourself entering the biggest department store you have ever been to (such as Walmart or Costco if you are from NA). If we pretend that this store is actually the memory space of a program, then a well-ordered store would group conceptually related items in to “sections” and “departments”. If I wish to purchase Eggs, then I would start by locating the grocery section of the store, and then the refrigerated department of that section.
Not only does it mean that I have an easier time locating what I want, but there is a more probabilistic and technical benefit to this grouping process. If the first item a shopper selects is “Eggs”, there is a far higher likelihood that I will also pick up “Cheese” (or other food items) during this trip to the store, versus the likelihood that the next item I will pick up is “Motor Oil”. Knowing this, if I wanted to create the most efficient shopping experience in terms of footsteps taken per shopping trip, I would design my store based on what my shoppers are most likely going to purchase together.
Let us imagine now, that we enter a department store with the same items, except that no departments, sections, or groups of a kind exist whatsoever. Upon searching for “Eggs”, we find one brand at the very front of the store beside some “Motor Oil,” but we are looking for the farm fresh brand as we are concerned about nutrient profile and ethics.
Over the course of 12 hours, we eventually run into an employee who holds a list the 1, 000,000 different products available in the store and their locations. It turns out that the brand of eggs we are looking for happens to be located in the store manager’s filing cabinet, second drawer. However, our employee regretfully explains that a stack of 37 Flat Screen TVs is currently obstructing the path to the manager’s office, so we would need to scale it in order to get there and back.
Grouping conceptually related information in “Things”, actually helps the computer build a more efficient memory space in the same sense of my analogy. Of course, a computer does not traverse memory space by walking, but it does traverse memory space in order to find the information it requires. By traversing, I pretty much mean looking at sequences of 0s and 1s from start to finish.
Although there are cases where we only want the computer to retrieve “noteContents” without being concerned about “creationDate”, there is still a high likelihood that if I look up “noteContents” for any reason during run time, I will shortly thereafter also look up “creationDate” (not always, but I care about the probability!). Knowing this, we can increase the efficiency of our information storage by keeping these values and variables beside each other in memory space. When I say beside each other, I literally mean that in a spatial sense like blocks in Minecraft:
Summary
As a programmer, you will almost invariably find yourself instructing the computer to create “Things” which encapsulate (separate) information, as well as functions (which we will discuss in Part 4: Logic And Errors).
The purpose of this article was to overview how a computer system stores information, and the necessity of telling the computer how to store this information in an efficient way.
Not only can we directly and measurably increase the performance of our programs by using “Things” (which indicate to the computer that the information the “Thing” contains should be located nearby in memory space, as opposed to random distribution in memory space), but there is also great benefit in doing so for the humans that must read the programs.
If you would like to see some direct examples of these kinds of “Things” in an actual program, here are some links to some of my open source learning projects of mine (the language happens to be Kotlin):
SpaceNotes “Note Thing”
KotlinCalculator “Mathematical Operator Thing”
Notes:
- According to MITx course 6.004.1x: Computation Structures 1: Digital Circuits: Information is defined as “Data communicated or received which resolves uncertainty about a particular fact or circumstance.”
- According to Steven Pinker, Enlightenment Now, Page 19: “Information may be thought of as a reduction in entropy.”
I was hoping to discuss these two definitions within this article as they are very important, but there is simply too much philosophy for me to unpack in what is supposed are intended to be practical explanations of designing programs in a good way. Nevertheless, it is worth taking some time to consider that the word information points to something which is a first order characteristic of physical reality (read: it is kind of a big deal).
Support
Follow the wiseAss Community:
https://www.instagram.com/wiseassbrand/
https://www.facebook.com/wiseassblog/
https://twitter.com/wiseass301
http://wiseassblog.com/
https://www.linkedin.com/in/ryan-kay-808388114
Consider donating if you learned something:
https://www.paypal.me/ryanmkay