ChatGPT Vision vs Real World

ChatGPT Vision vs Real World

Dr. John Wallin, Director of Computational and Data Science and Professor of Astronomy

Introduction

I finally got access to the Vision options within ChatGPT.   This is a blog post announcing its roll out.

https://openai.com/blog/chatgpt-can-now-see-hear-and-speak

This feature was enabled on my phone yesterday, so I gave it a series of challenges to see how well it works.   The results were impressive, and a bit disturbing.   Each section represents a picture of something and a single-line prompt to explain it.  The answers were written by GPT.

ChatGPT had correct answers for each prompt.  It has a harder time with the ancient Greek transcription and had some errors in the characters it presented.  However, I have spent the last 9 years working on this as a research problem developing a highly trained set of neural networks in collaboration with my graduate student Matthew and others.  ChatGPT did the same tasks without any additional information when traditional OCR models are basically useless with this kind of damaged manuscript.  The artistic interpretation of the diagram was also shocking.  It figured out the visual language being used when I could not figure out what it was doing.

Each case has a title, a short description of the image context that was not shared with ChatGPT, the image, the short prompt I typed, and the results from ChatGPT explaining what it saw.

Table of Contents

Introduction

Case Studies

Understanding the purpose of a circuit board - Note: the board name has been inked out so it can't cheat

Solving problems from Physics 4310 based on a picture of the homework

Understanding a circuit program

Interpreting Ingredients from a cereal box

Reading a technical diagram

Reading an Ancient Greek Papyri

Interpreting a Highly Artistic Image of a Scientific Diagram

Conclusions

 

Case Studies

Understanding the purpose of a circuit board   - Note: the board name has been inked out so it can’t cheat.

Context not shared with ChatGPT:  I found this image on a website and then make sure the name of the circuit board was not visible. 

Circuit board.jpg

PROMPT: What does this do?

The image you provided shows a printed circuit board (PCB) with various electronic components. Based on the visible components and labels, here's an analysis:

  1. Capacitors: The large white cylindrical components labeled "2.0 MFD ±10% 50 VDC" are capacitors. They store and release electrical energy. Their presence can indicate several functionalities like filtering, decoupling, or timing elements.
  2. Resistors: The colorful striped components are resistors. They limit or set the current flow through a particular path in a circuit.
  3. Integrated Circuits (ICs): There are two ICs on the board. These could be operational amplifiers, microcontrollers, or some other kind of specialized ICs. Without the exact part numbers or a clearer view, it's hard to determine their exact functionality.
  4. Diodes: I see at least one component that looks like a diode (black cylindrical component with a silver line). Diodes allow current to flow in only one direction and are often used for rectification, protection, or indication.
  5. Jumpers/Headers: Labeled as "Int+", "Int-", "Output", etc., these could be connections for inputs, outputs, or configuration settings.
  6. Potentiometers: Not clearly visible, but the "Bias", "Kp", "Ki", and "Kd" markings may indicate adjustable potentiometers or placeholders for them. These terms are related to Proportional-Integral-Derivative (PID) control, which is a method used in control systems to make a system behave in a desired manner. "Kp", "Ki", and "Kd" stand for the proportional, integral, and derivative gains, respectively.

From the visible components and markings, I'd hypothesize this circuit has something to do with control systems, possibly a PID controller or some signal processing related to it. The presence of terms related to PID control combined with the ICs suggests it might be processing input signals and generating a control output. However, to confirm the exact functionality, a schematic or more detailed information would be necessary.

Note: This is a broad analysis based on the visual information provided, and some components or connections might not be visible in the image. If you can provide more context or a clearer image, I might be able to give a more accurate assessment.

 

Solving problems from Physics 4310 based on a picture of the homework.

Context not shared with ChatGPT:  I took this image with my phone from a PHYS 4310 (E&M) homework assignment.  I had previous had it solve part A with a similar image.

Physics 4310.jpg

PROMPT:  Solve this problem

sure john.png

 

Understanding a circuit diagram

Context not shared with ChatGPT:  This is an circuit diagram of the circuit board shown above.  I shared this with ChatGPT in a separate thread, so it  couldn’t use the previous context to understand the image.

Picture5.jpg

PROMPT: What does this do?

The schematic you've provided appears to be an analog circuit, specifically an implementation of a Proportional-Integral-Derivative (PID) controller using operational amplifiers (op-amps, labeled as TL084). Let's break down its components and functionality:

  1. Proportional (P) Component: This is represented by the op-amp labeled "prop". It takes the difference between the input signal and a reference (usually set to 0V or ground for simplicity), amplifying it. The amplification factor is set by the resistors around it. This provides an output that's directly proportional to the error.
  2. Integral (I) Component: This is represented by the op-amp labeled "int". It integrates the error over time, essentially providing a measure of the accumulated error. If a system error has persisted for an extended period, the integral term builds up (increases), and the controller responds by changing the control input in relation to a consistent error magnitude. The capacitor Cint is essential for the integration action.
  3. Derivative (D) Component: This is represented by the op-amp labeled "der". It predicts system behavior and thus provides a control output to counteract the rate of error change. The main purpose of the derivative term is to provide damping or shaping of the controller's response. The capacitor Cder is responsible for the differentiation action.
  4. Summing: The outputs from the P, I, and D components are summed together to form the final output (Vout). This summed signal is the control signal that is sent to whatever system is being controlled.
  5. Biasing and Protection: There are various other components in the circuit, like diodes (e.g., D1), which can be for protection or to introduce non-linear behavior, and resistors like R_bias, which might be used for biasing purposes.
  6. Power Supply: The op-amps require positive (+V) and negative (-V) power supplies. These power rails ensure the op-amps can output the necessary voltage levels for the system they're controlling.

In summary, this circuit is a PID controller implemented in analog electronics. PID controllers are used in various applications to control processes, ensuring they stay at a set point or follow a desired trajectory. The PID controller works by adjusting its output based on the difference between a desired set point (reference) and the actual system state (input).

 

Interpreting Ingredients from a cereal box

Context not shared with ChatGPT:  I took this picture of cereal box at breakfast this morning. 

Picture6.jpg

PROMPT: Is this healthy?

The term "healthy" can be subjective, but let's break down the nutritional information of this product to provide an informed opinion:

  1. Calories: A serving without milk provides 150 calories, and with skim milk, it's 210 calories. This amount of calories is typical for many breakfast cereals.
  2. Fats: The cereal contains no saturated, trans, polyunsaturated, or monounsaturated fats. This is a positive aspect as saturated and trans fats are often linked to heart diseases.
  3. Cholesterol: There's no cholesterol in the cereal alone, but there is a minimal amount when consumed with milk.
  4. Sodium: The cereal has 300mg of sodium per serving, which is 13% of the Daily Value. A diet high in sodium can contribute to high blood pressure, so it's essential to monitor total daily intake.
  5. Carbohydrates: There are 36g of carbohydrates in the cereal alone. Of those, 3g come from dietary fiber and 14g from sugars. A diet rich in fiber can aid in digestion, while excessive sugar consumption can lead to various health issues, including weight gain and increased risk of heart disease.
  6. Proteins: The protein content is 3g without milk and 9g with milk. This isn't a significant source of protein compared to other foods.
  7. Vitamins and Minerals: The cereal seems to be fortified with various vitamins and minerals, including Vitamin D, iron, potassium, thiamin, riboflavin, niacin, vitamin B6, folate, and vitamin B12. These are beneficial for overall health.
  8. Ingredients: The primary ingredients are milled corn, sugar, and malt flavor. It's worth noting that sugar is a prominent ingredient, which could be a concern if one is trying to limit sugar intake.

In conclusion, this cereal has some nutritional benefits due to its vitamins and minerals, and the lack of unhealthy fats. However, its relatively high sugar content and moderate sodium level are areas of concern. Like many products, consuming this in moderation and as part of a balanced diet is key. If you have specific health goals or concerns, it's always best to consult with a nutritionist or healthcare professional.

 

Reading a technical diagram

Context not shared with ChatGPT:  This diagram was taken from a paper I am writing on our Augmented Reality system to teach STEM classes.   The diagram was generated by my grad student Isaac yesterday, so this could not have been in its training set.

Picture7.jpg

PROMPT: Explain this to me

The image you've provided appears to be a flowchart outlining the execution flow for some sort of virtual lab environment. Let me break down its main components:

  1. User Login and Lab Placement: This seems to be the starting point for users. They would first log in, and based on certain criteria or selections, they are placed in a specific lab.
  2. Server Interactions:
    • Ask for Lab Files: Once logged in, the system communicates with the server to request the necessary files for the lab.
    • Lab Assets: These are the components that make up the lab environment, such as multimedia files, 3D models, or any other assets.
    • Upload Logs: It seems the user's interactions or activities within the lab are logged and then uploaded to the server. This could be for tracking progress, errors, or for analytical purposes.
    • Build Lab: After retrieving the necessary assets, the server initiates the building or rendering of the lab for the user.
    • Return to Lab, Select, or Exit: After completing a task or module, the user has the option to return to the main lab, select a different lab, or exit the environment.
  1. Lab Construction:
    • Modules: Each lab is composed of multiple modules. In the diagram, these are represented as 'Module 1' and 'Module X', indicating that there can be any number of modules in between. Each module seems to define a specific activity.
    • Clips: Within each module, there are 'clips'. These clips might represent specific tasks, instructions, or components of the module. They seem to contain audio instructions, 3D models, and interactive elements. The clips appear to guide the user through specific changes or interactions in the scene.

In essence, this flowchart provides a high-level view of how a user interacts with a virtual lab system, from login to selecting and navigating through lab modules, and how the server supports this interaction by providing necessary assets, building the lab environment, and tracking user activity.

Would you like a more detailed analysis or have any specific questions about any components?

 

Reading an Ancient Greek Papyri

Context not shared with ChatGPT:  This image was again taken from some of my research.  With collaborators and my grad student Matthew, we created and trained neural networks to do this work.   It just read the damn thing with no help.  It didn’t do nearly as good of a job as our software, but normal OCR program just choke on this kind of problem.

Picture8.png

 

PROMPT: What does this say?

This appears to be an ancient or medieval fragment, possibly written in Greek given the style of the characters. From the image provided, deciphering the exact content is challenging due to the deteriorated condition and fragmented nature of the text. However, I can attempt to identify some of the visible words or phrases based on the Greek letters present:

  • The characters "ΔΙΟΣ" can be seen, which might refer to "Δίος" (Dios), meaning "of Zeus."
  • The characters "ΛΟΓΟΣ" might be present, translating to "Logos," which can mean "word," "reason," or "argument," among other definitions.
  • I can also spot characters that might be part of the word "ΑΝΘΡΩΠΟΣ," meaning "human" or "man."

It's important to note that without a complete and clearer image, and without context, the exact meaning and interpretation of this fragment are speculative. If you want a precise translation and interpretation, consulting with a specialist in ancient Greek texts would be essential. If you're interested in using AI to decipher such texts, it might be an exciting area to integrate your expertise in neural networks and crowd-sourced data sets.

PROMPT:  Can you transcribe as many characters as possible. Use a . for characters where data is missing or it isn't clear enough to read.

Of course, John. I'll do my best to transcribe the visible characters, using a "." for those that are missing or unclear:

..Π..Π...ΔΙ...

..ΛΟ.ΓΟΥΣ..

...Ε...ΣΩ...

..ΓΚΟΡΟΣ..

ΠΟΛΥΤ...Ο...

..ΝΤΕΡΥ...

ΑΝΤΙΟΝ...

ΡΙΟΥΚΙ..Please note that the transcription is highly tentative due to the fragmentary and deteriorated nature of the text. Some of the characters might not align perfectly with the original intent of the writer. If you have any specific goals with this transcription, I would recommend consulting with an expert in ancient Greek epigraphy or paleography to ensure accuracy

 

Interpreting a Highly Artistic Image of a Scientific Diagram

Context that was not shared with ChatGPT.   This diagram was generated by Dalle-3 as an artistic version of a Hertzsprung-Russell diagram.  It is simply banana. I asked the generator to “go wild”, and it came up with this.  I couldn’t understand what the heck was trying to do with this image, but it could explain what its other artistic brain created.

Picture9.png

PROMPT: Can you interpret this highly artistic version of this scientific diagram? What is the underlying idea and what scientific chart may have inspired it?

Certainly, John.

The artwork is a vivid and imaginative representation of some aspect of the cosmos, presented as if it were a theater performance. Given the elements in the image and their arrangement, the art seems to be drawing from astronomical themes, potentially inspired by a Hertzsprung-Russell (HR) diagram, or some other celestial classification system.

Here's an interpretation:

  1. Theater Setting: The cosmos is being presented as a theater show, suggesting the grand spectacle of the universe, with its various elements playing their respective roles in the grand drama of existence.
  2. Planets and Stars: The central stage has representations of various celestial bodies, from planets to stars. This could be indicative of the lifecycle of stars and planets or their relative positions and significance.
  3. Mystical Creatures and Labels: Creatures like dragons, a large-mouthed beast, and other entities can be seen hovering above specific celestial objects. These creatures, combined with the labels, might be trying to represent specific astronomical or mythological concepts. The labels, such as "White Drars", "Star", "Gadts", and "Superions", could represent types or classifications of stars or celestial phenomena, though they don't correspond directly to standard astronomical terminology.
  4. Audience: The audience, attentively watching the show, might represent us, the observers on Earth, trying to understand the vastness and complexity of the universe from our limited perspective.
  5. Stage Characters: The figures on stage, dressed in elaborate robes, could be symbolic of guiding forces or entities that play a role in the cosmic dance. They might also symbolize constellations or groupings of stars.
  6. Hertzsprung-Russell (HR) Diagram Inspiration: The placement and categorization of various stars and celestial objects bring to mind the HR diagram, which classifies stars based on their absolute magnitude and spectral type. In an HR diagram, stars are categorized from hot to cool (usually on the x-axis) and from more luminous to less luminous (usually on the y-axis). The imaginative categories in this artwork could be an abstract representation of that.

In essence, the image seems to be an artistic blending of the science of astronomy with mythological and imaginative elements, presenting the universe as a grand theater for us to observe and understand. The underlying idea might be to highlight the beauty, complexity, and interrelation of various celestial elements while emphasizing our role as observers trying to make sense of it all.

 

Conclusions

Dang – this is powerful.  I have no idea what the implications of this are for society or for higher education.  I am not often speechless, but this just floored me.