Using SparkAI

SparkAI is now available for University staff to test and explore how they can harness the power of large language models. But just how does one get started?

The SparkAI logo, a superscript yellow asterisk before "Spark AI." in a white serif font on a background of University blue.

If you're entirely new to the subject of large language models, you may wish to start at our earlier article on using ChatGPT to get a feel for what large language models in the context of teaching and learning are useful for and flag some things that they aren't very good with. As that article was written a while ago, there have been some improvements however the general principles are still the same - great for generating text, it still cannot be creative, and never trust it entirely.

Why would you use Spark AI instead of ChatGPT?

Importantly, the key difference between the standard online ChatGPT and Spark AI is how the data is managed. You can enter sensitive data that will not be used to train the OpenAI ChatGPT model (as the data is securely stored and does not leave Australia, for a start). This means you can save research data or student responses as a private data set (more on that later) and query it using generative AI capabilities and not worry about the data being released outside of the University's scope.

If you have already requested (and been granted) access, you're now wondering what to do when you open Spark AI and are faced with a big blue screen full of nothing. Well, let's start with something easy.

> In excel, how do I write a formula that checks one column has the value 2017 and another is not blank and get a count for the whole range?
To count the number of cells in one column that have the value "2017" and another column that is not blank, you can use the COUNTIFS function in Excel. Here's how you can write the formula: =COUNTIFS(A:A,2017,B:B,"<>") In this formula, "A:A" represents the range of cells in the first column, "2017" is the value you want to check for in the first column, "B:B" represents the range of cells in the second column, and "<>" checks for cells that are not blank in the second column. Make sure to adjust the column references (A:A and B:B) to match your actual data range.

> In excel, how do I write a formula that checks one column has the value 2017 and another is not blank and get a count for the whole range?

To count the number of cells in one column that have the value "2017" and another column that is not blank, you can use the COUNTIFS function in Excel. Here's how you can write the formula:

=COUNTIFS(A:A,2017,B:B,"<>")

In this formula, "A:A" represents the range of cells in the first column, "2017" is the value you want to check for in the first column, "B:B" represents the range of cells in the second column, and "<>" checks for cells that are not blank in the second column.

Make sure to adjust the column references (A:A and B:B) to match your actual data range.

Okay, complex formulae in Excel, check! (I tested that on some data that I had open, and it does work.)

Let's have a look at some more useful applications outside of the general chatbot features, but first we'll have to bring in some data to do so.

NB: History is always saved in Spark AI. You can come back to your previous queries and projects, and they will still have all the previous inputs and outputs that were there from your last login.

Adding your own data sets

If you have some data that you want to analyse, you can upload it securely to Spark AI by selecting Data Sets from the buttons at the top of the page. Here you will be asked to fill out some metadata before uploading your data set. All of the settings have info icons that tell you what they do, but you can leave the defaults as they are and only start changing those once you're more familiar with the outputs and what you might want to modify based on your needs.

You can upload multiple files to create a data set. Select all that you want to add on the second screen and Upload them and then Process them to finalise the data set. Depending on the size and number of files, this can take some time. In good news, you can navigate away from the data sets page while it's processing and it won't disappear or have to be restarted.

As an example, you could take transcripts of your lectures (mediaspace will do this automatically, otherwise there are tools available to do so - note however that there may be privacy and intellectual property concerns if you use a tool external to the University), and then perform a thematic analysis for each lecture, or generate a list of learning outcomes for each individually and all as a unit.

Creating your own projects

If you want to use different data sets for multiple purposes but keep the outputs separate, you might want to create different projects to house each line of enquiry. For example, you might have some student results for a whole semester, and wish to query this for assignment grades versus final result, but have a separate project for how to modify assessment tasks to better align with your curriculum and learning objectives.

NB: Tables are handled a bit strangely - these models are far better for textual analysis. Remember, it's a large language model!)

Click on the Add project button, give it the required metadata, and select your data set. Note that you can only have one data set, but this can contain multiple individual files. Note the temperature slider - if you set this to 0, it will not get creative; this is very important if you're doing data analysis tasks, but may be more useful to set at 1 if your project is to think of interesting ways to change assessment for the subject. The info buttons next to each setting gives a quick overview of what it does, but again we can probably stick with the defaults in most cases until we know what is going to work best (experience is the best teacher with these models).

Choose OpenAI ChatGPT 4 Turbo 128k (preview) as the model to use. I have no idea why you would not use the best and largest model when you have a choice to do so.

Querying your project

Now that you have data and a project to house it, you can start interrogating it. As an example, I took a 2nd-year Physiology subject's transcripts (with permission) and used that as the data set. Here are some results:

> What are the learning outcomes for this subject based on the text of these lectures?
The provided context data does not explicitly list the learning outcomes for the subject. Instead, it discusses the structure of the course, the teaching approach, the format of assessments, and the resources available to students. The lecturer emphasizes understanding how different systems in the body work together, the integration of systems, and the application of knowledge rather than rote memorization. The assessments are open book and open collaboration, with a focus on students expressing their understanding in their own words. Additionally, the lecturer mentions the use of bespoke Roslyn modules for certain topics in the subject. To summarize, while the learning outcomes are not directly stated, the context suggests that the outcomes would likely involve students developing a deep understanding of physiology, the ability to integrate knowledge across different systems of the body, and the application of this knowledge in a practical, open-resource environment.

> What are the learning outcomes for this subject based on the text of these lectures?

The provided context data does not explicitly list the learning outcomes for the subject. Instead, it discusses the structure of the course, the teaching approach, the format of assessments, and the resources available to students. The lecturer emphasizes understanding how different systems in the body work together, the integration of systems, and the application of knowledge rather than rote memorization. The assessments are open book and open collaboration, with a focus on students expressing their understanding in their own words. Additionally, the lecturer mentions the use of bespoke Roslyn modules for certain topics in the subject.

To summarize, while the learning outcomes are not directly stated, the context suggests that the outcomes would likely involve students developing a deep understanding of physiology, the ability to integrate knowledge across different systems of the body, and the application of this knowledge in a practical, open-resource environment.

Hrmm, okay well that's not really what I wanted. Let's ask more directly:

> Based on your summary, generate 5 learning outcomes that match the lecture content.
Based on the lecture content provided, here are five potential learning outcomes for the subject: 1. Understand the fundamental concepts of physiology, including cellular physiology and the functioning of neurons. 2. Comprehend the integration of different systems in the body and how they work together to maintain life. 3. Apply physiological knowledge in practical scenarios, using open-book resources to demonstrate understanding rather than memorization. 4. Engage in peer-to-peer learning activities to enhance comprehension and retention of material, including explaining concepts to others and participating in workshops. 5. Develop problem-solving skills through application and integration-based activities, preparing for assessments that require a deeper level of understanding.

> Based on your summary, generate 5 learning outcomes that match the lecture content.

Based on the lecture content provided, here are five potential learning outcomes for the subject:
1. Understand the fundamental concepts of physiology, including cellular physiology and the functioning of neurons.
2. Comprehend the integration of different systems in the body and how they work together to maintain life.
3. Apply physiological knowledge in practical scenarios, using open-book resources to demonstrate understanding rather than memorization.
4. Engage in peer-to-peer learning activities to enhance comprehension and retention of material, including explaining concepts to others and participating in workshops.
5. Develop problem-solving skills through application and integration-based activities, preparing for assessments that require a deeper level of understanding.

Not bad. How does that compare to those listed in the handbook?

Intended learning outcomes
On completion of this subject, students should be able to:
- Demonstrate a depth of knowledge in human physiology
- Integrate their understanding across multiple physiological systems and how they interact
- Apply their understanding of physiological systems to novel scenarios
- Communicate their understanding to others in a meaningful and impactful way
Generic skills
- Students should develop and enhance skills related to problem solving, integration, and application of complex topics and processes.

That's pretty spot-on! The only point missing is the 'depth of knowledge' in the subject, which was noted in the summary in the first exchange. This also demonstrates how you can interact with it as you would a colleague, discussing a topic and asking for clarification, further insights, new ideas etc.

You could also do this for each content block of the subject, getting individual ILOs for each one which you can then compare across subjects/degrees/universities, for example to use in determining prior learning when a student transfers to or from the university.

So now what?

An excellent question, but that's really up to you! You can now take some data, create a project, and query that data as though you had an assistant. It will obviously depend on what data you have available, and what you need to develop, change, analyse or transform.

You could also use the assistant to suggest ideas - I tried this with the dataset used in the example above, and one suggestion was to create infographics. That's something I hadn't thought of before, so asked it to make one. Note: Spark AI does not create images (that would be Stable Diffusion, Midjourney, DALL·E, etc), but it did create the outline for an infographic with notes on how to fill it in, e.g.:

Heat-Related Issues:

- Statistics on heat-related deaths, comparing natural disasters to heat stroke.

- Icons representing vulnerable groups: the elderly, young children, and athletes.

Comparative Physiology:

- Side-by-side images of a camel with a hump and a polar bear with insulating fat.

- Explanation of high surface area to volume ratio in animals adapted to heat.

If you have SES data, interview transcripts from focus groups, or other text-based responses, you could do a thematic analysis, then use the output to generate ideas as to how to modify your teaching or assessment strategy to e.g. improve student outcomes, discover challenges that students are facing, or understand external factors that may impact outcomes.

There's a world of opportunity to explore in text-based analysis of whatever data you might have. Remember that you can 'converse' with the language models, and even ask for ideas if you're out of them. Try using the chat function as an assistant, getting it to do some of the tedious items on your todo list. Create some data sets and set up projects to see what new insights you can derive from them. And most importantly, explore! The more familiar you become with these tools, the more you can get out of them.

26 Apr 2024

More Information

Jairus Bowne

digitallearninghub-info@unimelb.edu.au