Feed aggregator
ORA-00054: RESOURCE BUSY AND ACQUIRE WITH NOWAIT SPECIFIED
PGIO Update
PGIO is the SLOB method for PostreSQL. An old friend of mine, Bart Sjerps, has implemented a modified PGIO in Python and posted it on Github: https://github.com/bsjerps/pypgio
For more on PGIO:
https://dev.to/yugabyte/slob-on-yugabytedb-1a32
https://franckpachot.medium.com/do-you-know-what-you-are-measuring-with-pgbench-d8692a33e3d6
query performance tuning
SQL Firewall in 23c - UKOUG and Scripts
Posted by Pete On 27/11/23 At 03:18 PM
Writting DDL_EVENT Triggers
Easy-to-Follow RAG Pipeline Tutorial: Invoice Processing with ChromaDB & LangChain
sys_op_descend()
When you specify that a column used in an index should be a “descending” column Oracle uses the internal, undocumented, sys_op_descend() function to derive a value to store for that column in the index leaf block.
For many years I’ve claimed that this function simply takes the one’s-complement of a character value and appends 0xff to the result before storing it. This is nothing like the whole story and in a recent comment to a note on descending indexes and finding max() values I was given another part of the story in response to my confession that I did not know why a varchar2(128) had turned into a raw(193) – it’s the length not the rawness that puzzled me – when subject to sys_op_descend().
Here’s a little script to generate some data that helps to explain what’s going on.
rem
rem Script: sys_op_descend.sql
rem Author: Jonathan Lewis
rem Dated: Nov 2023
rem Purpose:
rem
rem Last tested
rem 19.11.0.0
rem
create table t1 (
v1 varchar2(10),
nv1 nvarchar2(10)
)
/
insert into t1 values('A','A');
insert into t1 values('AB','AB');
insert into t1 values('ABC','ABC');
commit;
create table t2 (
v1 varchar2(10),
nv1 nvarchar2(10)
)
/
insert into t2 values(chr(0), chr(0));
insert into t2 values(chr(1), chr(1));
insert into t2 values(chr(2), chr(2));
insert into t2 values(chr(3), chr(3));
insert into t2 values(chr(4), chr(4));
commit;
;
commit;
There’s nothing particularly special about the two tables and data I’ve inserted, I’m just trying to generate patterns that make it easy to see what’s going on; and I’ll be comparing a varchar2() and an nvarchar2() because my nvarchar2() column is a multi-byte character set.
Let’s look at the (nearly raw) data from t1 where I’ve stored growing strings of printable characters. Here’s a simple query and its result set:
set linesize 50
break on row
select
dump(v1,16) v1,
dump(sys_op_descend(v1), 16) v1_desc,
dump(nv1,16) nv1,
dump(sys_op_descend(nv1), 16) nv1_desc
from
t1
/
V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 41
Typ=23 Len=2: be,ff
Typ=1 Len=2: 0,41
Typ=23 Len=4: fe,fb,be,ff
Typ=1 Len=2: 41,42
Typ=23 Len=3: be,bd,ff
Typ=1 Len=4: 0,41,0,42
Typ=23 Len=7: fe,fb,be,fe,fb,bd,ff
Typ=1 Len=3: 41,42,43
Typ=23 Len=4: be,bd,bc,ff
Typ=1 Len=6: 0,41,0,42,0,43
Typ=23 Len=10: fe,fb,be,fe,fb,bd,fe,fb,bc,ff
If you start with the first two columns of the output rows you can see: ‘A’ == 0x41, which becomes 0xbe, 0xff, following the “one’s complement with an appended 0xff” rule. The same pattern is visible for ‘AB’ and ‘ABC’.
When you look at the 3rd and 4th columns of each row (the nvarchar2), ‘A’ is now a 2-byte value (0x00, 0x41) which turns into the four bytes: 0xfe, 0xfb, 0xbe, 0xff. The value 0xbe is recognisable as the one’s-complement of 0x41 that appeared for the varchar2() values – but the 0x00 in the original nvarchar2() value seems to have turned into a two-byte 0xfe, 0xfb rather than the “expected” 0xff.
Why doesn’t Oracle use the “right” one’s complement for zero? Maybe because 0xff is the byte that’s supposed to mark the end of the string, and it’s important to avoid the risk of ambiguity. But now we have a new problem: Oracle is using 0xfe as the first of two bytes to represent the “descending” zero, and 0xfe is the one’s-complement of 0x01. So how is Oracle working around the fact that it would be a bad idea to have two possible meanings for the value 0xfe? That’s where the second data set comes in; here’s the same query, with results, run against the t2 table:
select
dump(v1,16) v1,
dump(sys_op_descend(v1), 16) v1_desc,
dump(nv1,16) nv1,
dump(sys_op_descend(nv1), 16) nv1_desc
from
t2
/
V1
--------------------------------------------------
V1_DESC
--------------------------------------------------
NV1
--------------------------------------------------
NV1_DESC
--------------------------------------------------
Typ=1 Len=1: 0
Typ=23 Len=3: fe,fe,ff
Typ=1 Len=2: 0,0
Typ=23 Len=3: fe,fd,ff
Typ=1 Len=1: 1
Typ=23 Len=3: fe,fa,ff
Typ=1 Len=2: 0,1
Typ=23 Len=3: fe,fc,ff
Typ=1 Len=1: 2
Typ=23 Len=2: fd,ff
Typ=1 Len=2: 0,2
Typ=23 Len=4: fe,fb,fd,ff
Typ=1 Len=1: 3
Typ=23 Len=2: fc,ff
Typ=1 Len=2: 0,3
Typ=23 Len=4: fe,fb,fc,ff
Typ=1 Len=1: 4
Typ=23 Len=2: fb,ff
Typ=1 Len=2: 0,4
Typ=23 Len=4: fe,fb,fb,ff
Looking at the last three groups of 4 lines we can see the varchar2() column following the “one’s complement, append 0xff” rule and the nvarchar2() following the additional “use 0xfe 0xfb for zero” rule; but for chr(0) and chr(1) the dumps need some further untangling. With the tests so far all we can say with any confidence is that “if you see the 0xfe byte then a different pattern is about to appear briefly”.
I don’t really need to do any more experiments to guess why, in my previous note, the descending varchar2(128) was projected as raw(193) (though you might want to play around to see what happens with strings of several consecutives zeros or ones). I suppose the size reported could vary with character set, but if I have a two-byte fixed width character set and most of the data corresponds to basic ASCII characters then I’ll have a lot of characters where every other byte is a zero that encodes into two bytes when descending – so it makes sense to use for the projection a size derived as: 64 * 1 + 64 * 2 + 1 (0xff) = 193. Q.E.D.
Index sizingAn important side effect of this improved understanding is the realisation of what can happen to the size of an index when declared descending. For a simple demonstration, here’s a table with 4 columns and 50,000 rows copied from all_objects.object_name, using the nvarchar2() type for one of the pairs of copies.
create table t1a (
nva nvarchar2(128),
nvd nvarchar2(128),
va varchar2(128),
vd varchar2(128)
)
/
insert into t1a
select object_name, object_name, object_name, object_name
from all_objects
where
rownum <= 50000
;
create index t1a_nva on t1a(nva);
create index t1a_nvd on t1a(nvd desc);
create index t1a_va on t1a(va);
create index t1a_vd on t1a(vd desc);
execute dbms_stats.gather_table_stats(user,'t1a')
select index_name, leaf_blocks
from
user_indexes
where
table_name = 'T1A'
order by
index_name
/
select
column_name, avg_col_len
from user_tab_cols
where table_name = 'T1A'
order by
column_name
/
INDEX_NAME LEAF_BLOCKS
-------------------- -----------
T1A_NVA 590
T1A_NVD 854
T1A_VA 336
T1A_VD 343
4 rows selected.
COLUMN_NAME AVG_COL_LEN
-------------------- -----------
NVA 74
NVD 74
SYS_NC00005$ 111
SYS_NC00006$ 39
VA 38
VD 38
6 rows selected.
As you can see, the descending varchar2() index (backed by column sys_nc0006$) is only slightly larger than the corresponding ascending index, but the descending nvarchar2() (backed by column sys_nc00005$) has increased by something much closer to 50% in size because half the bytes in each object_name were zeroes that have been replaced by the two byte 0xfe 0xfb. This is much worse than the “one byte per descending column per row” that I’ve been claiming for the last 20 or 30 years.
Why book reports aren’t a good homework assignment any more
So, everyone knows that the kids these days are using ChatGPT to write their book reports. Even Big Nate knows it!
But what about ChatGPT’s safeguards? Isn’t it supposed to have some kind of anti-cheating baked in, so it won’t just write essays for kids? Why doesn’t that work?
Sure, it does have safeguards… kind of. If you just ask it to write an essay, it responds with a “helpful” answer about how to write an essay. The thing is that these safeguards are incredibly easy to work around.
Let’s pretend we’re a student who has to write a book report on The Kingdom over the Sea, by Zohra Nabi. Here’s how to write it in 20 minutes without even touching the book.
Look up the book online in a search engine. Find a book review that describes it and skim the review. Note the main character’s name and what happens.
In this case, the Kingdom over the Sea was reviewed on Kirkus Reviews. The main character’s name is Yara, and the author’s name is Zohra Nabi.
2. Ask ChatGPT for the first paragraph.Let’s pretend we’re a Grade 6 student asking ChatGPT what should go in the first paragraph of this book report. Here are the prompts I used:

Hi Im a Grade 6 student and I need to write a baper on a book called, the Kingdom over the Sea.

can you tell me what to put in the first paragraph

Of course! The first paragraph of your paper is usually the introduction, and it should give your reader a brief overview of the book and what your paper will be about. Here’s a basic structure for the first paragraph:
Start with a hook: Begin with an interesting or engaging sentence to grab your reader’s attention. It could be a quote from the book, a surprising fact, or a question related to the story.
Introduce the book: Mention the title of the book (“The Kingdom over the Sea”) and the author’s name if you know it.
Provide some context: Briefly describe the genre of the book (e.g., fantasy, adventure) and its setting (where and when the story takes place).
Mention the main idea: Give a hint about the main theme or what the story is about. For example, is it about a young hero going on an epic adventure, a mystery to be solved, or a journey to a magical land?
State your purpose: Finally, state that your paper will explore and analyze various aspects of the book. Mention the topics or themes you plan to discuss, such as characters, plot, or themes.
Here’s an example to get you started:
“In the enchanting world of ‘The Kingdom over the Sea,’ written by [Author’s Name], readers are transported to a magical land filled with adventures, mystery, and wonder. This fantasy novel, set in a distant realm, follows the journey of [Main Character’s Name] as he embarks on an epic quest. In this paper, we will delve into the captivating elements of this story, exploring the characters, plot twists, and the themes that make ‘The Kingdom over the Sea’ an unforgettable read.”
Feel free to customize it to fit your own thoughts and ideas. If you have any specific details about the book you’d like to include in the introduction, please let me know!
Ignore most of ChatGPT’s long, didactic answer. Grab the sample paragraph, fill in the missing “Author’s Name” and “Main Character’s Name”, and–this part is key–ask ChatGPT to rephrase it like a Grade 6 student wrote it. If you don’t, you’ll wind up with this problem:
Here’s the rewritten prompt:

please rewrite this text so it sounds like a Grade 6 student wrote it: “In the enchanting world of ‘The Kingdom over the Sea,’ written by Zohra Nabi, readers are transported to a magical land filled with adventures, mystery, and wonder. This fantasy novel, set in a distant realm, follows the journey of Yara as she embarks on an epic quest. In this paper, we will delve into the captivating elements of this story, exploring the characters, plot twists, and the themes that make ‘The Kingdom over the Sea’ an unforgettable read.”

“In ‘The Kingdom over the Sea’ by Zohra Nabi, it’s like you’re going on a journey to a super cool and magical place! This book is all about adventures, mysteries, and really amazing stuff. It’s a fantasy book, and it happens in a faraway world. We get to follow this awesome character named Yara, and she’s off on a super big adventure. In my paper, we’ll check out the cool stuff in the story, like the characters, surprises, and the things that make ‘The Kingdom over the Sea’ a book you won’t forget.”
And there’s the first paragraph of your book report — done!
5. Second paragraph: ParaphraseFor the second paragraph, just ask ChatGPT to paraphrase the whole review you found in step 1. Paste the entire review into the prompt. (I haven’t pasted the whole thing; the review starts with “Cryptic directions”)

can you write this paragraph like a 6th grader wrote it “Cryptic directions…

“Yara’s mom left her some tricky clues that sent her from England to a super magical world. Her mom passed away and left her directions to go to a place called Zehaira, where she’d meet a sorceress named Leyla Khatoun. Yara wasn’t sure if this was real at first, but she really wanted to know more about her mom and their past because, on paper, they were from Iraq. When she got there, she found a world where the Sultan was against magic and had the sorcerers hiding from the Inquisition. On her way to find Leyla, Yara heard about a plan to wipe out all the sorcerers. But Leyla was busy running a secret sorcerer hideout and didn’t want to help the city sorcerers. Yara had to work really hard to get Leyla to see the big danger and to tell her the secrets her mom kept. And guess what? Yara discovers she has some magic of her own, which was a bit unexpected. But we already liked Yara a lot by then. The world in the story is full of magic and feels like an Arabian adventure, and the people and jinn in the story are really interesting. The ending was kind of quick, but it looks like there will be more exciting adventures in the next book.”
Now you have two paragraphs. Feed both of them to ChatGPT and ask it to write Paragraph 3.

Note: In the actual prompt, I pasted in the entire first and second paragraphs; I didn’t duplicate them here because that would be boring. But you get the idea.

“One of my favorite parts in ‘The Kingdom over the Sea’ was when Yara discovered her own magical abilities. It was a big surprise, and I thought it was really cool how the author revealed her powers. I also liked how the characters changed during the story. Yara became even braver, and it was awesome to see her grow. The themes in the book, like courage and friendship, are important in real life too. I thought they were well shown in the story. The author’s writing style made the book easy to read, and I felt like I was right there with Yara on her adventure. Overall, I really enjoyed ‘The Kingdom over the Sea,’ and I would totally recommend it to my friends because it’s full of magic and fun!”
Put the three paragraphs all together to get a finished 440-word book review that could plausibly come from an overly enthusiastic middle-schooler:
In ‘The Kingdom over the Sea’ by Zohra Nabi, it’s like you’re going on a journey to a super cool and magical place! This book is all about adventures, mysteries, and really amazing stuff. It’s a fantasy book, and it happens in a faraway world. We get to follow this awesome character named Yara, and she’s off on a super big adventure. In my paper, we’ll check out the cool stuff in the story, like the characters, surprises, and the things that make ‘The Kingdom over the Sea’ a book you won’t forget.
Yara’s mom left her some tricky clues that sent her from England to a super magical world. Her mom passed away and left her directions to go to a place called Zehaira, where she’d meet a sorceress named Leyla Khatoun. Yara wasn’t sure if this was real at first, but she really wanted to know more about her mom and their past because, on paper, they were from Iraq. When she got there, she found a world where the Sultan was against magic and had the sorcerers hiding from the Inquisition. On her way to find Leyla, Yara heard about a plan to wipe out all the sorcerers. But Leyla was busy running a secret sorcerer hideout and didn’t want to help the city sorcerers. Yara had to work really hard to get Leyla to see the big danger and to tell her the secrets her mom kept. And guess what? Yara discovers she has some magic of her own, which was a bit unexpected. But we already liked Yara a lot by then. The world in the story is full of magic and feels like an Arabian adventure, and the people and jinn in the story are really interesting. The ending was kind of quick, but it looks like there will be more exciting adventures in the next book.
One of my favorite parts in ‘The Kingdom over the Sea’ was when Yara discovered her own magical abilities. It was a big surprise, and I thought it was really cool how the author revealed her powers. I also liked how the characters changed during the story. Yara became even braver, and it was awesome to see her grow. The themes in the book, like courage and friendship, are important in real life too. I thought they were well shown in the story. The author’s writing style made the book easy to read, and I felt like I was right there with Yara on her adventure. Overall, I really enjoyed ‘The Kingdom over the Sea,’ and I would totally recommend it to my friends because it’s full of magic and fun!
Add a few spelling mistakes for verisimilitude, and the homework is done.
This took me about 20 minutes, start to finish: much less time than it would’ve taken me to read the book, think about it, come up with something to say, and write it down.
And if I really was a Grade 6 student, I would’ve lost the experience of reading a fantastic book, thinking about it, and reflecting on my own reaction to it. (I did read the book, and it really is lovely.)
What’s a teacher to do?I don’t know, but here are a couple of ideas from a non-teacher:
- Use in-class quizzes, not book reports to see who’s done the assigned reading.
- Ask for response papers that discuss a student’s personal reaction to a book, rather than summarizing the book.
- Ask for personalized answers in which students contextualize their response to a book with something in their own lives.
- Require handwritten homework so that, at the very least, students have to hand-copy what ChatGPT spits out. Even the act of copying can lead to reflection.
Using vector databases for context in AI
In the realm of Artificial Intelligence (AI), understanding and retaining context stands as a pivotal factor for decision-making and enhanced comprehension. Vector databases, are the foundational pillars in encapsulating your own data to be used in conjunction with AI and LLMs. Vector databases are empowering these systems to absorb and retain intricate contextual information.
Understanding Vector Databases
Vector databases are specialised data storage systems engineered to efficiently manage and retrieve vectorised data - also known as embeddings. These databases store information in a vector format, where each data entity is represented as a multidimensional numerical vector, encapsulating various attributes and relationships, thus fostering the preservation of rich context. That is text, video or audio is translated into numbers with many attributes in the multidimensional space. Then mathematics are used to calculate the proximity between these numbers. Loosely speaking that is what a neural network in an LLM does, it computes proximity (similarity) between the vectors. The vector database is the database where the vectors are stored. If you don't use vector databases in LLM, under architectures like RAG, you will not be able to bring your own data or context into your LLM AI model as all it will know will be what it is trained on which will probably be what it was trained on from the public internet. Vector database enable you to bring your own data to AI.
Examples of Vector Databases
Several platforms offer vector databases, such as Pinecone, Faiss by Facebook, Annoy, Milvus, and Elasticsearch with dense vector support. These databases cater to diverse use cases, offering functionalities tailored to handle vast amounts of vectorised information, be it images, text, audio, or other complex data types.
Importance in AI Context
Within the AI landscape, vector databases play a pivotal role in serving specific data and context for AI models. Particularly, in the Retrieval-Augmented Generation (RAG) architecture, where retrieval of relevant information is an essential part of content generation, vector databases act as repositories, storing precomputed embeddings from your own private data. These embeddings encode the semantic and contextual essence of your data, facilitating efficient retrieval in your AI apps and Bots. Bringing vector databases to your AI apps or chatbots will bring your own data to your AI apps, Agents and chatbots, and these apps will speak your data in case of LLMs.
Advantages for Organisations and AI Applications
Organisations can harness the prowess of vector databases within RAG architectures to elevate their AI applications and enable them to use organisational specific data:
Enhanced Contextual Understanding: By leveraging vector databases, AI models grasp nuanced contextual information, enabling more informed decision-making and more precise content generation based on specific and private organisational context.
Improved Efficiency in Information Retrieval: Vector databases expedite the retrieval of pertinent information by enabling similarity searches based on vector representations, augmenting the speed and accuracy of AI applications.
Scalability and Flexibility: These databases offer scalability and flexibility, accommodating diverse data types and expanding corpora, essential for the evolving needs of AI-driven applications.
Optimised Resource Utilisation: Vector databases streamline resource utilisation by efficiently storing and retrieving vectorised data, thus optimising computational resources and infrastructure.
Closing Thoughts
In the AI landscape, where the comprehension of context is paramount, vector databases emerge as linchpins, fortifying AI systems with the capability to retain and comprehend context-rich information. Their integration within RAG architectures not only elevates AI applications but also empowers organisations to glean profound insights, fostering a new era of context-driven AI innovation from data.
In essence, the power vested in vector databases will reshape the trajectory of AI, propelling it toward unparalleled contextualisation and intelligent decision-making based on in house and organisations own data.
But the enigma persists: What precisely will be the data fuelling the AI model?
Unified Auditing on tables
Statistics Gathering using older version feature
503 Service Unavailable - ORA-28001: the password has expired
Oracle SE2 Licence NUP
How to extract the tag value form the xml tag with colon
Question about Leading Hint and outer Joined tables
UKOUG 2023 - Using Database Vault in Real Life
Posted by Pete On 20/11/23 At 09:44 AM
I started an AI newsletter just in time for OpenAI to implode
Last week, I decided to soft-launch a weekly newsletter to highlight three or four recent news stories in AI. When I started it, I had no idea this would be one of the biggest weeks in AI news all year. There was one giant story: OpenAI, makers of ChatGPT, fired their CEO, Sam Altman. And then it got messy.
You can read the rest of this week’s newsletter here.
Why a newsletter: I spend several hours a week reading about AI/machine learning. I get frustrated when I see overly-simple takes on AI in the media, on social media, etc. I figured that the best thing I could do about that would be to share what I’m learning every week.
What the newsletter is: A weekly email about some of the past week’s most interesting stories in AI, plus the occasional backgrounder.
What it’s not: Business-focused, highly technical, or comprehensive.
What you’ll get from AI Week:
- An interesting email about AI in your inbox every week.
- Background knowledge to contextualize all the breathless media stories about the powers, or dangers, of AI.
- A feeling for where the field is going, why it’s going there, and who’s powering it.
Where do I sign up? Right here:
Enter your email
gby_pushdown
This is a note that’s been awaiting completion for nearly 10 years. It’s about a feature (or, at least, a hint for the feature) that appeared in 10.2.0.5 to control some of the inter-process messaging that takes place in parallel execution.
It’s a very simple idea that can make a significant difference in CPU usage for large parallel queries – can you aggregate the raw data before distributing it between slave sets (minimising the volume of traffic) or should you simply distribute the data and aggregate late to avoid aggregating twice. The strategy of aggregating early is known as “group by pushdown”.
I’ll start with a script to create a simple data set and a trivial query with a parallel hint:
rem
rem Script: gby_pushdown.sql
rem Author: Jonathan Lewis
rem Dated: Nov 2023
rem
rem Last tested
rem 19.11.0.0
rem 12.2.0.1
rem
set linesize 180
set pagesize 60
set trimspool on
set tab off
create table t1
as
select
*
from
all_objects
where
rownum <= 50000
/
alter session set statistics_level = all;
set serveroutput off
prompt ==================================================
prompt Default (for this data) pushdown and hash group by
prompt ==================================================
set feedback only
select
/*+
qb_name(main)
parallel(t1 2)
-- no_gby_pushdown(@main)
-- no_use_hash_gby_for_pushdown(@main)
*/
owner, object_type, count(*) ct
from
t1
group by
owner, object_type
order by
owner, object_type
/
set feedback on
select * from table(dbms_xplan.display_cursor(format=>'outline allstats cost hint_report -predicate'));
In the absence of any hints (apart from the qb_name() and parallel() hints), the plan I get from the query is as follows:
Plan hash value: 2919148568
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | O/1/M |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 77 (100)| 96 |00:00:00.07 | 5 | 0 | | | |
| 1 | PX COORDINATOR | | 1 | | | 96 |00:00:00.07 | 5 | 0 | 11264 | 11264 | 1/0/0|
| 2 | PX SEND QC (ORDER) | :TQ10001 | 0 | 276 | 77 (8)| 0 |00:00:00.01 | 0 | 0 | | | |
| 3 | SORT GROUP BY | | 2 | 276 | 77 (8)| 96 |00:00:00.01 | 0 | 0 | 4096 | 4096 | 2/0/0|
| 4 | PX RECEIVE | | 2 | 276 | 77 (8)| 121 |00:00:00.01 | 0 | 0 | | | |
| 5 | PX SEND RANGE | :TQ10000 | 0 | 276 | 77 (8)| 0 |00:00:00.01 | 0 | 0 | | | |
| 6 | HASH GROUP BY | | 2 | 276 | 77 (8)| 121 |00:00:00.04 | 1043 | 991 | 1079K| 1079K| 2/0/0|
| 7 | PX BLOCK ITERATOR | | 2 | 50000 | 73 (3)| 50000 |00:00:00.02 | 1043 | 991 | | | |
|* 8 | TABLE ACCESS FULL| T1 | 26 | 50000 | 73 (3)| 50000 |00:00:00.01 | 1043 | 991 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
ALL_ROWS
OUTLINE_LEAF(@"MAIN")
FULL(@"MAIN" "T1"@"MAIN")
GBY_PUSHDOWN(@"MAIN")
USE_HASH_GBY_FOR_PUSHDOWN(@"MAIN")
END_OUTLINE_DATA
*/
You’ll notice in the Outline Data that Oracle has recorded the use_hash_gby_for_pushdown() hint and the gby_pushdown() hint. I’ll be repeating the query disabling each of these hints in turn – which is why the negative versions of the hints appear as comments in my original query.
If you look at operation 6 of the plan you can see that the optimizer has chosen to use a hash group by to aggregate the selected rows, reducing 50,000 rows to 121 rows. We could query v$pq_tqstat, or run the SQL Monitor report to get more detail about how much work each PX process did, but it’s fairly safe to assume that it was shared reasonably evenly between the two processes.
After aggregating their data the first layer of PX processes distributes the results by range (operation 5, PX Send Range) to the second layer of PX processes, which re-aggregate the much reduced data set. At this point Oracle chooses to aggregate by sorting (Sort Group By) as this will deal with the order by clause at the same time. (Note: you could tell Oracle to split the aggregation and ordering by adding a use_hash_aggregation hint to the query.)
With my data set it’s fairly clear that it’s a good idea to do this “two stage” aggregation because the initial raw data is reduced by a very large factor the first layer of PX processes before they forward the results – and the reduction in inter-process messaging is likely to be a good idea.
There may be cases, of course, where some feature of the data pattern means that two-stage aggregation is a good idea, but aggregating by a sort is more efficient than an aggregation by hashing – a cardinality or clustering estimate might have persuaded the optimizer to make the wrong choice – so let’s see what happens to the plan if we enable the no_use_hash_gby_for_pushdown() hint:
lan hash value: 3954212205
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 77 (100)| 96 |00:00:00.14 | 5 | | | |
| 1 | PX COORDINATOR | | 1 | | | 96 |00:00:00.14 | 5 | 11264 | 11264 | 1/0/0|
| 2 | PX SEND QC (ORDER) | :TQ10001 | 0 | 276 | 77 (8)| 0 |00:00:00.01 | 0 | | | |
| 3 | SORT GROUP BY | | 2 | 276 | 77 (8)| 96 |00:00:00.03 | 0 | 4096 | 4096 | 2/0/0|
| 4 | PX RECEIVE | | 2 | 276 | 77 (8)| 120 |00:00:00.03 | 0 | | | |
| 5 | PX SEND RANGE | :TQ10000 | 0 | 276 | 77 (8)| 0 |00:00:00.01 | 0 | | | |
| 6 | SORT GROUP BY | | 2 | 276 | 77 (8)| 120 |00:00:00.13 | 1043 | 9216 | 9216 | 2/0/0|
| 7 | PX BLOCK ITERATOR | | 2 | 50000 | 73 (3)| 50000 |00:00:00.07 | 1043 | | | |
|* 8 | TABLE ACCESS FULL| T1 | 26 | 50000 | 73 (3)| 50000 |00:00:00.02 | 1043 | | | |
---------------------------------------------------------------------------------------------------------------------------------------
It doesn’t really make any difference in this very small test case, though the sorting does seem to have take a fraction of a second more CPU. The key change is that operation 6 has become a Sort Group By.
There is an interesting detail to look out for, though, in the Outline Data and Hint Report:
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
ALL_ROWS
OUTLINE_LEAF(@"MAIN")
FULL(@"MAIN" "T1"@"MAIN")
GBY_PUSHDOWN(@"MAIN")
END_OUTLINE_DATA
*/
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
1 - MAIN
- qb_name(main)
8 - MAIN / T1@MAIN
- parallel(t1 2)
I added the hint /*+ no_use_hash_gby_for_pushdown(@main) */ to the query, and the hint has clearly worked; but it’s not reported in the Hint Report, and it doesn’t appear in the Outline Data. This suggests that if you created an SQL Baseline for this query with this hint in place the plan would not reproduce because SQL Baseline would be missing the critical hint. (To be tested – left as an exercise to readers.)
The next step is to enable the no_gby_pushdown() hint. For the purposes of the note this is the important one. It’s also the one that you are more likely to use as it’s fairly obvious (if you know the data) when it’s a good idea to use it. (In some cases, of course, the optimizer might have avoided “group by pushdown” when it should have used it – in which case you’d be using the gby_pushdown() hint.) Here’s the plan when I block “group by pushdown”.
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 77 (100)| 96 |00:00:00.08 | 5 | | | |
| 1 | PX COORDINATOR | | 1 | | | 96 |00:00:00.08 | 5 | 6144 | 6144 | 1/0/0|
| 2 | PX SEND QC (ORDER) | :TQ10001 | 0 | 276 | 77 (8)| 0 |00:00:00.01 | 0 | | | |
| 3 | SORT GROUP BY | | 2 | 276 | 77 (8)| 96 |00:00:00.11 | 0 | 4096 | 4096 | 2/0/0|
| 4 | PX RECEIVE | | 2 | 50000 | 73 (3)| 50000 |00:00:00.07 | 0 | | | |
| 5 | PX SEND RANGE | :TQ10000 | 0 | 50000 | 73 (3)| 0 |00:00:00.01 | 0 | | | |
| 6 | PX BLOCK ITERATOR | | 4 | 50000 | 73 (3)| 50091 |00:00:00.02 | 1051 | | | |
|* 7 | TABLE ACCESS FULL| T1 | 28 | 50000 | 73 (3)| 50091 |00:00:00.01 | 1051 | | | |
--------------------------------------------------------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
ALL_ROWS
OUTLINE_LEAF(@"MAIN")
FULL(@"MAIN" "T1"@"MAIN")
END_OUTLINE_DATA
*/
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3
---------------------------------------------------------------------------
1 - MAIN
- no_gby_pushdown(@main)
- qb_name(main)
7 - MAIN / T1@MAIN
- parallel(t1 2)
Key details to highlight here are:
- There’s only one aggregation step, appearing at operation 3 after the PX SEND/RECEIVE
- 50,000 rows are passed from slave set 1 (operation 6) to slave set 2 (operation 4).
- The no_gby_pushdown(@main) does appear in the Hint Report
- The no_gby_pushdown(@main) doesn’t appear in the Outline Data (which is, again, a little worrying).
Again, with this small data set, the change in plan isn’t going to make much difference to the end user, but you may find cases where there is a best choice to keep the user sufficiently happy and save machine resources.
SummaryWhen using parallel query the optimizer may choose to aggregate the data in two steps so that the current rowsource is reduced by one set of PX processes before it is passed to the next set of PX processes that does a second aggregation step. This tends to be a good idea if the first set of processes can reduce the size of the data set by a large factor and save a lot of machine resources on the subsequence inter-process communication; but it is a bad idea if the data doesn’t reduce and a large volume of data ends up being aggregated in both sets of PX processes.
You can over-ride the optimizer’s choice with the [no_]gby_pushdown() hint.
Session Activity Stats
A little detail to remember when trouble-shooting at the session level – some of the information summarised in the Instance Activity figures (v$sysstat) is not available in the Session Activity figures (v$sesstat / v$mystat). The difference goes right down to the x$ objects, and here are two versions of a simple query I wrote for 19c to check for the missing statistics (you’ll have to be able to connect as SYS to get results from the first one):
rem
rem Script: mystat_missing.sql
rem Author: Jonathan Lewis
rem Dated: Nov 2023
rem Purpose:
rem
rem Last tested
rem 23.3.0.0
rem 19.11.0.0
rem 12.2.0.1
spool mystat_missing
set linesize 132
set pagesize 60
set trimspool on
set tab off
select
usd.indx, usd.ksusdnam, usd.ksusdcls
from
x$ksusd usd
where
usd.indx not in (
select sta.indx
from x$ksumysta sta
)
/
select
sys.statistic#, sys.name, sys.class
from
v$sysstat sys
where
sys.statistic# not in (
select mst.statistic#
from v$mystat mst
)
/
spool off
If you run the script against 12.2.0.1 you’ll find that there are no “missing” session stats, but when you upgrade to 19c (in my case 19.11.0.0) you’ll find a couple of dozen statistics reported. This was the output I got:
STATISTIC# NAME CLASS
---------- ---------------------------------------------------------------- ----------
2010 Instance Statistic test case 384
2011 cell RDMA reads unused 264
2012 cell RDMA writes 264
2013 cell pmem cache read hits unused 264
2014 cell pmem cache writes 264
2015 NS transactions 257
2016 NS transactions timed out 257
2017 NS transactions interrupted 257
2018 NS transactions not using all standbys 257
2019 NS transactions skipping send 257
2020 NS transaction setups 257
2021 NS transaction bytes loaded 320
2022 NS transaction bytes to network 320
2023 NS transaction bytes relogged 322
2024 NS transaction bytes logged 322
2025 NS transaction send time 320
2026 NS transaction setup time 320
2027 NS transaction confirm time 320
2028 NS recovery timeout interrupts 384
2029 NS recovery DTC full interrupts 384
2030 NS recovery fetch requests made 384
2031 NS recovery fetch ranges received 384
2032 NS recovery fetch requested bytes 384
2033 NS recovery fetch received bytes 384
2034 NS recovery fetch received time 384
2035 NS recovery fetch requests remade 384
26 rows selected.
Running the query against 23c Free (23.3) on the Developer VM created by Oracle, the number of “missing” statistics jumped to 1,052 – so I won’t list them. Given the size of the list I did a quick check to remind myself of how many statistics were actually present in v$sysstat, and that produced an interesting comparison.
--
-- 19.11.0.0
--
SQL> select count(*) ct_19c from v$sysstat;
CT_19C
----------
2036
--
-- 23.3.0.0 (on premises)
--
SQL> select count(*) ct_23c from v$sysstat;
CT_23C
----------
2661
So there are 600-ish extra stats available in 23c, but 1,000-ish stats that don’t appear at the session level. So if you’ve been taking advantage of some of the “new” 19c stats to help with trouble-shooting you may find that they disappear on the upgrade. More on that later.
If you look at the output I’ve listed above you might spot that all the missing stats satisfy the predicate “class >= 256”. In fact, the class is a bit vector, and a more appropriate predicate would be: “bitand(class,256) = 256”. Either option gives you a fairly lazy way to do any checks you might be interested in. For example, after setting up a database link from a 19c instance to a 23c instance, I ran the following query to find out how many statistics that were visible in the 19c v$sesstat had changed their class to become invisible in the 23c v$sesstat.
select name from v$sysstat where class < 256
intersect
select name from V$sysstat@oracle23 where class >= 256
/
NAME
----------------------------------------------------------------
...
SMON posted for dropping temp segment
SMON posted for instance recovery
SMON posted for txn recovery for other instances
SMON posted for undo segment recovery
SMON posted for undo segment shrink
TBS Extension: bytes extended
TBS Extension: files extended
TBS Extension: tasks created
TBS Extension: tasks executed
...
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanout failures: delayed log
commit cleanout failures: flashback
commit cleanout failures: hot backup in progress
commit cleanout failures: write disabled
...
db corrupt blocks detected
db corrupt blocks recovered
...
502 rows selected.
Of the 502 stats a very large fraction were about In Memory (prefixed IM), and there were a number that looked as if they were only relevant to background processes and therefore (to some extent, possibly) not of interest when debugging user activity. It’s also possible that some of the statistics would fall into different classes if certain features (e.g hybrid columnar compression) were linked in to the executable.
Another query that might be of interest is a typical “what’s changed” query. What statistics are available in 23c that aren’t in 19c (and vice versa):
select
name, version
from (
select name, '23c' version from v$sysstat@oracle23
minus
select name, '23c' from V$sysstat
union all
select name, '19c' version from v$sysstat
minus
select name, '19c' from V$sysstat@oracle23
)
order by
version desc, name
/
Again there are so many that I don’t think it would be helpful to reproduce my results, but I’ll just say that there were 663 stats in 23c that weren’t in 19c, and 38 stats in 19c that weren’t in 23c. Of course, it’s possible (and I didn’t check carefully) that some of these new/”disappearing” statistics show up only because they’ve had a name change as the version changed.
A couple of the new 23c stats that I like the look of (and would like to see at the session level) are:
user commits PL/SQL
user commits Autonomous
I’m sure that there are more that will reveal themselves as (occasionally) useful over time, and further investigation is left to the reader.
Create MySQL In Vagrant Box
First, thank you to jazzfogPavel for https://github.com/jazzfog/MySql-in-Vagrant as it was straight forward.
Too tired and too lazy to explain all the steps; however, the output should be self explanatory.
pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$ ls -al
ls -al
total 25
drwxr-xr-x 1 pegasus 197121 0 Nov 19 06:37 .
drwxr-xr-x 1 pegasus 197121 0 Nov 19 06:34 ..
drwxr-xr-x 1 pegasus 197121 0 Nov 19 06:34 .git
-rw-r--r-- 1 pegasus 197121 31 Nov 19 06:34 .gitignore
drwxr-xr-x 1 pegasus 197121 0 Nov 19 06:35 .vagrant
drwxr-xr-x 1 pegasus 197121 0 Nov 19 06:37 MysqlData
-rw-r--r-- 1 pegasus 197121 635 Nov 19 06:34 Vagrantfile
drwxr-xr-x 1 pegasus 197121 0 Nov 19 06:34 provision
-rw-r--r-- 1 pegasus 197121 3010 Nov 19 06:34 readme.md
pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$
======================================================================
pegasus@Greek MINGW64 /c/vagrant
$ git clone https://github.com/jazzfog/MySql-in-Vagrant
Cloning into 'MySql-in-Vagrant'...
remote: Enumerating objects: 32, done.
remote: Total 32 (delta 0), reused 0 (delta 0), pack-reused 32
Receiving objects: 100% (32/32), 5.32 KiB | 5.32 MiB/s, done.
Resolving deltas: 100% (12/12), done.
pegasus@Greek MINGW64 /c/vagrant
$ ls
MySql-in-Vagrant/ vagrant/ vagrant-projects/
pegasus@Greek MINGW64 /c/vagrant
$ cd MySql-in-Vagrant/
pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$ ls
Vagrantfile provision/ readme.md
pegasus@Greek MINGW64 /c/vagrant/MySql-in-Vagrant (master)
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/trusty64'...
vagrant@vagrant-ubuntu-trusty-64:~$ mysql -V
mysql Ver 14.14 Distrib 5.5.62, for debian-linux-gnu (x86_64) using readline 6.3
vagrant@vagrant-ubuntu-trusty-64:~$ mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 37
Server version: 5.5.62-0ubuntu0.14.04.1 (Ubuntu)
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SELECT version();
+-------------------------+
| version() |
+-------------------------+
| 5.5.62-0ubuntu0.14.04.1 |
+-------------------------+
1 row in set (0.00 sec)
mysql> STATUS;
--------------
mysql Ver 14.14 Distrib 5.5.62, for debian-linux-gnu (x86_64) using readline 6.3
Connection id: 37
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.5.62-0ubuntu0.14.04.1 (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 5 min 13 sec
Threads: 1 Questions: 113 Slow queries: 0 Opens: 48 Flush tables: 1 Open tables: 41 Queries per second avg: 0.361
--------------
mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.5.62 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.62-0ubuntu0.14.04.1 |
| version_comment | (Ubuntu) |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)
mysql>
Pages
