ANALYZE

Syntax

SQLite docs for ANALYZE.

The following shapes of SQL statements are to be permitted:

  1. ANALYZE;

  2. ANALYZE schema_name;

  3. ANALYZE table_name;

  4. ANALYZE schema_name.table_name;

The recommended way of running ANALYZE is to use PRAGMA optimize, which is out of scope as ANALYZE is strictly a pre-requisite.

ANALYZE is affected by PRAGMA analysis_limit, support for which is also out of scope.

Semantics

The effect of running ANALYZE is that the sqlite_stat* table(s) are populated. SQLite docs describe sqlite_stat[1-4] tables, where:

PRAGMA compile_options; shows no ENABLE_STAT define in my fedora install, and ANALYZE only produces a sqlite_stat1, so starting there is a safe bet.

The sqlite_stat1 table is defined as CREATE TABLE sqlite_stat1(tbl,idx,stat);, where:

From experimentation:

Code Gen

The overall structure of each ANALYZE statement is:

  1. Create the sqlite_stat1 table if it doesn’t already exist OR delete the existing row if it does.

  2. For each table within each schema:

    1. Iff there is no primary key, count the rows and insert into the stats table with a NULL idx.

    2. Iff there is a primary key, stat_init() a stats object, stat_push() each row into it, and then store stat_get() into the stats column.

ANALYZE of a table without a primary key defined, with sqlite_stat1 pre-created:

sqlite> CREATE TABLE iiftest(a int, b int, c int);
sqlite> EXPLAIN ANALYZE iiftest;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     21    0                    0   Start at 21
1     Null           0     1     0                    0   r[1]=NULL
2     OpenWrite      3     4     0     3              0   root=4 iDb=0; sqlite_stat1
3     Rewind         3     9     0                    0
4       Column         3     0     2                    0   r[2]= cursor 3 column 0
5       Ne             3     8     2     BINARY-8       81  if r[2]!=r[3] goto 8
6       Rowid          3     4     0                    0   r[4]=sqlite_stat1.rowid
7       Delete         3     0     0     sqlite_stat1   2
8     Next           3     4     0                    1
9     OpenWrite      0     4     0     3              0   root=4 iDb=0; sqlite_stat1
10    OpenRead       4     2     0     3              0   root=2 iDb=0; iiftest
11    String8        0     11    0     iiftest        0   r[11]='iiftest'; iiftest
12    Count          4     13    0                    0   r[13]=count()
13    IfNot          13    18    0                    0
14    Null           0     12    0                    0   r[12]=NULL
15    MakeRecord     11    3     9     BBB            0   r[9]=mkrec(r[11..13])
16    NewRowid       0     5     0                    0   r[5]=rowid
17    Insert         0     9     5                    8   intkey=r[5] data=r[9]
18    LoadAnalysis   0     0     0                    0
19    Expire         0     0     0                    0
20    Halt           0     0     0                    0
21    Transaction    0     1     9     0              1   usesStmtJournal=0
22    String8        0     3     0     iiftest        0   r[3]='iiftest'
23    Goto           0     1     0                    0


ANALYZE of a table with a primary key defined, with sqlite_stat1 pre-created:

sqlite> CREATE TABLE stat_test(a int, b int, c int, primary key (a,b,c));
sqlite> EXPLAIN ANALYZE stat_test;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     45    0                    0   Start at 45
1     Null           0     1     0                    0   r[1]=NULL
2     OpenWrite      3     4     0     3              0   root=4 iDb=0; sqlite_stat1
3     Rewind         3     9     0                    0
4       Column         3     0     2                    0   r[2]= cursor 3 column 0
5       Ne             3     8     2     BINARY-8       81  if r[2]!=r[3] goto 8
6       Rowid          3     4     0                    0   r[4]=sqlite_stat1.rowid
7       Delete         3     0     0     sqlite_stat1   2
8     Next           3     4     0                    1
9     OpenWrite      0     4     0     3              0   root=4 iDb=0; sqlite_stat1
10    OpenRead       4     5     0     3              0   root=5 iDb=0; stat_test
11    String8        0     11    0     stat_test      0   r[11]='stat_test'
12    String8        0     12    0     sqlite_autoindex_stat_test_1 0   r[12]='sqlite_autoindex_stat_test_1'; Analysis for stat_test.sqlite_autoindex_stat_test_1
13    OpenRead       5     6     0     k(4,,,,)       0   root=6 iDb=0; sqlite_autoindex_stat_test_1
14    Integer        0     10    0                    0   r[10]=0
15    Integer        4     7     0                    0   r[7]=4
16    Integer        3     8     0                    0   r[8]=3
17    Count          5     9     0                    0   r[9]=count()
18    Function       0     7     6     stat_init(4)   0   r[6]=func(r[7..10])
19    Rewind         5     42    0                    0
20    Integer        0     7     0                    0   r[7]=0
21    Goto           0     33    0                    0
22      Integer        0     7     0                    0   r[7]=0
23      Column         5     0     9                    0   r[9]=sqlite_autoindex_stat_test_1.a
24      Ne             9     33    14    BINARY-8       128 if r[14]!=r[9] goto 33
25      Integer        1     7     0                    0   r[7]=1
26      Column         5     1     9                    0   r[9]=sqlite_autoindex_stat_test_1.b
27      Ne             9     34    15    BINARY-8       128 if r[15]!=r[9] goto 34
28      Integer        2     7     0                    0   r[7]=2
29      Column         5     2     9                    0   r[9]=sqlite_autoindex_stat_test_1.c
30      Ne             9     35    16    BINARY-8       128 if r[16]!=r[9] goto 35
31      Integer        3     7     0                    0   r[7]=3
32      Goto           0     36    0                    0
33      Column         5     0     14                   0   r[14]=sqlite_autoindex_stat_test_1.a
34      Column         5     1     15                   0   r[15]=sqlite_autoindex_stat_test_1.b
35      Column         5     2     16                   0   r[16]=sqlite_autoindex_stat_test_1.c
36      Function       1     6     9     stat_push(2)   0   r[9]=func(r[6..7])
37    Next           5     22    0                    0
38    Function       0     6     13    stat_get(1)    0   r[13]=func(r[6])
39    MakeRecord     11    3     9     BBB            0   r[9]=mkrec(r[11..13])
40    NewRowid       0     5     0                    0   r[5]=rowid
41    Insert         0     9     5                    8   intkey=r[5] data=r[9]
42    LoadAnalysis   0     0     0                    0
43    Expire         0     0     0                    0
44    Halt           0     0     0                    0
45    Transaction    0     1     9     0              1   usesStmtJournal=1
46    String8        0     3     0     stat_test      0   r[3]='stat_test'
47    Goto           0     1     0                    0


ANALYZE of a table with a primary key, without rowid, and with sqlite_stat1 pre-created:

sqlite> CREATE TABLE stat_test_norowid(a int, b int, c int, primary key (a,b,c)) without rowid;
sqlite> EXPLAIN ANALYZE stat_test_norowid;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     41    0                    0   Start at 41
1     Null           0     1     0                    0   r[1]=NULL
2     OpenWrite      3     4     0     3              0   root=4 iDb=0; sqlite_stat1
3     Rewind         3     9     0                    0
4       Column         3     0     2                    0   r[2]= cursor 3 column 0
5       Ne             3     8     2     BINARY-8       81  if r[2]!=r[3] goto 8
6       Rowid          3     4     0                    0   r[4]=sqlite_stat1.rowid
7       Delete         3     0     0     sqlite_stat1   2
8     Next           3     4     0                    1
9     OpenWrite      0     4     0     3              0   root=4 iDb=0; sqlite_stat1
10    OpenRead       4     7     0     k(3,,,)        0   root=7 iDb=0; stat_test_norowid
11    String8        0     11    0     stat_test_norowid 0   r[11]='stat_test_norowid'
12    String8        0     12    0     stat_test_norowid 0   r[12]='stat_test_norowid'; Analysis for stat_test_norowid.stat_test_norowid
13    OpenRead       5     7     0     k(3,,,)        0   root=7 iDb=0; sqlite_autoindex_stat_test_norowid_1
14    Integer        0     10    0                    0   r[10]=0
15    Integer        3     7     0                    0   r[7]=3
16    Integer        3     8     0                    0   r[8]=3
17    Count          5     9     0                    0   r[9]=count()
18    Function       0     7     6     stat_init(4)   0   r[6]=func(r[7..10])
19    Rewind         5     38    0                    0
20    Integer        0     7     0                    0   r[7]=0
21    Goto           0     30    0                    0
22      Integer        0     7     0                    0   r[7]=0
23      Column         5     0     9                    0   r[9]=sqlite_autoindex_stat_test_norowid_1.a
24      Ne             9     30    14    BINARY-8       128 if r[14]!=r[9] goto 30
25      Integer        1     7     0                    0   r[7]=1
26      Column         5     1     9                    0   r[9]=sqlite_autoindex_stat_test_norowid_1.b
27      Ne             9     31    15    BINARY-8       128 if r[15]!=r[9] goto 31
28      Integer        2     7     0                    0   r[7]=2
29      Goto           0     32    0                    0
30      Column         5     0     14                   0   r[14]=sqlite_autoindex_stat_test_norowid_1.a
31      Column         5     1     15                   0   r[15]=sqlite_autoindex_stat_test_norowid_1.b
32      Function       1     6     9     stat_push(2)   0   r[9]=func(r[6..7])
33    Next           5     22    0                    0
34    Function       0     6     13    stat_get(1)    0   r[13]=func(r[6])
35    MakeRecord     11    3     9     BBB            0   r[9]=mkrec(r[11..13])
36    NewRowid       0     5     0                    0   r[5]=rowid
37    Insert         0     9     5                    8   intkey=r[5] data=r[9]
38    LoadAnalysis   0     0     0                    0
39    Expire         0     0     0                    0
40    Halt           0     0     0                    0
41    Transaction    0     1     9     0              1   usesStmtJournal=1
42    String8        0     3     0     stat_test_norowid 0   r[3]='stat_test_norowid'
43    Goto           0     1     0                    0

Which is the same, just using the table name as the index name.


ANALYZE without sqlite_stat1 created:

sqlite> DROP TABLE sqlite_stat1;
sqlite> EXPLAIN ANALYZE;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     47    0                    0   Start at 47
1     ReadCookie     0     3     2                    0
2     If             3     5     0                    0
3     SetCookie      0     2     4                    0
4     SetCookie      0     5     1                    0
5     CreateBtree    0     2     1                    0   r[2]=root iDb=0 flags=1
6     OpenWrite      0     1     0     5              0   root=1 iDb=0
7     NewRowid       0     1     0                    0   r[1]=rowid
8     Blob           6     3     0                   0   r[3]= (len=6)
9     Insert         0     3     1                    8   intkey=r[1] data=r[3]
10    Close          0     0     0                    0
11    Close          0     0     0                    0
12    Null           0     4     5                    0   r[4..5]=NULL
13    Noop           4     0     4                    0
14    OpenWrite      3     1     0     5              0   root=1 iDb=0; sqlite_master
15    SeekRowid      3     17    1                    0   intkey=r[1]
16    Rowid          3     5     0                    0   r[5]= rowid of 3
17    IsNull         5     25    0                    0   if r[5]==NULL goto 25
18    String8        0     6     0     table          0   r[6]='table'
19    String8        0     7     0     sqlite_stat1   0   r[7]='sqlite_stat1'
20    String8        0     8     0     sqlite_stat1   0   r[8]='sqlite_stat1'
21    Copy           2     9     0                    0   r[9]=r[2]
22    String8        0     10    0     CREATE TABLE sqlite_stat1(tbl,idx,stat) 0   r[10]='CREATE TABLE sqlite_stat1(tbl,idx,stat)'
23    MakeRecord     6     5     4     BBBDB          0   r[4]=mkrec(r[6..10])
24    Insert         3     4     5                    0   intkey=r[5] data=r[4]
25    SetCookie      0     1     5                    0
26    ParseSchema    0     0     0     tbl_name='sqlite_stat1' AND type!='trigger' 0
27    OpenWrite      0     2     0     3              16  root=2 iDb=0; sqlite_stat1
28    OpenRead       5     3     0     4              0   root=3 iDb=0; casetest
29    String8        0     18    0     casetest       0   r[18]='casetest'; casetest
30    Count          5     20    0                    0   r[20]=count()
31    IfNot          20    36    0                    0
32    Null           0     19    0                    0   r[19]=NULL
33    MakeRecord     18    3     16    BBB            0   r[16]=mkrec(r[18..20])
34    NewRowid       0     12    0                    0   r[12]=rowid
35    Insert         0     16    12                   8   intkey=r[12] data=r[16]
36    OpenRead       5     2     0     3              0   root=2 iDb=0; iiftest
37    String8        0     18    0     iiftest        0   r[18]='iiftest'; iiftest
38    Count          5     20    0                    0   r[20]=count()
39    IfNot          20    44    0                    0
40    Null           0     19    0                    0   r[19]=NULL
41    MakeRecord     18    3     16    BBB            0   r[16]=mkrec(r[18..20])
42    NewRowid       0     12    0                    0   r[12]=rowid
43    Insert         0     16    12                   8   intkey=r[12] data=r[16]
44    LoadAnalysis   0     0     0                    0
45    Expire         0     0     0                    0
46    Halt           0     0     0                    0
47    Transaction    0     1     4     0              1   usesStmtJournal=1
48    Goto           0     1     0                    0

Note the lack of clearing the table, because it doesn’t exist. iiftest and casetest are the tables in the database at the time of running this command, leftover from previous work.

Functions

As can be seen during codegen, there’s three functions invoked: stat_init(), stat_get(), and stat_push(). These functions are defined in sqlite source as:

/*
** Three SQL functions - stat_init(), stat_push(), and stat_get() -
** share an instance of the following structure to hold their state
** information.
*/
typedef struct StatAccum StatAccum;
typedef struct StatSample StatSample;
struct StatSample {
  tRowcnt *anDLt;                 /* sqlite_stat4.nDLt */
};
struct StatAccum {
  sqlite3 *db;              /* Database connection, for malloc() */
  tRowcnt nEst;             /* Estimated number of rows */
  tRowcnt nRow;             /* Number of rows visited so far */
  int nLimit;               /* Analysis row-scan limit */
  int nCol;                 /* Number of columns in index + pk/rowid */
  int nKeyCol;              /* Number of index columns w/o the pk/rowid */
  u8 nSkipAhead;            /* Number of times of skip-ahead */
  StatSample current;       /* Current row as a StatSample */
};
stat_init()
/*
** Implementation of the stat_init(N,K,C,L) SQL function. The four parameters
** are:
**     N:    The number of columns in the index including the rowid/pk (note 1)
**     K:    The number of columns in the index excluding the rowid/pk.
**     C:    Estimated number of rows in the index
**     L:    A limit on the number of rows to scan, or 0 for no-limit
**
** Note 1:  In the special case of the covering index that implements a
** WITHOUT ROWID table, N is the number of PRIMARY KEY columns, not the
** total number of columns in the table.
**
** For indexes on ordinary rowid tables, N==K+1.  But for indexes on
** WITHOUT ROWID tables, N=K+P where P is the number of columns in the
** PRIMARY KEY of the table.  The covering index that implements the
** original WITHOUT ROWID table as N==K as a special case.
**
** This routine allocates the StatAccum object in heap memory. The return
** value is a pointer to the StatAccum object.  The datatype of the
** return value is BLOB, but it is really just a pointer to the StatAccum
** object.
*/
stat_push()
/*
** Implementation of the stat_push SQL function:  stat_push(P,C,R)
** Arguments:
**
**    P     Pointer to the StatAccum object created by stat_init()
**    C     Index of left-most column to differ from previous row
**    R     Rowid for the current row.  Might be a key record for
**          WITHOUT ROWID tables.
**
** The purpose of this routine is to collect statistical data and/or
** samples from the index being analyzed into the StatAccum object.
** The stat_get() SQL function will be used afterwards to
** retrieve the information gathered.
**
** This SQL function usually returns NULL, but might return an integer
** if it wants the byte-code to do special processing.
**
** The R parameter is only used for STAT4
*/
stat_get()
#define STAT_GET_STAT1 0          /* "stat" column of stat1 table */
#define STAT_GET_ROWID 1          /* "rowid" column of stat[34] entry */
#define STAT_GET_NEQ   2          /* "neq" column of stat[34] entry */
#define STAT_GET_NLT   3          /* "nlt" column of stat[34] entry */
#define STAT_GET_NDLT  4          /* "ndlt" column of stat[34] entry */

/*
** Implementation of the stat_get(P,J) SQL function.  This routine is
** used to query statistical information that has been gathered into
** the StatAccum object by prior calls to stat_push().  The P parameter
** has type BLOB but it is really just a pointer to the StatAccum object.
** The content to returned is determined by the parameter J
** which is one of the STAT_GET_xxxx values defined above.
**
** The stat_get(P,J) function is not available to generic SQL.  It is
** inserted as part of a manually constructed bytecode program.  (See
** the callStatGet() routine below.)  It is guaranteed that the P
** parameter will always be a pointer to a StatAccum object, never a
** NULL.
**
** If STAT4 is not enabled, then J is always
** STAT_GET_STAT1 and is hence omitted and this routine becomes
** a one-parameter function, stat_get(P), that always returns the
** stat1 table entry information.
*/

Limbo Implementation Plan

Backfill needed VDBE support for pre-created sqlite_stat1:

Then, implement the ANALYZE support:

And then fill in the rest of VDBE support needed for creating sqlite_stat1:

Then, adjust the previous work to permit creating the table as part of codegen:

Then, shore up testing:

Which leaves the work of supporting indexes (and covering indexes) until index support is added to limbo.