7. 基于向量的分析函数
SmartEDB 的一个强大特性是序列数据类型。除了能够以这种“列式”格式存储数据所带来的内存和处理效率之外,序列尤其适用于基于向量的统计函数。通常,这些 SQL 统计函数用于对“时间序列”(即一组相关的序列字段)执行分析操作。以下示例将通过对一个简单的数据库类进行查询来执行一些统计函数:
class Quote {
char<MAX_SYMBOL_LEN> symbol;
sequence<date asc> day;
sequence<float> low;
sequence<float> high;
sequence<float> open;
sequence<float> close;
sequence<uint4> volume;
unique tree<symbol> by_sym;
};
这些示例的数据取自 2013 年 IBM 的历史值。此示例数据以及执行选择语句的一组脚本和 shell 命令文件包含在目录 SmartEDB/samples/native/sql/xsql/scripts/financial 中。要创建 QuoteIBM 表并加载示例数据,请使用以下命令在该目录中打开 xSQL:
xsql -i -size 100m -f IBM-q1-2013.sql
序列间隔
第一个示例将提取属于指定间隔(2013 年 1 月)的数据。使用有序序列“日”来选择该间隔,然后通过以下查询将此间隔投影到其他(无序)序列“开盘”和“收盘”:
SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013,
open@Jan_2013 as "open", close@Jan_2013 as "close"
FROM Quote WHERE symbol='IBM';
symbol Jan_2013 open close
--------------------------------------------------
IBM 20130102 194.089996 196.350006
IBM 20130103 195.669998 195.270004
IBM 20130104 194.190002 193.990005
IBM 20130107 193.399994 193.139999
IBM 20130108 192.919998 192.869995
IBM 20130109 193.479996 192.320007
IBM 20130110 192.649994 192.880005
IBM 20130111 194.149994 194.449997
IBM 20130114 192.820007 192.619995
IBM 20130115 191.309998 192.500000
IBM 20130116 192.000000 192.589996
IBM 20130117 193.850006 193.649994
IBM 20130118 194.029999 194.470001
IBM 20130122 194.360001 196.080002
IBM 20130123 203.500000 204.720001
IBM 20130124 203.910004 204.419998
IBM 20130125 204.449997 204.970001
IBM 20130128 204.850006 204.929993
IBM 20130129 204.339996 203.899994
IBM 20130130 203.690002 203.520004
IBM 20130131 203.320007 203.070007
Selected records: 21
请注意,此查询可以从脚本文件 x3.sql 中运行:
XSQL>script x3.sql
请注意使用关键字“flattened”将序列字段转换为表,以便结果集显示为每组相应序列元素的单独行,标量字段符号值在每行重复。(还要注意,为 seq_search() 指定的边界是“包含的”。)
序列的算术运算
我们可以对序列元素进行算术运算。在下面的查询中,通过将“high”和“low”这两个序列中的每个对应元素相加,并除以 2,来对它们求平均值,从而生成一个结果序列:
SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013,
high@Jan_2013 as "high", low@Jan_2013 as "low",
(high@Jan_2013 + low@Jan_2013) / 2 as "average"
FROM Quote WHERE symbol='IBM';
symbol Jan_2013 high low average
------------------------------------------------------------------
IBM 20130102 196.350006 193.800003 195.075012
IBM 20130103 196.289993 194.440002 195.364990
IBM 20130104 194.460007 192.779999 193.619995
IBM 20130107 193.779999 192.339996 193.059998
IBM 20130108 193.300003 191.610001 192.455002
IBM 20130109 193.490005 191.649994 192.570007
IBM 20130110 192.960007 191.279999 192.119995
IBM 20130111 195.000000 192.899994 193.949997
IBM 20130114 193.279999 191.750000 192.514999
IBM 20130115 192.729996 190.389999 191.559998
IBM 20130116 193.179993 191.350006 192.264999
IBM 20130117 194.460007 193.240005 193.850006
IBM 20130118 195.000000 193.800003 194.399994
IBM 20130122 196.080002 194.009995 195.044998
IBM 20130123 208.580002 203.360001 205.970001
IBM 20130124 205.059998 203.080002 204.070007
IBM 20130125 205.179993 204.130005 204.654999
IBM 20130128 206.220001 204.289993 205.255005
IBM 20130129 205.729996 203.639999 204.684998
IBM 20130130 204.880005 203.190002 204.035004
IBM 20130131 204.470001 202.960007 203.714996
Selected records: 21
并且这个查询可以从脚本文件 x4.sql 中运行。
筛选序列元素
以下查询将提取 2013 年 2 月收盘价高于开盘价的日子:
SELECT symbol, seq_search(day, 20130201, 20130228) as Feb_2013,
close@Feb_2013 as Feb_Close, open@Feb_2013 as Feb_Open,
seq_filter(Feb_Close > Feb_Open, Feb_2013) as "up_Feb",
seq_filter(Feb_Close > Feb_Open, Feb_Open) as "up_Open",
seq_filter(Feb_Close > Feb_Open, Feb_Close) as "up_Close"
FROM Quote WHERE symbol='IBM';
symbol
Feb_2013{}
Feb_Close{}
Feb_Open{}
up_Feb{}
up_Open{}
up_Close{}
------------------------------------------------------------------------------
IBM
{20130201, 20130204, 20130205, 20130206, 20130207, 20130208,
20130211, 20130212, 20130213, 20130214, 20130215, 20130219,
20130220, 20130221, 20130222, 20130225, 20130226, 20130227,
20130228}
{205.179993, 203.789993, 202.789993, 201.020004, 199.740005, 201.679993,
200.160004, 200.039993, 200.089996, 199.649994, 200.979996, 200.320007,
199.309998, 198.330002, 201.089996, 197.509995, 199.139999, 202.330002,
200.830002}
{204.649994, 204.190002, 204.309998, 200.389999, 200.619995, 199.970001,
200.979996, 200.009995, 200.649994, 199.729996, 199.979996, 200.600006,
200.619995, 198.630005, 199.229996, 201.669998, 198.630005, 198.889999,
202.179993}
{20130201, 20130206, 20130208, 20130212, 20130215, 20130222,
20130226, 20130227}
{204.649994, 200.389999, 199.970001, 200.009995, 199.979996, 199.229996,
198.630005, 198.889999}
{205.179993, 201.020004, 201.679993, 200.039993, 200.979996, 201.089996,
199.139999, 202.330002}
Selected records: 1
“up_Feb”序列包含了 2013 年 2 月收盘价高于开盘价的日期。为了说明这一点,我们在“up_Feb”序列中突出显示了 2013 年 2 月 8 日的日期值,它对应于“up_Open”和“up_Close”序列的第三个元素,并且满足筛选条件;即 201.68 > 199.97。请注意,为了生成经过筛选的开盘价和收盘价序列,反复调用函数“seq_filter()”是必要的,这使得输出更易于理解,因为“Feb_2013”、“Feb_Open”和“Feb_Close”序列(无法抑制)在一定程度上掩盖了关键信息。请查看“分析函数库”页面,以获取更多示例以及对这些强大分析函数的进一步解释。