类关系
正如“类关系”页面中所解释的那样,SmartEDB 提供了多种实现表之间关系(连接)的方法,每种方法在内存开销和性能方面都有其独特的优势和成本。以下各节将介绍 Java 开发人员可用的不同技术。
索引连接
在 SQL 中,两个表(Department 和 Employee)之间简单的“一对多”关系可以按如下方式实现:
select e.name, d.name from Employee e
inner join Department d d.dept_no = d.dept_no;
要实现像Department > Employee示例那样的一对多关系,我们可以这样定义两个数据库类:
@Persistent()
class Department
{
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique tree
// index by "code"
String code;
String name;
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique Tree
// index by "dept_no"
int dept_no;
}
@Persistent()
@Index(name="byDept_EmployeeName", keys={@Key("dept_no"), @Key("name")}, unique=true)
class Employee
{
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique tree
// index by "name"
String name;
int dept_no;
}
在此,唯一哈希索引 dept_no 在 Department 类中充当“主键”,而唯一 B 树索引 byDept_EmployeeName 在 Employee 类中充当“外键”。请注意,byDept_EmployeeName 是由 Employee 类中的两个字段值组成的复合索引:dept_no 和 name。外键不一定是复合索引,仅在 dept_no 字段上创建一个简单的 B 树索引就足以形成连接,但使用复合索引还有一个额外的优势,即能够按 name 字段的字母顺序对具有相同 dept_no 值的所有实例进行排序。
为了展示连接是如何实现的,考虑以下代码片段(取自 SDK 示例“java/joins/Index_Join”),这些代码片段管理 Department 和 Employee 对象。首先,我们使用一些示例数据填充 Department 类:
// Create and insert Department objects
System.out.println("\nCreate Departments:\n");
con.startTransaction(Database.TransactionType.ReadWrite);
for (short i = 0; i < N_DEPARTMENTS; i++)
{
Department dept = new Department();
dept.name = DD[i].name;
dept.code = DD[i].code;
dept.dept_no = DD[i].dept_no;
con.insert(dept);
System.out.println("Code " + dept.code + ", Name " + dept.name +
", Dept_no = " + dept.dept_no);
}
con.commitTransaction();
然后我们插入Employee对象,将它们与适当的Department对象关联起来:
con.startTransaction(Database.TransactionType.ReadWrite);
// Find Department by code; extract dept_no; create Employee and assign name,
// dept_no
Cursor<Department> cursor = new Cursor<Department>(con, Department.class, "code");
for (short i = 0; i < N_EMPLOYEES; i++)
{
Department dept = cursor.find(ED[i].deptCode);
Employee emp = new Employee();
emp.dept_no = dept.dept_no;
emp.name = ED[i].name;
con.insert(emp);
System.out.println("\tEmployee: " + emp.name + "Dept_no: " + emp.dept_no);
}
con.commitTransaction();
请注意,我们在索引“Department.code”上实例化一个游标,并调用游标的 find() 方法定位到具有指定 deptCode 的 Department 对象;然后提取该 Department 对象的 dept_no 并存储到新的 Employee 对象中。这在后续查询中实现关联(关系)时,在 Department 对象和 Employee 对象之间创建了关联(关系)。
现在,为了在(即“连接”)这两个类之间建立关系,我们可以使用如下代码来显示给定员工所在部门的所有同事:
// Search for all Employee objects from a specified Employee's Department
String search_name = "William";
con.startTransaction(Database.TransactionType.ReadOnly);
// 1. Find the Employee object by name and extract dept_no
Cursor<Employee> cursor1 = new Cursor<Employee>(con, Employee.class, "name");
Employee emp1;
emp1 = cursor1.find(search_name);
// 2. Find the Department object by its dept_no and display the Department name
Cursor<Department> cursor2 = new Cursor<Department>(con, Department.class, "dept_no");
Department d = cursor2.find(emp1.dept_no);
System.out.println("\n\nFind " + search_name + "'s co-workers in " + d.name + " :\n");
// 3. Position the cursor in the byDept_EmployeeName compound index to the first
// object with this dept_no
con.startTransaction(Database.TransactionType.ReadWrite);
Cursor<Employee> cursor3 = new Cursor<Employee>(con, Employee.class,"byDept_EmployeeName");
{
if (cursor3.search(Cursor.Operation.GreaterOrEquals, emp1.dept_no, ""))
{
for (Employee e : cursor3)
{
if(e.dept_no != emp1.dept_no) // Exit loop when Dept_no is no longer equal
{
break;
}
else if ( !(e.name.equals(search_name)) ) // exclude search_name from results
{
System.out.println("\t" + e.name);
}
}
cursor3.moveNext();
}
}
con.commitTransaction();
请注意,我们使用索引 Employee.name 上的游标来查找具有指定名称的 Employee 对象;然后使用索引 Department.dept_no 上的游标通过找到的 Employee 的 dept_no 来查找 Department。接着,我们使用索引 Employee.byDept_EmployeeName 上的第三个游标定位到具有此 dept_no 的第一个对象,并在游标中滚动,直到 dept_no 不同为止。对于每个索引节点,我们从游标中获取 Employee 对象并检查其 name 字段。如果这不是我们正在为其查找同事的原始员工的名称,我们就显示该名称并调用 Cursor.moveNext() 方法以转到索引树中的下一个节点。
优点和缺点
这是一种许多开发人员熟悉的典型“关系型”风格的连接。虽然它直观且易于实现,但管理多个索引会带来显著的开销。当提交插入事务时,为每个“部门”对象创建一个哈希索引和两个 B 树索引节点,为每个“员工”对象创建两个 B 树索引节点。这会导致内存消耗,此外还有数据库对象本身占用的空间,而且 B 树索引结构需要“平衡”,这可能会消耗大量的处理器周期。因此,插入对象的整体性能不是最优的。
由于哈希索引 Idept_no 的存在,通过部门编号 dept_no 查找部门对象的操作效率很高。并且通过复合索引 Employee.byDept_EmployeeName 在员工对象上创建的游标有助于在索引树中“滚动”,从而按姓名字母顺序列出员工。
Autoid参考
上述索引连接示例的另一种技术是通过直接从“员工”对象引用其所属的“部门”来实现“外键”关系。一个自动 ID 字段可以实现此目的。例如,我们可以将这两个数据库类定义如下:
@Persistent(autoid = true)
class Department
{
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique tree
// index by "code"
String code;
String name;
}
@Persistent()
@Index(name="byDept_EmployeeName", keys={@Key("dept"), @Key("name")}, unique=true)
class Employee
{
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique tree
// index by "name"
String name;
long dept;
}
请注意,Department 类不再具有 dept_no 字段,因此也没有针对该字段的索引。取而代之的是,它具有注解 (autoid=true)。这实际上会导致为 Department 对象的 autoid 值维护一个“隐藏”的哈希索引,随着新对象的创建,这些值会自动递增。并且请注意,字段 dept 的类型为 long - 它将存储关联的 Department 对象的 autoid。
为了展示这种连接是如何实现的,请看以下代码片段(取自 SDK 示例“java/joins/Autoid_Ref”)。我们像上面的 Index_Join 示例那样填充 Department 类,但请注意,为了显示每个 Department 对象的自增 ID,我们从 Connection.insert() 方法中获取它:
// Create and insert Department objects
System.out.println("\nCreate Departments:\n");
con.startTransaction(Database.TransactionType.ReadWrite);
for (short i = 0; i < N_DEPARTMENTS; i++)
{
Department dept = new Department();
dept.name = DD[i].name;
dept.code = DD[i].code;
long autoid = con.insert(dept);
System.out.println("\t" + i + ") " + dept.code + ", " + dept.name + ", Autoid = " + autoid);
}
con.commitTransaction();
当插入Employee对象时,我们使用Cursor.getAutoId()方法从对应的Department对象中提取autoid,并将其分配给引用字段Employee.dept:
// Create and insert Employee objects
System.out.println("\nCreate employees and join each to a department:\n");
con.startTransaction(Database.TransactionType.ReadWrite);
// Find Department by code; extract dept_no; create Employee and assign name, dept_no
Cursor<Department> cursor = new Cursor<Department>(con, Department.class, "code");
for (short i = 0; i < N_EMPLOYEES; i++)
{
Department dept = cursor.find(ED[i].deptCode);
Employee emp = new Employee();
emp.name = ED[i].name;
// Assign the Department autoid for this Department object and insert new
// Employee object
emp.dept = cursor.getAutoId(); // Note that the method GetAutoId is of the
// Cursor class
con.insert(emp);
System.out.println("\t" + i + ") " + emp.name + ", Department.Autoid= " + emp.dept);
}
con.commitTransaction();
要显示给定员工所在部门的所有同事,其代码几乎与上面的 Index_Join 示例相同,唯一的区别在于我们在 Department 类上实例化游标时未指定索引字段。然后,我们通过调用 find 并传入 Employee.dept 的值来查找由其自动生成的主键标识的 Department 对象。这相当于在 Department.dept_no 索引上调用 Cursor.find(),只不过 dept_no 索引使用的是哈希索引,而自动生成的主键是自动产生的,而 dept_no 字段则必须指定一个唯一的值。同样的通过在复合索引 byDept_EmployeeName 上滚动游标的技术用于按字母顺序显示员工姓名:
// Search for all Employee objects from a specified Employee's Department
String search_name = "William";
con.startTransaction(Database.TransactionType.ReadOnly);
// 1. Find the Employee object by name and extract dept_no
Cursor<Employee> cursor1 = new Cursor<Employee>(con, Employee.class, "name");
Employee emp1;
emp1 = cursor1.find(search_name);
// 2. Find the Department object by its autoid and display the Department name
Cursor<Department> cursor2 = new Cursor<Department>(con, Department.class);
Department d = cursor2.find(emp1.dept);
System.out.println("\n\nFind " + search_name + "'s co-workers in " + d.name + " :\n");
// 3. Position the cursor in the byDept_EmployeeName compound index to the first
// object with this Department Autoid
con.startTransaction(Database.TransactionType.ReadWrite);
Cursor<Employee> cursor3 = new Cursor<Employee>(con, Employee.class, "byDept_EmployeeName");
{
if (cursor3.search(Cursor.Operation.GreaterOrEquals, emp1.dept, ""))
{
for (Employee e : cursor3)
{
if(e.dept != emp1.dept) // Exit loop when Dept_no is no longer equal
{
break;
}
else if ( !(e.name.equals(search_name)) ) // exclude serch_name from results
{
System.out.println("\t" + e.name);
}
}
cursor3.moveNext();
}
}
con.commitTransaction();
优点和缺点
这种实现方式通过省去字段 dept_no 而节省了一些内存,由于为自增 ID 生成了哈希索引,所以对 Department 对象的查找同样高效。但若放宽应用程序的要求,或许还能获得一些优化。
请注意,无论是这种关联方式还是索引连接的关联方式,其性能和内存消耗的主要瓶颈在于 B 树索引:Employee.byDept_EmployeeName、Employee.name 和 Department.name。例如,如果无需通过名称查找 Department 对象,则可以删除 Department.Name 索引。再者,如果无需按字母顺序显示给定部门的员工,则可以删除复合索引 Employee.byDept_EmployeeName。以下示例说明了这种方法。
autoids向量
与其在“员工”对象中存储“部门”对象的自增 ID,我们不如在“部门”对象中存储“员工”自增 ID 的数组或向量。例如(请参阅 SDK 示例“java/joins/Autoid_Vector”),我们可能会这样定义数据库类:
@Persistent(autoid = true)
class Department
{
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique tree
// index by "code"
String code;
String name;
long[] employees;
}
@Persistent(autoid = true)
//@Index(name="byDept_EmployeeName", keys={@Key("dept_no"), @Key("name")}, unique=true)
class Employee
{
@Indexable(type=Database.IndexType.BTree, unique=true) // Declare unique tree
// index by "name"
String name;
long dept;
}
请注意,这两个类都是通过 @Persistent(autoid = true) 来定义的。员工的自动 ID 将存储在部门类的员工引用向量中。我们将像上面的 Autoid_Ref 示例那样,在员工引用字段 dept 中使用部门的自动 ID,以便从员工对象高效地访问部门对象。
随着我们需要为 Department 对象中的动态自增 ID 值向量分配空间,并在插入 Employee 对象时为其分配这些自增 ID,填充数据库的代码变得稍微复杂了一些。因此,对于每个 Employee 对象,我们执行以下代码来分配空间,并将 Employee 自增 ID 插入到 Department 的 employees 向量中:
// Create and insert Department objects
System.out.println("\nCreate Departments:\n");
con.startTransaction(Database.TransactionType.ReadWrite);
for (short i = 0; i < N_DEPARTMENTS; i++)
{
Department dept = new Department();
dept.name = DD[i].name;
dept.code = DD[i].code;
// Allocate space for Employee autoids and initialize to 0
dept.employees = new long[VECTOR_SIZE];
for (short j = 0; j < VECTOR_SIZE; j++) dept.employees[j] = 0;
long autoid = con.insert(dept);
System.out.println("\t" + i + ") " + dept.code + ", " + dept.name +", Autoid=" + autoid);
}
con.commitTransaction();
// Create and insert Employee objects
System.out.println("\nCreate employees and join each to a department:\n");
for (short i = 0; i < N_EMPLOYEES; i++)
{
// Find the Department object for this Employee by the BTree index on
// Department.code
con.startTransaction(Database.TransactionType.ReadWrite);
Cursor<Department> cursor = new Cursor<Department>(con, Department.class,"code");
Department dept = cursor.find(ED[i].deptCode);
if (null != dept)
{
// Find the first vacant Employee vector element
for (short j = 0; j < VECTOR_SIZE; j++)
{
if (0 == dept.employees[j])
{
// Create Employee object and store its autoid in vector
// Department.employees
Employee emp = new Employee();
emp.name = ED[i].name;
// Assign the Department autoid for this Department object
// and insert new Employee object
emp.dept = cursor.getAutoId();
dept.employees[j] = con.insert(emp);
cursor.update(); // Assure that the inserted autoid is made permanent
System.out.println("\t" + i + ") " + emp.name +
", Department.Autoid=" + emp.dept);
break;
}
}
}
else
{
System.out.println("\tDepartment.code (" + ED[i].deptCode +
") not found!");
}
con.commitTransaction();
cursor.close();
}
现在,要显示给定员工所在部门的所有同事的代码,只需像上面的 Autoid_Ref 示例那样通过在不指定索引字段的情况下实例化 Department 类的 Cursor 来查找 Department 对象。然后我们调用 find 并传入 Employee.dept 的值,通过其自增 ID 查找 Department 对象。然后我们只需从向量中列出此部门的员工,如下所示:
// Search for all Employee objects from a specified Employee's Department
String search_name = "William";
con.startTransaction(Database.TransactionType.ReadOnly);
// 1. Find the Employee object by name and extract its autoid
Cursor<Employee> cursor1 = new Cursor<Employee>(con, Employee.class, "name");
Employee emp1 = cursor1.find(search_name);
if ( null != emp1 )
{
// 2. Find the Department object by its autoid and display the name
Cursor<Department> cursor2 = new Cursor<Department>(con, Department.class);
Department dept1 = cursor2.find(emp1.dept);
System.out.println("\n\n" + search_name + "'s co-workers in " + dept1.name +
" are:\n");
// 3. Scroll through the vector of Employee autoids, find the Employee object
// and display its name
Cursor<Employee> cursor3 = new Cursor<Employee>(con, Employee.class);
for (short j = 0; j < VECTOR_SIZE && 0 != dept1.employees[j]; j++)
{
// Skip if this is the autoid of the "search_name" object
if (cursor1.getAutoId() != dept1.employees[j])
{
Employee emp = cursor3.find(dept1.employees[j]);
System.out.println("\t" + emp.name);
}
}
}
优点和缺点
此实现方式显著减少了内存使用,并通过消除两个 B 树索引大幅减少了后台树结构维护工作,只是以不按字母顺序排列员工列表为代价。不过,要在应用程序中引入排序算法来解决这个问题其实很简单;但需要注意的是,如果每个部门的员工数量非常大,这种方法可能会变得不可行。