This document explains usage of the Raco library. Most users of Myria will interact through Myria python, so who is this document for?
scripts/myrial provides Raco functionality on the command line.
Print the help message
Generate a logical plan for a MyriaL query in examples/
scripts/myrial -l examples/join.myl
See the physical plan, MyriaX is the default algebra to use
Raco requires a catalog for MyriaL queries. All of the example queries in
use the catalog defined in
raco/catalog.py for formatting information.
scripts/myrial automatically searches for a
catalog.py file in the same directory
as the provided query. You can also provide a custom path.
scripts/myrial --catalog=examples/catalog.py -l examples/join.myl
To connect to the real catalog of a Myria instance, see using Raco from python.
Get the JSON used to submit the query plan to MyriaX REST interface
scripts/myrial -j example/join.myl
There is also a python string representation of the query plan. This is valid python code that you can give back to Raco.
# output the plan for the query (join.raco is just a file containing python code) scripts/myrial -r example/join.myl >join.raco cat join.raco # run raco on the plan in join.raco instead of on a MyriaL query scripts/myrial --plan join.raco
You can write python scripts to construct query plans and compile them to back end systems like the Myria JSON format.
Below is the boilerplate template for going from MyriaL query to JSON query plan, using the catalog from a running Myria instance. Fill in your Myria hostname and the query to translate.
import raco.myrial.parser as parser import raco.myrial.interpreter as interpreter from raco.backends.myria.catalog import MyriaCatalog from raco.backends.myria.connection import MyriaConnection # connect to your Myria instance's catalog connection = MyriaConnection( hostname=<url of your myria instance>, port=8753, ssl=False) catalog = MyriaCatalog(connection) _parser = parser.Parser() statement_list = _parser.parse(""" <put myrial query here> """) processor = interpreter.StatementProcessor(catalog, True) processor.evaluate(statement_list) # here we print the logical, physical, and json versions of the plan for illustration purposes print processor.get_logical_plan() print processor.get_physical_plan() print processor.get_json()
Using Raco’s python API, it is possible to manipulate the query plan at either the logical or physical level.
Often users of MyriaX want to partition a table.
This is possible in MyriaL with Store:
T1 = scan(public:vulcan:edgesConnected); store(T1, public:vulcan:edgesConnectedSort, [$0, $1, $3]);
In the past, MyriaL did not have this Shuffle syntax. However we could still easily build the query plan we wanted. Here is an example of using MyriaX shuffle to partition a table in the MyMergerTree astronomy application. Don’t feel daunted by the length of this example; most of the code is boilerplate to get the plan from a query. The action is at “this is the actual plan manipulation”.
from raco.catalog import FromFileCatalog import raco.myrial.parser as parser import raco.myrial.interpreter as interpreter import raco.algebra as alg from raco.expression.expression import UnnamedAttributeRef # get the schema catalog = FromFileCatalog.load_from_file("vulcan.py") _parser = parser.Parser() # We can have Raco start us with a plan that is close to the one we want by giving it a MyriaL query. # Here we start with scan, store. We'll modify it to get scan, shuffle, store. statement_list = _parser.parse(""" T1 = scan(public:vulcan:edgesConnected); store(T1, public:vulcan:edgesConnectedSort); """) processor = interpreter.StatementProcessor(catalog, True) processor.evaluate(statement_list) # we will add the shuffle into the logical plan p = processor.get_logical_plan() # This is the actual plan manipulation; just insert a Shuffle. Since the # operators are all unary (single-input) this just looks like linked-list insertion. tail = p.args.input p.args.input = alg.Shuffle(tail, [UnnamedAttributeRef(0), UnnamedAttributeRef(1), UnnamedAttributeRef(3)]) # Shuffle columns # output json query plan for MyriaX p = processor.get_physical_plan() p = processor.get_json() print p
Suppose Raco chooses to perform a join by shuffling both inputs. However, we may know that the right input is much smaller and so we really want to do a broadcast join.
from raco.catalog import FromFileCatalog import raco.myrial.parser as parser import raco.myrial.interpreter as interpreter import raco.backends.myria as alg from raco.expression.expression import UnnamedAttributeRef # get the schema catalog = FromFileCatalog.load_from_file("vulcan.py") _parser = parser.Parser() # Get the default Raco plan for the join statement_list = _parser.parse(""" T1 = scan(public:vulcan:edgesConnected); s = select * from T1 a, T1 b where b.currentTime=0 and a.nextGroup=b.currentGroup; store(s, public:vulcan:joined); """) processor = interpreter.StatementProcessor(catalog, True) processor.evaluate(statement_list) # will modify the physical plan, where a Join implementation is already chosen p = processor.get_physical_plan() # the plan is a symmetric hash join, shuffling both sides print "before mod: ", p # locate the MyriaSymmetricHashJoin operator join = p.input.input.input assert isinstance(join, alg.MyriaSymmetricHashJoin) # modify the right side to replace shuffle with broadcast rightChild = join.right.input.input join.right = alg.MyriaBroadcastConsumer(alg.MyriaBroadcastProducer(rightChild)) # modify the left side to remove the shuffle leftChild = join.left.input.input join.left = leftChild print "after mod: ", p # output json query plan for MyriaX p = processor.get_json() print p
Here we provide information on extending Raco.
Raco currently emits code for MyriaX (+ SQL query push down into Postgres), Grappa/C++, and SQL databases. It has limited support for SciDB and SPARQL.
Every backend goes in its own directory within
raco/backends. Here is an example:
raco/backends/myria/catalog.py: defines the Catalog interface for getting information about relations raco/backends/myria/myria.py: defines the physical algebra for Myria and a list of optimization rules raco/backends/myria/tests: tests specific to myria backend raco/backends/myria/__init__.py: provides convenient import of public members using raco.backends.myria
opt_rules, which is a list of optimization rules to apply in order.
MyriaOperator is the base class for operators in the physical algebra for Myria.
Compilation from a tree of
Operators to the target language can be implemented in any way you want.
For examples, see
compileme method and
Put your new operator for
If you need to also add an operator to the logical operator, put it in
The (non-experimental) optimization of query plans is done with a heuristic rule-based planner.
Raco provides many useful rules in
Rule is the super class of all rules.
A physical algebra provides an implementation of
opt_rules, which just returns an ordered list
of rules to apply. The optimizer applies each rule breadth first to the entire query plan tree, in the order specified by the list.
This algorithm is very simplistic, but it works out okay right now (see
raco/rules.pyor one of the languages in
raco/language/*.py. If it is a generic rule and you find it in one of the languages, please submit a pull request.
raco/rules.py. You must implement two methods:
firechecks if the rule is applicable to the given tree. If not then it should return the tree itself. If the rule does apply then
fireshould return a transformed tree. It is okay to mutate the input tree and return it: most of Raco’s rules are currently doing this instead of keeping the input immutable and copying the whole tree.
raco/backends/myria/myria.py) and instantiate your rule somewhere in the list returned by