Solr in production was easier to set up than zookeeper, likely because I already played with it in development mode.
Make sure to set up Zookeeper before solr.
They key here was to follow the instructions.
To install solr as a service, we will need java, lsof and chkconfig:
yum install java-11-openjdk-devel
yum install chkconfig
yum install lsof
We can then get the solr download:
wget "https://dlcdn.apache.org/solr/solr/9.9.0/solr-9.9.0.tgz"
Next we get the install script from the tar directly:
tar xzf solr-9.9.0.tgz solr-9.9.0/bin/install_solr_service.sh --strip-components=2
Once we have the install script, we can then run it specifying the tar to use:
bash ./install_solr_service.sh solr-9.9.0.tgz
This should create the solr user and set up all the directories. Solr will save data to /var/solr/data and logs are in /var/solr/logs.
At this point solr is also running. However we need to let solr know about zookeeper and we also need to enable the extraction module.
We need to update /etc/default/solr.in.sh:
SOLR_PID_DIR="/var/solr"
SOLR_HOME="/var/solr/data"
LOG4J_PROPS="/var/solr/log4j2.xml"
SOLR_LOGS_DIR="/var/solr/logs"
SOLR_PORT="8983"
SOLR_MODULES=extraction,ltr
ZK_HOST="localhost:2181,localhost:2182,localhost:2183"
I added the last two lines to the bottom of the config.
Once we've updated the config, then we can restart solr:
service solr restart
Now we can create our first collection:
/opt/solr/bin/solr create -c posts --shards 2 -rf 2
This has created the posts collection. Now I want to create _text_ field that we will search against. This is the catchall field.
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' http://localhost:8983/solr/posts/schema
Finally, I loaded in some data:
/opt/solr/bin/solr post -c posts /home/username/BLOG/json/*
I exported my blog posts as json so that I can easy ingest everything.
Now I can do a quick search to test everything:
curl "http://localhost:8983/solr/posts/select?q=sola~1&hl=true*&hl.fl=title%20markdown"
This should give me back a list of results that contain the word sola or anything that has an edit distance of 1 to that word. It should also give me a context fragment with highlighting.