-
Bug
-
Resolution: Done
-
Medium
-
El Alto Release
-
El Alto - 2
While doing a deployment of el-alto an issue was observed once that the file org.osaaf.aaf.cassandra.props was missing from the /opt/app/osaaf/local directory.
Multiple restarts didn't solve the problem until the directory /mnt/data/aaf/config/local/ was removed and aaf re-installed. Then everything worked.
On examining the startup scripts I think I can see how this could happen but wouldn't happen often as its a race condition
I think there are at least two positions this can happen in.
It is made more likely because all the different aaf components are waiting on the same service to come avtive so the config of all the "sub-services" will be triggered at the same time. It was more likely in my deployment because I only had 4 nodes so more aaf pods were on the same node
I think this what caused the problem even though it seems very unlikely
In the script
/opt/app/aaf_config/bin/agent.sh
#Should we clean up?
if [ ! -e "${LOCAL}/VERSION" ] || [ "${VERSION}" != "$(cat ${LOCAL}/VERSION)" ]; then
echo "Clean up directory ${LOCAL}"
rm -Rf ${LOCAL}/org.osaaf.aaf.*props ${LOCAL}/org.osaaf.aaf.p12
ls ${LOCAL}
fi
echo "${VERSION}" > $LOCAL/VERSION
can have a race because echo "${VERSION}" > $LOCAL/VERSION will cause the if to be true sometimes as the file will not exist for a tiny amount of time (I tested it with a continous loop and it does fail)
I think if the
echo "${VERSION}" > $LOCAL/VERSION is moved into the if it is significantly less likely to happen
also in the script
/opt/app/aaf_config/bin/agent.sh
there is the check
#Only initialize once, automatically...
if [ ! -e $LOCAL/org.osaaf.aaf.props ]; then
...
echo "cm_always_ignore_ips=${cm_always_ignore_ips:=false}" >> $LOCAL/org.osaaf.aaf.props;
but the action in the ... could take an amount of time.
- relates to
-
AAF-965 Document moving to El Alto based Properties
- Closed